c-c-c-c-c's favorites | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit | c-c-c-c-c's favorites

submissions | comments

		jcul on July 4, 2024 \| parent \| context \| on: Do not taunt happy fun branch predictor (2023) I'm not super familiar with ARM / ARM64 assembly and was confused as to how x0 was incremented. Was going to ask here, but decided to not be lazy and just look it up. `const float f = *data++; ldr s1, [x0], #4` Turns out this instruction loads and increments x0 by 4 at the same time. It looks like you can use negative values too, so could iterate over something in reverse. Kind of cool, I don't think x86_64 has a single instruction that can load and increment in one go.
		dragontamer on Sept 26, 2022 \| parent \| context \| on: Zen4's AVX512 Teardown Excellent Teardown by "Mysticial" from mersenneforum.org. Cliffnotes: * Zen4 AVX512 is mostly double-pumped: a 256-bit native hardware that processes two halves of the 512-bit register. * No throttling observed * 512-bit shuffle pipeline (!!). A powerful exception to the "double-pumping" found in most other AVX512 instructions. * AMD seemingly handles the AVX512 mask registers better than Intel. * Gather/Scatter slow on AMD's Zen4 implementation. * Intel's 512-bit native load/store unit has clear advantages over AMD's 256-bit load-store unit when reading/writing to L1 cache and beyond.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact