Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

AArch64 came from AArch32. That's why it keeps things like condition codes, which are a big mistake for large out-of-order implementations. RISC-V sensibly avoid this by having condition-and-branch instructions instead. Otherwise, RISC-V is conservative because it tries to avoid possibly encumbered techniques. But other than that it's remarkably simple and elegant.


> That's why it keeps things like condition codes, which are a big mistake for large out-of-order implementations. RISC-V sensibly avoid this by having condition-and-branch instructions instead.

Respectfully, the statement in question is partially erroneous and, in far greater measure, profoundly misleading. A distortion draped in fragments of truth remains a falsehood nonetheless.

Whilst AArch64 does retain condition flags, it is not simply because of «AArch32 stretched to 64-bit», and condition codes are not a «big mistake» for large out-of-order (OoO) cores. AArch64 also provides compare-and-branch forms similar to RISC-V, so the contrast given is a false dichotomy.

Namely:

  – «AArch64 came from AArch32» – historically AArch64 was a fresh ARMv8-A ISA design that removed many AArch32 features. It has kept flags, but discarded pervasive per-instruction predication and redesigned much of the encoding and register model;

  – «Flags are a big mistake for large OoO» – global flags do create extra dependencies, yet modern cores (x86 and ARM) eliminate most of the cost with techniques such as flag renaming, out-of-order flag generation and using instruction forms that avoid setting flags when unnecessary. As implemented in high-IPC x86 and ARM cores, it shows that flags are not an inherent limiter;

  – «RISC-V avoids this by having condition-and-branch» – AArch64 also has condition-and-branch style forms that do not use flags, for example:

  1) CBZ/CBNZ xN, label – compare register to zero and branch;

  2) TBZ/TBNZ xN, #bit, label – test bit and branch.
Compilers freely choose between these and flag-based sequences, depending on what is already available and the code/data flow. Also, many arithmetic operations do not set flags unless explicitly requested, which reduces false flag dependencies.

Lastly, but not least importantly, Apple’s big cores are among the widest, deepest out-of-order designs in production, with very high IPC and excellent branch handling. Their microarchitectures and toolchains make effective use of:

  – Flag-free branches where convenient – CBZ/CBNZ, TBZ/TBNZ (see above);

  – Flag-setting only when it is free or beneficial – ADDS/SUBS feeding a conditional branch or CSEL;

  – Advanced renaming – including flag renaming – which removes most practical downsides of a global NZCV.


[flagged]


You are, of course, most welcome to offer your contributions — whether in debate or in contestation of the points I have raised – beyond the hollow reverberations of yet another LLM echo chamber.

The information I used to contest the original statement comes from the AArch64 ISA documentation as well as from the infamous «M1 Explainer (070)» publication, namely sections titled «Theory of a modern OoO machine» and «How Do “set flags” Instructions, Like ADDS, Modify the History File?».


Thanks for the link to that article, by the way! I missed a lot of the “ephemeral literature” that was being passed around when M1 was first released and we were collectively trying to understand it.


Yeah the problem with having flags is demonstrated by multiple very high performance implementations of arm64 and x86, while risc-v has exactly zero.


The time in which you will be able to truthfully say that is very rapidly coming to an end.


RVA23 hopefully.

It looks a lot like Zeno's paradox of RISC-V implementation.


I wish this were true, but we are more than one year(s) away from a consumer RISC-V chip that can beat my Intel N150 mini PC.


That will be amazing when it happens, and a year is VERY soon!

Tenstorrent's first "Atlantis" Ascalon dev board is going to be similar µarch to Apple M1 but running at a lower clock speed, but all 8 cores are "performance" cores, so it should be in N150 ballpack single-core and soundly beating it multi-core.

They are currently saying Q2 2026, which is only 4-7 months from now.


How are you defining "large"? Apple seems to do pretty well with the M-series.


Afair, AArch64 was basically designed by Apple for their A-series iPhone processors, and pushed to be the official ARM standard. Those guys really knew what they were doing and it shows.


It's clear that Arm worked with Apple on AArch64 but saying it was basically designed 'by Apple' rather than 'with Apple' is demonstrably unfair to the Arm team who have decades of experience in ISA design.

If Apple didn't need Arm then they would have probably found a way of going it alone.


Apple helped develop Arm originally and was a (very) early user with Newton. Why would they go it alone when they already had a large amount of history and familiarity available?

Sorry, Apple didn’t help to develop ARM originally. They were an early investor and customer of Advanced RISC Machines when it was spun out of Acorn.

RISC-V’s variable instruction length (since compression is required to have decent density) is a bigger problem for wide designs.

Not insurmountable, as evidenced by recent AMDs. But still a limitation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: