Yeah, I wouldn’t complain if one dropped in my lap, but they’re not at the top of my list for inference hardware.
Although... Is it possible to pair a fast GPU with one? Right now my inference setup for large MoE LLMs has shared experts in system memory, with KV cache and dense parts on a GPU, and a Spark would do a better job of handling the experts than my PC, if only it could talk to a fast GPU.
[edit] Oof, I forgot these have only 128GB of RAM. I take it all back, I still don’t find them compelling.
> And yet, if I go to Youtube or just about any other modern site, it takes literally a minute to load and render, none of the UI elements are responsive, and the site is unusable for playing videos. Why? I'm not asking for anything the hardware isn't capable of doing.
but the website and web renderer are definitely not optimized for a netbook from 2010 - even modern smartphones are better at rendering pages and video than your atom (or even 8350u) computers.
That's an understatement if I've ever seen one! For web rendering single-threaded performance is what mostly matters and smartphones got crazy good single-core performance these days. The latest iPhone has faster single core than even most laptops
Yes, but parent comment definitely implied they weren't talking about people running on the latest and best out there. Even the middle-grade smartphones today are leaps and bounds better than the atom from 2010.
x86-64 introduced a `syscall` instruction to allow syscalls with a lower overhead than going through interrupts. I don't know any reason to prefer `int 80h` over `syscall` when the latter is available. For documentation, see for example https://www.felixcloutier.com/x86/syscall
While AMD syscall or Intel sysenter can provide a much higher performance than the old "int" instructions, both syscall and sysenter have been designed very badly, as explained by Linus himself in many places. It is extremely easy to use them in ways that do not work correctly, because of subtle bugs.
It is actually quite puzzling why both the Intel designers and the AMD designers have been so incompetent in specifying a "syscall" instruction, when such instructions, but well designed, had been included in many other CPU ISAs for many decades.
When not using an established operating system, where the implementation for "syscall" has been tested for many years and hopefully all bugs have been removed, there may be a reason to use the "int" instruction to transition into the privileged mode, because it is relatively foolproof and it requires a minimum amount of code to be handled.
Now Intel has specified FRED, a new mechanism for handling interrupts, exceptions and system calls, which does not have any of the defects of "int", "syscall" and "sysenter".
The first CPU implementing FRED should be Intel Panther Lake, to be launched by the end of this year, but surprisingly, recently when Intel has made a presentation providing information about Panther Lake no word was said about FRED, even if this is expected to be the greatest innovation of Panther Lake.
I hope that the Panther Lake implementation of FRED is not buggy, which could have made Intel to disable it and postpone its introduction to a future CPU, like they have done many times in the past. For instance, the "sysenter" instruction was intended to be introduced in Intel Pentium Pro, by the end of 1995, but because of bugs it was disabled and not documented until Pentium II, in mid 1997, where it finally worked.
The real issue with packed structs and bitfields happens when concurrency gets involved. Majority of modern CPU caches are private and only allow one core to hold the cache line - so it creates more false dependencies when cores are trying to alter information that was compacted into a cache line on another core.
Avoiding false sharing is a separate problem best solved by explicitly aligning the struct or relevant members to std::hardware_destructive_interference_size.
reply