Hacker Newsnew | past | comments | ask | show | jobs | submit | zeusk's commentslogin

Get the DGX Spark computers? They’re exactly what you’re trying to build.

They’re very slow.

They're okay, generally, but slow for the price. You're more paying for the ConnectX-7 networking than inference performance.

Yeah, I wouldn’t complain if one dropped in my lap, but they’re not at the top of my list for inference hardware.

Although... Is it possible to pair a fast GPU with one? Right now my inference setup for large MoE LLMs has shared experts in system memory, with KV cache and dense parts on a GPU, and a Spark would do a better job of handling the experts than my PC, if only it could talk to a fast GPU.

[edit] Oof, I forgot these have only 128GB of RAM. I take it all back, I still don’t find them compelling.


the TB5 link (RDMA) is much slower than direct access to system memory


Nvidia has been investing in confidential compute for inference workloads in cloud - that covers physical ownership/attacks in their thread model.

https://www.nvidia.com/en-us/data-center/solutions/confident...

https://developer.nvidia.com/blog/protecting-sensitive-data-...


It's likely I'm mistaken about details here but I _think_ tee.fail bypassed this technology and the AT article covers exactly that.


> And yet, if I go to Youtube or just about any other modern site, it takes literally a minute to load and render, none of the UI elements are responsive, and the site is unusable for playing videos. Why? I'm not asking for anything the hardware isn't capable of doing.

but the website and web renderer are definitely not optimized for a netbook from 2010 - even modern smartphones are better at rendering pages and video than your atom (or even 8350u) computers.


> even modern smartphones are better

That's an understatement if I've ever seen one! For web rendering single-threaded performance is what mostly matters and smartphones got crazy good single-core performance these days. The latest iPhone has faster single core than even most laptops


Yes, but parent comment definitely implied they weren't talking about people running on the latest and best out there. Even the middle-grade smartphones today are leaps and bounds better than the atom from 2010.


What do you mean by that? Most syscalls are still interrupt based.


x86-64 introduced a `syscall` instruction to allow syscalls with a lower overhead than going through interrupts. I don't know any reason to prefer `int 80h` over `syscall` when the latter is available. For documentation, see for example https://www.felixcloutier.com/x86/syscall


While AMD syscall or Intel sysenter can provide a much higher performance than the old "int" instructions, both syscall and sysenter have been designed very badly, as explained by Linus himself in many places. It is extremely easy to use them in ways that do not work correctly, because of subtle bugs.

It is actually quite puzzling why both the Intel designers and the AMD designers have been so incompetent in specifying a "syscall" instruction, when such instructions, but well designed, had been included in many other CPU ISAs for many decades.

When not using an established operating system, where the implementation for "syscall" has been tested for many years and hopefully all bugs have been removed, there may be a reason to use the "int" instruction to transition into the privileged mode, because it is relatively foolproof and it requires a minimum amount of code to be handled.

Now Intel has specified FRED, a new mechanism for handling interrupts, exceptions and system calls, which does not have any of the defects of "int", "syscall" and "sysenter".

The first CPU implementing FRED should be Intel Panther Lake, to be launched by the end of this year, but surprisingly, recently when Intel has made a presentation providing information about Panther Lake no word was said about FRED, even if this is expected to be the greatest innovation of Panther Lake.

I hope that the Panther Lake implementation of FRED is not buggy, which could have made Intel to disable it and postpone its introduction to a future CPU, like they have done many times in the past. For instance, the "sysenter" instruction was intended to be introduced in Intel Pentium Pro, by the end of 1995, but because of bugs it was disabled and not documented until Pentium II, in mid 1997, where it finally worked.


32 bit x86 also has sysenter/sysexit.


Only Intel. AMD had its own "syscall" instead of Intel's "sysenter" since the K6 CPU, so x86-64 has inherited that.

AMD's "syscall" corrects some defects of Intel's "sysenter", but unfortunately it introduces some new defects.

Details can be found in the Linux documentation, in comments by Linus Torvalds about the use of these instructions in the kernel.


Double buffering a 4K 4bpp framebuffer itself is 64mb


> same hardware as the higher end models but needs a firmware bit flip

Is this firmware bit flip known? couldn't find anything off google.



This is incredible. I had no idea that there is a Homebrew channel!


AUM is not theirs to keep; and market cap is a very deceitful metric especially for banks where liabilities dwarf the market cap.


My point is that it’s a minor transaction for them


Well so is the snapdragon X elite, including the older snapdragons (anyone remember scorpion cores on QSD8x50?)


The real issue with packed structs and bitfields happens when concurrency gets involved. Majority of modern CPU caches are private and only allow one core to hold the cache line - so it creates more false dependencies when cores are trying to alter information that was compacted into a cache line on another core.


Avoiding false sharing is a separate problem best solved by explicitly aligning the struct or relevant members to std::hardware_destructive_interference_size.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: