More

eblanshey · 2025-12-22T00:36:35 1766363795

What hardware can you buy for $5k to be able to run K2? That's a huge model.

SamDc73 · 2025-12-22T04:20:46 1766377246

This older HN thread shows R1 running on a ~$2k box using ~512 GB of system RAM, no GPU, at ~3.5-4.25 TPS: https://news.ycombinator.com/item?id=42897205

If you scale that setup and add a couple of used RTX 3090s with heavy memory offloading, you can technically run something in the K2 class.

nl · 2025-12-22T06:16:50 1766384210

Is 4 TPS actually useful for anything?

That's around 350,000 tokens in a day. I don't track my Claude/Codex usage, but Kilocode with the free Grok model does and I'm using between 3.3M and 50M tokens in a day (plus additional usage in Claude + Codex + Mistral Vibe + Amp Coder)

I'm trying to imagine a use case where I'd want this. Maybe running some small coding task overnight? But it just doesn't seem very useful.

zarzavat · 2025-12-22T08:10:29 1766391029

3.5-50M tokens a day? What are you doing with all those tokens?

Yesterday I asked Claude to write one function. I didn't ask it to do anything else because it wouldn't have been helpful.

KronisLV · 2025-12-22T14:34:13 1766414053

Here’s my own stats, for comparison: https://news.ycombinator.com/item?id=46216192

Essentially migrating codebases, implementing features, as well as all of the referencing of existing code and writing tests and various automation scripts that are needed to ensure that the code changes are okay. Over 95% of those tokens are reads, since often there’s a need for a lot of consistency and iteration.

It works pretty well if you’re not limited by a tight budget.

nl · 2025-12-23T00:47:43 1766450863

https://github.com/nlothian/Vibe-Prolog chews a lot of tokens.

Have a bunch of other side projects as well as my day job.

It's pretty easy to get through lots of tokens.

SamDc73 · 2025-12-22T20:18:10 1766434690

I only run small models (70b at my hardware gets me around 10-20 TOPS) for just random things (personal assistant kind of thing) but not for coding tasks.

For coding related tasks I consume 30-80M tokens per day and I want something as fast as it gets

BoredPositron · 2025-12-22T08:38:30 1766392710

Stop recommending 3090s they are all but obsolete now. Not having native bf16 is a showstopper.

qayxc · 2025-12-22T10:23:18 1766398998

Hard disagree. The difference in performance is not something you'll notice if you actually use these cards. In AI benchmarks, the RTX 3090 beats the RTX 4080 SUPER, despite the latter having native BF16 support. 736GiB/s (4080) memory bandwidth vs 936 GiB/s (3090) plays a major role. Additionally, the 3090 is not only the last NVIDIA consumer card to support SLI.

It's also unbeatable in price to performance as the next best 24GiB card would be the 4090 which, even used, is almost tripple the price these days while only offering about 25%-30% more performance in real-world AI workloads.

You can basically get an SLI-linked dual 3090 setup for less money than a single used 4090 and get about the same or even more performance and double the available VRAM.

BoredPositron · 2025-12-22T12:03:56 1766405036

If you run fp32 maybe but no sane person does that. The tensor performance of the 3090 is also abysmal. If you run bf16 or fp8 stay away from obsolete cards. Its barely usable for llms and borderline garbage tier on video and image gen.

qayxc · 2025-12-22T14:09:21 1766412561

Actual benchmarks show otherwise.

> The tensor performance of the 3090 is also abysmal.

I for one compared my 50-series card's performance to my 3090 and didn't see "abysmal performance" on the older card at all. In fact, in actual real-world use (quantised models only, no one runs big fp32 models locally), the difference in performance isn't very noticeable at all. But I'm sure you'll be able to provide actual numbers (TTFT, TPS) to prove me wrong. I don't use diffusion models, so there might be a substantial difference there (I doubt it, though), but for LLMs I can tell you for a fact that you're just wrong.

BoredPositron · 2025-12-22T15:52:32 1766418752

To be clear, we are not discussing small toy models but to be fair I also don't use consumer cards. Benchmarks are out there (phoronix, runpod, hugginface or from Nvidias own presentation) and they say it's at least 2x on high and nearly 4x on low precision, which is comparable to the uplift I see on my 6000 cards, if you don't see the performance uplift everyone else sees there is something wrong with your setup and I don't have the time to debug it.

qayxc · 2025-12-22T21:13:00 1766437980

> To be clear, we are not discussing small toy models but to be fair I also don't use consumer cards.

> if you don't see the performance uplift everyone else sees there is something wrong with your setup and I don't have the time to debug it.

Read these two statements and think about what might be the issue. I only run what you call "toy models" (good enough for my purposes), so of course your experience is fundamentally different from mine. Spending 5 figures on hardware just to run models locally is usually a bad investment. Repurposing old hardware OTOH is just fine to play with local models and optimise them for specific applications and workflows.

SamDc73 · 2025-12-22T20:23:52 1766435032

Even with something like a 5090, I’d still run Q4_K_S/Q4_K_M because they’re far more resource-efficient for inference.

Also, the 3090 supports NVLink, which is actually more useful for inference speed than native BF16 support.

Maybe if you're training bf16 matters?

BoredPositron · 2025-12-22T20:59:46 1766437186

That's a smart thing todo considering a 5090 has native tensor cores for 4bit precision...

eblanshey · 2025-10-17T01:19:53 1760663993

OTOH I installed it on my elderly mother's computer, and she said that it did everything that MS Office could do. She's perfectly happy with it.

eblanshey · 2025-07-13T18:03:53 1752429833

I do pretty much this.

- /home/me -- 660 default permissions

- /home/me/project -- owner "claude", add "me" to "claude" group

- symlink /home/claude/project to /home/me/project

- Run claude in different user session

- Use CTRL+ALT+F2 to immediately switch between my main and claude sessions. Claude continues working in the background.

eblanshey · on Nov 20, 2024

Like others have mentioned, Blender has become quite the successful open-source story. They used to be riddled with bugs and UX issues, much like FreeCAD was. Yesterday FreeCAD released v1 of their software, and they seem to be on the same redemption path as Blender. It's too bad their v1 release didn't gain much traction on here, as more people ought to give FreeCAD another whirl. The improvements there are massive. And it's the only proper parametric CAD software available on Linux.

amelius · on Nov 20, 2024

To what extent can Blender replace FreeCAD for mechanical engineering purposes?

cultofmetatron · on Nov 20, 2024

I would say avoid it. blender is an excellent MESH modeler but that puts it fundamentally at odds with being a good parametric modeler. a parametric modeler's base primitives are based in deformations on solid objects. mesh modelers are just vertices connected by line segments where 3 form a face. servicable if you're just doing simple objects for a 3d printer but disastrous if you need precision.

amelius · on Nov 20, 2024

I don't understand why precision would be an issue? Is it not possible to fix the position of vertices to sub-micron precision?

I know that Blender is used more in the movie industry. But what if I wanted to make, say, an animation of some cartoon character that gets shredded in a gearbox? What program would I use?

rounce · on Nov 20, 2024

A curve in a parametric CAD program will have an internal representation which is perfectly smooth. As rather than being than a set of straight lines (edges) connected by vertices it is instead a mathematical description of a curve which has infinite resolution.

For your animation example Blender would be the appropriate tool to use as you are doing stuff that requires flexibility of form rather than precision.

amelius · on Nov 20, 2024

Aha, so it is a bit like bitmap versus vector graphics in 2d painting programs.

rounce · on Nov 20, 2024

Yeah somewhat, there’s also the thing where mesh models can potentially have no thickness (eg. a single polygon) as well as gaps in the mesh whereas this is (usually) impossible in the case of a parametric model.

regularfry · on Nov 20, 2024

Rigging. The assembly bits in FreeCAD just haven't been great historically, and the Ondsel assembly layer is very new. If you want to visually check for clashes I can see how someone might prefer to just import a bunch of STLs into Blender, rig them up, and wiggle them about.

rounce · on Nov 20, 2024

Why? They’re fundamentally different applications for different purposes.

amelius · on Nov 20, 2024

I was thinking that since Blender has physics simulation, and it also has nice video renderings, that would be two great reasons to use it for mechanical designs with moving parts, for example.

But I don't have much experience in designing parts. I like SolveSpace, but it becomes slow for medium/large designs. I know FreeCAD has a lot of problems with stability and UI consistency, so I avoided it.

rounce · on Nov 20, 2024

FreeCAD has more rigorous simulation features - FEM/FEA, mechanical assembly, CAM path generation and simulation, and robotics to name a few - out of the box which makes sense as it’s for engineering rather than art, and there are additional addons for CFD and sheet metal available among many others.

The recent 1.0 update brought some major UI/UX improvements, though if you’re coming from other software you’ll find the Ribbon addon to be extremely helpful to feel comfortable. I think it gets a lot of over the top criticism given there are more people working on just the Autodesk CAD kernel than the entirety of FreeCAD and its dependencies. The rate of improvement is gradually accelerating and its already a big jump from where it was a few years ago.

eblanshey · on Sept 29, 2024

So Linux now officially support RTOS capabilities, without patches, which is pretty cool. I wonder, realistically, how many applications that were originally designed to use microcontrollers for real-time purposes, can be migrated to use Linux, which vastly simplifies and lowers the cost development. And having the ability to use high-level languages like Python significantly lowers the barrier to entry. Obviously certain applications require the speed of a MCU without an operating system, but how many projects really don't need dedicated MCUs?

elcritch · on Sept 29, 2024

Unfortunately migrating real-time stuff to Linux _doesn't_ necessarily reduce costs or simplify real-time development needs. I've been doing embedded development for 5+ years at a few companies and doing embedded Linux is still a slog. I prefer a good MCU running Nim or other modern language. Heck there's even MicroPython nowadays.

Especially for anything that needs to "just run" for multiple years. Linux means you must deal of the distro or something like Yocto or Buildroot. Both of which have major pain points.

eblanshey · on Sept 29, 2024

I would think the portability of, say, a Python application running on Linux is a nice benefit. Try switching from one MCU to a totally different one and you may have to start from scratch (e.g. try going from Microchip to STM.) Can you describe why embedded Linux is still a slog? And what do you think it would take for the issues to be addressed?

makapuf · on Sept 30, 2024

I thought we were talking about real-time applications, which I'm not sure Python is (even tuning the GC). But if we're talking about the difficulty of changing MCU families (remember stm32 are >1000 different chips) changing OS is also difficult, even changing from yocto to buildroot can also be a lot of pain on linux.

wongarsu · on Sept 29, 2024

Doesn't Micropython already get you 95% of the way towards just running the same Python code on multiple MCUs?

eblanshey · on Sept 30, 2024

I'm not sure, I've never used it. But I think the issue is that the number of MCUs that support micropython is very small.

magicalhippo · on Sept 30, 2024

MicroPython supports[1] PIC16 and SAMD21/SAMD51, STM32, ESP8266/ESP32 and more, but it also supports Zephyr as a target, and with it the platforms Zephyr supports[2].

So yeah not everything under the sun, but certainly not what I'd consider a "very small" number of MCUs.

Of course, support level varies among the platforms, but you're not going to be doing too fancy things in MicroPython I imagine.

[1]: https://github.com/micropython/micropython?tab=readme-ov-fil...

[2]: https://docs.zephyrproject.org/latest/boards/index.html

rcxdude · on Sept 30, 2024

I think there's still a wide range of devices for which a bare-metal or smaller RTOS approach is still more cost-effective. Anything that's simple enough it doesn't need networking, a filesystem, or a display, for example. Especially considering bare-metal embedded work seems to pay less than development on linux. But yes, embedded linux can address a huge part of the market and RT expands that a lot (though, of course, most people for whom that is a good option are already using it, it was a well-supported patchset for a long time)

eblanshey · on May 23, 2024

I'd buy a framework laptop just for that.

eblanshey · on April 22, 2024

How long before AI crawlers start picking this up and start including it in their training data?

system2 · on April 22, 2024

Let's hope so.

eblanshey · on April 17, 2024

An old-time hang glider pilot has been keeping his website up-to-date since the very early 2000s, complete with rotating gifs, hidden SEO keywords, index pages, photo albums, and more goodies!

eblanshey · on March 22, 2024

Wow, the state of self-hosted photo libraries has gotten way better since I last settled on Photoprism a few years ago. Both Memories and Immich seem very polished. The timeline features look great. I may need to play around with these.

eblanshey · on March 11, 2024

Most mobile bank websites have 100% functionality already, including depositing checks. The bank app isn't needed.