Sure, you're not going to get anything close to a Claude Code style agent from a local model (unless you shell out $10,000+ for a 512GB Mac Studio or similar).
This post isn't about building Claude Code - it's about hooking up an LLM to one or two tool calls in order to run something like ping. For an educational exercise like that a model like Qwen 4B should still be sufficient.
The expectation that reasonable people have isn't fully local claude code, that's a strawman. But it's also not ping tools or the simple weather agent that tutorials like to use. It's somewhere in between, isn't that obvious? If you're into evangelism, acknowledging this and actually taking a measured stance would help prevent light skeptics from turning into complete AI-deniers. If you mislead people about one thing, they will assume they are being misled about everything
https://fly.io/blog/everyone-write-an-agent/ is a tutorial about writing a simple "agent" - aka a thing that uses an LLM to call tools in a loop - that can make a simple tool call. The complaint I was responding to here was that there's no point trying this if you don't want to be hooked on expensive APIs. I think this is one of the areas where the existence of tiny but capable local models is relevant - especially for AI skeptics who refuse to engage with this technology at all if it means spending money with companies they don't like.
I think it is misleading to suggest today that tool-calling for nontrivial stuff really works with local models. It just works in demos because those tools always accept one or two arguments, usually string literals or numbers.
In the real world functions take more complex arguments, many arguments, or take a single argument that's an object with multiple attributes, etc. You can begin to work around this stuff by passing function signatures, typing details, and JSON-schemas to set expectations in context, but local models tend to fail at handling this kind of stuff long before you ever hit limits in the context window. There's a reason demos are always using 1 string literal like hostname, or 2 floats like lat/long. It's normal that passing a dictionary with a few strict requirements might need 300 retries instead of 3 to get a tool call that's syntactically correct and properly passed arguments. Actually `ping --help` for me shows like 20 options, and for any attempt to 1:1 map things with more args I think you'd start to see breakdown pretty quickly.
Zooming in on the details is fun but doesn't change the shape of what I was saying before. No need to muddy the water; very very simple stuff still requires very big local hardware or a SOTA model.
You and I clearly have a different idea of what "very very simple stuff" involves.
Even the small models are very capable of stringing together a short sequence of simple tool calls these days - and if you have 32GB of RAM (eg a ~$1500 laptop) you can run models like gpt-oss:20b which are capable of operating tools like bash in a reasonably useful way.
This wasn't true even six months ago - the local models released in 2025 have almost all had tool calling specially trained into them.
You mean like a demo for simple stuff? Something like hello world type tasks? The small models you mentioned earlier are incapable of doing anything genuinely useful for daily use. The few tasks they can handle are easier and faster to just write yourself with the added assurance that no mistakes will be made.
I’d love to have small local models capable of running tools like current SOTA models, but the reality is that small models are still incapable, and hardly anyone has a machine powerful enough to run the 1 trillion parameter Kimi model.
Yes, I mean a demo for simple stuff. This whole conversation is attached to an article about building the simplest possible tool-in-a-loop agent as a learning exercise for how they work.
This post isn't about building Claude Code - it's about hooking up an LLM to one or two tool calls in order to run something like ping. For an educational exercise like that a model like Qwen 4B should still be sufficient.