I’ve found that experienced devs use agentic coding in a more “hands-on” way than beginners and pure vibe-coders.
Vibecoders are the best because they push the models in humorous and unexpected ways.
Junior devs are like “I automated the deploy process via an agent and this markdown file”
Seasoned devs will spend more time writing the prompt for a bug fix, or lazily paste the error and then make the 1-line change themselves.
The current crop of LLMs are more powerful than any of these use cases, and it’s exciting to see experienced devs start to figure that out (I’m not stanning Gas Town[0], but it’s a glimpse of the potential).
Partially related: I really dislike the vibe of Gas Town, both the post and the tool, I really hope this isn't what the future looks like. It just feels disappointing.
To be fair, the author says: "Do not use Gas Town."
I started "fully vibecoding" 6 months ago, on a side-project, just to see if it was possible.
It was painful. The models kept breaking existing functionality, overcomplicating things, and generally just making spaghetti ("You're absolutely right! There are 4 helpers across 3 files that have overlapping logic").
A combination of adjusting my process (read: context management) and the models getting better, has led me to prefer "fully vibecoding" for all new side-projects.
Note: I still read the code that gets merged for my "real" work, but it's no longer difficult for me to imagine a future where that's not the case.
I have noticed in just the past two weeks or so, a lot of the naysayers have changed their tunes. I expect over the next 2 months there will be another sea change as the network effect and new frameworks kick in.
No. If anything we are getting "new" models but hardly any improvements. Things are "improving" on scores, ranking and whatever other metrics the AI industry has invented but nothing is really materializing in real work.
I think we have crossed the chasm and the pragmatists have adopted these tools because they are actually useful now. They've thrown out a lot of their previously held principles and norms to do so and I doubt the more conservative crowd will be so quick to compromise.
2 years sounds more likely than 2 months since the established norms and practices need to mature a lot more than this to be worthy of the serious consideration of the considerably serious.
Curious what fidelity/precision the author finds necessary with Claude 4.5 Opus/GPT 5.2.
Looking at the screenshot of "Tracked Issues", it seems many of the "tasks" are likely overlapping in terms of code locality.
Based on my own experience, I've found the current crop of models to work well at a slightly higher-level of complexity than the tasks listed there, and they often benefit from having a shared context vs. when I've tried to parallelize down to that level of work (individual schema changes/helper creation/etc.).
Maybe I'm still just unclear on the inner workings, but it's my understanding each of those tasks is passed to Claude Code and developed separately?
In either case, I think this project is a glimpse into the future of software development (albeit with a grungy desert punk tinted lens).
For context, I've been "full vibe-coding"[0] for the past 6 months, and though it started painfully, the models are now good enough that not reading the code isn't much of an issue anymore.
> why can't there be an LLM that would always give the exact same output for the exact same input
LLMs are inherently deterministic, but LLM providers add randomness through “temperature” and random seeds.
Without the random seed and variable randomness (temperature setting), LLMs will always produce the same output for the same input.
Of course, the context you pass to the LLM also affects the determinism in a production system.
Theoretically, with a detailed enough spec, the LLM would produce the same output, regardless of temp/seed.
Side note: A neat trick to force more “random” output for prompts (when temperature isn’t variable enough), is to add some “noise” data to the input (i.e. off-topic data that the LLM “ignores” in it’s response).
No, setting the temperature to zero is still going to yeld different results. One might think they add random seeds, but it makes no sense for temperature zero. One theory is that the distributed nature of their systems adds entropy and thus produces different results each time.
Random seeds might be a thing, but for what I see there's a lot demand for reproducibility and yet no certain way to achieve it.
It's not really a mystery why it happens. LLM APIs are non-deterministic from user's point of view because your request is going to get batched with other users' requests. The batch behavior is deterministic, but your batch is going to be different each time you send your request.
The size of the batch influences the order of atomic float operations. And because float operations are not associative, the results might be different.
> Without the random seed and variable randomness (temperature setting), LLMs will always produce the same output for the same input.
Except they won't.
Even at temperature 0, you will not always get the same output as the same input. And it's not because of random noise from inference providers.
There are papers that explore this subject because for some use-cases - this is extremely important. Everything from floating point precision, hardware timing differences, etc. make this difficult.
> Text is the oldest and most stable communication technology
Minor nit: complex language (i.e. Zipf’s law) is the oldest and most stable communication technology.
Before text, we had oral story telling. It allowed us to communicate one generation’s knowledge to the next, and so on.
Arguably this is present elsewhere in the animal kingdom (orcas, elephants, etc.), but human language proves to be the most complex.
Side note: one of my favorite examples is from the Gunditjmara (a group of Aboriginal Australians) who recall a volcanic eruption from 30k+ years ago [0].
Written language (i.e. text) is unique, in that it allows information to pass across multiple generations, without a man-in-the-middle telephone-like game of storytelling.
But both are similar, text requires you to read, in your own voice, the thoughts of another. Storytelling requires you to hear a story, and then communicate it to others.
In either case, the person is required to retell the knowledge, either as an internal monologue or as an external broadcast.
Well, the article had "assuming we treat speech/signing as natural phenomenon" but if you are including biological communication you'd probably have to go with genetic code written in RNA. Nature's way of writing down life's assembly instructions. Four billion years and going strong.
We basically do. The habitable planet hunt is almost definitive for "since we depend on water and complex organic molecules for 'life' we will hunt for this signature to define if we think we have found extrasolar life, radio signals aside"
> You’ve mentioned the 1975 book The Mythical Man-Month so many times that I’m starting to think it’s your only personality trait besides complaining about Tailwind CSS.
I think the future is likely one that mixes the kitchen-sink style MCP resources with custom skills.
Services can provide an MCP-like layer that provides semantic definitions of everything you can do with said service (API + docs).
Skills can then be built that combine some subset of the 3rd party interfaces, some bespoke code, etc. and then surface these more context-focused skills to the LLM/agent.
Couldn’t we just use APIs?
Yes, but not every API is documented in the same way. An “MCP-like” registry might be the right abstraction for 3rd parties to expose their services in a semantic-first way.
Agree. I'd add that a aha moment to skills is AI agents are pretty good at writing skills. Let's say you have developed an involved prompt that explains how to hit an API (possibly with the complexity of reading credentials from an env var or config file) or run a tool locally to get some output you want the agent to analyze (example, downloading two versions of python packages and diffing them to analyze changes). Usually the agent reading the prompt it's going to leverage local tools to do it (curl, shell + stdout, git, whatever) every single time. Every time you execute that prompt there is a lot thinking spent on deciding to run these commands and you are burning tokens (and time!). As an eng you know that this is a relatively consistent and deterministic process to fetch the data. And if you were consuming it yourself, you'd write a script to automate it.
So you read about skills (prompt + scripts) to make this more repeatable and reduce time spent thinking. At that point there are two paths you can go down -- write the skill and prompt yourself for the agent to execute -- or better -- just tell the agent to write the skill and prompt and then you lightly edit it and commit it.
This may seem obvious to some, but I've seen engineers create skills from scratch because they have a mental model around skills being something that people must build for the agent, whereas IMO skills are you just bridging a productivity gap that the agent can't figure out itself (for now), which is instructing it to write tools to automate its own day to day tedium.
The example Datasette plugin authoring skill I used in my article was entirely written by Claude Opus 4.5 - I uploaded a zip file to its the Datasette repo in it (after it failed to clone that itself for some weird environment reason) and had it use its skill-writing skill to create the rest: https://claude.ai/share/0a9b369b-f868-4065-91d1-fd646c5db3f4
That's awesome and I have a few similar conversations with Claude. I wasn't quite an AI luddite a couple months ago, but close. I joined a new company recently that is all in on AI and I have a comically huge token budget so I jumped all the way in myself. I have my choice of tools I can use and once I tried Claude Code it all clicked. The topology they are creating for AI tooling and concepts is the best of all the big LLMs, by far. If they can figure out the remote/cloud agent piece with the level of thoughtfulness they have given to Code, it'd be amazing. Cursor Cloud has that area locked down right now, but I'm looking forward to how Anthropic approaches it.
Completely agree with both points. Skills replacing one-off microservices and agents writing their own skills feel like two sides of the same coin to me.
I’m a solo developer building a markdown-first slide editing app. The core format is just Markdown with --- slide separators, but it has custom HTML comment directives for layouts (<!-- layout: title -->, <!-- layout: split -->, etc.) and content-type detection for tables, code blocks, and Mermaid diagrams. It’s a small DSL, but enough that an LLM without context will generate slides that don’t render optimally.
Right now my app is designed for copy-paste from external LLMs, which means users have to manually include the format spec in their prompts every time. But your comment about agents writing skills made me realize the better path: I could just ask Claude Code to read my parser and layout components, then generate a Slide_Syntax_Guide skill for me. The agent already understands the codebase—it can write the definitive spec better than I could document it manually.
I like to think of LLMs as the internet's Librarian. They've read nearly all the books in the library, can't always cite the exact page, but can point you in the right direction most of the time.
Completely agree, and for me it is not just about the easier/quicker access to information, but the interactivity. I can ask Claude to spend half an hour to create a learning plan for me, then refine it by explaining what I already know and where I see my main gaps.
And then I can, in the same context, ask questions while reading the articles suggested for learning. - There's also danger involved there, as the constant affirmation ("Great Point!", "You're absolutely right!") breeds overconfidence, but it has led me to learn quite a few things in a more formal capacity that I would have endlessly postponed before.
For example, I work quite a lot with k8s, but during the day, I'm always trying to solve a specific problem. I have never just sat down, and started reading about the architecture, design decisions, and underlying tech in a structured format. Now I have a detailed plan ready on how to fill my foundational gaps over the Christmas break, and this will hopefully save me time during the next big deployment/feature rollout.
Vibecoders are the best because they push the models in humorous and unexpected ways.
Junior devs are like “I automated the deploy process via an agent and this markdown file”
Seasoned devs will spend more time writing the prompt for a bug fix, or lazily paste the error and then make the 1-line change themselves.
The current crop of LLMs are more powerful than any of these use cases, and it’s exciting to see experienced devs start to figure that out (I’m not stanning Gas Town[0], but it’s a glimpse of the potential).
[0]https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...
reply