The thing that this article seems to be studiously avoiding mentioning is Kent Overstreet's repeated refusal to adhere to testing code quality, patch, and release standards for the kernel, such that Linus has had to call him out before and his dispute with the Linux Kernel Code of Conduct Committee. This allows them to paint Kent as some reasonable, rational person that's just being unwarrantedly attacked, left and right by everyone in the mailing list. But that's really bullshit. Honestly, I wish people would stop sharing stuff from the register because it's a hack news blog.
There's plenty of cheap furniture that was designed only in CAD or something that is flimsy, doesn't fit human proportions well, and looks ugly in real life, because it was quicker to just throw it together on the computer, CNC it out, and mail the parts to people for them to build themselves than to actually carefully test it and work out the kinks. That's basically what half of IKEA is. So I think this is a decent analogy.
This is a really good post. I'm a naturally controlling person, and I care about my craft a lot, so even in my recent dabbling (on a ~3000 LOC project) with agentic coding, one of the things I naturally did from the start was not just skim the diffs that the AI generated, but decide for myself what technologies should be used, describe the logic and architecture of the code I wanted in detail — to keep my mental model fresh and accurate — and read every single line of code as if it was someone else's, explicitly asking the AI to restructure anything that I didn't feel was the way I'd implemented it — thus ensuring that everything fit my mental model, and going in and manually adding features, and always doing all debugging myself as a natural way to get more familiar with the code.
One of the things I noticed is that I'm pretty sure I was still more productive with AI, but I still had full control over the codebase, precisely because I didn't let AI take over any part of the mental modelling part of the role, only treating it as, essentially, really really good refactoring, autocompletion, and keyboard macro tools that I interact with through an InterLISP-style REPL instead of a GUI. It feels like a lever to actually enable me to add more error handling, make more significant refactors for clarity to fit my mental model, and so on. So I still have a full mental model of where everything is, how it works, how it passes data back and forth, and the only technologies I'm not familiar with the use of in the codebase are things I've made the explicit choice not to learn because I don't want to (TKinter, lol).
Meanwhile, when I introduced my girlfriend (a data scientist) to the same agentic coding tool, her first instinct was to essentially vibe code — let it architect things however it wanted, not describe logic, not build the mental model and list of features explicitly herself, and skim the code (if that) and we quickly ended up in a cul de sac where the code was unfixable without a ton of work that would've eliminated all the productivity benefits.
So basically, it's like that study: if you use AI to replace thinking, you end up with cognitive debt and have to struggle to catch up which eventually washes out all the benefits and leaves you confused and adrift
Something I've noticed recently, is the new Opus 4.1 model seems to be incredibly good at getting out of these cul-de-sacs.
I've always had a subscription to both ChatGPT and Claude, but Claude has recently almost one-shotted major toxic waste dumps from the previous models.
I'll still use ChatGPT, it seems to be pretty good at algorithms, and bouncing ideas back and forth. but when things go off the rails Opus 4.1 bails me out.
The thing is that since these models aren't actually doing reasoning and don't possess internal world models, you're always going to end up having to rely on your own understanding at some point, they can fill in more of the map with things they can do, but they can't ever make it complete. There will always be cul-de-sacs they end up stuck in, or messes they make, or mistakes they consistently keep making, or make stochastically. So, although that's rather neat, it doesn't really change my point, I don't think.
I understand they don't have a logic engine built into them, ie no deduction, but I do think their inference is a weak form of reasoning, and I'm not sure about world model.
I suppose it depends on the definition of model.
I currently do consider the transformer weights to be a world model, but having a rigid one based on statistical distributions tend to create pretty wonky behavior at times.
That's why I do agree, relying on your own understanding the code is the best way.
It's amazing seeing these things produce some beautiful functions and designs, and then promptly forget that it exists, and then begin writing incompatible, half re-implemented non-idiomatic code.
If you're blind to what they are doing, it's just going to be layers upon layers of absolute dreck.
I don't think they will get out of cul-de-sacs without a true deductive engine, and a core of hard, testable facts to build on. (I'm honestly a bit surprised that this behavior didn't emerge early in training to be honest).
Though I think humans minds are the same way, in this respect, and fall for the same sort of traps. Though at least our neurons can rewire themselves on the fly.
I know a LOT of people who sparingly use their more advanced reasoning faculties, and instead primarily rely on vibes, or pre-trained biases. Even though I KNOW they are capable of better.
Good comment. I'm pretty much on the same page, my only disagreement is that transformers, if they are a world model, are a world model of some sort of semiotic shadow world, not an experiential physical consistent world like ours, so they're not equipped to handle modelling our world.
I’d only recently given Claude a try (been using ChatGPT for a while) and I was blown away. Both for conversational things as well as coding. Enough of a tangible difference for me to cancel my ChatGPT subscription and switch.
Interesting. Would you mind elaborating a bit on your workflow? In my work I go back and forth between the "stock" GUIs, and copy-pasting into a separated terminal for model prompts. I hate the vibe code-y agent menu in things like Cursor, I'm always afraid integrated models will make changes that I miss because it really only works with checking "allow all changes" fairly quickly.
Ah, yeah. Some agentic coding systems try to force you really heavily into clicking a loud. I don't think it's intentional, but like, I don't think they're really thinking through the workflow of someone who's picky and wants to be involved as much as I am. So they make it to that, you know, canceling things is really disruptive to the agent or difficult or annoying to do or something. And so it kind of railroads you into letting the agent do whatever it wants, and then trying to clean up after, which is a mess.
Typically, I just use something like QwenCode. One of the things I like about it, and I assume this is true of Gemini CLI as well, is that it's explicitly designed to make it as easy as possible to interrupt an agent in the middle of its thought or execution process and redirect it, or to reject its code changes and then directly iterate on them without having to recapitulate everything from the start. It's as easy as just hitting escape at any time. So I tell it what I want to do by usually giving like a little markdown formatted you know paragraph or so that's you know got some bullet points or some numbers maybe a heading or two, explaining the exact architecture and logic I want for a feature, not just the general feature. And then I let it kind of get started and I see where it's going. And if I generally agree with the approach that it's taking, then I let it turn out a diff. And then if I like the diff after reading through it fully, then I accept it. And if there's anything I don't like about it at all, then I hit Escape and tell it what to change about the disc before it even gets to merge it in.
There are three advantages to this workflow over the chat GPT copy and paste workflow.
One is that the agent can automatically use grep and find and read source files, which makes it much easier and more convenient to load it up with all of the context that it needs to understand the existing style architecture and purpose of your codebase. Thus, it typically generates code that I'm willing to accept more often without me doing a ton of legwork.
The second is that it allows the agent to automatically of its own accord, run things like linters, type checkers, compilers, and tests, and automatically try to fix any warnings or errors in that result, so that it's more likely to produce correct code that adheres to whatever style guide I've provided. Of course, again I could run those tools manually, manually and copy and paste the output into a chat window, but that's just enough extra effort and friction after I've gotten what's ostensibly something working that I know I would be likely to be lazy and not do that at some point. This sort of ensures that it's always done. Some tools like OpenCode even automatically run LSPs and linters and feed that back into the model after the diff is applied automatically, thus allowing it to automatically correct things.
Third, this has the benefit of forcing the AI to use small and localized diffs to generate code, instead of regenerating whole files or just autoregressively completing or filling in the middle for things, which makes it way easier to keep up with what it's doing and make sure you know everything that's going on. It can't slip subtle modifications past you, or, and doesn't tend to generate 400 lines of nonsense.
Jon Gjengset (jonhoo) Who is famously fastidious did a stream on live coding where he did something similar in terms of control. Worth of a watch if that is a style you want to explore.
I don't have the energy to do that for most things I am writing these days which are small PoC where the vibe is fine.
I suspect as you do more, you will create dev guides and testing guides that can encapsulate more of that direction so you won't need to micromanage it.
If you used Gemini CLI, you picked the coding agent with the worst output. So if you got something that worked to your liking, you should try Claude.
> I suspect as you do more, you will create dev guides and testing guides that can encapsulate more of that direction so you won't need to micromanage it.
Definitely. Prompt adherence to stuff that's in an AGENTS/QWEN/CLAUDE/GEMINI.md is not perfect ime though.
>If you used Gemini CLI, you picked the coding agent with the worst output. So if you got something that worked to your liking, you should try Claude.
I'm aware actually lol! I started with OpenCode+GLM 4.5 (via OpenRouter), but I started burning through cache extremely quickly, and I can't remotely afford Claude Code, so I was using qwen-code mostly just for the 2000 free requests a day and prompt caching abilities, and because I prefer Qwen 3 Coder to Gemini... anything for agentic coding.
You can use Claude Code against Kimi K2, DeepSeek, Qwen, etc. The 20$ a month plan gets you access to a token amount of sonnet for coding, but that wouldn't be indicative of how people are using it.
We gave Gemini CLI a spin, it is kinda unhinged, I am impressed you were able to get your results. After reading through the Gemini CLI codebase, it appears to be a shallow photocopy knockoff of Claude Code, but it has no built in feedback loops or development guides other than, "you are an excellent senior programmer ..." the built in prompts are embarrassingly naive.
> You can use Claude Code against Kimi K2, DeepSeek, Qwen, etc.
Yeah but I wouldn't get a generous free tier, and I am Poor lmao.
> I am impressed you were able to get your results
compared to my brief stint with OpenCode and Claude Code with claude code router, qwen-code (which is basically a carbon copy of gemini cli) is indeed unhinged, and worse than the other options, but if you baby it just right you can get stuff done lol
Having read parts of e.g. the "Refactoring" and "Patterns of Enterprise Architecture" books and ThoughtWorks and Fowler web pages and blog posts, and "The Clean Coder", and about distributed computing algorithms; I've been working with a limited set of refactoring terms in my prompts like "factor out", "factor up", "extract an interface/superclass from".
TIL according to Wikipedia, the more correct terms are "pull up" and "push down".
How should they learn terms for refactoring today? Should they too train to code and refactor and track customer expectations without LLMs? There's probably an opportunity to create a good refactoring exercise; with and without LLMs and IDEs and git diff.
System Prompt, System Message, User, User Prompt, Agent, Subagent, Prompt Template, Preamble, Instructions, Prompt Prefix, Few-Shot examples; which thing do we add this to:
First, summarize Code Refactoring terms in a glossary.
Would methods for software quality teams like documentation and tests prevent this cognitive catch-up on so much code with how much explanation at once?
Generate comprehensive unit tests for this. Generate docstrings and add comments to this.
If you build software with genai from just a short prompt, it is likely that the output will be inadequate in regards to the unstated customer specifications and that then there will need to be revisions. Eventually, it is likely that a rewrite or a clone of the then legacy version of the project will be more efficient and maintainable. Will we be attached to the idea of refactoring the code or to refactoring the prompts and running it again with the latest model too?
Retyping is an opportunity to rewrite! ("Punch the keys" -- Finding Forrester)
Are the prompts worth more than the generated code now?
simonw/llm by default saves all prompt inputs and outputs in a sqlite database. Copilot has /save and gemini-cli has /export, but they don't yet autosave or flush before attempting to modify code given the prompt output?
Catch up as a human coder, Catch up the next LLM chat context with the prior chat prompt sequences (and manual modifications, which aren't but probably should be auto-committed distinctly from the LLM response's modifications)
Thanks! I just wish I could afford to work on this full time, or at least even part time. It would help me a lot and prevent me from having to work what is effectively two full time jobs. Rent and food keep getting more expensive in Canada.
Me too. I wouldn't mind Project Xanadu style micro payments for blogs, and it'd both fix the AI scraper issue and the ads issue, and help people fund hosting costs sustainably. I think the issue is taxes and transaction fees would push the prices too high, and it'd price out people with very low income possibly. It'd also create really perverse incentives for even more tight copyright control, since your content appearing even in part on anyone else's website is then directly losing you money, so it'd destroy the public Commons even more, which would be bad. But maybe not, who knows.
I'm not anti-the-tech-behind-AI, but this behavior is just awful, and makes the world worse for everyone. I wish AI companies would instead, I don't know, fund common crawl or something so that they can have a single organization and set of bots collecting all the training data they need and then share it, instead of having a bunch of different AI companies doing duplicated work and resulting in a swath of duplicated requests. Also, I don't understand why they have to make so many requests so often. Why wouldn't like one crawl of each site a day, at a reasonable rate, be enough? It's not like up to the minute info is actually important since LLM training cutoffs are always out of date anyway. I don't get it.
Greed. It's never enough money, never enough data, we must have everything all the time and instantly. It's also human nature it seems, looking at how we consume like there's no tomorrow.
Maybe they assume there'll be only one winner and think, "what if this gives me an edge over the others". And money is no object. Imagine if they cared about "the web".
This isn't AI. This is corporations doing things because they have a profit motive. The issue here is the non-human corporations and their complete lack of accountability even if someone brings legal charges against them. Their structure is designd to abstract away responsibility and they behave that way.
Yeah, that's why I said I'm not against AI as a technology, but against the behavior of the corporations currently building it. What I'm confused by (not really confused, I understand its just negligence and not giving a fuck, but, frustrated and confused in a sort of helpless sense of not being able to get into the mindset) is just that while there isn't a profit motive against doing this (obviously) there's also not clearly a profit motive to do it, it seems like they're wasting their own resources too on unnecessarily frequent data collection, and also it'd be cheaper to pool data collection efforts.
Yeah, I don't think we can regulate this problem away personally. Because whatever regulations will be made will either be technically impossible and nonsensical products of people who don't understand what they're regulating that will produce worse side effects (@simonw extracted a great quote from recent Doctorow post on this: https://simonwillison.net/2025/Aug/14/cory-doctorow/) or just increase regulatory capture and corporate-state bonds, or even facilitate corp interests, because the big corps are the ones with economic and lobbying power.
> fund common crawl or something so that they can have a single organization and set of bots collecting all the training data they need and then share it
That, or, they could just respect robots.txt and we could put enforcement penalties for not respecting the web service's request to not be crawled. Granted, we probably need a new standard but all these AI companies are just shitting all over the web, being disrespectful of site owners because who's going to stop them? We need laws.
IMO, if digital information is posted publicly online, it's fair game to be crawled unless that crawl is unreasonably expensive or takes it down for others, because these are non rivalrous resources that are literally already public.
> we could put enforcement penalties for not respecting the web service's request to not be crawled... We need laws.
How would that be enforceable? A central government agency watching network traffic? A means of appealing to a bureaucracy like the FCC? Setting it up so you can sue companies that do it? All of those seem like bad options to me.
> IMO, if digital information is posted publicly online, it's fair game to be crawled unless that crawl is unreasonably expensive or takes it down for others, because these are non rivalrous resources that are literally already public.
I disagree. Whether or not content should be available to be crawled is dependent on the content's license, and what the site owner specifies in robots.txt (or, in the case of user submitted content, whatever the site's ToS allows)
It should be wholly possible to publish a site intended for human consumption only.
> How would that be enforceable?
Making robots.txt or something else a legal standard instead of a voluntary one. Make it easy for site owners to report violations along with logs, legal action taken against the violators.
> It should be wholly possible to publish a site intended for human consumption only.
You have just described the rationale behind DRM. If you think DRM is a net positive for society, I won't stop you, but there has been plenty published online on the anguish, pain and suffering it has wrought.
Precisely, this would be a system that is essentially designed to ensure that your content can only be accessed by specific kinds of users you approve of, for specific kinds of use you approve of, and only with clients and software that you approve of by means of legislation, so that you don't have to go through the hassle of actually setting up the (user hostile) technologies that would be necessary to enforce this otherwise and/or give up the appearance of an open web by requiring sign ins, while just being hostile on another level. It's trying to have your cake and eat it too, and it will only massively strengthen the entire ecosystem of DRM and IP. I also just personally find the idea of posting something on a board in a town square and then trying to decide who gets to look at it ethically repugnant.
This is actually kind of why I like Anubis. Instead of trying to dictate what clients or purposes or types of users can access a site, it just changes the asymmetry of costs enough that hopefully it fixes the problem. Because like you can still scrape a site behind Anubis, it just takes a little bit more commitment, so it's easier to do it on an individual level than on a mass DoS level.
> unless that crawl is unreasonably expensive or takes it down for others
This _is_ the problem Anubis is intended to solve -- forges like Codeberg or Forgejo, where many routes perform expensive Git operations (e.g. git blame), and scrapers do not respect the robots.txt asking them not to hit those routes.
laws are inherently national, which the internet is not. by all means write a law that crawlers need to obey robots.txt, but how are you going to make russia or china follow that law?
I think the fundamental problem here is that there are two uses for the internet: as a source for on-demand information to learn a specific thing or solve a specific problem, and as a sort of proto-social network, to build human connections. For most people looking things up on the internet, the primary purpose is the former, whereas for most people posting things to the internet, the primary purpose is more the latter. With traditional search, there was an integration of the two desires because people who wanted information had to go directly to sources of information that were oriented towards human connection and then could be enramped onto the human connection part maybe. But it was also frustrating for that same reason, from the perspective of people that just wanted information — a lot of the time the information you were trying to gather was buried in stuff that focused too much on the personal, on the context and storytelling, when that wasn't wanted, or wasn't quite what you were looking for and so you had to read several sources and synthesize them together. The introduction of AI has sort of totally split those two worlds. Now people who just want straight to the point information targeted at specifically what they want will use an AI with web search or something enabled. Whereas people that want to make connections will use RSS, explore other pages on blogs, and us marginalia and wiby to find blogs in the first place. I'm not even really sure that this separation is necessarily ultimately a bad thing since one would hope that the long-term effect of it would be it to filter the users that show up on your blog down to those who are actually looking for precisely what you're looking for.
>from the perspective of people that just wanted information — a lot of the time the information you were trying to gather was buried in stuff that focused too much on the personal, on the context and storytelling, when that wasn't wanted, or wasn't quite what you were looking for and so you had to read several sources and synthesize them together.
When looking for information its critically important to have the story and the context included along side the information. The context is what makes a technical blog post more reliable than an old fourm post. When an AI looks at both and takes the answer the ai user no longer knows where that answer came from and therefore cant make an informed decision on how to interpret the information.
That's a fair point. But it can cite that original context in case the human user decides they need it, which might be the best of both worlds? I'm not sure. Also, long form posts may be more useful in certain cases than forum posts, but technical forums didn't pop up out of nowhere, people created and went to them precisely because they were useful even when blog posts already exist, so there's clearly a space for both. There's overlap, for sure, though.
I don't recall who (unfortunately) but back when i first heard of Gemini (the protocol and related websites, and not the AI), I read a similar (though not exact) comparison...and that was their justification for why something like Gemini websites might eventually thrive...and i agreed with that assessment then, and i agree with your opinions now! My question is: as this splintering gets more and more pronounced, will each separate "world" be named something like the "infonet" (for the AI/get-quick-answers world); and the "socialNet" (for the fun, meandering of digital gardens)? Hmmm...
That's sort of my ideal, to be honest — why I'm less hostile to AI agent browsers. A semantic wikipedia like internet designed for AI agents as well as more traditional org-mode like hypertext database and lookup systems to crawl and correlate for users, and a neocities or gemini-like place full of digital gardens and personal posts and stories. I don't think they'd have to be totally separate — I'm not a huge fan of splitting onto a different protocol, for instance — though; I more imagine them as sort of parallel universes living interlaced through the same internet. I like infonet as a name, but maybe something like personanet would be better for the other?
If you'll forgive me putting my debugging hat on for a bit, because solving problems is what most if us do here, I wonder if it's not actually reading the URL, and maybe that's the source of the problem, bc I've had a lot of success feeding manuals and such to AIs and then asking it to synthesize commands or asking it questions about them. Also, I just tried asking Gemini 2.5 Flash this and it did a web search, found a source, answered my question correctly (ls -a, or -la for more detail), and linked me to the precise part of its source it referenced: https://kinsta.com/blog/show-hidden-files/#:~:text=If%20you'... (this is the precise link it gave me).
Well, in one case (it was borg or restic doc) I noticed it actually picked something correctly from the URL/page and then still messed up in the answer.
What my guess is - maybe it read the URL and mentioned a few things as one part of its "that" answer/output but for the other part it relied it on the learning it already had. Maybe it doesn't learn "on the go". I don't know, could be a safeguard against misinformation or spamming the model or so.
As I said in my comment, I hadn't asked it "ls -a" question but rather something else - different commands on different times which I don't recall now except borg and restic ones which I did recently. "ls -a" is the example I picked to show one of the things I was"cribbing" about.