As of Dec 2025, Sonnet/Opus and GPTCodex are both trained and most good agent tools (ie. opencode, claude-code, codex) have prompts to fire off subagents during an exploration (use the word explore) and you should be able to Research without needing the extra steps of writing plans and resetting context. I'd save that expense unless you need some huge multi-step verifiable plan implemented.
The biggest gotcha I found is that these LLMs love to assume that code is C/Python but just in your favorite language of choice. Instead of considering that something should be written encapsulated into an object to maintain state, it will instead write 5 functions, passing the state as parameters between each function. It will also consistently ignore most of the code around it, even if it could benefit from reading it to know what specifically could be reused. So you end up with copy-pasta code, and unstructured copy-pasta at best.
The other gotcha is that claude usually ignores CLAUDE.md. So for me, I first prompt it to read it and then I prompt it to next explore. Then, with those two rules, it usually does a good job following my request to fix, or add a new feature, or whatever, all within a single context. These recent agents do a much better job of throwing away useless context.
I do think the older models and agents get better results when writing things to a plan document, but I've noticed recent opus and sonnet usually end up just writing the same code to the plan document anyway. That usually ends up confusing itself because it can't connect it to the code around the changes as easily.
>Instead of considering that something should be written encapsulated into an object to maintain state, it will instead write 5 functions, passing the state as parameters between each function.
Sounds very functional, testable, and clean. Sign me up.
I know this is tongue in cheek, but writing functional code in an object oriented language, or even worse just taking a giant procedural trail of tears and spreading it across a few files like a roomba through a pile of dog doo is ... well.. a code smell at best.
I have a user prompt saved called clean code to make a pass through the changes and remove unused, DRY and refactor - literally the high points of uncle bob's Clean Code. It works shockingly well at taking AI code and making it somewhat maintainable.
>I know this is tongue in cheek, but writing functional code in an object oriented language, or even worse just taking a giant procedural trail of tears and spreading it across a few files like a roomba through a pile of dog doo is ... well.. a code smell at best.
After forcing myself over years to apply various OOP principles using multiple languages, I believe OOP has truly been the worst thing to happen to me personally as engineer. Now, I believe what you actually see is just an "aesthetics" issue, moreover it's purely learned aesthetics.
> As of Dec 2025, Sonnet/Opus and GPTCodex are both trained and most good agent tools (ie. opencode, claude-code, codex) have prompts to fire off subagents during an exploration (use the word explore) and you should be able to Research without needing the extra steps of writing plans and resetting context. I'd save that expense unless you need some huge multi-step verifiable plan implemented.
Does the UI shows clearly what portion was done by a subagent?
The UI (terminal) in Claude code will tell you if it has launched a subagent to research a particular file or problem. But it will not be highlighted for you, simply displayed in its record of prompts and actions.
The biggest gotcha I found is that these LLMs love to assume that code is C/Python but just in your favorite language of choice. Instead of considering that something should be written encapsulated into an object to maintain state, it will instead write 5 functions, passing the state as parameters between each function. It will also consistently ignore most of the code around it, even if it could benefit from reading it to know what specifically could be reused. So you end up with copy-pasta code, and unstructured copy-pasta at best.
The other gotcha is that claude usually ignores CLAUDE.md. So for me, I first prompt it to read it and then I prompt it to next explore. Then, with those two rules, it usually does a good job following my request to fix, or add a new feature, or whatever, all within a single context. These recent agents do a much better job of throwing away useless context.
I do think the older models and agents get better results when writing things to a plan document, but I've noticed recent opus and sonnet usually end up just writing the same code to the plan document anyway. That usually ends up confusing itself because it can't connect it to the code around the changes as easily.