Hacker Newsnew | past | comments | ask | show | jobs | submit | skybrian's commentslogin

At first glance, it sounds vaguely similar to creating a bug before implementing a feature. (Or writing a design doc). Is there more to it?

Features and architectuaral decisions are largely separate things, although there can of course be causal links between them. But you can implement new features without having to add a single architectural decision, and you can make architectural decisions and implement them without having to change a single feature (similar to a refactoring). The architecture can enable certain features, but the same feature can usually be implemented in the context of wildly different architectures. You want to keep an organized record of all architectural decisions, independently from features, even if some of them are motivated by features. Architectural decisions often remain relevant even after features have been changed or removed. You could take the architectural decisions (or some subset of them) of one project and apply them to a different project with very different features.

You could use an issue tracker as a database to maintain ADRs, but they would be their own item type. You could maintain ADRs as a list of subsections in a design document (probably not so convenient), or as a (usually rather short) document per architectural decision, which however you’d have to organize somehow. ADRs are more granular than design documents, and they collectively maintain a history of the decisions made.


Some might be duped. But it could also be a way for fans or whales to “donate” to him.

An alternative might be to run the agent in a VM in the cloud and use Syncthing or some other tool like that to move files back and forth. (I'm using exe.dev for the VM.)

fly.io released sprites.dev which basically this. discussed in HN several days ago: https://news.ycombinator.com/item?id=46557825

A bog standard devcontainer works fine too.

Yes this is definitely an area I'm interested in exploring.


I think a good UI would be to prompt it with something like "how far is that hole from the edge?" and it would measure it for you, and then "give me a slider to adjust it," and it gives you a slider that moves it in the appropriate direction. If there were already a dimension for that, it wouldn't help much, but sometimes the distance is derived.

I'd love to have that kind of UI for adjusting dimensions in regular (non-CAD) images. Or maybe adjusting the CSS on web pages?


I think that would make a lot of sense for non-CAD images, but the particular task you described there is do-able in just a few clicks in most CAD systems already. I think the AI would almost always take a longer time to do those kinds of actions than if you did it yourself.

For experts maybe, but beginners would probably find asking questions about how to do things useful.

Being able to undo any changes that Cowork makes seems important. Any plans for automatic snapshots or an undo log?

Technically, isn't the API they want third-party software to use better anyway? This is really about pricing. The price difference between the regular API and the Oauth API is too large.

This "orchestration" software is about people trying to increase productivity by running many instances of a coding agent on the same project, without stepping on each other too much. It doesn't seem to be fully baked yet. A "shared nothing" architecture where you work have each instance work on a distinct project seems simpler if you want to spin more plates.

For a coding agent, the project "learns" as you improve its onboarding docs (AGENTS.md), code, and tests. If you assume you're going to start a new conversation for each task and the LLM is a temp that's going to start from scratch, you'll have a better time.

But these docs are the notes, it constantly needs to be re-primed with them, an approach which doesn’t scale. How much of this knowledge can you really put in these agent docs? There's only so much you can do, and for any serious-scale projects, there's SO much knowledge that needs to be captured. Not just "do this, do that", but also context about why certain decisions were made (rationale, business context, etc).

It is exactly akin to a human that has to write down everything on notes, and re-read them every time.


They don’t need to re read them all every time though, they revise the relevant context for a particular task, exactly as a human would need to do when revisiting an area of the application that they have no recent exposure to… If you’re putting everything into one master MD file in root you’re going very wrong.

But that's the thing: Claude Plays Pokemon is an experiment in having Claude work fully independently, so there's no "you" who would improve its onboarding docs or anything else, it has to do so on its own. And as long as it cannot do so reliably, it effectively has anterograde amnesia.

And just to be clear, I'm mentioning this because I think that Claude Plays Pokemon is a playground for any agentic AI doing any sort of long-term independent work; I believe that the solution needed here is going to bring us closer to a fully independent agent in coding and other domains. It reminds me of the codeclash.ai benchmark, where similar issues are seen across multiple "rounds" of an AI working on the same codebase.


No, but it can produce the onboarding docs itself with some "bootstrap" prompting. E.g. give it a scratchpad to write its own notes in, and direct it to use it liberally. Give it a persistent todo list, and direct it to use it liberally. Tell it to keep a work log. Tell it to commit early and often - you can squash things later, and Claude is very good at navigating git logs.

Sure, it's not close to fully independent. But I was interpreting "much, much less employable" as not very useful for programming in its current state, and I think it is quite useful.

The way amp does this explicitly with threads and hand-offs (and of course the capability to summarize/fetch parts of other threads on demand as opposed to eagerly, like compaction essentially tries to do) makes imho a ton of sense for the way LLMs currently work. "Infinite scroll but not actually" is an inferior approach. I'm surprised others aren't replicating this approach; it's easy to understand, simple to implement and works well.

mfw people do better documentation for AI than for other people in the project

Yeah but it feels terrible. I put as much as I can into Claude skills and CLAUDE.md but the fact that this is something I even have to think about makes me sad. The discrete points where the context gets compacted really feel bad and not like how I think AGI or whatever should work.

Just continuously learn and have a super duper massive memory. Maybe I just need a bazillion GPUs to myself to get that.

But no-one wants to manage context all the time, it's incidental complexity.


I agree with essentially everything you said, except for the final claim that managing context is incidental complexity. From what I know of cognitive science, I would argue that context management is a central facet of intelligence, and a lot of the success of humans in society is dependent on their ability to do so. Looking at it from the other side, executive function disorders such as ADHD offer significant challenges for many humans, and they seem to be not quite entirely unlike these context issues that Claude faces.

no-one wants to manage context all the time

Maybe we'll start needing to have daily stand-ups with our coding agents.


Already should be. Though given the speed difference, when every equivalent of a human-day of work has been done, rather than every 24 hours of wall clock time.

Even with humans, if a company is a car and the non-managers are the engine, meetings are the steering wheel and the mirror checks.


I agree, I started doing something like that a while ago.

I've had great success using Claude Opus 4.5, as long as I hold its hand very tightly.

Constantly updating the CLAUDE.md file, adding an FAQ to my prompts, making sure it remembers what it tried before and what the outcome was. It became a lot more productive after I started doing this.

Using the "main" agent as an orchestrator, and making it do any useful work or research in subagents, has also really helped to make useful sessions last much longer, because as soon as that context fills up you have to start over.

Compaction is fucking useless. It tries to condense +/- 160.000 tokens into a few thousand tokens, and for anything a bit complex this won't work. So my "compaction" is very manual: I keep track of most of the things it has said during the session and what resulted from that. So it reads a lot more like a transcript of the session, without _any_ of the actual tool call results. And this has worked surprisingly well.

In the past I've tried various ways of automating this process, but it's never really turned out great. And none of the LLMs are good at writing _truly_ useful notes.


> If the customer gets stuck on an issue with their own generated codebase, how do we have a hope of finding the problem?

Effectively, the coding agent has to provide front-line support. They ask the coding agent to diagnose the bug and either fix it directly, or generate a bug report and send it upstream.

It seems like a better idea to have a downstream maintainer generate and maintain the language-specific code? If you're providing enterprise support, maybe that downstream maintainer is you.


I assume you meant that the TypeScript compiler is being rewritten in Go. (At first I read it as something entirely different.)

Indeed i ment the compiler being rewritten from JS->Go

With Haxe this would never have been a problem, as the compiler (written in ocaml) was always fast as anything out there.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: