Hacker Newsnew | past | comments | ask | show | jobs | submit | CharlesW's commentslogin

"Specific on the surface, generic underneath" (Medium paywalled): https://medium.com/tech-renaissance/generic-internals-specif...

Everything related to LLMs is probabilistic, but those rules are also often followed well by agents.

Yes they do, most of the time. Then they don’t. Yesterday, I told codex that it must always run tests by invoking a make target. That target is even configurable w/ parameters, eg to filter by test name. But always, at some point in the session, codex started disregarding that rule and fell back to using the platform native test tool directly. I used strong language to steer it back, but 20% or so of context later, it did that again.

Once the LLM has made one mistake, it's often best to start a new context.

Since its mechanism is to predict the next token of the conversation, it's reasonable to "predict" itself making more mistakes once it has made one.


I‘m not sure this is still the case with codex. In this instance, restarting had no strong effect.

I'd assume it's related to this Amazon "Socratic Human Feedback (SoHF): Expert Steering Strategies for LLM Code Generation" paper: https://assets.amazon.science/bf/d7/04e34cc14e11b03e798dfec5...

> We're back to using LOC as a productivity metric because LLMs are best at cranking out thousands of LOC really fast.

Can you point me to anyone who knows what they're talking about declaring that LOC is the best productivity metric for AI-assisted software development?


Are you implying that the author of this article doesn't know what they are talking about? Because they basically declared it in the article we just read.

Can you point me to where the author of this article gives any proof to the claim of 10x increased productivity other than the screenshot of their git commits, which shows more squares in recent weeks? I know git commits could be net deleting code rather than adding code, but that's still using LOC, or number of commits as a proxy to it, as a metric.


> I know git commits could be net deleting code rather than adding code…

Yes, I'm also reading that the author believes commit velocity is one reflection of the productivity increases they're seeing, but I assume they're not a moron and has access to many other signals they're not sharing with us. Probably stuff like: https://www.amazon.science/blog/measuring-the-effectiveness-...


The iWork apps are all brilliant.

YES!

Also, don't sleep on the tragically underappreciated Pages.

Fair, but Pages tries to hard to be a Word replacement. And I think it calls home to the Apple mothership quite often, too.

Oh, for the good old days of AppleWorks!



Absolutely, but the mods here are great and I trust their opinion.

The panopticon is built on the technology and culture at the center of HN, so hopefully they have sympathy for my not being able to predict that the post would be flagged. I think it's important to understand and discuss the escalating technology-based erosion of our privacy and rights, but I'll guess we'll just do that at the monthly Antifa meetings. /s


> You responded that this is a "gross overgeneralization of the content of the actual study", but the study appears to back up the original statement.

It doesn't, and the study authors themselves are pretty clear about the limitations. The irony is that current foundation models are pretty good at helping to identify why this study doesn't offer useful general insights into the productivity benefits (or lack of) of AI-assisted development.


This story is not about developers, but setting that aside: The reason it's not damning is that the results can't be generalized. It's mostly self-reported "minutes per issue" anecdata by 16 experienced OSS maintainers working on their own, mature repos, most of whom were new to Cursor and did not a chance to develop or adopt frameworks for AI-assisted development.

I think this is missing an important point: the developers in question thought they were being faster, when they were in fact the opposite.

I'd love to see a larger study on more experienced users. The mismatch between their perception and reality is really curious.


Just a guess, but maybe they were really estimating how much effort they made.

It's quite possible that it took less overall mental effort from the developers using AI, but it took more elapsed time.


That has not been my experience at all. Whenever I tried asking the AI to do something, it took an inordinate amount of time and thought to go through its changes.

The mistakes were subtle but critical. Like copying a mutex by value. Now, if I would be writing the code, I would not make the mistake. But when reviewing, it almost slipped by.

And that's where the issue is: you have to review the code as if it's written by a clueless junior dev. So, building up the mental model when reviewing, going through all the cases and thinking of the mistakes that could possibly have happened... sorry, no help.

Maybe 10% of typing it out but when I think about it, it's taking more time because I have to create the mental model in my mind then create the mental model out of the thing that AI typed out and then have to compare the two. This latter is much more time consuming than creating the model in my mind and typing it out.


I think that programming languages (well, at least some of them, maybe not all) have succeeded in being good ways of expressing programmatic thought. If I know what I want in (say) C, it can be faster for me to write C code than to write English that describes C code.

I guess it depends on what you use it for. I found it quite relaxing to get AI to write a bunch of unit tests for existing code. Simple and easy to review, and not fun to write myself.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: