Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> ... I’ll keep pulling PRs locally, adding more git hooks to enforce code quality, and zooming through coding tasks—only to realize ChatGPT and Claude hallucinated library features and I now have to rip out Clerk and implement GitHub OAuth from scratch.

I don't get this, how many git hooks do you need to identify that Claude had hallucinated a library feature? Wouldn't a single hook running your tests identify that?



They probably don't have any tests, or the tests that the LLM creates are flawed and not detecting these problems


Just tell the AI "and make sure you don't add bugs or break anything"

Works every time


Yesterday Claude Code assured me the following:

• Good news! The code is compiling successfully (the errors shown are related to an existing macro issue, not our new code).

When infact, it managed to insert 10 compilation errors that were not at all related with any macros.


The other day I had Claude proudly proclaim it fixed the bug, by deleting log line that exposed the bug...


I tried using agents in Cursor and when it runs into issues it will just rip out the offending code :)


I've had similar cases where the fix to the test was.. delete the test. Ah, if only I'd realized that little hack earlier in my career!


I don't have a ton of tests. From what I've seen, Claude will often just update the tests to no-op so tests passing isn't trustworthy.

My workflow is often to plan with ChatGPT and what I was getting at here is ChatGPT can often hallucinate features of 3rd party libraries. I usually dump the plan from ChatGPT straight into Claude Code and only look at the details when I'm testing.

That said, I've become more careful in auditing the plans so I don't run in to issues like this.


Tell Claude to use a code review sub agent after every significant change set, tell them to run the tests and evaluate the change set, don't tell Claude it wrote the code, and give them strict review instructions. Works like a charm.


Interesting. I had not thought about a code review sub agent. I will give that a shot.


Any tips on writing productive review sub agent instruction?


Yes. Go on ChatGPT, explain what you're doing (claude code, trying to get it to be more rigorous with itself and reduce defects) then click deep research and tell it you'd like it to look up code review best practices, AI code review, smells/patterns to look out for in AI code, etc. Then have it take the result of that and generate a XML structured document with a flowchart of the code review best practices it discovered, cribbing from an established schema for element names/attributes when possible, and put it in fenced xml blocks in your subagent. You can also tell claude code to do deep research, you just have to be a little specific about what it should go after.


AI agents have been known to rip out mocks so that the tests pass.


I have had human devs do that too


cool, can you think of any differences between a human engineer, who is presumably employed by an employer and subject to review and evaluation by a manager and inherently assumed to be capable of receiving feedback and reliably applying it on a go-forward basis to their future work, and an LLM, when they each make this same kind of mistake?


Yes, the difference is about $197,600 of playing fair or $57,600 if offshoring.


the difference between an arbitrary LLM and a human engineer is completely described by the salary you would pay to the human engineer? in all other dimensions they are indistinguishable? nice, super cool


yeah, when its a human its not a random chance and was them subverting testing and safety requirements to shortcut their way through their job.


nope not that


"hallucinated" library features are identified even earlier, when claude builds your project. i also don't get what author is talking about.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: