> ... I’ll keep pulling PRs locally, adding more git hooks to enforce code quali...

sc68cal · 2025-08-21T19:06:22 1755803182

They probably don't have any tests, or the tests that the LLM creates are flawed and not detecting these problems

AstroBen · 2025-08-21T19:39:34 1755805174

Just tell the AI "and make sure you don't add bugs or break anything"

Works every time

manmal · 2025-08-21T20:18:10 1755807490

Yesterday Claude Code assured me the following:

• Good news! The code is compiling successfully (the errors shown are related to an existing macro issue, not our new code).

When infact, it managed to insert 10 compilation errors that were not at all related with any macros.

lkramer · 2025-08-21T21:59:34 1755813574

The other day I had Claude proudly proclaim it fixed the bug, by deleting log line that exposed the bug...

deegles · 2025-08-21T19:26:42 1755804402

I tried using agents in Cursor and when it runs into issues it will just rip out the offending code :)

robertfw · 2025-08-22T00:43:16 1755823396

I've had similar cases where the fix to the test was.. delete the test. Ah, if only I'd realized that little hack earlier in my career!

jmvldz · 2025-08-21T20:45:32 1755809132

I don't have a ton of tests. From what I've seen, Claude will often just update the tests to no-op so tests passing isn't trustworthy.

My workflow is often to plan with ChatGPT and what I was getting at here is ChatGPT can often hallucinate features of 3rd party libraries. I usually dump the plan from ChatGPT straight into Claude Code and only look at the details when I'm testing.

That said, I've become more careful in auditing the plans so I don't run in to issues like this.

CuriouslyC · 2025-08-21T21:18:27 1755811107

Tell Claude to use a code review sub agent after every significant change set, tell them to run the tests and evaluate the change set, don't tell Claude it wrote the code, and give them strict review instructions. Works like a charm.

jmvldz · 2025-08-21T21:57:19 1755813439

Interesting. I had not thought about a code review sub agent. I will give that a shot.

cpursley · 2025-08-21T22:44:55 1755816295

Any tips on writing productive review sub agent instruction?

CuriouslyC · 2025-08-21T22:54:23 1755816863

Yes. Go on ChatGPT, explain what you're doing (claude code, trying to get it to be more rigorous with itself and reduce defects) then click deep research and tell it you'd like it to look up code review best practices, AI code review, smells/patterns to look out for in AI code, etc. Then have it take the result of that and generate a XML structured document with a flowchart of the code review best practices it discovered, cribbing from an established schema for element names/attributes when possible, and put it in fenced xml blocks in your subagent. You can also tell claude code to do deep research, you just have to be a little specific about what it should go after.

pluto_modadic · 2025-08-21T19:33:50 1755804830

AI agents have been known to rip out mocks so that the tests pass.

thrown-0825 · 2025-08-21T20:05:03 1755806703

I have had human devs do that too

kiitos · 2025-08-21T21:10:35 1755810635

cool, can you think of any differences between a human engineer, who is presumably employed by an employer and subject to review and evaluation by a manager and inherently assumed to be capable of receiving feedback and reliably applying it on a go-forward basis to their future work, and an LLM, when they each make this same kind of mistake?

cpursley · 2025-08-21T22:46:47 1755816407

Yes, the difference is about $197,600 of playing fair or $57,600 if offshoring.

kiitos · 2025-09-02T20:13:19 1756843999

the difference between an arbitrary LLM and a human engineer is completely described by the salary you would pay to the human engineer? in all other dimensions they are indistinguishable? nice, super cool

thrown-0825 · 2025-08-22T05:55:05 1755842105

yeah, when its a human its not a random chance and was them subverting testing and safety requirements to shortcut their way through their job.

kiitos · 2025-08-22T20:46:37 1755895597

nope not that

loandbehold · 2025-08-21T19:14:40 1755803680

"hallucinated" library features are identified even earlier, when claude builds your project. i also don't get what author is talking about.