Something I've noticed recently, is the new Opus 4.1 model seems to be incredibly good at getting out of these cul-de-sacs.
I've always had a subscription to both ChatGPT and Claude, but Claude has recently almost one-shotted major toxic waste dumps from the previous models.
I'll still use ChatGPT, it seems to be pretty good at algorithms, and bouncing ideas back and forth. but when things go off the rails Opus 4.1 bails me out.
The thing is that since these models aren't actually doing reasoning and don't possess internal world models, you're always going to end up having to rely on your own understanding at some point, they can fill in more of the map with things they can do, but they can't ever make it complete. There will always be cul-de-sacs they end up stuck in, or messes they make, or mistakes they consistently keep making, or make stochastically. So, although that's rather neat, it doesn't really change my point, I don't think.
I understand they don't have a logic engine built into them, ie no deduction, but I do think their inference is a weak form of reasoning, and I'm not sure about world model.
I suppose it depends on the definition of model.
I currently do consider the transformer weights to be a world model, but having a rigid one based on statistical distributions tend to create pretty wonky behavior at times.
That's why I do agree, relying on your own understanding the code is the best way.
It's amazing seeing these things produce some beautiful functions and designs, and then promptly forget that it exists, and then begin writing incompatible, half re-implemented non-idiomatic code.
If you're blind to what they are doing, it's just going to be layers upon layers of absolute dreck.
I don't think they will get out of cul-de-sacs without a true deductive engine, and a core of hard, testable facts to build on. (I'm honestly a bit surprised that this behavior didn't emerge early in training to be honest).
Though I think humans minds are the same way, in this respect, and fall for the same sort of traps. Though at least our neurons can rewire themselves on the fly.
I know a LOT of people who sparingly use their more advanced reasoning faculties, and instead primarily rely on vibes, or pre-trained biases. Even though I KNOW they are capable of better.
Good comment. I'm pretty much on the same page, my only disagreement is that transformers, if they are a world model, are a world model of some sort of semiotic shadow world, not an experiential physical consistent world like ours, so they're not equipped to handle modelling our world.
I’d only recently given Claude a try (been using ChatGPT for a while) and I was blown away. Both for conversational things as well as coding. Enough of a tangible difference for me to cancel my ChatGPT subscription and switch.
I've always had a subscription to both ChatGPT and Claude, but Claude has recently almost one-shotted major toxic waste dumps from the previous models.
I'll still use ChatGPT, it seems to be pretty good at algorithms, and bouncing ideas back and forth. but when things go off the rails Opus 4.1 bails me out.