There's a whole lot of undecidable (or effectively undecidable) edge cases that can be adequately covered. As a matter of fact, Decidability Logic is compatible with Prolog.
We would begin by having a Prolog server of some kind (I have no idea if Prolog is parallelized but it should very well be if we're dealing with Horn Clauses).
There would be MCP bindings to said server, which would be accessible upon request. The LLM would provide a message, it could even formulate Prolog statements per a structured prompt, and then await the result, and then continue.
Good. The world model is absolutely the right play in my opinion.
AI Agents like LLMs make great use of pre-computed information. Providing a comprehensive but efficient world model (one where more detail is available wherever one is paying more attention given a specific task) will definitely eke out new autonomous agents.
Swarms of these, acting in concert or with some hive mind, could be how we get to AGI.
I wish I could help, world models are something I am very passionate about.
One theory of how humans work is the so called predictive coding approach. Basically the theory assumes that human brains work similar to a kalman filter, that is, we have an internal model of the world that does a prediction of the world and then checks if the prediction is congruent with the observed changes in reality. Learning then comes down to minimizing the error between this internal model and the actual observations, this is sometimes called the free energy principle. Specifically when researchers are talking about world models they tend to refer to internal models that model the actual external world, that is they can predict what happens next based on input streams like vision.
Why is this idea of a world model helpful? Because it allows multiple interesting things, like predict what happens next, model counterfactuals (what would happen if I do X or don't do X) and many other things that tend to be needed for actual principled reasoning.
In this video we explore Predictive Coding – a biologically plausible alternative to the backpropagation algorithm, deriving it from first principles.
Predictive coding and Hebbian learning are interconnected learning mechanisms where Hebbian learning rules are used to implement the brain's predictive coding framework. Predictive coding models the brain as a hierarchical system that minimizes prediction errors by sending top-down predictions and bottom-up error signals, while Hebbian learning, often simplified as "neurons that fire together, wire together," provides a biologically plausible way to update the network's weights to improve predictions over time.
Only if you also provide it with a way for it to richly interact with the world (i.e. an embodiment). Otherwise, how do you train it? How does a world model verify the correctness of its model in novel situations?
Learning from the real world, including how it responds to your own actions, is the only way to achieve real-world competency, intelligence, reasoning and creativity, including going beyond human intelligence.
The capabilities of LLMs are limited by what's in their training data. You can use all the tricks in the book to squeeze the most out of that - RL, synthetic data, agentic loops, tools, etc, but at the end of the day their core intelligence and understanding is limited by that data and their auto-regressive training. They are built for mimicry, not creativity and intelligence.
Training on 2,500 hours of prerecorded video of people playing Minecraft, they produce a neural net world model of Minecraft. It is basically a learned Minecraft simulator. You can actually play Minecraft in it, in real time.
They then train a neural net agent to play Minecraft and achieve specific goals all the way up to obtaining diamonds. But the agent never plays the real game of Minecraft during training. It only plays in the world model. The agent is trained in its own imagination. Of course this is why it is called Dreamer.
The advantage of this is that once you have a world model, no extra real data is required to train agents. The only input to the system is a relatively small dataset of prerecorded video of people playing Minecraft, and the output is an agent that can achieve specific goals in the world. Traditionally this would require many orders of magnitude more real data to achieve, and the real data would need to be focused on the specific goals you want the agent to achieve. World models are a great way to cheaply amplify a small amount of undifferentiated real data into a large amount of goal-directed synthetic data.
Now, Minecraft itself is already a world model that is cheap to run, so a learned world model of Minecraft may not seem that useful. Minecraft is just a testbed. World models are very appealing for domains where it is expensive to gather real data, like robotics. I recommend listening to the interview above if you want to know more.
World models can also be useful in and of themselves, as games that you can play, or to generate videos. But I think their most important application will be in training agents.
He is one of these people who think that humans have a direct experience of reality not mediated by as Alan Kay put it three pounds of oatmeal. So he thinks a language model can not be a world model. Despite our own contact with reality being mediated through a myriad of filters and fun house mirror distortions. Our vision transposes left and right and delivers images to our nerves upside down, for gawd’s sake. He imagines none of that is the case and that if only he can build computers more like us then they will be in direct contact with the world and then he can (he thinks) make a model that is better at understanding the world
Isn't this idea demonstrably false due to the existence of various sensory disorders too?
I have a disorder characterised by the brain failing to filter own its own sensory noise, my vision is full of analogue TV-like distortion and other artefacts. Sometimes when it's bad I can see my brain constructing an image in real time rather than this perception happening instantaneously, particularly when I'm out walking. A deer becomes a bundle of sticks becomes a muddy pile of rocks (what it actually is) for example over the space of seconds. This to me is pretty strong evidence we do not experience reality directly, and instead construct our perceptions predictively from whatever is to hand.
Pleased to meet someone else who suffers from "visual snow". I'm fortunate in that like my tinnitus, I'm only acutely aware of it when I'm reminded of it, or, less frequently, when it's more pronounced.
You're quite correct that our "reality" is in part constructed. The Flashed Face Distortion Effect [0][1] (wherein faces in the peripheral vision appear distorted due the the brain filling in the missing information with what was there previously) is just one example.
Only tangentially related but maybe interesting to someone here so linking anyways: Brian Kohberger is a visual snow sufferer. Reading about his background was my first exposure to this relatively underpublicized phenomenon.
Ah that's interesting, mine is omnipresent and occasionally bad enough I have to take days off work as I can't read my own code; it's like there's a baseline of it that occasionally flares up at random. Were you born with visual snow or did you acquire it later in life? I developed it as a teenager, and it was worsened significantly after a fever when I was a fresher.
Also do you get comorbid headaches with yours out of interest?
I developed it later in life. The tinnitus came earlier (and isn't as a result of excessive sound exposure as far as I know), but in my (unscientific) opinion they are different manifestations (symptoms) of the same underlying issue – a missing or faulty noise filter on sensory inputs to the brain.
Thankfully I don't get comorbid headaches – in fact I seldom get headaches at all. And even on the odd occasion that I do, they're mild and short-lived (like minutes). I don't recall ever having a headache that was severe, or that lasted any length of time.
Yours does sound much more extreme than mine, in that mine is in no way debilitating. It's more just frustrating that it exists at all, and that it isn't more widely recognised and researched. I have yet to meet an optician that seems entirely convinced that it's even a real phenomenon.
Interesting, definitely agree it likely shares an underlying cause with tinnitus. It's also linked to migraine and was sometimes conflated with unusual forms of migraine in the past, although it's since been found to be a distinct disorder. There's been a few studies done on visual snow patients, including a 2023 fMRI study which implicated regions rich in glutamate and 5HT2A receptors.
I actually suspected 5HT2A might be involved before that study came out, since my visual distortions sometimes resemble those caused by psychedelics. It's also known that both psychedelics and anecdotally from patient's groups SSRIs too can cause a similar symptoms to visual snow syndrome, I had a bad experience with SSRIs for example but serotonin antagonists actually fixed my vision temporarily - albeit with intolerable side-effects so I had to stop.
It's definitely a bit of a faff that people have never heard of it, I had to see a neuro-ophthalmologist and a migraine specialist to get a diagnosis. On the other hand being relatively unknown does mean doctors can be willing to experiment. My headaches at least are controlled well these days.
scoot, you may find the current mini-series by the podcast Unexplainable to be interesting. It's on sound, and one episode is about tinnitus and research into it.
The default philosophical position for human biology and psychology is known as Representational Realism. That is, reality as we know it is mediated by changes and transformations made to sensory (and other) input data in a complex process, and is changed sufficiently to be something "different enough" from what we know to be actually real.
Direct Realism is the idea that reality is directly available to us and any intermediate transformations made by our brains is not enough to change the dial.
Direct Realism has long been refuted. There are a number of examples, e.g. the hot and cold bucket; the straw in a glass; rainbows and other epiphenomena, etc.
the fact that a not-so-direct experience of reality produces "good enough results" (eg. human intelligence) doesn't mean that a more-direct experience of reality won't produce much better results, and it clearly doesn't mean it can't produce these better results in AI
your whole reasoning is neither here not there, and attacking a straw man - YLC for sure knows that human experience of reality is heavily modified and distorted
but he also knows, and I'd bet he's very right on this, that we don't "sip reality through a narrow straw of tokens/words", and that we don't learn "just from our/approved written down notes", and only under very specific and expensive circumstances (training runs)
anything closer to more-direct-world-models (as LLMs are ofc at a very indirect level world models) has very high likelihood of yielding lots of benefits
But he seems to like pretending that we can’t reconfigure that straw of tokens into 4096 straws or a couple billion straws for that matter. LLMs are just barely getting started. That’s not to say there’s no other or better way, but yucking our yum he fails to acknowledge there’s a lot more that can be done with this stuff.
The world model of a language model is a ... language model. Imagine the mind of a blind limbless person, locked in a cell their whole life, never having experienced anything different, who just listens all day to a piped in feed of randomized snippets of WikiPedia, 4chan and math olypiad problems.
The mental model this person has of this feed of words is what an LLM at best has (but human model likely much richer since they have a brain, not just a transformer). No real-world experience or grounding, therefore no real-world model. The only model they have is of the world they have experience with - a world of words.
Whatever idea yann has of JEPA and its supposed superiority compared to LLMs, he doesn't seem to have done a good job of "selling it" without resorting to strawmanning LLMs. From what little I gathered (which may be wrong), his objection to LLMs is something like the "predict next token" inductive bias is too weak for models to be able to meaningfully learn models of things like physics, sufficient to properly predict motion and do well on physical reasoning tasks.
And LLMs are trained on the humans trying to describe all of this through text. The point is not if humans have a true experience of reality, it’s that human writings are a poor descriptor of reality anyway, and so LLMs cannot be a stepping stone.
A world model is a persistent representation of the world (however compressed) that is available to an AI for accessing and compute. For example, a weather world model would likely include things like wind speed, surface temperature, various atmospheric layers, total precipitable water, etc. Now suppose we provide a real time live feed to an AI like an LLM, allowing the LLM to have constant, up to date weather knowledge that it loads into context for every new query. This LLM should have a leg up in predictive power.
Some world models can also be updated by their respective AI agents, e.g. "I, Mr. Bot, have moved the ice cream into the freezer from the car" (thereby updating the state of freezer and car, by transferring ice cream from one to the other, and making that the context for future interactions).
If your "world model" only models a small portion of the world, I think the more appropriate label is a time-series model. Once you truncate correlated data, the model you're left with isn't very worldly at all.
You don't need to load the entire world model in order to be effective at a task. LLM providers already do something similarly described with model routing.
The way I think of it (might be wrong) but basically a model that has similar sensors to humans (eyes, ears) and has action-oriented outputs with some objective function (a goal to optimize against). I think autopilot is the closest to world models in that they have eyes, they have ability to interact with the world (go different directions) and see the response.
> Swarms of these, acting in concert or with some hive mind, could be how we get to AGI.
There's absolutely no reason to think this. In fact, all of the evidence we have to this point suggests that scaling intelligence horizontally doesn't increase capabilities – you have to scale vertically.
Additionally, as it stands I'd argue there's foundational architectural advancements needed before artificial neutral networks can learn and reason at the same level (or better) than humans across a wide variety of tasks. I suspect when we solve this for LLMs the same techniques could be applied to world models. Fundamentally, the question to ask here is whether AGI is io dependant, and I see no reason to believe this to be the case – if someone removes your eyes and cuts off your hands they don't make you any less generally intelligent.
It really hasn't to the scale that you imply. Why hasn't ukraine and russia both used this to completely shut down each others infrastructure? Why isn't russia just hacking all the ukrainian COTS drones? Why hasn't anyone hacked a nuclear power plant?
There is power in restricting access and air gapping helps a lot. A drone (for example) can fall back to basic cryptography to limit access.
Air gapping is a baseline requirement in most safety critical systems. Nuclear power plants in particular have lots of redundant layers of safety. AFAIK Russia hasn't physically tried to cause a meltdown, presumably due to the political blow back (although they have attacked Chernobyl's sarcophagus). I assume this limits their digital espionage attacks too.
We do get glimpses of the use of such malware, like when Saudi Arabia hacked Jeff Bezos' phone. But we don't hear about most of it because there is a benefit to keeping a hack secret, so as to keep access.
Finally, it's usually cheaper to social engineer someone into loading a PowerPoint presentation and doing a local privilege escalation. They burn those for things as petty as getting embarrassing political information.
I doubt that most critical systems are air gapped. Even if there are, most part of Russians economy is not, but is still using IT based on COTS systems. Why wouldn't the Ukraine DoS or compromise the whole non air-gapped IT infrastructure of Russia to hit the economy if they could have easy access to RCE just because they are a government?
I mean, they do all the time. The value is generally in keeping access, however, and operational security and access control is helpful. You can knock a system out but then you just get kicked out and have to start over.
I remain convinced App Store Connect is the project they put interns on. It also explains why they keep redesigning / reimplementing it, then losing interest and leaving it part-finished and incoherent. It’s because the interns working on it go back to school.
Most of the time, I don't personally look at it as cheap labour because I am just ordering, e.g. 60,000 of something or 100,000 of something else.
It's cheap, yes. I can indeed buy 1,000 of something more locally or from other than China.
But when it comes to scale, needing vast shipments, then they are the ones who can actually ship it and do it reliably. It just also happens to be cheaper, too, which is more of a convenience or cherry on top, than the actual attractive part: vast scale.
Is it the noise cancellation making a feedback sound, or is it the pressure differential in the ear canal pulling the ear drum back to produce a white noise?
He said that it goes away when he yawns, so I'm thinking it might be the pressure differential.
Yawning alters the conformation of the external auditory canal by displacing the mandible, which articulates with the tympanic plate of the temporal bone adjacent to the canal.
The seal might be so good that a small pressure differential happens as cabin pressure drops, which causes some issue with the microphone or speaker. Yawning might break that seal, or otherwise cause pressure equalization. Why only the left one? Apple might put some kind of special signal diagnostics or sensors in that side that bugs out under those conditions, or maybe human anatomy on the left side is consistently subtley different in a set of people.
Because this doesn't happen to everybody it could be some kind of "instrument effect" where the particular shape of someone's ear canal, and the interaction with their ear drum and the speakers and sensors in the app creates this tone, likely assisted by the constant driving signal of air cabin white noise.
> The seal might be so good that a small pressure differential happens as cabin pressure drops
That's my guess. I'm very sensitive to pressure changes and I know that cabin pressure on most planes is not constant even when cruising. It's in a range that most people won't notice but it definitely fluctuates near constantly within that band.
reply