Of course they hallucinate because we are training on random mode. +Since you me...

Of course they hallucinate because we are training on random mode. +Since you mentioned 3blue1brown there is an excellent video on ANN interpretation based on the works of famous researchers who attempt to provide plausible explanations about how these (transformers based) archs store and retrieve information. Randomness and stochasticity is literally the most basic components which allow all these billions of parameters to represent better embedding spaces almost hilbertian in nature and barely orthogonal as training progresses.

The "emergent structures" you are mentioning are just the outcome of randomness guided by "gradiently" descending to data landscapes. There is nothing to learn by studying these frankemonsters. All these experiments have been conducted in the past (decades past) multiple times but not at this scale.

We are still missing basic theorems, not stupid papers about which tech bro payed the highest electricity bill to "train" on extremely inefficient gaming hardware.