I would say compression and intelligence are closely related, not equal. For example, zstd is a pretty good compressor, but does any one think it should be called a pretty good “AI”?
Zstd, while it might be better than gzip or bzip, is still a very poor compressor compared to an ideal compressor (which hasn't yet been discovered).
That is why zstd acts like a rather bad AI. Note that if you wanted to use zstd as an AI, you would patch out of the source code checksum checks, and you would then feed it a file to decompress (The cat sat on the mat), followed by a few bytes of random noise.
A great compressor would output: The cat sat on the mat. It was comfortable, so he then lay down to sleep.
A medium compressor would output: The cat sat on the mat. bat cat cat mat sat bat.
A terrible compressor would output: The cat sat on the mat. D7s"/r %we
See how each is using knowledge at different levels to generate a completion. Notice also how that few bytes generates different amounts of output depending on the compressors level of world understanding, and therefore compression ratio.
> A terrible compressor would output: The cat sat on the mat. Dsr %we3 9T23 }£{D:rg!@ !jv£dP$
LLMs are sort of unable to do this because they use a fixed tokenizer instead of raw bytes. That means they won't output binary garbage even early on + saves a lot of memory, but it may hurt learning things like capitalization, rhyming, etc we think are obvious.
Even if you trained an LLM with a simplified tokenizer that simply had 256 tokens for each of 256 possible ascii characters, you would see the same result.