If you want to learn something cool and related, checkout Autoencoders (or VAEs). They effectively compress information by forming representations of some data.
Indeed. They are extremely cool! <3 :) I think in a way, every neural network works on compression! Though AEs & VAEs are a bit more blatant in how they do it. It's just always turned around a little, here and there. I have a pet theory that no neural network can work without compression (even generative ones that have constantly increasing layer depths! :D :)))) )
The link between compression and intelligence is a popular theory. Indeed the linked work here came out of the Hutter Prize:
> The goal of the Hutter Prize is to encourage research in artificial intelligence (AI). The organizers believe that text compression and AI are equivalent problems. Hutter proved that the optimal behavior of a goal-seeking agent in an unknown but computable environment is to guess at each step that the environment is probably controlled by one of the shortest programs consistent with all interaction so far.
NNCP (Bellard's prelude to ts_zip, using similar techniques) is not qualified for Hutter Prize, btw, because hardware and speed limitations specified by Hutter Prize.
"Must run in ≲50 hours using a single CPU core and <10GB RAM and <100GB HDD on our test machine."
Which is an Intel Core i7-620M
I agree that it is most certainly necessary to have compression to have intelligence, after all, it is the bridge from empirical examples to a learned policy. Oftentimes the subtle switch comes when people say that compression IS intelligence.
Similar to how metabolism in many ways is required for life, yet metabolism itself isn't life.
One of the challenges is that the minimum MDL (Minimum Descriptor Length -- https://en.wikipedia.org/wiki/Minimum_description_length) is intractable to prove directly, we can only prove that we are a bit closer to it than we were before. This of course becomes even more difficult in the temporal regime, as the amount of information to prove that an iterative mapping (i.e. a decision-making algorithm or what have you, in this case) over each time slice in a temporal system is nigh-impossible. We can tell _something_ about it because of the attractors generated by such a system, but even then, I think it's something basically impossible to do in a closed-form manner.
That being said, I do believe compression is required for intelligence, and that deliciously drags in all of the info theory stuff, which is very fun indeed. Just seems like it gets really messy with that time component, but that's just my 2 cents at least. :'(
I didn't know that was related to the Hutter Prize. Very cool. I'd read a little bit about that prize before, and I'll take another look at it now. Probably won't start any work on anything because I really, really, really do not want to Collatz Conjecture myself again.
Oh, I appreciate this point, thanks for making it! I like it a lot. <3 :))))
I think, perhaps, if inputs > outputs and there is some dimensionality reduction (though I think orthogonality would be a trait of an ideal system [i.e. an emergent property approached in the limit], not one that is explicitly enforced each step).