Notably, I think this would fail on the Wikipedia compression contest because that counts the size of the decompression program (model in this case), so that it measures Kolmogorov complexity. And RWKV is surely too big. A neat thing to think about though, and maybe a winning idea if a more space-efficient model can be found.