Author of related work here. This is very cool! I was hoping that they would try to invert layer by layer from the output to the input but it seems that they do a search process at the input layer instead. They rightly point out the residual connections make a layer by layer approach difficult. I may point out though that an rmsnorm layer should be invertible due to the epsilon term in the denominator which can be used to recover the input magnitude