What about duplicating the entire executable essentially a few times, and jumping to the right version at the very beginning of execution?
You have bigger binaries, but the logistics are simplified compared to shipping multiple binaries and you should get the same speed as multiple binaries with fully inlined code.
Since they don't seem to be doing that, my question is: what's the caveat I'm missing? (Or are the bigger binaries enough of a caveat by themselves?)
Ideally you only need to duplicate until you hit the first not-inlined function call; at that point there’s nothing gained and it’s just a waste of binary size.
You have bigger binaries, but the logistics are simplified compared to shipping multiple binaries and you should get the same speed as multiple binaries with fully inlined code.
Since they don't seem to be doing that, my question is: what's the caveat I'm missing? (Or are the bigger binaries enough of a caveat by themselves?)