IMO that's the fundamental difference between statistics and ML. The culture of stats is about fitting a model and interpreting the fit, while the culture of ML is to treat the model as a black box.
That's one of the reasons that multicollinearity is seen as a big deal by statisticians, but ML practitioners couldn't give a hoot.
You are describing the difference between academic mathematician statisticians and "applied/engineering/actuarial/business" people who use statistics. The "black box" culture goes back to before ML and before both computing Machines M and statistical Learning (iterative models)
I suspect that the "black box" philosophy for statistics/ML is actually bad if you don't have a quick way of verifying the predictions. For instance, using PCA as a "black box" is perfectly fine if you're using it to de-noise readings from a camera or other instrument, because a human being can quickly tell if the de-noising is working correctly or not. But if you're using PCA to make novel discoveries, where you don't have an independent way of checking those discoveries, then it might be outright essential to have a deep definition-theorem-proof style understanding of PCA. What do people think of this hunch?
The point about PCA applies to population genetics and psychometrics (IQ). Some conclusions have been derived using PCA that appear to be supported by little else, and these have come under question.
You make a good point, though the difference between ML and statistics isn't just about interpreting and validating the model. It's about the "novel discoveries" part aka Doing Science.
Statistical modeling is done primarily in service of scientific discovery--for the purpose of making an inference (population estimate from a sample) or a comparison to test a hypothesis derived from a theoretical causal model of a real-world process before viewing data. The parameters of a model are interpreted because they represent an estimate of a treatment effect of some intervention.
Methods like PCA can be part of that modeling process either way, but analyzing and fitting models to data to mine it for patterns without an a priori hypothesis is not science.
Only perfect multicollinearity (correlation of 1.0 or -1.0) is a problem at the linear algebra level when fitting a statistical model.
But theoretically speaking, in a scientific context, why would you want to fit an explanatory model that includes multiple highly (but not perfectly) correlated independent variables?
It shouldn't be an accident. Usually it's because you've intentionally taken multiple proxy measurements of the same theoretical latent variable and you want to reduce measurement error. So that becomes a part of your measurement and modeling strategy.
That's one of the reasons that multicollinearity is seen as a big deal by statisticians, but ML practitioners couldn't give a hoot.