But if you are iterating on code and using an LLM without even looking at the code, there's a reasonable chance that when you prompt "okay, now handle factor y also", you end up with code that handles factor y but also handles pre-existing factor x differently for no good reason. And scientific work is probably more likely than average programming to be numerics stuff where seemingly innocuous changes to how things are computed can have significant impacts due to floats being generally unfriendly.
Totally agree, in my experience we are far from having reliable research code based on prompts.
We are clearly not there yet, but I feel that the article is pushing in that direction, maybe to push research in that direction.
There was a long time ago an article from the creators of Mathematica or maple, I don't remember that said something similar. The question was: why do we learn about matrix operations at school, when (modern) tools are able to perform everything. We should teach at school matrix algebra and let students use the software (a little bit like using calculators). This would allow to make children learn more abstract thinking and test way more interesting ideas. (if someone has the reference I'm interested)
I feel the article follow the same lines. But with current tools.
(of course I'm skipping the fact that Mathematica is deterministic in doing algebra, and LLMs are far from it)