At Plotly we did a decent amount of benchmarking to see how much the different defaults `uv` uses lead to its performance. This was necessary so we could advise our enterprise customers on the transition. We found you lost almost all of the speed gains if you configured uv behave as much like pip as you could. A trivial example is the precompile flag, which can easily be 50% of pips install time for a typical data science venv.
The precompilation thing was brought up to the uv team several months ago IIRC. It doesn't make as much of a difference for uv as for pip, because when uv is told to pre-compile it can parallelize that process. This is easily done in Python (the standard library even provides rudimentary support, which Python's own Makefile uses); it just isn't in pip yet (I understand it will be soon).
You'll never be able to do maintenance or upgrade these things. The up front cost seems extremely high given the risk of hardware failure or obselecence at data center scales.
Plotly's new Plotly Studio product is a spec-anchored approach to building data applications. Each chart or dataset gets its own prompt/spec.
The question of how much detail to include in a spec is really hard. We actually split it into two levels - an input prompt describing details the user cares about in that component and an output spec describing what was built to allow verification.
A company I worked for switched over to pre-recorded demos and everyone talked about how clever it was for the first few larger audiences. Then they made a mistake and replayed a clip during the same session and the audience chat blew up. You could see a dip in new users for days after the demo.
While I get the academic perspective of sharing these insights, this article comes across as corporate justifying/complaining that their model's score is lower than it should be on the leaderboards... by saying the leaderboards are wrong.
Or an even darker take is that its coorporate saying they won't prioritize eliminating hallucinations until the leaderboards reward it.
Yes, it's self-interested because they want to improve the leaderboards, which will help GPT-5 scores, but in the other hand, the changes they suggest seem very reasonable and will hopefully help everyone in the industry do better.
And I'm sure other people will complain if notice that changing the benchmarks makes things worse.
This very much confused me. Isn't the idea behind this movement that Europe doesn't want to be dependent on external companies for critical infrastructure? Won't this just be the equivalent of a shell company completely dependant on Amazon in the US for any future fixes or R&D?
https://plotly.com/blog/uv-python-package-manager-quirks/
reply