> "GPT-5 is really good at literature search, it 'solved' an apparently-open problem by finding an existing solution"
Survivor bias.
I can assure you that GPT-5 fucks up even relatively easy searches. I need to have a very good idea how the results looks like and the ability to test it to be able to use any result from GPT-5.
If I throw the dice 1000 times and post about it each time that I got a double six. Am I the best dice thrower that there is?
I'm not really sure what you mean. Literature search is about casting a wide net to make a reading list that is relevant to your research.
It is pretty hard to fuck that up, since you aren't expected to find everything anyway. The idea of "testing" and "using any result from GPT" is just, like, reading the papers and seeing if they are tangentially related.
If I may speak to my own experience, literature search has been the most productive application I've personally used, more than coding, and I've found many interesting papers and research directions with it.
One time when I was a kid my dad and I were playing Yahtzee, and he rolled five 5s on his first roll of the turn. He was absolutely stunned, and at the time I was young enough that I didn't understand just how unlikely it was. If I only I knew that I was playing against the best dice thrower!
For literature search that might be ok. It doesn't need to replace any other tools, and if 1/10 it surfaces something you wouldn't have found otherwise it could be worth the time on the dud attempts.
Survivor bias.
I can assure you that GPT-5 fucks up even relatively easy searches. I need to have a very good idea how the results looks like and the ability to test it to be able to use any result from GPT-5.
If I throw the dice 1000 times and post about it each time that I got a double six. Am I the best dice thrower that there is?