Its crap at the visual Raven IQ test though, it scores 22% vs an algorithm that ...

thenaturalist · on March 1, 2023

I'd be cautiuous with such general statements given the rapid pace of development in this area.

Benchmark shelf lives aren't that long.

You ommitted the fact that tuning bumped it to 26% vs random.

Sure, questionable what effort is involved in that step, but at the same time, that hints to me that tuning will be the new baseline within the next 12-24 months.

didntreadarticl · on March 1, 2023

Sure I would expect it to improve. But it was a bit fishy how 'it took an IQ test!' is in all the highlights but then they mumble quietly about the score that it actually got and hope no-one is listening to that bit.

Its notable that it was able to attempt it at all I suppose.

nmarinov · on March 1, 2023

Semi related, there's a (pretty good) course at OMSCS where the main project is building an agent to solve RPM problems: https://lucylabs.gatech.edu/kbai/spring-2023/project-overvie...

And quite a lot of papers about that: https://scholar.google.com/scholar?q=%22raven%27s+progressiv...

kalium-xyz · on March 1, 2023

Bet you 5 bucks I can train one that gets 100%. Just gotta train it on the ravens answer key.