Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Its crap at the visual Raven IQ test though, it scores 22% vs an algorithm that takes random guesses scoring 17%.


I'd be cautiuous with such general statements given the rapid pace of development in this area.

Benchmark shelf lives aren't that long.

You ommitted the fact that tuning bumped it to 26% vs random.

Sure, questionable what effort is involved in that step, but at the same time, that hints to me that tuning will be the new baseline within the next 12-24 months.


Sure I would expect it to improve. But it was a bit fishy how 'it took an IQ test!' is in all the highlights but then they mumble quietly about the score that it actually got and hope no-one is listening to that bit.

Its notable that it was able to attempt it at all I suppose.


Semi related, there's a (pretty good) course at OMSCS where the main project is building an agent to solve RPM problems: https://lucylabs.gatech.edu/kbai/spring-2023/project-overvie...

And quite a lot of papers about that: https://scholar.google.com/scholar?q=%22raven%27s+progressiv...


Bet you 5 bucks I can train one that gets 100%. Just gotta train it on the ravens answer key.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: