Wow, this seems ridiculous. The expected answer is basically finding a loophole ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		mitemte 10 months ago \| parent \| context \| favorite \| on: 30% drop in O1-preview accuracy when Putnam proble... Wow, this seems ridiculous. The expected answer is basically finding a loophole in the problem. I can imagine how worthless all of these models would be if they behaved that way.

stavros 10 months ago [–]

It's not a loophole, the question is "how can he get the goat across?". The answer is he just takes it across.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact