Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Gemini 3 is #1 on Vending-Bench 2 (andonlabs.com)
1 point by lukaspetersson 28 days ago | past
Our LLM-controlled office robot can't pass butter (andonlabs.com)
229 points by lukaspetersson 49 days ago | past | 117 comments
Misaligned Vending Machines [pdf] (andonlabs.com)
1 point by bulla 3 months ago | past
Vending-Bench: Testing long-term coherence in agents (andonlabs.com)
3 points by andromaton 5 months ago | past | 1 comment
Vending-Bench: Testing long-term coherence in agents (andonlabs.com)
1 point by vector_spaces 7 months ago | past
Vending-Bench: Testing long-term coherence in agents (andonlabs.com)
5 points by tosh 8 months ago | past | 2 comments
Vending-Bench: Testing long-term coherence in agents (andonlabs.com)
1 point by gdeglin 9 months ago | past
Claude isn't the best Computer-use agent (andonlabs.com)
2 points by lukaspetersson 11 months ago | past

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: