More

zhisbug · 2025-08-04T22:18:49 1754345929

live demo is here: https://fastwan.fastvideo.org/

zhisbug · 2025-06-30T20:35:28 1751315728

Pokémon is increasingly used to evaluate modern large language models, but current practices lack standardization, and depend heavily on game-specific harness. The Pokémon Red involves three major tasks—navigation, combat control and training a competitive Pokémon team. We find they come with limitations: navigation tasks are too hard, combat control is too simple, and Pokémon team training is too expensive. We address these issues in Lmgame Bench, a new framework offering standardized evaluations and initial results across diverse games.

zhisbug · 2025-06-13T21:46:52 1749851212

where other models tops out in a few moves

zhisbug · 2025-04-08T21:27:22 1744147642

More details here: https://x.com/haoailab/status/1909712259326394519

zhisbug · 2025-03-07T00:06:50 1741306010

We find that spatial perception and spatial reasoning remain very difficult even for the strongest models like o3 or Claude 3.7

zhisbug · 2025-02-28T20:04:27 1740773067

gaming agent code here: https://github.com/lmgame-org/GamingAgent/tree/main

zhisbug · 2025-02-20T18:26:38 1740075998

https://hao-ai-lab.github.io/blogs/sta/

zhisbug · 2025-02-19T00:19:30 1739924370

Sliding tile attention accelerates Hunyuan video generation by 3x with no quality drop and no need for training

zhisbug · 2025-02-18T02:36:07 1739846167

Try our demo and let us know

zhisbug · 2025-02-13T00:22:46 1739406166

This is pretty clever and seems to have high potential, but it still relies on humans. What if some day all humans cannot outsmart AI?

snyhlxde · 2025-02-13T00:30:16 1739406616

When super intelligence comes, it would be very interesting to see multi-party game play among AI too. What role humans play in this story is unclear. Maybe humans can't directly engage in the games neither as they are too naive and will be immediately identified and exploited by AI :)

elpocko · 2025-02-13T06:06:15 1739426775

So transparent.

https://news.ycombinator.com/item?id=43017857

zhisbug 1 day ago | next [–]

We hope to redefine ai evaluation via our gamified AI evaluation platform: game arena!