Hey - yes, I think this is definitely possible, as you don't need any training compute for it to work. Its super easy to plug-and-play different models into new games, once an API is made available.
Models struggle in 2 main areas. The first is spatial reasoning: often the models make off-by-one errors which they find it hard to recover from (as factories are very sensitive to these mistakes - like in programming). The second is in long-term planning, i.e figuring out what to do strategically, before making tactical subgoals.
The difficulty scales in lab-play generally in proportion to the depth of the production chains. If an item requires several factory segments first, this makes it a lot more challenging. I think this is related to planning though, as the models tend to get down 'into the weeds' of fixing minor issues - rather than coming up with a master plan first.
Yes we tried that - as well as a few other visual DSLs for spatial reasoning. They didn't seem to help much, i.e there were no failure modes that this approach solved compared to the simpler approach. As ARC-AGI results showed - there don't seem to be many 'free lunch' solutions to this without actually training.
Models struggle in 2 main areas. The first is spatial reasoning: often the models make off-by-one errors which they find it hard to recover from (as factories are very sensitive to these mistakes - like in programming). The second is in long-term planning, i.e figuring out what to do strategically, before making tactical subgoals.
The difficulty scales in lab-play generally in proportion to the depth of the production chains. If an item requires several factory segments first, this makes it a lot more challenging. I think this is related to planning though, as the models tend to get down 'into the weeds' of fixing minor issues - rather than coming up with a master plan first.