Just to jump in here as one of the authors We designed the API to be as spatiall...

Hammershaft · 2025-03-11T14:35:12 1741703712

Someone below mentioned the ASCII interface for dwarf fortress as being ideal for this, and I wonder if that kind of representation with a legend might produce spatially better results. The drawback I see is that elements can be layered on a tile in Factorio, or have properties that are not visually obvious in ASCII, so the llm would need to be able to introspect on the map.

noddybear · 2025-03-11T14:39:09 1741703949

I think your intuition is correct about the amount of information that needs to be encoded into an ASCII char. You could potentially use unicode to pack more more into each char, e.g direction, type, status etc. Or make each representation available on-demand, i.e 'show me the direction of all inserters in a 10 tile radius'.

vessenes · 2025-03-11T15:09:16 1741705756

Well we learned last month on HN that you can encode arbitrary data into Unicode; anecdotally, o3-mini-high at least could decode it if given instructions.

I wonder what a quick way to calculate how many Unicode characters you’d need is.. I guess every entity + four orientations. Underground belts and pipes seem tough. But I guess you could just add an encoding showing if the square has an underground pipe or encoding.

I propose this would work. I think I’ll give it a try today.. I’d love dwarf fortress factorio. That said, the encode/decode phase seems like a lot of tokens for a model that’s not trained to understand the Unicode ‘map’. Seems like you’d want to fine tune something at least. Maybe a layout model.

noddybear · 2025-03-11T15:24:55 1741706695

Checkout the 'ObserveAll' tool in the repo - its deprecated now, but it pipes all the raw entities on the map back to the agent. You could procedurally convert it to unicode format given a pre-defined codebook (which you give to the agent) before letting the agent observe and reason over it.