This is exactly how FLE works, the agent writes a program that executes its policy.
I think you bring up a good point, we could create tasks where the goal is to optimise a static factory, starting from a kernel of functionality like 'steam engine power supply' etc.
But it seems like it's being used to generate short snippets that in the examples seem to be equivalent to command lists as opposed to generating a full program that actually plays the whole game by itself.
The model could also then be fed back the results of running the program and iteratively change it as needed.
I.e. prompt first with "Write a program that can play Factorio automatically given an interface <INTERFACE SPECIFICATION> and a set of goals in <GOAL FORMAT>, and produces text output that can help determine whether the program is working correctly and whether tasks are performed efficiently and goals are reached as fast as possible"
And then with "the program was run and produced this text output: <TEXT OUTPUT> Determine any possible bugs, avenues of improvements or missing output information and modify the program accordingly, printing the new version".
And iterate until there doesn't seem to be an improvement anymore.
If I understand you correctly, this approach is sort of supported in FLE - the agents can create functions that encapsulate more complex logic. However, interaction is still synchronous/turn-based. I think to do what you propose, you will need to create event listeners that can trigger the agents program whenever appropriate.
I think you bring up a good point, we could create tasks where the goal is to optimise a static factory, starting from a kernel of functionality like 'steam engine power supply' etc.