Show HN: FLE v0.3 – Claude Code Plays Factorio

Benchmark: Lab-Play

Early signs of life for production automation

Lab-play is a highly constrained environment, where agents are given a fixed set of resources and a single target entity to maximize production throughput. This simple setting has only a tiny fraction of the complexity of open-play, where agents spawn in a procedurally generated map and must achieve a complex goal given no starting inventory and sparser resources. Agents write Python using the FLE API to interact with the game, and observe the standard outputs and error messages from their execution.

We replicate the methodology from the original FLE paper for the lab-play setting to evaluate the strongest models as of September 2025.

The standardized agent harness is minimal: it continuously appends environment interactions to a single conversational history, and when the token budget is nearing exhaustion, it invokes the agent to summarize the older history so it can continue reasoning while remaining aware of past interactions.

We do not evaluate agents with backtracking and/or reflection logic as we did in FLE 0.2.0, and instead we encourage the community to experiment with more advanced agent designs.

Setting

Objective: to achieve production throughput targets of 16 per minute for solid items and 250 per minute for fluids.

to achieve production throughput targets of 16 per minute for solid items and 250 per minute for fluids. Prompt: documentation of the FLE API, Factorio recipes, and a guide describing common patterns.

documentation of the FLE API, Factorio recipes, and a guide describing common patterns. Inventory: a set of useful items for building functional factories.

... continue reading