MEM1 is an RL framework that trains LLM agents to operate over long multi-turn tasks while keeping memory usage nearly constant.
At each step, previous memory and new observations are merged into a compact internal state token (
A masked-trajectory RL scheme reconstructs valid trajectories for PPO without feeding the entire history.
MEM1-7B matches or beats much larger baselines on tasks with up to 16 sequential objectives while reducing memory use by ~3.7×.