Is There a Half-Life for the Success Rates of AI Agents?
In general, ability to perform a task drops off as its duration increases, so they use the AI agent’s performance on tasks of different lengths to estimate the task-length at which the model would have a 50% success rate. They then showed that this length has been doubling every 7 months as the capabilities of frontier agents improve. The task-lengths are measured by how long it took humans to solve the same tasks. They used 50% success rate as their chief performance threshold because it is th