Evaluating Agents
“Models constantly change and improve but evals persist” Look at the data No amount of evals will replace the need to look at the data, once you have a evals good coverage you’ll be able to decrease the time but it’ll be always a must to just look at the agent traces to identify possible issues or things to improve. Starting, end to end evals You must create evals for your agents, stop relying solely on manual testing. Not sure where to start? Add e2e evals, define a success criteria (