Tech News
← Back to articles

Evaluating Agents

read original related products more articles

“Models constantly change and improve but evals persist”

Look at the data

No amount of evals will replace the need to look at the data, once you have a evals good coverage you’ll be able to decrease the time but it’ll be always a must to just look at the agent traces to identify possible issues or things to improve.

Starting, end to end evals

You must create evals for your agents, stop relying solely on manual testing.

Not sure where to start?

Add e2e evals, define a success criteria (did the agent meet the user’s goal?) and make the evals output a simple yes/no value.

... continue reading