A case study in testing with 100+ Claude agents in parallel

In our previous blog post, we introduced mngr and how you can use it to usefully launch hundreds of parallel agents. Here’s all the details of how we are actually using mngr to run and improve itself, by testing its own demo script.

High-level architecture

This is how the entire setup works:

We start from a tutorial script, tutorial.sh , containing blocks of commands. A block is simply a sequence of consecutive non-empty lines.

, containing blocks of commands. A block is simply a sequence of consecutive non-empty lines. For each block, we derive one or more pytest function.

For each pytest function, we launch an agent to run, debug, fix and improve it.

Finally, we integrate the outcome of all the agents together.

Let’s dive into how each step works.

Writing the tutorial script

This script is seeded with a lot of content we wrote ourselves, but it is a bit tiring once we have written 50 or so examples. So we simply:

... continue reading