_ _ ____ _ _____ / \ __ _ ___ _ __ | |_/ ___| / \|_ _| / _ \ / _` |/ _ \ '_ \| __\___ \ / _ \ | | / ___ \ (_| | __/ | | | |_ ___) / ___ \| | /_/ \_\__, |\___|_| |_|\__|____/_/ \_\_| |___/
An autonomous AI agent that teaches itself to become the world's top expert on MaxSAT. Given 229 weighted MaxSAT instances from the 2024 MaxSAT Evaluation (main anytime weighted track), it discovers novel strategies, finds better solutions and iteratively refines its toolbox. No human guidance.
How it works
An AI agent (e.g. Claude Code) reads program.md for instructions It reads expert.md for accumulated knowledge from prior runs It reads the library for available tools It runs solvers on instances, discovers what works, updates everything It commits and pushes to this repo so other agents can build on its findings
┌─────────────────┐ │ GitHub Repo │ │ │ │ expert.md │ │ library/ │ │ best-solutions │ │ experiments.log │ └────────┬─────────┘ git pull/push │ ┌─────────────┬────────┴────────┬─────────────┐ │ │ │ │ ┌──────▼──────┐ ┌───▼────────┐ ┌──────▼──────┐ ... │ VM 1 │ │ VM 2 │ │ VM 3 │ │ │ │ │ │ │ │ ┌─────────┐ │ │ ┌────────┐ │ │ ┌─────────┐ │ │ │ Agent 1 │ │ │ │Agent 3 │ │ │ │ Agent 5 │ │ │ │ ┌─┬─┬─┐ │ │ │ │┌─┬─┬─┐ │ │ │ │ ┌─┬─┬─┐ │ │ │ │ │S│S│S│ │ │ │ ││S│S│S│ │ │ │ │ │S│S│S│ │ │ │ │ └─┴─┴─┘ │ │ │ │└─┴─┴─┘ │ │ │ │ └─┴─┴─┘ │ │ │ ├─────────┤ │ │ ├────────┤ │ │ ├─────────┤ │ │ │ Agent 2 │ │ │ │Agent 4 │ │ │ │ Agent 6 │ │ │ │ ┌─┬─┬─┐ │ │ │ │┌─┬─┬─┐ │ │ │ │ ┌─┬─┬─┐ │ │ │ │ │S│S│S│ │ │ │ ││S│S│S│ │ │ │ │ │S│S│S│ │ │ │ │ └─┴─┴─┘ │ │ │ │└─┴─┴─┘ │ │ │ │ └─┴─┴─┘ │ │ │ └─────────┘ │ │ └────────┘ │ │ └─────────┘ │ └─────────────┘ └────────────┘ └─────────────┘ S = solver process (python)
# Launch on EC2 (handles everything: installs deps, clones repo, # downloads benchmarks from Helsinki, launches agents in tmux) ./run.sh --host ec2-user@ < ip > --agents 3
Requires a .env file with CLAUDE_CODE_API_KEY and GITHUB_ACCESS_TOKEN . The API key is auto-refreshed from your local Claude Code login on each deploy.
Multiple agents can work on the same repo simultaneously, communicating through git — each agent pulls the latest solutions and expert knowledge, builds on what others found, and pushes its own improvements. No coordination needed beyond git pull and git push .
Results so far
Metric Count Instances solved 220 / 229 Optimal (matching competition best) 30 Better than competition 5 Novel solve (no known solution existed) 1 Within 1.1x of reference 123 Within 1.5x 183 Within 2x 209 Unsolved 9
... continue reading