From 0% to 36% on Day 1 of ARC-AGI-3

The Agentica SDK by Symbolica achieves an unverified competition score of 36.08% on ARC-AGI-3 [1], passing 113 out of 182 playable levels, and completes 7 out of the 25 available games [2].

Our implementation outperforms CoT baselines of 0.2% (Opus 4.6 Max) and 0.3% (GPT 5.4 High), while maintaining a far lower cost: Agentica's 36.08% for $1,005 vs. Opus 4.6's 0.25% for $8,900. Check out the code on Check out the code on GitHub symbolica-ai/ARC-AGI-3-Agents

ARC-AGI-3: Score vs Cost 0 % 10 % 20 % 30 % 40 % Score (%) $1 $10 $100 $1k $10k Cost ($) Gemini 3.1 Pro (Preview) Grok 4.20 (Beta Reasoning) GPT-5.4 (High) Opus 4.6 (Max) SOTA Agentica Opus 4.6 (High)

Figure 1. A comparison of the score and cost per task on the ARC-AGI-3 public eval set between Chain of Thought (CoT) models and the Agentica ARC-AGI-3 agent for Opus 4.6 (120k) High. For details on the cost per task for Agentica Opus 4.6 (120k) High see the code.

Gallery - Games Won

97.6 % 118 actions CN04 97.6 % WIN 84.16 % 273 actions LP85 84.16 % WIN 83.28 % 516 actions AR25 83.28 % WIN 77.59 % 123 actions FT09 77.59 % WIN Show all 7 games ↓

Score Breakdown - All Games

Beat human baseline Game won Game ended Game L 1 L 2 L 3 L 4 L 5 L 6 L 7 L 8 L 9 L 10 Score CN04 20 19 22 21 35 — 97.60 LP85 17 11 18 18 23 153 19 13 84.16 AR25 50 30 97 28 73 84 106 47 83.28 FT09 3 7 14 21 21 56 77.59 CD82 60 36 57 14 16 20 70.15 TR87 42 32 39 29 43 3,962 69.21 TU93 17 18 23 45 81 62 14 91 48 67.87 KA59 37 56 37 52 27 113 59 65.33 SB26 18 221 15 20 17 19 67 203 49.35 M0R0 25 43 121 12 61 — 40.06 RE86 24 37 61 132 66 280 263 — 35.54 SU15 16 232 17 105 136 90 27 150 — 35.17 S5I5 33 72 77 141 365 — — — 23.85 WA30 39 58 86 80 132 — — — — 22.22 SC25 78 9 30 42 — — 18.42 VC33 11 15 29 143 — — — 17.14 DC22 94 99 114 128 — — 15.56 G50T 69 180 467 — — — — 8.70 LS20 26 387 251 213 212 502 — 7.13 LF52 23 137 246 174 928 — — — — — 5.36 R11L 4 432 — — — — 4.76 TN36 57 69 528 — — — — 1.31 SK48 74 72 266 — — — — — 1.21 SP80 28 120 — — — — 0.73 BP35 48 10 — — — — — — — 0.22 Overall 36.08

Chat with Agentica

We've sandboxed the SDK and let it run any persistent task, including solving ARC puzzles.

... continue reading