Comparing Fable and 10 other LLMs on refactoring a LangGraph god node

Twilight of the Gods. Fable and 10 more LLMs on a Code Reorganization Task. Comparison.¶

Other languages Эта статья также доступна на русском: Гибель богов.

Materials & raw data All 11 model proposals, the cross-reviews, the theses runs, and the ranking script are published here: Materials & reproduce this experiment.

This is a detailed write-up of one experiment. I took a god node from a real LangGraph agent and asked 5 American and 6 Chinese models first to propose how to untangle it, then to evaluate each other's proposals. After that, I tried three different ways to figure out which of them to trust on the matter.

Contents

The original problem¶

You know how it goes: you're building a practice AI agent with the fellas on a course by Data Sanity, and amid the colorful whirl of rapidly accreting features you suddenly notice that one of the project's internal agents has a state graph (LangGraph) that looks like this:

At first glance this is just a cute little octopus — nothing to worry about. But once you know how much logic this octopus has to hold in its modest eight-legged head, it becomes clear right away that we're looking at an anti-pattern. In this case, let's call it a god node.

The plan node hides about 350 lines of logic, including iterative checks, bootstrap questions about region and currency, schema preparation, acquisition-task routing, the LLM call, the subsequent correction of the decision, and so on.

... continue reading