Skip to content
Tech News
← Back to articles

Comparing Fable and 10 other LLMs on refactoring a LangGraph god node

read original more articles
Why This Matters

This experiment highlights the capabilities and limitations of various large language models (LLMs) in refactoring complex code structures like a 'god node' in a LangGraph agent. It underscores the importance of evaluating AI models for code comprehension and restructuring, which is crucial for improving AI development, maintainability, and reliability in the tech industry. Understanding how different models perform in such tasks can guide better integration and trust in AI-assisted coding tools for consumers and developers alike.

Key Takeaways

Twilight of the Gods. Fable and 10 more LLMs on a Code Reorganization Task. Comparison.¶

Other languages Эта статья также доступна на русском: Гибель богов.

Materials & raw data All 11 model proposals, the cross-reviews, the theses runs, and the ranking script are published here: Materials & reproduce this experiment.

This is a detailed write-up of one experiment. I took a god node from a real LangGraph agent and asked 5 American and 6 Chinese models first to propose how to untangle it, then to evaluate each other's proposals. After that, I tried three different ways to figure out which of them to trust on the matter.

Contents

The original problem¶

You know how it goes: you're building a practice AI agent with the fellas on a course by Data Sanity, and amid the colorful whirl of rapidly accreting features you suddenly notice that one of the project's internal agents has a state graph (LangGraph) that looks like this:

flowchart TD planner_start([START]) --> plan[plan] plan -->|search| search[search] plan -->|ask_user| ask_user[ask_user / interrupt] plan -->|reflect| reflect[reflect] plan -->|calculate| calculate[calculate] plan -->|finish| finish[finish] search -->|last_observation| observe[observe] search -->|no hits / backend failure| plan observe --> plan calculate --> plan ask_user --> observe_user[observe_user] observe_user --> plan reflect --> plan finish --> planner_end([END])

At first glance this is just a cute little octopus — nothing to worry about. But once you know how much logic this octopus has to hold in its modest eight-legged head, it becomes clear right away that we're looking at an anti-pattern. In this case, let's call it a god node.

The plan node hides about 350 lines of logic, including iterative checks, bootstrap questions about region and currency, schema preparation, acquisition-task routing, the LLM call, the subsequent correction of the decision, and so on.

... continue reading