Find Related products on Amazon

Shop on Amazon

Open-sourcing circuit tracing tools

Published on: 2025-06-14 18:16:54

In our recent interpretability research, we introduced a new method to trace the thoughts of a large language model. Today, we’re open-sourcing the method so that anyone can build on our research. Our approach is to generate attribution graphs, which (partially) reveal the steps a model took internally to decide on a particular output. The open-source library we’re releasing supports the generation of attribution graphs on popular open-weights models—and a frontend hosted by Neuronpedia lets you explore the graphs interactively. This project was led by participants in our Anthropic Fellows program, in collaboration with Decode Research. An overview of the interactive graph explorer UI on Neuronpedia. To get started, you can visit the Neuronpedia interface to generate and view your own attribution graphs for prompts of your choosing. For more sophisticated usage and research, you can view the code repository. This release enables researchers to: Trace circuits on supported models, ... Read full article.