The octopus architecture for AI agents

TorkBot is designed a bit like an octopus. This architecture was born from a series of dead-ends and iterative improvement. When I say octopus, what I mean is that TorkBot has a centralized “brain” directing many semi-autonomous appendages, each with their own brains, reporting back to the central dispatcher.

Static lanes are the long-lived appendages. Curator is one. Plugins can contribute others, like the Google Workspace lane. Lane templates are different. A template is a capability that can be instantiated for a bounded purpose. A sandbox snapshot is different again: it is not a collaborator at all, just a saved filesystem starting point for a future sandbox-backed lane.

Interaction vs capability

Several competing pressures are at play that pushed me into this architecture.

Responsiveness to surface interactions — The agent requires a design in which its turns are more or less bounded in complexity and can avoid I/O entirely. This allows the agent to interact quickly even when tasks or work may take quite some time. Capability — The agent shouldn’t be limited in what it can accomplish just to keep turns efficient. It needs mechanisms to pursue complex tasks through delegation and be able to observe and steer those tasks close to real-time. Continuity — The agent should maintain a continuous perspective and personality. The best continuity comes from a single LLM conversation that is continually curated. In this way, the personality and short-term memory don’t need to be “added in”; instead they’re a side effect of the architecture.

These pressures pushed me into a design with multiple “lanes”, as you can see in the diagram above. The “foreground” lane is the LLM conversation users interact with through surface activity. But here, I have made a bet that is likely controversial: all activity across all surfaces goes through the same foreground conversation. Threads, channels, and even platforms are all collapsed. Right now, that cognitive complexity is perhaps beyond the ability of most models and perhaps even beyond the frontier. But I’m certain that will not be the case for long.

All activity across all surfaces goes through the same foreground conversation.

Input multiplexing That does not mean one model turn per event. Surface messages, system reminders and lane messages accumulate as pending input. They are injected when the target lane can accept a user message: idle, or after a tool batch has flushed. This is what decouples interactivity from activity volume. Ten things can happen and still become one coherent turn at the right boundary. The catch is that the foreground model has to understand recency, priority and interruption.

Part of my thesis with TorkBot is to bet on emergent behaviour and emergent intelligence. Coming up with systems that split LLM conversations across arbitrary platform-defined boundaries is antithetical to the continuity goal. I want my agent to make links across threads and even across surfaces. I want the agent to be able to trivially continue work started in Slack and continued on GitHub. If we’re not there yet in model intelligence, I bet we will soon be and the agentic system designed for that world will stand above the competition in terms of intuitiveness and power.

How the octopus works

... continue reading