Skip to content
Tech News
← Back to articles

Durable Execution the Hard Way

read original more articles
Why This Matters

This article highlights the importance of building durable execution engines, which are crucial for ensuring reliability and fault tolerance in long-running, stateful applications like AI agents and workflow systems. By understanding and implementing these engines from scratch, developers can gain deeper insights into system resilience and improve the robustness of their applications.

Key Takeaways

Durable execution, the hard way

Inspired by Kelsey Hightower's Kubernetes the hard way, we're going to build a durable execution engine from scratch using Go and Postgres.

Durable execution is a mechanism to incrementally checkpoint the state of a function as it makes progress, so that in the case of unexpected failure, the function can recover from where it left off. It's particularly relevant in newer stacks and projects implementing AI agents, which are long-running and stateful. A system which implements durable execution is often called a "workflow engine."

This guide uses Go and templated SQL using sqlc. The only dependencies are:

Go 1.25+

Postgres (by default, created via Docker)

pgx

If you are interested in contributing support for other languages, please create a Github issue. I'll be sharing updates (new lessons, other languages) for this guide on Twitter if you'd like to follow along.

Target audience

You will benefit from this guide if you:

... continue reading