Taming P99s in OpenFGA: How we built a self-tuning strategy planner

Operating a latency-critical system means the inevitable work of reducing tail latency. Tail latency refers to the response time experienced by the slowest requests (the outliers), rather than the average. Since authorization happens on every request, these decisions must be fast; otherwise, they directly add overhead to the total response time. For OpenFGA, an open-source authorization system modeled after Google's Zanzibar, that powers up Auth0 FGA, this challenge manifests in its most critical operation: Check . Answering "Can user X access resource Y?" requires traversing relationship graphs. In this context, traversal performance isn't just a feature; it is the fundamental constraint of the system's architecture.

In our quest to reduce latency for the Check API, we initially developed multiple graph traversal strategies tailored to specific data distributions. Our early iterations selected these strategies statically based on the graph node’s complexity, lacking the context to determine whether a specific strategy would actually outperform the default traversal algorithm for a given dataset.

We needed a way to consistently select the optimal path based on performance data, not just static rules. This led to the development of a dynamic, self-tuning planner that learns from production latency in real-time. Because every node in a customer’s graph possesses unique complexity—varying by type of operations, operation count, data cardinality, and subgraph distribution—the planner treats each node independently, applying the most effective strategy for that specific point in the traversal.

This post details the algorithmic framework chosen for the self-tuning planner and the methodology used to calibrate the probabilistic distributions for each traversal strategy. We will examine how this architecture creates an extensible feedback loop, allowing us to continuously inject new, pre-tuned strategies into the decision engine (the planner) improving even more the performance of our system. By decoupling strategy definition from selection logic, we ensure the planner evolves in lockstep with our expanding library of optimization heuristics.

The Problem

At the heart of OpenFGA is a graph of relationship tuples. This graph is defined as a Weighted Directed Cyclic Graph that represents the OpenFGA model. Each edge and node has a defined weight (complexity factor) for each reachable user type in the subgraph. But each element in the graph also stores information about the characteristics of each subgraph, such as recursiveness, cycle presence, public access reachability, which allows us in O(1) time to select the appropriate traversal algorithm to use for each case.

Using this weighted graph, we were able to introduce new algorithms for traversing the subgraph and began seeing latency improvements for some customers. However, this wasn't applicable to everyone, and sometimes performance shifted as data distribution evolved. The reality is that the optimal strategy for traversing a subgraph is not just dependent on static graph characteristics and complexity, but is also highly dependent on the subgraph tuple distribution. This required building a traversal planner, a heuristic engine that selects the optimal algorithm for each subgraph based on both factors.

Our Approach: Why We Chose Thompson Sampling

Our goal was to create an adaptive heuristic enabling the engine to dynamically modify traversal selection during execution based on observed performance. We initially considered implementing the planner using counters and thresholds. While a step in the right direction, this model felt brittle and necessitated the constant manual tuning of "magic numbers." A key insight emerged during our design discussions: this wasn't a simple routing problem, but a classic reinforcement learning scenario known as the Multi-Armed Bandit problem.

In this problem, the gambler has multiple slot machines to play, and he must determine what slot machine to play and how often to maximize their cumulative reward. We can think of our traversal algorithms as the slot machines in the bandit problem, and our planner will act as the agent selecting the strategy for each subgraph request to maximize the payout (or in our context: minimize latency). Crucially, the planner does not evaluate every algorithm indiscriminately; the selection pool is first restricted to only those strategies compatible with the specific subgraph characteristics and request context. This architecture forces us to balance two competing fundamental needs:

... continue reading