Programming Extensible Data Types in Rust with CGP - Part 2: Modular Interpreters and Extensible Visitors
Posted on 2025-07-09
Authored by Soares Chen
Discuss on Reddit, GitHub or Discord.
This is the second part of the blog series on Programming Extensible Data Types in Rust with CGP. You can read the first part here.
As a recap, we have covered the new release of CGP v0.4.2 which now supports the use of extensible records and variants, allowing developers to write code that operates on any struct containing specific fields or any enum containing specific variants, without needing their concrete definition.
In the first part of the series, Modular App Construction and Extensible Builders, we demonstrated an example use of the extensible builder pattern, which uses extensible records to support modular construction of an application context.
In this second part of the series, we will explore the use of extensible variants, by examining how it can be used in an extensible visitor pattern to build a modular interpreter for a toy math expression language.
Part 1: Modular App Construction and Extensible Builders – In this introductory part, we present a high-level overview of the key features enabled by extensible data types. We then dive into a hands-on demonstration showing how extensible records can be used to build and compose modular builders for real-world applications.
Part 2: Modular Interpreters and Extensible Visitors (this post) – This part continues the demonstration by introducing extensible variants. We use them to address the expression problem, implementing a set of reusable interpreter components for a small toy language.
Part 3: Implementing Extensible Records – Here, we walk through the internal mechanics behind extensible records. We show how CGP supports the modular builder pattern demonstrated in Part 1 through its underlying type and trait machinery.
Part 4: Implementing Extensible Variants – This part mirrors Part 3, but for extensible variants. We examine how extensible variants are implemented, and compare the differences and similarities between extensible records and variants.
Earlier, we explored how CGP uses the extensible builder pattern to enable modular construction of context structs. In this article, we will see how a similar approach can be applied to context enums, allowing each variant to be destructured and handled by a flexible, composable set of handlers.
In Rust and many object-oriented languages, this pattern is commonly referred to as the visitor pattern. However, Rust’s powerful enum and match features often reduce the need for the visitor pattern, especially when the concrete enum type is known. In such cases, developers can simply use match expressions to handle each variant explicitly and concisely.
Despite this, the visitor pattern remains useful in situations where the concrete enum type is unknown or abstracted away. This is especially true in libraries like serde and syn , where visitors are used to traverse abstract syntax trees or serialization payloads without tying the implementation to a specific format or structure. For instance, in serde , deserialization is driven by a visitor provided by the target type, which walks through structures like JSON or TOML without coupling the deserializer to any specific data format.
While the visitor pattern is useful, it suffers from a major drawback: it is inherently closed for extension. All possible variants and visitable types must be declared upfront in the visitor interface, and it is challenging to add or remove variants later without breaking existing implementations.
For example, consider the Visitor trait in serde , which defines methods for visiting a fixed set of primitive types — up to 128-bit integers. If a developer wants to deserialize a type that contains a U256 value, there is no way to extend the Visitor trait to support native 256-bit integers. Likewise, if someone builds a new serialization format that introduces support for such a type, it cannot cleanly integrate with serde because the trait cannot be expanded.
To work around this, serde includes a broad set of 26 visitor methods in its core Visitor trait to accommodate a wide range of cases. However, this introduces the opposite problem: when a serialization format does not support a specific visitor method, the only option is to return a runtime error. There is no way to signal at compile time that a type is incompatible with the format, even if it formally implements Serialize or Deserialize .
This mismatch becomes especially noticeable when using compact formats like postcard or bincode , which support only a small subset of types compared to JSON. These libraries accept any type implementing Deserialize , but compatibility is only verified at runtime — leaving users to discover format mismatches through runtime errors instead of compile-time errors.
In short, the traditional visitor pattern tends to be either too restrictive (by enforcing a closed set of operations) or too permissive (by relying on runtime errors to reject unsupported operations). What’s needed is a more flexible, composable alternative — one that allows both sides (visitor and visitee) to express precise requirements at compile time.
This is exactly the problem that the extensible visitor pattern in CGP aims to solve. It enables open-ended, modular visitors that preserve type safety and extensibility, without the pitfalls of runtime errors or rigid interfaces.
While it’s theoretically possible to replace serde ’s visitor pattern with CGP’s extensible alternative, doing so would require significant refactoring and is outside the scope of this post. Instead, we’ll explore a simpler but well-known challenge that illustrates the same limitations: the expression problem.
Suppose we want to implement an interpreter for a toy arithmetic language in Rust. This language might support basic math expressions like 1 + (2 * 3) . A typical way to represent such a language is with an enum like this:
pub enum MathExpr { Literal( u64 ), Plus(Box, Box), Times(Box, Box), }
Here, MathExpr represents arithmetic expressions. The Plus and Times variants contain boxed sub-expressions, and Literal holds an integer value. The use of Box is necessary due to Rust’s size constraints for recursive data structures.
To evaluate these expressions, we can implement a straightforward eval function:
pub fn eval(expr: MathExpr) -> u64 { match expr { MathExpr::Literal(value) => value, MathExpr::Plus(a, b) => eval(*a) + eval(*b), MathExpr::Times(a, b) => eval(*a) * eval(*b), } }
This works well for small examples. But real-world interpreters quickly grow in complexity. Each evaluation case might span dozens — or hundreds — of lines of code. Additionally, the enum itself might have many more variants. For example, syn::Expr , a real-world expression type for Rust, defines over 40 variants.
Let’s assume our toy MathExpr is similarly complex. Now imagine that alongside eval , we also want to define other operations, like pretty-printing:
pub fn expr_to_string(expr: & MathExpr) -> String { match expr { MathExpr::Literal(value) => value.to_string(), MathExpr::Plus(a, b) => format!( "( {} + {} )" , expr_to_string(a), expr_to_string(b)), MathExpr::Times(a, b) => format!( "( {} * {} )" , expr_to_string(a), expr_to_string(b)), } }
Here lies the crux of the expression problem: as the language evolves, we frequently need to add new expression variants or remove old ones. But any modification to the MathExpr enum forces us to update all pattern-matching functions like eval , expr_to_string , and others. The enum becomes tightly coupled to every function that consumes it.
Worse, this coupling is not easy to break. The recursive nature of MathExpr — where variants like Plus contain other MathExpr values — means even modular helper functions (e.g., eval_plus ) must still operate on MathExpr , perpetuating the tight dependency.
This isn’t a problem unique to interpreters. Many recursive data structures — like JSON Value types — suffer from similar issues. A JSON object may contain maps of nested Value s, making any function over the type deeply tied to its structure.
Because of this, extending or experimenting with the enum often becomes burdensome. If the type is part of an upstream crate, users may need to submit a pull request just to add a variant. And if the maintainer declines, downstream users may be forced to fork the crate just to gain the flexibility they need.
In the next section, we’ll explore how CGP's extensible visitor pattern addresses this problem — by decoupling the implementation of each variant from the concrete enum definition.
To demonstrate how CGP enables extensible and decoupled evaluation logic, we will now walk through how to implement a small part of the eval function — specifically, the logic for handling the Plus operator. Rather than tying ourselves to a fixed MathExpr enum, we begin by defining Plus as an independent struct:
pub struct Plus { pub left: Box, pub right: Box, }
In this definition, Plus is no longer a variant of a hardcoded enum. Instead, it is a generic data structure that takes an Expr type parameter. This parameter represents the broader expression type and allows Plus to be reused in many different expression trees. The left and right operands are wrapped in Box to support recursive structures while still satisfying Rust’s size requirements later on.
To actually evaluate such a sub-expression, CGP introduces the concept of a Computer — a CGP component designed for pure computation. It is defined as follows:
#[cgp_component(Computer)] pub trait CanCompute { type Output ; fn compute( & self, _code: PhantomData, input: Input) -> Self:: Output; }
This trait behaves similarly to the Handler trait introduced earlier, but with one key distinction: compute is a synchronous function and does not return a Result . It is called a computer because it embodies a pure, deterministic computation that transforms input into output.
The Computer trait serves as the foundation for extensible evaluation. It abstracts the idea of computation away from any specific expression type or evaluation strategy. Using this abstraction, we can implement evaluation logic for each sub-expression in isolation. For example, here is how we define a provider for evaluating the Plus struct:
#[cgp_new_provider] impl Computer> for EvalAdd where Context: CanCompute, Output: Add