Fun with Finite State Transducers

ENOSUCHBLOG Programming, philosophy, pedaling. Aug 14, 2025 Tags: devblog, programming, rust, zizmor I recently solved an interesting problem inside zizmor with a type of state machine/automaton I hadn’t used before: a finite state transducer (FST). This is just a quick write-up of the problem and how I solved it. It doesn’t go particularly deep into the data structures themselves. For more information on FSTs themselves, I strongly recommend burntsushi’s article on transducers (which is what actually led me to his fst crate). TL;DR: I used the fst crate to build a finite state transducer that maps GitHub Actions context patterns to their logical “capability” in the context of a potential template injection vulnerability. This ended up being an order of magnitude smaller in terms of representation (~14.5KB instead of ~240 KB) and faster and more memory efficient than my naïve initial approaches (tables and prefix trie walks). It also enabled me to fully precompute the FST at compile time, eliminating the startup cost of priming a trie- or table-based map. Background zizmor is a static analysis tool for GitHub Actions. One of the categories of weaknesses it can find is template injections, wherein the CI author uses a GitHub Actions expression in a shell or similar context without realizing that the expression can escape any shell-level quoting intended to “defuse” it. Here’s an example, derived from a pattern that gets exploited over and over again: 1 2 3 - name : " Print the current ref" run : | echo "The current ref is: ${{ github.ref }}" If this step is part of a workflow that grants elevated privileges to third parties (like pull_request_target ), and attacker can contrive a git ref that escapes the shell quoting and runs arbitrary code. For example, the following ref: 1 innocent";cat${IFS}/etc/passwd;true${IFS}" …would expand as: 1 echo "The current ref is innocent" ; cat /etc/passwd ; true "" Fortunately, zizmor detects these: 1 2 3 4 5 6 7 8 9 10 11 12 zizmor hackme.yml error[template-injection]: code injection via template expansion -- > hackme.yml:15:41 | 14 | run: | | ^^^ this run block 15 | echo "The current ref is: ${ { github.ref } }" | ^^^^^^^^^^ may expand into attacker-controllable code | = note: audit confidence → High = note: this finding has an auto-fix The problem There’s a very simple way to detect these vulnerabilities: we could walk every code “sink” in a given workflow (e.g. run: blocks, action inputs that are known to contain code, &c.) and look for the fences of an expression ( ${{ ... }} ). If we see those fences, we know that the contents are a potential injection risk. This is appealing for reasons of simplicity, but is unacceptably noisy: There are many actions expressions that are trivially safe, or non-trivial but deductively safe: Literals, e.g. ${{ true }} or ${{ 'lol' }} ; Any expression that can only expand to a literal: 1 2 3 # only ever expands to 'main' or 'not-main' # despite using the github.ref context ${{ github.ref == 'main' && 'main' || 'not-main' }} Any expression that can’t expand to meaningful code, e.g. due to the expression’s type: 1 2 3 4 5 # only ever expands to a number ${{ github.run_number }} # only ever expands to `true` or `false` ${{ github.ref == 'main' }} There are many expressions that can appear unsafe by virtue of dataflow or context expansion, but are actually safe because of the context’s underlying type or constraints: ${{ github.event.pull_request.merged }} is populated by GitHub’s backend and can only expand to true or false , but requires us to know a priori that it’s a “safe” context; ${{ github.actor }} is an arbitrary string, but is limited in structure to characters that make it infeasible to perform a useful injection with (no semicolons, $ , &c.). zizmor generally aims to present low-noise findings, so filtering these out by default is paramount. The first group is pretty easy: we can do a small amount of dataflow analysis to determine whether an expression’s evaluation is “tainted” by arbitrary controllable inputs. The second group is harder, because it requires to know additional facts about arbitrary-looking contexts. The two main facts we care about are type (whether a context expands to a string, a number, or something else) and capability (whether the expansion is fully arbitrary, or constrained in some manner that might make it safe or at least less risky). In practice these both collapse down to capability, since we can categorize certain types (e.g. booleans and numbers) as inherently safe. Fact finding So, what we want is a way to collect facts about every valid GitHub Actions context. The trick to this lies in remembering that, under the hood, GitHub Actions is driven by GitHub’s webhooks API: most of the context state loaded into a GitHub Actions workflow run is derived from the webhook payload corresponding to the event that triggered the workflow. So, how do we get a list of all valid contexts along with information about their expansion? GitHub doesn’t provide this directly, but we can derive it from their OpenAPI specification for the webhooks API. This comes in the form of a ~4.5MB OpenAPI schema, which is pain in the ass to work with directly: it’s both heavily self-referential (by necessity, since an “unrolled” version with inline schemas for each property would infeasibly large), is heavily telescoped (also by necessity, since GitHub’s API responses themselves are not particularly flat), and makes ample use of OpenAPI constructions like oneOf , anyOf , and allOf that require careful additional handling. At the bottom of all of this, however, is our reward: detailed information about every property provided by each webhook event, including the property’s type and valuable information about how the property is constrained. For example, here’s the schema for a pull_request.state property: 1 2 3 4 5 6 "state" : { "description" : "State of this Pull Request. Either `open` or `closed`." , "enum" : [ "open" , "closed" ], "type" : "string" , "examples" : [ "open" ] } This tells us that pull_request.state is a string, but that its value is constrained to either open or closed . We categorize this as having a “fixed” capability, since we know that the attacker can’t control the structure of the value itself in a meaningful way. Long story short: this is implemented as a helper script within zizmor called webhooks-to-contexts.py . This script is run periodically in GitHub Actions and walks the OpenAPI scheme to produces a CSV, context-capabilities.csv , that looks like this: context capability github.event.pull_request.active_lock_reason arbitrary github.event.pull_request.additions fixed github.event.pull_request.allow_auto_merge fixed github.event.pull_request.allow_update_branch fixed github.event.pull_request.assignee fixed github.event.pull_request.assignee.avatar_url structured github.event.pull_request.assignee.deleted fixed github.event.pull_request.assignee.email arbitrary github.event.pull_request.assignee.events_url arbitrary github.event.pull_request.assignee.followers_url structured github.event.pull_request.assignee.following_url arbitrary github.event.pull_request.assignee.gists_url arbitrary github.event.pull_request.assignee.gravatar_id arbitrary github.event.pull_request.assignee.html_url structured …to the tune of about 4000 unique contexts. The bigger problem So, we have a few thousand contexts, each with a capability that tells us how much of a risk that context poses in terms of template injection. We can just shove these into a map and call it a day, right? Wrong. We’ve glossed over a significant wrinkle, which is that context accesses in GitHub Actions are not themselves always literal. Instead, they can be patterns that can expand to multiple values. A good example of this is github.event.pull_request.labels : labels is an array of objects, each of which has a name property corresponding to the label’s actual name. To access these, we can use syntaxes that select individual labels or all labels: 1 2 3 4 5 # access the first label's name github.event.pull_request.labels[0].name # access all labels' names github.event.pull_request.labels.*.name In both cases, we want to apply the same capability to the context’s expansion. To make things even more complicated exciting, GitHub’s own context access syntax is surprisingly malleable: each of the following is a valid and equivalent way to access the first label’s name: 1 2 3 4 5 github.event.pull_request.labels[0].name github.EVENT.PULL_REQUEST.LABELS[0].NAME github['event']['pull_request']['labels'][0]['name'] github['event'].pull_request['labels'][0].name github.event.pULl_ReQUEst['LaBEls'][0].nAmE ..and so forth. In sum, we have two properties that blow a hole in our “just shove it in a map” approach: Contexts are patterned, and can’t be expanded into a static finite enumeration of simplified contexts. For example, we can’t know how many labels a repository has, so we can’t statically unfold the github.event.pull_request.labels.*.name context into N contexts that match everything the user might write in a workflow. Contexts can be expressed through multiple syntaxes. We can keep things simple on the capability extraction side by only using the jq -ish syntax, but we still need to normalize any contexts as they appear in workflows. This is not very difficult, but it makes a single map lookup even less appealing. The solution To recap: We have a few thousands contexts, each of which is really a pattern that can match one or more “concrete” context usages as they appear in a user’s workflow. Each of these context patterns has an associated capability, as one of fixed | structured | arbitrary , indicating how much of a risk the context poses in terms of template injection. , indicating how much of a risk the context poses in terms of template injection. Our goal is to efficiently match these patterns against the contexts as they appear in a user’s workflow, and return the associated capability. To me, this originally smacked of a prefix/radix trie problem: there are a a large number of common prefixes/segments in the pattern set, meaning that the trie could be made relatively compact. However, tries are optimized for operations that the problem doesn’t require: Tries are optimized to grow and shrink at runtime, but we don’t need that: the set of context patterns is static, and we’d ideally trade off some compile-time cost for runtime size and speed. Tries can perform efficient prefix and exact matches, but (typically) at the cost of a larger runtime memory footprint. Finally, on a more practical level: I couldn’t find a great trie/radix trie crate to use. Some of this might have been a discovery failure on my part, but I couldn’t find one that was already widely used and still actively maintained. radix_tree came the closest, but hasn’t been updated in nearly 5 years. While reading about other efficient prefix representation structures, I came across DAFSAs (also sometimes called DAWGs). These offer a significantly more compact representation of prefixes than a trie, but at a cost: unlike a trie, a DAFSA cannot contain auxiliary data. This makes them great for inclusion checking (including of prefixes), but not so great for my purpose of storing each pattern’s associated capability. That brought me to transducers as a class of finite state machines: unlike acceptors (DAFSAs, but also normal DFAs and NFAs) that map from an input sequence to a boolean accept/reject state, transducers map an input sequence to an output sequence. That output sequence can then be composed (e.g. via summation) into an output value. In effect, the “path” an input takes through the transducer yields an output value. In this way, FSTs can behave a lot like a map (whether backed by a prefix trie or a hash table), but with some appealing additional properties: A finite state transducer can compress its entire input, unlike a prefix trie (which only compresses the prefixes). That means a more compact representation. A finite state transducer can share duplicated output values across inputs, unlike a prefix trie or hash table (which would store the same output value multiple times). This also means a more compact representation, and is particularly appealing in our case as we only have a small set of possible capabilities. These desirable properties come with downsides too: optimal FST construction requires memory proportional to the total input size, and requires ordered insertion of each input. Modifications to an FST are also limited: optimal insertions must be ordered, while deletions or changes to an associated value require a full rebuild of the FST. In practice, this that FST construction is a static affair over a preprocessed input set. But that’s perfectly fine for my use case! Putting it together As it turns out, using the fst crate to construct an FST at build time is pretty simple. Here’s the totality of the code that I put in build.rs to transform context-capabilities.csv : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 fn do_context_capabilities () { let manifest_dir = env :: var ( "CARGO_MANIFEST_DIR" ) .unwrap (); let source = Path :: new ( & manifest_dir ) .join ( "data/context-capabilities.csv" ); println! ( "cargo::rerun-if-changed={source}" , source = source .display () ); let out_dir = env :: var ( "OUT_DIR" ) .unwrap (); let out_path = Path :: new ( & out_dir ) .join ( "context-capabilities.fst" ); let out = io :: BufWriter :: new ( File :: create ( out_path ) .unwrap ()); let mut build = fst :: MapBuilder :: new ( out ) .unwrap (); let mut rdr = csv :: ReaderBuilder :: new () .has_headers ( false ) .from_path ( source ) .unwrap (); for record in rdr .records () { let record = record .unwrap (); let context = record .get ( 0 ) .unwrap (); let capability = match record .get ( 1 ) .unwrap () { "arbitrary" => 0 , "structured" => 1 , "fixed" => 2 , _ => panic! ( "Unknown capability" ), }; build .insert ( context , capability ) .unwrap (); } build .finish () .unwrap (); } …and then, loading and querying it: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 static CONTEXT_CAPABILITIES_FST : LazyLock < Map <& [ u8 ] >> = LazyLock :: new (|| { fst :: Map :: new ( include_bytes! ( concat! ( env! ( "OUT_DIR" ), "/context-capabilities.fst" )) .as_slice ()) .expect ( "couldn't initialize context capabilities FST" ) }); impl Capability { fn from_context ( context : & str ) -> Option < Self > { match CONTEXT_CAPABILITIES_FST .get ( context ) { Some ( 0 ) => Some ( Capability :: Arbitrary ), Some ( 1 ) => Some ( Capability :: Structured ), Some ( 2 ) => Some ( Capability :: Fixed ), Some ( _ ) => unreachable! ( "unexpected context capability" ), _ => None , } } } (where the input context has been normalized, e.g. from github["event"]["pull_request"]["title"] to github.event.pull_request.title ). Concluding thoughts Everything above was included in zizmor 1.9.0, as part of a large-scale refactor of the template-injection audit. Was this overkill? Probably: I only have about ~4000 valid context patterns, which would have comfortably fit into a hash table. However, using an FST for this makes the footprint ludicrously and satisfyingly small: each pattern takes less than 4 bytes to represent in the serialized FST, well below the roughly linear memory footprint of loading the equivalent data from context-capabilities.csv . Using an FST also unlocked future optimizations ideas that I haven’t bothered to experiment with yet:

Fun with Finite State Transducers

Share this article

Related Articles