Fil-C: A memory-safe C implementation

Fil-C: A memory-safe C implementation [LWN subscriber-only content]

Welcome to LWN.net The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!

Fil-C is a memory-safe implementation of C and C++ that aims to let C code — complete with pointer arithmetic, unions, and other features that are often cited as a problem for memory-safe languages — run safely, unmodified. Its dedication to being " fanatically compatible " makes it an attractive choice for retrofitting memory-safety into existing applications. Despite the project's relative youth and single active contributor, Fil-C is capable of compiling an entire memory-safe Linux user space (based on Linux From Scratch), albeit with some modifications to the more complex programs. It also features memory-safe signal handling and a concurrent garbage collector.

Fil-C is a fork of Clang; it's available under an Apache v2.0 license with LLVM exceptions for the runtime. Changes from the upstream compiler are occasionally merged in, with Fil-C currently being based on version 20.1.8 from July 2025. The project is a personal passion of Filip Pizlo, who has previously worked on the runtimes of a number of managed languages, including Java and JavaScript. When he first began the project, he was not sure that it was even possible. The initial implementation was prohibitively slow to run, since it needed to insert a lot of different safety checks. This has given Fil-C reputation for slowness. Since the initial implementation proved viable, however, Pizlo has managed to optimize a number of common cases, making Fil-C-generated code only a few times slower than Clang-generated code, although the exact slowdown depends heavily on the structure of the benchmarked program.

Reliable benchmarking is notoriously finicky, but in order to get some rough feel for whether that level of performance impact would be problematic, I compiled Bash version 5.2.32 with Fil-C and tried using it as my shell. Bash is nearly a best case for Fil-C, because it spends more time running external programs than running its own code, but I still expected the performance difference to be noticeable. It wasn't. So, at least for some programs, the performance overhead of Fil-C does not seem to be a problem in practice.

In order to support its various run-time safety checks, Fil-C does use a different internal ABI than Clang does. As a result, objects compiled with Fil-C won't link correctly against objects generated by other compilers. Since Fil-C is a full implementation of C and C++ at the source-code level, however, in practice this just requires everything to be recompiled with Fil-C. Inter-language linking, such as with Rust, is not currently supported by the project.

Capabilities

The major challenge of rendering C memory-safe is, of course, pointer handling. This is especially complicated by the fact that, as the long road to CHERI-compatibility has shown, many programs expect a pointer to be 32 or 64 bits, depending on the architecture. Fil-C has tried several different ways to represent pointers since the project's beginning in 2023. Fil-C's first pointers were 256 bits, not thread-safe, and didn't protect against use-after-free bugs. The current implementation, called "InvisiCaps", allows for pointers that appear to match the natural pointer size of the architecture (although this requires storing some auxiliary information elsewhere), with full support for concurrency and catching use-after-free bugs, at the expense of some run-time overhead.

Fil-C's documentation compares InvisiCaps to a software implementation of CHERI: pointers are separated into a trusted "capability" piece and an untrusted "address" piece. Since Fil-C controls how the program is compiled, it can ensure that the program doesn't have direct access to the capabilities of any pointers, and therefore the runtime can rely on them being uncorrupted. The tricky part of the implementation comes from how these two pieces of information are stored in what looks to the program like 64 bits.

When Fil-C allocates an object on the heap, it adds two metadata words before the start of the allocated object: an upper bound, used to check accesses to the object based on its size, and an "aux word" that is used to store additional pointer metadata. When the program first writes a pointer value into an object, the runtime allocates a new auxiliary allocation of the same size as the object being written into, and puts an actual hardware-level pointer (i.e., one without an attached capability) to the new allocation into the aux word of the object. This auxiliary allocation, which is invisible to the program being compiled, is used to store the associated capability information for the pointer being stored (and is also reused for any additional pointers stored into the object later). The address value is stored into the object as normal, so any C bit-twiddling techniques that require looking at the stored value of the pointer work as expected.

... continue reading