Garbage collection for Rust: The finalizer frontier

Abstract Rust is a non-Garbage Collected (GCed) language, but the lack of GC makes expressing data-structures that require shared ownership awkward, inefficient, or both. In this paper we explore a new design for, and implementation of, GC in Rust, called Alloy. Unlike previous approaches to GC in Rust, Alloy allows existing Rust destructors to be automatically used as GC finalizers: this makes Alloy integrate better with existing Rust code than previous solutions but introduces surprising soundness and performance problems. Alloy provides novel solutions for the core problems: finalizer safety analysis rejects unsound destructors from automatically being reused as finalizers; finalizer elision optimises away unnecessary finalizers; and premature finalizer prevention ensures that finalizers are only run when it is provably safe to do so.

1 Introduction

Amongst the ways one can classify programming languages are whether they are Garbage Collected (GCed) or not: GCed languages enable implicit memory management; non-GCed languages require explicit memory management (e.g C 's malloc / free functions). Rust's use of affine types [25, p. 5] and ownership does not fit within this classification: it is not GCed but it has implicit scope-based memory management. Most portions of Rust programs are as succinct as a GCed equivalent, but ownership is too inflexible to express shared ownership for data-structures that require multiple owners (e.g. doubly linked lists). Workarounds such as reference counting impose an extra burden on the programmer, make mistakes more likely, and often come with a performance penalty.

In an attempt to avoid such problems, there are now a number of GCs for Rust (e.g. [2, 11, 14, 32, 33]). Most introduce a user-visible type Gc which takes a value v of type T and moves v to the 'GC heap'. The Gc value itself is a wrapper around a pointer to v on the GC heap. Gc can be cloned (i.e. duplicated) and dereferenced to a value of type &T (i.e. a type-safe pointer) at will by the user. When no Gc wrappers pointing to v can be found, indirectly or directly, from the program's roots (e.g. variables on the stack), then the GC heap memory for v can be reclaimed.

It has proven hard to find a satisfying design and implementation for a GC for Rust, as perhaps suggested by the number of attempts to do so. We identify two fundamental challenges for GC for Rust: how to give Gc an idiomatic and complete API; and how to make finalizers (i.e. the code that is run just before a value is collected by the GC) safe, performant, and ergonomic.

struct GcNode { value: u8, nbr: Option>>} impl Drop for GcNode { fn drop(&mut self) { println!("drop {}", self.value); } } fn main() { let mut gc1 = Gc::new(RefCell::new(GcNode { value: 1, nbr: None } )); gc1.borrow_mut().nbr = Some(gc1); let gc2 = Gc::new(RefCell::new(GcNode { value: 2, nbr: None } )); gc2.borrow_mut().nbr = Some(gc2); gc1 = gc2; force_gc(); println!("{} {}", gc1.borrow().value, gc1.borrow().nbr.unwrap().borrow().value); }

In this paper we introduce Alloy, a new GC for Rust: an example of its use is shown in Listing 1. Alloy uses conservative garbage collection (i.e. treating each reachable machine word as a potential pointer), which naturally solves the API challenge. However, the finalization challenge is much more involved: the causes of this challenge, and our solutions to it, occupy the bulk of this paper.

Normal Rust code uses destructors (i.e. code which is run just before a value is reclaimed by Rust's implicit memory management) extensively. Although finalizers and destructors may seem to be synonyms, existing GCs for Rust cannot reuse destructors as finalizers: the latter must be manually implemented for each type that needs it. Unfortunately, even this is trickier than it appears: it is not possible to implement a finalizer for Gc if T is an external library; some parts of destructors are automatically created by the Rust compiler, but hand-written finalizers must duplicate those parts manually; and users can accidentally cause a type's finalizer to be run more than once. In short, finalization in existing GCs for Rust is unpalatable.

GCs for Rust are not alone in requiring manually written finalizers. In a close cousin to our work, a GC proposal for C++, the reuse of destructors as finalizers was ruled out due to seemingly insoluble problems [8, p. 32], which we divide into four categories: (1) some safe destructors are not safe finalizers; (2) finalizers can be run prematurely; (3) running finalizers on the same thread as a paused mutator can cause race conditions and deadlocks; (4) and finalizers are prohibitively slower than destructors. All are, at least to some degree, classical GC problems; all are exacerbated in some way by Rust; and none, with the partial exception of #2, has existing solutions.

We show that it is possible to reuse most Rust destructors as finalizers in a satisfying way. We introduce novel solutions to the long-standing problems this implies by making use of some of Rust's unusual static guarantees. We thus gain a better GC for Rust and solutions to open GC problems. Our solutions, in order, are: (1) finalizer safety analysis extends Rust's static analyses to reject programs whose destructors are not provably safe to be used as finalizers; (2) premature finalizer prevention automatically inserts fences to prevent the GC from being 'tricked' into collecting values before they are dead; (3) we run finalizers on a separate thread; and (4) and finalizer elision statically optimises away finalizers if the underlying destructor duplicates the GC's work.

... continue reading