Skip to content
Tech News
← Back to articles

Unix GC Remastered

read original more articles

Introduction

The AF_UNIX garbage collector is an interesting piece of the kernel. It exists because sockets can be sent with SCM_RIGHTS but they can become unreachable from user-space while still being kept alive by the kernel, which is not memory efficient; in this situation, the garbage collector intervenes to free them. Not long ago, the subsystem was rewritten from scratch on top of a graph/Strongly-Connected-Components model; but it is still bug prone. This post walks the rewrite end-to-end, and discusses a Use-After-Free bug.

AF_UNIX Garbage Collector — Background

A per-subsystem garbage collector is responsible for reclaiming kernel objects that can no longer be reached through user-space handles. For AF_UNIX, the entry point is unix_gc() :

static DECLARE_WORK (unix_gc_work, __unix_gc); void unix_gc ( void ) { WRITE_ONCE (gc_in_progress, true); queue_work (system_dfl_wq, & unix_gc_work); }

Its real body is __unix_gc() :

static void __unix_gc ( struct work_struct * work) { struct sk_buff_head hitlist; struct sk_buff * skb; spin_lock ( & unix_gc_lock); if ( ! unix_graph_maybe_cyclic) { spin_unlock ( & unix_gc_lock); goto skip_gc; } __skb_queue_head_init ( & hitlist); if (unix_graph_grouped) unix_walk_scc_fast ( & hitlist); else unix_walk_scc ( & hitlist); spin_unlock ( & unix_gc_lock); skb_queue_walk ( & hitlist, skb) { if ( UNIXCB (skb).fp) UNIXCB (skb).fp -> dead = true; } __skb_queue_purge_reason ( & hitlist, SKB_DROP_REASON_SOCKET_CLOSE); skip_gc: WRITE_ONCE (gc_in_progress, false); }

The unix_sock structure

struct unix_sock { /* WARNING: sk has to be the first member */ struct sock sk; /* inheritance */ struct unix_address * addr; /* bound name */ struct path path; /* filesystem path if bound */ struct mutex iolock, bindlock; struct sock * peer; /* connected peer */ struct list_head link; atomic_long_t inflight; /* [1] SCM_RIGHTS fd count */ /* ... */ struct sk_buff * oob_skb; };

The critical field for GC is inflight ([1]). A socket is “in flight” when its struct file * is riding as SCM_RIGHTS payload — sent by process A, not yet accepted by process B. Each time it is sent, inflight is incremented; each time it is received, inflight is decremented. The GC is looking for sockets for which file_count == inflight : the only remaining references are the ones trapped in other sockets’ receive queues, i.e. no user-space handle can ever reach them again.

... continue reading