SBCL Fibers – Lightweight Cooperative Threads

This is a work-in-progress draft document describing lightweight userland cooperative threads for SBCL. The implementation is under active development and details may change. This is a living document — you can view its revision history. You can try it out on the fibers-v2 branch at github.com/atgreen/sbcl.

Table of Contents#

Introduction Motivation: Why Fibers? Design Goals Terminology Programming API Creating and Running Fibers Yielding and Waiting Fiber Sleep and Timed Waits Fiber Parking (Condition-Based Suspend/Resume) Fiber Join Spawning Fibers from Fibers Multi-Carrier Scheduling run-fibers (Simple, Blocking) start-fibers / submit-fiber / finish-fibers (Dynamic, Long-Lived) Fiber-Aware Blocking Primitives Mutex Acquisition Condition Variables Semaphores I/O ( wait-until-fd-usable ) sleep and wait-for Fiber Pinning (Preventing Yield) Introspection ( list-all-fibers , fiber-state , Backtraces) Idle Hooks API Reference Summary Architecture Overview Carrier Threads and Schedulers Fiber Lifecycle State Machine Data Flow: Yield and Resume Context Switching Register Save/Restore Convention The fiber_switch Assembly Routine Stack Frame Initialization for New Fibers The Entry Trampoline (Assembly to C to Lisp) Zero-Allocation Design (No SAPs, No GC Pressure) Thread Register Patching on Carrier Migration Stack Management Control Stack Layout and Guard Pages Binding Stack (Separate Allocation) Stack Pooling ( madvise(MADV_DONTNEED) Recycling) Stack Overflow Detection in Signal Handlers Stack Size Tradeoffs (No Dynamic Growth) Dynamic Variable Bindings (TLS) The Problem: unbind_to_here Zeroes Entries TLS Overlay Arrays (Save/Restore Without Unbinding) Carrier Value Update on Migration Catch Block and Unwind-Protect Chain Save/Restore Empty Binding Stack Fast Path Same-Carrier Resume Optimization Garbage Collector Integration The Two-List Design: Suspended Fibers vs. Active Contexts fiber_gc_info : Conservative Control Stack Scanning active_fiber_context : Carrier + Fiber Stack Visibility Precise Binding Stack Scavenging Persistent Carrier Context (Lock-Free Hot Path) The fiber-context Thread Struct Slot (O(1) Lookup) GC Safety Windows ( without-interrupts vs. without-gcing ) Correctness Argument: Why Partial Updates Are Safe Scheduler Design The Scheduler Loop Fast Path: Skip Maintenance When Deque Is Hot Maintenance: Pending Queue Drain, Deadline Expiry, Wake Checks Post-Switch Dispatch (Suspended, Dead) Idle Detection and Carrier Parking Maintenance Frequency Backstop (Every 64 Fibers) Work Stealing Chase-Lev Lock-Free Deque Owner Operations (Push/Pop from Bottom, LIFO) Thief Operations (Steal from Top, FIFO, CAS) Buffer Growth (Power-of-Two Circular Array) Random Victim Selection Fiber Migration and Thread Register Fixup I/O Multiplexing Platform Abstraction (epoll, kqueue, poll Fallback) Edge-Triggered Mode with One-Shot ( EPOLLET | EPOLLONESHOT ) Indexed I/O Waiters (fd-to-Fiber Table) Bounded Epoll Drain Loop fd-ready-p and wait-until-fd-usable Integration The Default I/O Idle Hook Batched FD Polling vs. Per-Fiber Polling Deadline Scheduling Binary Min-Heap with Inline Index O(log N) Insert and Remove Batch Expiry (Pop All Expired in One Pass) Interaction with I/O Waiters (Dual-Indexed Fibers) Fiber Death and Cleanup The Lisp Trampoline ( fiber-trampoline ) Error Handling and Result Capture Binding Stack Cleanup on Death (Without unbind_to_here ) GC Info Unregistration Timing Stack Return to Pool Integration with SBCL Thread Struct Extension ( fiber-context Slot) serve-event Dispatch (Fiber-Aware wait-until-fd-usable ) sleep and wait-for Dispatch Mutex and Condition Variable Dispatch Pinned Blocking Fallback to OS Primitives *pinned-blocking-action* Warning/Error Policy interrupt-thread Debugger Integration Performance HTTP Benchmark Memory Efficiency Scalability Under High Connection Counts Context Switch Microbenchmark GC Impact Platform Support x86-64 (Linux, macOS, Windows) ARM64 ARM32 PPC64 PPC32 RISC-V Feature Flag: :sb-fiber Platform-Specific Assembly and I/O Backends

Appendix A: Using Hunchentoot with Fibers

The Fiber Taskmaster Taskmaster Methods Starting the Server How It Works SSL Considerations

Many server workloads are concurrent but not parallel. A web server handling 10,000 connections spends almost all of its time waiting for network I/O; the actual computation per request is trivial. The natural programming model is one thread of control per connection — read a request, compute a response, write it back — but OS threads are too expensive to use this way at scale.

Each OS thread in SBCL carries a full-sized control stack (typically 8 MB), a binding stack, signal handling infrastructure, and a kernel task_struct. Creating a thread requires mmap , clone , and TLS setup; destroying one requires the reverse. Context switching between threads requires a kernel transition, TLB management, and scheduler bookkeeping. At 10,000 concurrent connections, this means 80 GB of virtual address space for stacks alone, and the kernel scheduler — designed for dozens to hundreds of runnable tasks — begins to degrade.

The conventional alternative is event-driven programming: a single thread multiplexes connections using epoll or kqueue , dispatching callbacks on I/O readiness. This scales well but inverts control flow. Sequential logic must be broken into state machines or continuation chains. Error handling, resource cleanup, and debugging all become harder. Backtraces show the event loop, not the logical call stack of the request being served.

Fibers offer a third option. A fiber is a user-space thread with its own control stack and binding stack, scheduled cooperatively by a library-level scheduler rather than the kernel. Fibers preserve the sequential programming model — the code reads top-to-bottom like a normal thread — while achieving the resource efficiency of event-driven I/O. A fiber’s control stack is 256 KB by default (not 8 MB), context switching saves and restores six registers in user space (not a full kernel transition), and thousands of fibers can share a small pool of OS carrier threads.

Common Lisp complicates this picture because the language carries more implicit per-thread state than most: special variable bindings live in a separate binding stack, non-local exits thread through catch tags and unwind-protect chains, and all of these must be saved and restored correctly across suspension points. A fiber implementation that only switches the control stack will corrupt this state. SBCL’s fiber implementation handles all of it: dynamic bindings made within a fiber are local to that fiber, unwind-protect cleanup forms run on fiber death, and handler-case / handler-bind work as expected.

... continue reading