Bringing restartable sequences out of the niche

Please consider subscribing to LWN Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

The restartable sequences feature, which was added to the 4.18 kernel in 2018, exists to enable better performance in certain types of threaded applications. While there are users for restartable sequences, they tend to be relatively specialized code; this is not a tool that most application developers reach for. Over time, though, the use of restartable sequences has grown, and it looks to grow further as the feature is tied to new capabilities provided by the kernel. As restartable sequences become less of a niche feature, though, some problems have turned up; fixing one of them may involve an ABI change visible in user space.

A restartable sequences overview

As the number of CPUs in a system grows, so does the desire to write and run highly parallel programs. In user space, as well as in the kernel, concurrent code using locks eventually runs into scalability problems, leading to an interest in the use of lockless algorithms instead. The kernel has a distinct advantage when it comes to lockless concurrent access, though, in that code running in kernel space can prevent interruptions from interrupts, preemption, or migration to another CPU during critical sections; user space has no such guarantees. So any user-space lockless algorithm must work correctly in an environment where code execution can be interrupted at any time.

Restartable sequences are one solution to this problem. To use this feature, an application must designate a critical section that does some work, culminating in a single atomic instruction that commits whatever change is being made. One of the earliest uses of restartable sequences was a memory allocator that would, within its critical section, identify an object to allocate from a per-CPU list, then remove it from that list with an atomic compare-and-exchange operation.

It would obviously be problematic if this critical section were to be interrupted between the identification of the object to allocate and the final step committing the change. To work around this problem, the application informs the kernel of the critical section by filling a structure ( struct rseq_cs ) describing the address range of the instructions within that section, along with a special abort address. If the kernel interrupts the execution of the relevant thread, to preempt it, move it to another CPU, or to deliver a signal, it will, on return to user space check for the existence of this structure. If the current instruction pointer lies within the critical section, the kernel will cause execution to jump to the abort address. The thread can then clean up after its aborted attempt and restart the operation.

The hope, of course, is that most runs through the critical section will complete without interruption, accomplishing their job in about the fastest way possible. The extra overhead of dealing with aborts only enters the picture in cases where the critical section is interrupted for some reason.

Using restartable sequences requires setting up a shared memory area using the rseq() system call. The actual critical section almost certainly must be written in assembly. Until relatively recently, there was no support for restartable sequences in the GNU C Library (glibc), though that has been changing. Given the effort that is required to use this feature properly, and given that it is a Linux-only feature, it is not surprising that relatively few application developers have made use of it.

Not so niche anymore

... continue reading