Modernizing Linux swapping: introducing the swap table

Modernizing swapping: introducing the swap table [LWN subscriber-only content]

Welcome to LWN.net The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!

The kernel's swap subsystem is a complex and often unloved beast. It is also a critical component in the memory-management subsystem and has a significant impact on the performance of the system as a whole. At the 2025 Linux Storage, Filesystem, Memory-Management and BPF Summit, Kairui Song outlined a plan to simplify and optimize the kernel's swap code. A first installment of that work , written with help from Chris Li, was merged for the 6.18 release. This article will catch up with the 6.18 work, setting the stage for a future look at the changes that are yet to be merged.

In a virtual-memory system, memory shortages must be addressed by reclaiming RAM and, if necessary, writing its contents to the appropriate persistent backing store. For file-backed memory, the file itself is that backing store. Anonymous memory — the memory that holds the variables and data structures used by a process — lacks that natural backing store, though. That is where the swap subsystem comes in: it provides a place to write anonymous pages when the memory they occupy is needed for other uses. Swapping allows unused (or seldom-used) pages to be pushed out to slower storage, making the system's RAM available for data that is currently in use.

A quick swap-subsystem primer

A full description of the kernel's swap subsystem would be lengthy indeed; there is a lot of complexity, much of which has built up over time. What follows is a partial, simplified overview of how the swap subsystem looked in the 6.17 kernel, which can then be used as a base for understanding the subsequent changes.

The swap subsystem uses one or more swap files, which can be either partitions on a storage device or ordinary files within a filesystem. Inside the kernel, active swap files are described by struct swap_info_struct , but are usually referred to using a simple integer index instead. Each file is divided into page-sized slots; any given slot in the kernel's swap areas can be identified using the swp_entry_t type:

typedef struct { unsigned long val; } swp_entry_t;

This long value is divided into two fields: the upper six bits are the index number of the swap file (which, for extra clarity, is called the "type" in the swap code), and the rest is the slot number within the file. There is a set of simple functions used to create swap entries and get the relevant information back out.

Note that the above describes the architecture-independent form of the swap entry; each architecture will also have an architecture-dependent version that is used in page-table entries. Curious readers can look at the x86_64 macros that convert between the two formats. Within the swap subsystem itself, though, the architecture-independent version of the swap entry is used.

... continue reading