Modernizing swapping: virtual swap spaces

This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

The problem with swap entries

As a reminder, a "swap entry" identifies a slot on a swap device that can be used to hold a page of data. It is a 64-bit value split into two fields: the device index (called the "type" within the code), and an offset within the device. When an anonymous page is pushed out to a swap device, the associated swap entry is stored into all page-table entries referring to that page. Using that entry, the kernel can quickly locate a swapped-out page when that page needs to be faulted back into RAM.

The "swap table" is, in truth, a set of tables, one for each swap device in the system. The transition to swap tables has simplified the kernel considerably, but the current design of swap entries and swap tables ties swapped-out pages firmly to a specific device. That creates some pain for system administrators and designers.

As a simple example, consider the removal of a swap device. Clearly, before the device can be removed, all pages of data stored on that device must be faulted back into RAM; there is no getting around that. But there is the additional problem of the page-table entries pointing to a swap slot that no longer exists once the device is gone. To resolve that problem, the kernel must, at removal time, scan through all of the anonymous page-table entries in the system and update them to the page's new location. That is not a fast process.

This design also, as Pham describes, creates trouble for users of the zswap subsystem. Zswap works by intercepting pages during the swap-out process and, rather than writing them to disk, compresses them and stores the result back into memory. It is well integrated with the rest of the swapping subsystem, and can be an effective way of extending memory capacity on a system. When the in-memory space fills, zswap is able to push pages out to the backing device.

The problem is that the kernel must be able to swap those pages back in quickly, regardless of whether they are still in zswap or have been pushed to slower storage. For this reason, zswap hides behind the index of the backing device; the same swap entry is used whether the page is in RAM or on the backing device. For this trick to work, though, the slot in the backing device must be allocated at the beginning, when a page is first put into zswap. So every zswap usage must include space on a backing device, even if the intent is to never actually store pages on disk. That leads to a lot of wasted storage space and makes zswap difficult or impossible to use on systems where that space is not available to waste.

Virtual swap spaces

The solution that Pham proposes, as is so often the case in this field, is to add another layer of indirection. That means the replacement of the per-device swap tables with a single swap table that is independent of the underlying device. When a page is added to the swap cache, an entry from this table is allocated for it; the swap-entry type is now just a single integer offset. The table itself is an array of swp_desc structures:

... continue reading