Vm.overcommit_memory=2 is the right setting for servers

The Linux kernel has a feature where you can tune the behavior of memory allocations: the vm.overcommit_memory sysctl. When overcommit is enabled (sadly, this is the default), the kernel will typically return a mapping when brk(2) or mmap(2) is called to increase a program’s heap size, regardless of whether or not memory is available. Sounds good, right?

Not really. While overcommit is convenient for application developers, it fundamentally changes the contract of memory allocation: a successful allocation no longer represents an atomic acquisition of a real resource. Instead, the returned mapping serves as a deferred promise, which will only be fulfilled by the page fault handler if and when the memory is first accessed. This is an important distinction, as it means overcommit effectively replaces a fail-fast transactional allocation model with a best-effort one where failures are only caught after the fact rather than at the point of allocation.

To understand how this deferral works in practice, let’s consider what happens when a program calls malloc(3) to get a new memory allocation. At a high level, the allocator calls brk(2) or mmap(2) to request additional virtual address space from the kernel, which is represented by virtual memory area objects, also known as VMAs.

On a system where overcommit is disabled, the kernel ensures that enough backing memory is available to satisfy the request before allowing the allocation to succeed. In contrast, when overcommit is enabled, the kernel simply allocates a VMA object without guaranteeing that backing memory is available: the mapping succeeds immediately, even though it is not known whether the request can ultimately be satisfied.

The decoupling of success from backing memory availability makes allocation failures impossible to handle correctly. Programs have no other option but to assume the allocation has succeeded before the kernel has actually determined whether the request can be fulfilled. Disabling overcommit solves this problem by restoring admission control at allocation time, ensuring that allocations either fail immediately or succeed with a guarantee of backing memory.

Failure locality is important for debugging

When allocations fail fast, they are dramatically easier to debug, as the failure is synchronous with the request. When a program crashes due to an allocation failure, the entire context of that allocation is preserved: the requested allocation size, the subsystem making the allocation and the underlying operation that required it are already known.

With overcommit, this locality is lost by design. Allocations appear to succeed and the program proceeds under the assumption that the memory is available. When the allocation is eventually accessed, the kernel typically responds by invoking the OOM killer and terminating the process outright. From the program’s perspective, there is no allocation failure to handle, only a SIGKILL . From the operator’s perspective, there is no stack trace pointing to the failure. There are only post-mortem logs which often fail to paint a clear picture of what happened.

Would you rather debug a crash at the allocation site or reconstruct an outage caused by an asynchronous OOM kill? Overcommit doesn’t make allocation failure recoverable. It makes it unreportable.

Dishonorable mention: Redis

... continue reading