Dissecting the CPU-memory relationship in garbage collection (OpenJDK 26)

Every time you configure a Java application's heap size, you are essentially making a trade-off: spending more on infrastructure to improve performance metrics such as throughput or latency. Historically, this trade-off was visible when an undersized heap triggered long pauses, signaling a need for more resources. With modern collectors, however, pause duration and computational effort have become decoupled. This creates an operational blind spot: dashboards may show excellent response times, while the collector silently consumes excess compute capacity to compensate for a constrained heap. To address this, it is essential to look beyond GC pauses and examine overall efficiency using new tools. This article, therefore, analyzes why we need additional metrics for infrastructure efficiency and introduces the new Java API for GC CPU in OpenJDK 26, which empowers engineers and researchers to quantify the collector's CPU overhead and make informed memory-CPU trade-off decisions.

Dissecting the CPU-Memory Relationship in Garbage Collection

1. Background

Since the popularization of garbage collection (GC) in Lisp almost 70 years ago, managed runtimes have provided developers a with kind of magic: automatic memory management. This freed programmers from managing complex lifecycle management. This, along with many other ideas, influenced the design of Smalltalk. Following this lineage, Smalltalk was also one of several languages that inspired the authors of Java, the language and runtime I spend my days improving.

While the programmer was liberated, the CPU was not. The GC now sat on the critical path to reclaim memory, accruing a debt that could not be deferred forever. For decades, settling this debt meant pausing the application entirely, or “stopping the world” in GC parlance. The collector would stop the application, scan the heap to identify and reclaim reusable memory. In the single-core era, the pause time served as a reliable proxy for machine load.

1.1. The GC Cost Taxonomy

To reason about the performance implications of GC, we need to decompose it into three dimensions as depicted in Figure 1.

1. Explicit Cost Application GC Thread(s) 2. Implicit Cost Your Source Code void update( Node n) { n.next = newNode; } Actual Execution if (GC.isMarking) { GC.enqueue(n.next); // Pre-Barrier } n.next = newNode; GC.updateCard(n); // Post-Barrier 3. Microarchitectural Effects CPU L3 Cache Hot App Data GC Data Cold GC scans evict "Hot" Application Data, causing cache misses when App resumes.

Explicit GC cost The CPU cycles consumed by dedicated GC threads performing tasks such as: traversing the object graph to find live data, relocating memory to free space, or updating references. Implicit GC cost Code may be injected directly into the application to support specific GC capabilities. These are often referred to as barriers and are required for features such as reference counting, tracking object age (generations), or ensuring heap consistency when objects move concurrently. Microarchitectural effects GC also impacts the memory subsystem. It can degrade performance by evicting application data from CPU caches or, alternatively, enhance it by rearranging objects to improve spatial locality.

Measuring the implicit GC cost is difficult. Blackburn and Hosking (2004) [1] augmented Jikes RVM (a VM optimized for research) to establish a baseline without barriers for comparison. However, such approaches do not easily lend themselves to a performance-optimized VM like OpenJDK.

... continue reading