Part 1 of 3 in the Java Performance Optimization series. Parts 2 and 3 coming soon.
I built a Java order-processing app for a talk I gave at DevNexus a couple of weeks ago. The app worked. Tests passed. I ran a load test and collected a Java Flight Recording (JFR).
Before any changes: 1,198ms elapsed time, 85,000 orders per second, peak heap sitting at just over 1GB, 19 GC pauses.
After: 239ms. 419,000 orders per second. 139MB heap. 4 GC pauses.
Same app. Same tests. Same JDK. No architectural changes. And those numbers get a lot more meaningful when you consider that code like this doesn’t run on a single box in production. It runs across a fleet.
In Part 2 I’ll walk through the profiling data behind those numbers: the flame graph, which methods were actually hot, and what changed when we fixed them. Before we get there, you need to know what kinds of things we were actually fixing.
The problems were patterns that show up in real codebases. They compile fine, they sneak through code review, and they’re the kind of thing you could miss without profiling data telling you where to look. Here are eight of them.
TL;DR: Fixing anti-patterns like these turned a Java app that took 1,198ms into one that took 239ms. Here are some to look for and fix:
String concatenation in loops — O(n²) copying from immutability O(n²) stream iteration inside loops — streaming the full list per element String.format() in hot paths — slowest string builder, parses format every call Autoboxing in hot paths — millions of throwaway wrapper objects Exceptions for control flow — fillInStackTrace() walks the entire call stack Too-broad synchronization — one lock becomes the bottleneck Recreating reusable objects — ObjectMapper, DateTimeFormatter, Gson per call Virtual thread pinning (JDK 21–23) — synchronized + blocking I/O pins carriers
After fixing: 5x throughput, 87% less heap, 79% fewer GC pauses. Same app, same tests, same JDK.
... continue reading