Reflections on 30 years of HPC programming

Last summer, I had the opportunity to give the keynote at HIPS 2025—the 30th International Workshop on High-Level Parallel Programming Models and Supportive Environments. This was quite an honor since, over its history, HIPS has been a key workshop for projects like Chapel that strive to create productive approaches to scalable parallel programming [note:For readers unfamiliar with HIPS, its publications focus on high-level programming of multiprocessors, compute clusters, and massively parallel machines via language design, compilers, runtime systems, and programming tools. A long-term refrain from its call for papers has been “We especially invite papers demonstrating innovative approaches in the area of emerging programming models for large-scale parallel systems and many-core architectures.”].

To commemorate the 30th instance of HIPS, I took the approach of using my talk to reflect on the past 30 years of programming within the field of HPC, or High-Performance Computing. This was a sobering exercise, but one that was well-received. In November, I reprised the talk in a condensed lightning talk format for CLSAC 2025. In this blog article, I’ll attempt to capture some of the main elements of those talks for a wider audience.

Like so many “n years of HPC” retrospectives, let’s start by looking to the TOP500 list [note:The TOP500 is a ranking of HPC systems, as measured by their performance on the Linpack benchmark. All TOP500 results and images in this article originate from top500.org and are used with permission. Note that I’ve updated the original talk contents to reflect the latest results from November 2025.] to see how HPC systems themselves have changed over the past three decades. For simplicity, I’ll just focus on the top five systems from each list.

Browsing the results from 30 years ago—November 1995—we see that systems from Fujitsu, Intel, and Cray make up the top five, where their network interconnects used crossbar, 2D mesh, and 3D torus topologies, respectively. Core counts ranged from 80 to 3,680, and performance as measured by Rmax values ranged from 98.9 to 170 GFlop/s. The following screenshot from the TOP500 website summarizes these systems and results:

Jumping forward to the latest TOP500 list, published in November 2025, we see systems from HPE Cray, Eviden/Bull, and Microsoft. These are running using Slingshot-11 and InfiniBand NDR interconnects that utilize topologies based on dragonfly[+] and/or fat-trees. Core counts have jumped to the millions (2,073,600–11,340,000 cores), and Rmax values range from 561 to 1809 PFlop/s:

Summarizing the changes over these 30 years, core counts have increased by a factor of 100s to 100s of thousands, while performance has improved by factors of millions to 10s of millions—a massive improvement!

1995 top 5 2025 top 5 Delta Cores 80–3680 2,073,600–11,340,000 ~563–141,750 × \times × Rmax 98.9–170 GFlop/s 561.2–1809 PFlop/s ~3,300,000–18,300,000 × \times × Vendors Fujitsu, Intel, Cray HPE, Eviden, Microsoft — Networks crossbar, mesh, torus dragonfly[+], fat-trees higher-radix, lower-diameter

Million-fold improvements like these don’t happen without significant effort, even with the passage of decades of time; so it’s worth reflecting on what changes in hardware and HPC system architecture took place over this period to generate the massive gains seen here. Though I’m not a hardware architect, from my perspective, I tend to think of the main factors as having been:

the commodification of processors with vector instructions

the commodification of multicore/manycore CPUs and chiplet -based designs

... continue reading