Peeling the Covers Off Germany's Exascale "Jupiter" Supercomputer

The newest of the exascale-class supercomputer to be profiled in the Top500 rankings in the June list is the long-awaited “Jupiter” system at Forschungszentrum Jülich facility in Germany. We finally have a sense of how this hybrid CPU-GPU machine will perform, although some of the details on its configuration are still not nailed down publicly.

Jupiter is the first exascale system to be completed under the EuroHPC Joint Undertaking of the European Union, and the fact that it is not using a custom CPU and XPU created by European companies, as was originally hoped, and is basically an Nvidia machine top to middle – bottom would include Nvidia storage, which it hasn’t acquired yet but will – speaks volumes about difficult it is to start from scratch to achieve chip independence for Europe. But, the Universal Cluster module will be based on the “Rhea1” Arm server CPU created by SiPearl, which is a step in the direction of independence for European HPC.

The Jupiter machine is built by Eviden, the HPC division of Atos that was going to be spun out but which the company has had second – and good – thoughts about doing, and ParTec, the German HPC system designer and installer.

Like its predecessor, the “Jewels” system that was first deployed in 2018 and upgraded a few times over the years, Jupiter is a hybrid supercomputer with blocks of CPU and GPU compute with other kinds of storage and acceleration blocks linked into it. With Jewels, the Cluster Module was installed first, based on Intel “Skylake” Xeon SP processors linked with 100 Gb/sec EDR InfiniBand from the then-independent Mellanox Technologies with everything installed in a BullSequana X1000 system from Eviden. In 2020, a BullSequana XH2000 system loaded up with AMD “Rome” Epyc CPUs and Nvidia “Ampere” GPU accelerators and called a Booster Module, was added to Jewels using 200 Gb/sec HDR InfiniBand.

Here is the honeycomb diagram for Jupiter, showing its modular components:

The vast majority of floating point and integer performance in Jupiter is, of course, in the GPU Booster module, which was taken for a spin using the High Performance LINPACK benchmark commonly used to rank supercomputer throughput and which placed this Jupiter Booster module in the number four position on the June 2025 Top500 rankings of supposedly HPC-centric systems.

The Universal Cluster will have more than CPU-only 1,300 nodes based on a pair of Rhea1 chips with 80 cores each based on the “Zeus” Neoverse V1 cores. These are the same V1 cores used in the “Graviton3” Arm chip designed by Amazon Web Services, which have a pair of 256-bit SVE vector engines. Each Rhea1 has a bank of 64 GB of HBM memory, the same fast but not fat memory used on GPU and XPU accelerators these days. As far as we know, the Rhea1 chip was delayed back in June 2024 and is expected sometime later this year for FZJ. Some variant of SiPearl Arm CPU – maybe Rhea1 but also maybe its Rhea2 kicker – will also be employed in the second exascale system in Europe, called “Alice Recoque” and set to be hosted in France and presumably also to be built by Eviden. The Alice Recoque system has a budget of €542 million ($580.2 million), which includes money for the system, the facility, and its power and cooling.

This Universal Cluster is expected to have a mere 5 petaflops of FP64 performance running the HPL benchmark, which probably puts it at somewhere around 7 petaflops at peak theoretical performance. This is tiny compared to the Jupiter GPU Booster module that was tested for the June Top500 list.

The Jupiter GPU Booster node is based on a unique four-way clustering of Nvidia “Grace” G100 Arm server CPUs, which essentially uses four “Hopper” H200 GPUs as NUMA node controllers to link four CPUs and four GPUs into a more hefty cluster of eight compute engines working in harmony.

For those of you who didn’t see it when we wrote about the Jupiter nodes back in September 2024, here is a block diagram of the Jupiter GPU Booster node, which has a pair of sleds, each with a quad of Grace-Hopper modules linked by their main memories using direct NVLink ports off the CPUs and GPUs. Here is a block diagram of each node:

... continue reading