Nvidia's dominance on the Green500 faces challenges from AMD – and itself
Blackwell's weaker FP64 performance could give the House of Zen's Instinct accelerators a leg up in future efficiency benchmarks
SC24 Nvidia's accelerators are among the most power hungry machines in their class, yet the chips continue to dominate the Green500 ranking of the most sustainable supercomputers in the world.
Eight of the ten most power-efficient systems on the bi-annual list employed Nvidia parts, and of those five were powered by the GPU giant's 1,000-watt Grace Hopper Superchip (GH200).
The parts, which meld a 72-core Grace CPU based on Arm's Neoverse V2 design and 480 GB of LPDDR5x memory with an H100 GPU with 96 to 144 GB of HBM3 or HBM3e memory, have become quite popular in the HPC community.
On the latest Green 500 list, the chip powers both the first and second most efficient systems — EuroHPC's JEDI and the Romeo HPC Center's Romeo-2025 machines, which achieved 72.7 and 70.9 gigaFLOPS per watt in the High-Performance Linpack benchmark — that's FP64, of course.
The two systems are nearly identical, having been built using Eviden's BullSequana XH3000 platform and employing the same GH200 accelerators. Nvidia's GH200 also claims position four, six, and seven on the list with the Isambard-AI Phase 1 (68.8 gigaFLOPS/watt), Jupiter Exascale Transition Instrument (67.9 gigaFLOPS/watt), and Helios GPU (66.9 FLOPS/watt).
The Jupiter Exascale Development Instrument ... Source | Image by Forschungszentrum Jülich / Ralf-Uwe Limbach
Meanwhile, Nvidia's venerable H100 powers the fifth, eighth, and ninth most efficient machines, including the Capella, Henri, and HoreKa-Teal systems.
It is doubtful Nvidia will retain its high ranking on the Green 500. Its Grace-Blackwell superchips are already on the way in the form of the 2.7-kilowatt GB200 and the 5.4-kilowatt GB200 NVL4.
The new products don’t always deliver more compute power per watt.
From the A100 in 2020 to the H100 in 2022, FP64 performance jumped roughly 3.5x. However, compared to the 1,200-watt Blackwell, the 700-watt H100 is actually faster in FP64 matrix math. In fact, for FP64, the only improvement is for vector math, where the incoming chip boasts 32 percent higher perf.
So while Nvidia enjoys high positions on the Green500 today, AMD isn't out of the game just yet. In fact, the House of Zen's MI300A accelerated processing unit claimed the number three spot on the latest list with the Adastra 2 system.
If you're not familiar, AMD's MI300A was announced a little under a year ago and fuses 24-CPU cores and six CDNA-3 GPU dies into a single APU with up to 128 GB of HBM3 memory on board, and a configurable TDP of 550-760 watts. And, at least on paper, the part already boasts 1.8x the HPC performance of the H100.
Built by HPE Cray using EX255a blades – as used in the world's most powerful publicly known supercomputer –Adastra 2 managed 69 gigaFLOPS/watt of performance. It's not alone either. The 10th most efficient machine is another MI300A-based machine at Lawrence Livermore National Laboratory called RZAdams, which managed 62.8 gigaFLOPS/watt.
- Microsoft unveils beefy custom AMD chip to crunch HPC workloads on Azure
- Nvidia continues its quest to shoehorn AI into everything, including HPC
- LLNL's El Capitan surpasses Frontier with 1.74 exaFLOPS performance
- Nvidia's latest Blackwell boards pack 4 GPUs, 2 Grace CPUs, and suck down 5.4 kW
Scaling up
All of these systems in the Green500's top 10 are now well above the 50 gigaFLOPS/watt target necessary to achieve an exaFLOP of compute in a 20-megawatt envelope. But as it turns out maintaining these levels of efficiency at scale is rather tricky.
Looking at the three most efficient machines on the Green500, they're all on the small side. JEDI is rated for just 67 kilowatts of power. For comparison, the Swiss National Supercomputing Centre's Alps machine - the most powerful GH200 system on the Top500 - achieves 434 petaFLOPS in the HPL benchmark while consuming 7.1 megawatts, making it the 14th most efficient machine at 61 gigaFLOPS per watt.
It's a similar story for Adastra 2, which is even smaller than JEDI at 37 kilowatts. If you could maintain 69 gigaFLOPS per watt at scale, you'd only need about 25.2 megawatts to match El Capitan's 1.742 exaFLOPS of real-world performance. In reality, El Capitan needed nearly 29.6 megawatts of power to achieve its record-breaking run. ®