Intel has announced new processors with high-bandwidth memory (HBM) targeting high-performance computing (HPC), supercomputing and artificial intelligence (AI).
These products are called Xeon CPU Max series and GPU Max series. The chips are based on existing technology; the CPU is fourth-generation Xeon Scalable, also known as Sapphire Rapids, and the GPU is Ponte Vecchio, the data center version of Intel’s Xe GPU technology.
The difference is that these two processors come with HBM on the processor chip, rather than relying solely on standard DRAM. HBM is much faster than DDR4 or DDR5 memory and sits on the processor chip next to the CPU/GPU core with a high-speed interconnect, rather than on a memory stick like DDR memory.
“If you look at the overall workloads in HPC and AI, there’s a lot of variety,” said Jeff McVeigh, vice president and general manager of supercomputing at Intel. “Traditionally, there have been two routes to the top. One is The CPU route, and the second is the GPU route, each has its own obstacles. Our goal is to really move forward and comprehensively solve these problems.”
The focus of CPU Max and GPU Max is on maximizing bandwidth, maximizing compute and maximizing the capabilities and possibilities they offer to address the breadth of workloads, McVeigh said.
CPU max
CPU Max is available in three server configurations. The first is that there is no DRAM, so the only memory in the system is the 64GB HBM on the CPU Max chip. This is how Japan’s Fugaku supercomputer, once one of the fastest supercomputers in the world, works. In a dual-socket system, that’s 128GB of memory, which McVeigh said is “more than enough for many applications and workloads.” In this usage scenario, the application can run as-is.
The second configuration, called HBM Planar Mode, combines HBM in the CPU package with a standard DDR5 memory stick in the system. Both HBM and DDR software need to be optimized to move data between these different memory regions.
The third configuration is HBM cache mode, where HBM acts as a cache for the DDR memory in the system. In this mode, no software code changes are required. “You might want to make some adjustments to take advantage of the very large caches you have now, but you don’t have to do that when you get immediate benefit,” McVeigh said.
Graphics card maximum
GPU Max is also available in three configurations; 1100, 1350 and 1550 models. The 1100 is a 300-watt double-wide PCIe card with 56 Xe cores and 48GB of HBM2e memory. Multiple cards can be connected via the Intel Xe Link Bridge.
The other two configurations feature an Open Compute Project (OCP) accelerator module, called OAM, which is a faster alternative interface to PCIe cards.
The 1350 GPU is a 450-watt OAM module with 112 Xe cores and 96GB of HBM. The 1550 GPU is a 600-watt OAM module with 128 Xe cores and 128GB of HBM.
PCI Express cards are well suited for use in standard servers and even workstation systems, but OAM modules are really geared toward higher-density environments, McVeigh said. He said that Intel has obtained many system designs developed by OEMs and system solution providers, and they will launch servers with OAM starting in 2023.
Intel is building a supercomputer for Argonne National Laboratory equipped with CPU Max and GPU Max processors that will deliver more than 2 exaFLOPs of performance when it comes online in 2023. That’s twice as fast as Frontier, the current leader in the supercomputer race. It’s the combination of the two processors that makes this possible, McVeigh said.
“You would argue that we’ve offloaded everything to the GPU and we don’t need the highest-end CPU, right? We don’t need HBM memory, right? Wrong. By turning on HBM integrated in the CPU, we can Get significant performance gains because there’s still a lot of code running on the CPU, even though we’ve offloaded some of the larger cores to the GPU,” he said.
The new processors have already begun shipping to initial customers, including Argonne. The Max series is scheduled to launch in January 2023.