It’s fair to say that AI is changing the perspective of CPU/GPU manufacturers. While the ongoing performance race in the x86 space continues, it has been shaped by the focus on sustainability and energy efficiency, leading Intel and AMD to release versions of their CPU chipsets that are optimized to meet those needs, as well as the traditional “we’re the fastest at workload X” approach.
The GPU market has also seen massive changes over the last few year. After record sales for Nvidia and AMD GPUs beyond the gamer community to the cryptocurrency mining market, the bottom fell out of crypto with the crash in the value of those currencies over the last two years. Demand was so strong that Intel, who had paid short shrift to dedicated GPUs for years, even got into cryptocurrency, releasing competitive GPUs pretty much just as the market for them crashed.
But now AI has taken up the slack in the GPU space, given the ability of GPUs to handle Machine Learning / AI workloads with greater aplomb than most CPU technologies, though CPUs are still handling significant amounts of the AI development and processing.
GPUs have become the preferred.processor for machine learning tasks because of the way that they work. While CPUs can perform exceedingly fast sequential operations, GPUs are able to break down complex problems into smaller pieces and process them in parallel. Machine learning, deep learning, and AI inference are usually massively distributed processes that show significant benefit in performance from the capabilities of GPUs when compared to CPUs.
So where do these traditional players in the market stand? Let’s take a look.
Earlier this year Intel formally announced availability of the fourth-generation Xeon Scalable server CPUs based on their 7nm process. With up to 60 cores and 8-socket configurations the CPU fully supports the Advanced Matrix Extensions (AMX), the extensions to the x86 instruction set from Intel and AMD that are specifically designed to accelerate machine learning and AI related workloads.
At the product announcement, Intel claimed that in company testing the new CPUs achieved up to 10 times higher PyTorch performance for real-time inference and training using the AMX BF16 processing when compared to the previous generation testing using FP32. The CPU also includes up to four different integrated IP accelerators, depending on model. These accelerators include:
- Intel Quick Assist Technology (QAT)
- Intel Dynamic Load Balancer (DLB)
- Intel In-Memory Analytics Accelerator (IAA)
- Intel Data Streaming Accelerator (DSA)
Additionally, AI technology startup Deci claims that in conjunction with their proprietary Automated Neural Architecture Construction technology they were able to deliver comparable performance to GPU-based systems in natural language processing (NLP) models and computer vision using these latest generation Xeon hardware.
Directly in the GPU market, Intel has announced the Xe2 HPG (high performance graphics) GPU for the discrete GPU market. This a difficult market to design products for, with a range of products and technologies ranging from the low performance space, to data center specific, to gaming.
Tom Petersen of Intel said in an interview with HardwareLuxxx “because you’re getting into the datacenter, external desktop, and integrated graphics business; they all have different ways to optimize.” The Intel discrete GPU products are marketed under the name Intel Arc.
In 2022, Intel launched its second generation of the Habana Deep Learning processor, the Gaudi2 processor for deep learning, moving the technology to their 7nm process, tripling the number of cores and memory looking to compete directly with the Nvidia A100 80G on computer vison and NLP models. Intel acquired Habana in 2019 to boost its offerings for AI.
AMD is in an interesting position. It has announced that they will be integrating AI hardware into its x86 CPUs, the AMD Ryzen AI engine, which will make its first appearance on the Ryzen 7040 Series mobile processors, which should be shipping now. AMD's most recent refresh of its top-of-the-line data center CPU, the fourth-generation Epyc, does have up to 96 cores, and supports PCI-E 5.0 and exemplary memory performance, which are important in the AI space, as well as support for AVX-512Vector neural network instruction (VNNI), also supported by Intel’s latest, that is designed to accelerate convolutional neural network loops. No specific announcement has been made yet regarding integrated AI hardware at this level of processor. But, as with Intel, this level of CPU can be used for AI/ML tasks in the data center.
AMD is also introducing AI along its product line, as evidenced by today’s announcement of the Alveo MA35D media accelerator, a dedicated video processing unit that is designed to accelerate the entire video processing pipeline and includes an integrated AI processor that evaluates content on a frame-by-frame basis to minimize bitrate while improving the perceived visual quality allow for a higher-performance, more energy-efficient deployment.
Of course, AMD has an established line of high-performance discrete gaming GPUs, the AMD Radeon RX, that can be repurposed to AI/ML projects. To encourage this, the AMD GPU software stack, ROCm, has been evolving to add a focus on the deep learning critical to AI. With the most recent release ROCm 5.0, AMD is offering a selection of turnkey frameworks on the AMD Infinity Hub to speedup development of ML workloads in areas such as life sciences, image and video, and autonomous driving. ROCm is a direct competitor to Nvidia’s CUDA toolkit.
Nvidia is the big dog in the AI data center hardware game. As we pointed out in March, AI is where Nvidia’s focus now lies and its product portfolio acknowledges this. Multiple versions of the Nvidia A100 Tensor Core GPU deliver acceleration at scale for the enterprise class GPU deployment along with a complete set of supporting CUDA software and the addition of the Nvidia EGX platform and its optimized software for the enterprise.
Entry-and mid-level hardware solutions in the form of servers built around discrete graphics cards, starting with the RTX 4000 series provide a selection of price points for hardware investment in building an ML/AI infrastructure for your business. The range of Nvidia AI hardware runs the gamut from dipping your toes in the water to complete hybrid cloud enterprises. Development can be done on desktop, cloud, or in the data center depending upon the level of investment in hardware if you want on premise and your choice of cloud provider for off-premise solutions.
What might be most interesting for data centers is Nvidia’s attack on the CPU market to accompany its existing AI hardware GPU focus. Due for release in the first half of the year, the Grace CPU Superchip has 144 ARM-based cores and 1 TB/s of memory bandwidth and is explicitly designed for HPC applications that are CPU and memory intensive.
The Grace Hopper Superchip, which is due to ship alongside the Grace Superchip, combines the Grace ARM CPU architecture with the Nvidia Hopper 9th generation data center GPU. Designed to handle applications that require terabytes of data and expected to increase the speed of processing these huge data sets and giant scale AI problems at unprecedented level of performance.
It’s a bright and shining future
Whatever choice your business makes in terms of a direction to follow for your data center AI/ML hardware support, there will be lots of options available and significant changes in both the ML support and deployed inference models as we move forward. As Nvidia CEO Jensen Huang said at the 2023 Nvidia GTC last month “AI has reached it’s iPhone moment” and just taking a look around your own workspace will show you the impact of that moment on hardware, software, and the way work is done.