What might the rise of artificial intelligence revolution look like in the data center? If one new ssytem is any indication, it could look like GPUs immersed in dielectric liquid coolant fluid, supporting water-cooled x86 servers.
That’s the vision put forward by the creators of Frontera, a new $60 million supercomputer to be built at the Texas Advanced Computing Center (TACC) in Austin. It is expected to be the most powerful supercomputer at any U.S. university, and continue the TACC’s history of deploying new systems ranking among the top 10 on the Top500 list of the world’s leading supercomputers.
The vast majority of data centers continue to cool IT equipment using air, while liquid cooling has been used primarily in high-performance computing (HPC). With the growing use of artificial intelligence, more companies are facing data-crunching challenges that resemble those seen by the HPC sector, which could make liquid cooling relevant for a larger pool of data center operators.
The design for Frontera reflects the leading edge of HPC efficiency. Frontera is Spanish for “frontier,” and the new supercomputer will help advance the frontiers of liquid cooling, with a hybrid system that will combine Dell EMC servers with x86 Intel processors and water-cooling systems from CoolIT, and a smaller system using NVIDIA GPUs (graphic processing units) immersed in a tank of liquid coolant from GRC (previously Green Revolution Cooling). Data Direct Networks will contribute the primary storage system, and Mellanox will provide the high-performance interconnect for Frontera.
Applying Immersion Benefits to GPUs
Anticipated early projects for Frontera include analyses of particle collisions from the Large Hadron Collider, global climate modeling, and improved hurricane forecasting and “multi-messenger” astronomy research using gravitational waves and electromagnetic radiation.
“Many of the frontiers of research today can be advanced only by computing, and Frontera will be an important tool to solve grand challenges that will improve our nation’s health, well-being, competitiveness and security.” said Dan Stanzione, TACC executive director.
TACC has been a leader in the use of immersion cooling, which sinks servers in liquid to cool the components, and began working with Austin-based neighbor GRC in 2009. In 2017 this collaboration was expanded to immersion cooling for NVIDIA GPUs, test-driving a system created by server vendor Supermicro. Using immersion cooling with GPUs is a fairly recent phenomenon, but may attract interest as more companies adopt GPUs for AI and other parallel processing challenges.
“The cost savings that immersion cooling enables (on the hardware side) are extremely impressive,” TACC’s Stanzione said of the 2017 project. “Being early adopters of GRC’s immersion cooling system we have seen the technology mature rapidly over the years. And with the growing power and computing needs of AI and machine learning applications, especially with hotter and hotter GPUs, cooling is even more important for reliability.”
AI Data Crunching Boosts Density
New hardware for AI workloads is packing more computing power into each piece of equipment, boosting the power density – the amount of electricity used by servers and storage in a rack or cabinet – and the accompanying heat. The trend is challenging traditional practices in data center cooling, and prompting data center operators to adapt new strategies and designs.
The alternative is to bring liquids into the server chassis to cool chips and components. Some vendors integrate water cooling into the rear-door of a rack or cabinet. This can also be done by immersing servers in tanks of coolant, or through enclosed systems featuring pipes and plates that bring cooling inside the chassis and directly to the processor.
The latter approach will be used by the CPU-powered component of Frontera, which will features a CoolIT DLC system adapted for the Dell EMC servers. CoolIT recently shared an image on its social channels showing what a prototype of the Frontera system will look like.
“The new Frontera systems represents the next phase in the long-term relationship between TACC and Dell EMC, focused on applying the latest technical innovation to truly enable human potential,” said Thierry Pellegrino, vice president of Dell EMC High Performance Computing. “The substantial power and scale of this new system will help researchers from Austin and across the U.S. harness the power of technology to spawn new discoveries and advancements in science and technology for years to come.”
“Accelerating scientific discovery lies at the foundation of the TACC’s mission, and enabling technologies to advance these discoveries and innovations is a key focus for Intel,” said Patricia Damkroger, Vice President in Intel’s Data Center Group and General Manager, Extreme Computing Group. “We are proud that the close partnership we have built with TACC will continue with TACC’s selection of next-generation Intel Xeon Scalable processors as the compute engine for their flagship Frontera system.”