Immersion GPU System Provides AI Horsepower for Frontera

Sept. 5, 2018
The new Frontera supercomputer at the University of Texas combines several approaches to liquid cooling, including GPUs immersed in dielectric liquid coolant fluid, and x86 servers using direct water cooling to the processor.

What might the rise of artificial intelligence revolution look like in the data center? If one new ssytem is any indication, it could look like GPUs immersed in dielectric liquid coolant fluid, supporting water-cooled x86 servers.

That’s the vision put forward by the creators of Frontera, a new $60 million supercomputer to be built at the Texas Advanced Computing Center (TACC) in Austin. It is expected to be the most powerful supercomputer at any U.S. university, and continue the TACC’s history of deploying new systems ranking among the top 10 on the Top500 list of the world’s leading supercomputers.

The vast majority of data centers continue to cool IT equipment using air, while liquid cooling has been used primarily in high-performance computing (HPC). With the growing use of artificial intelligence, more companies are facing data-crunching challenges that resemble those seen by the HPC sector, which could make liquid cooling relevant for a larger pool of data center operators.

The design for Frontera reflects the leading edge of HPC efficiency. Frontera is Spanish for “frontier,” and the new supercomputer will help advance the frontiers of liquid cooling, with a hybrid system that will combine Dell EMC servers with x86 Intel processors and water-cooling systems from CoolIT, and a smaller system using NVIDIA GPUs (graphic processing units) immersed in a tank of liquid coolant from GRC (previously Green Revolution Cooling). Data Direct Networks will contribute the primary storage system, and Mellanox will provide the high-performance interconnect for Frontera.

Applying Immersion Benefits to GPUs

Anticipated early projects for Frontera include analyses of particle collisions from the Large Hadron Collider, global climate modeling, and improved hurricane forecasting and “multi-messenger” astronomy research using gravitational waves and electromagnetic radiation.

“Many of the frontiers of research today can be advanced only by computing, and Frontera will be an important tool to solve grand challenges that will improve our nation’s health, well-being, competitiveness and security.” said Dan Stanzione, TACC executive director.

A GRC immersion cooling container in action. (Source: GRC)

TACC has been a leader in the use of immersion cooling, which sinks servers in liquid to cool the components, and began working with Austin-based neighbor GRC in 2009. In 2017 this collaboration was expanded to immersion cooling for NVIDIA GPUs, test-driving a system created by server vendor Supermicro. Using immersion cooling with GPUs is a fairly recent phenomenon, but may attract interest as more companies adopt GPUs for AI and other parallel processing challenges.

“The cost savings that immersion cooling enables (on the hardware side) are extremely impressive,” TACC’s Stanzione said of the 2017 project. “Being early adopters of GRC’s immersion cooling system we have seen the technology mature rapidly over the years. And with the growing power and computing needs of AI and machine learning applications, especially with hotter and hotter GPUs, cooling is even more important for reliability.”

AI Data Crunching Boosts Density

New hardware for AI workloads is packing more computing power into each piece of equipment, boosting the power density – the amount of electricity used by servers and storage in a rack or cabinet – and the accompanying heat. The trend is challenging traditional practices in data center cooling, and prompting data center operators to adapt new strategies and designs.

The alternative is to bring liquids into the server chassis to cool chips and components.  Some vendors integrate water cooling into the rear-door of a rack or cabinet. This can also be done by immersing servers in tanks of coolant, or through enclosed systems featuring pipes and plates that bring cooling inside the chassis and directly to the processor.

An example of what the CoolIT direct liquid cooling system for Frontera will look like. (Source: CoolIT)

The latter approach will be used by the CPU-powered component of Frontera, which will features a CoolIT DLC system adapted for the Dell EMC servers. CoolIT recently shared an image on its social channels showing what a prototype of the Frontera system will look like.

“The new Frontera systems represents the next phase in the long-term relationship between TACC and Dell EMC, focused on applying the latest technical innovation to truly enable human potential,” said Thierry Pellegrino, vice president of Dell EMC High Performance Computing. “The substantial power and scale of this new system will help researchers from Austin and across the U.S. harness the power of technology to spawn new discoveries and advancements in science and technology for years to come.”

“Accelerating scientific discovery lies at the foundation of the TACC’s mission, and enabling technologies to advance these discoveries and innovations is a key focus for Intel,” said Patricia Damkroger, Vice President in Intel’s Data Center Group and General Manager, Extreme Computing Group. “We are proud that the close partnership we have built with TACC will continue with TACC’s selection of next-generation Intel Xeon Scalable processors as the compute engine for their flagship Frontera system.”

About the Author

Rich Miller

I write about the places where the Internet lives, telling the story of data centers and the people who build them. I founded Data Center Knowledge, the data center industry's leading news site. Now I'm exploring the future of cloud computing at Data Center Frontier.

Sponsored Recommendations

Get Utility Project Solutions

Lightweight, durable fiberglass conduit provides engineering benefits, performance and drives savings for successful utility project outcomes.

Guide to Environmental Sustainability Metrics for Data Centers

Unlock the power of Environmental, Social, and Governance (ESG) reporting in the data center industry with our comprehensive guide, proposing 28 key metrics across five categories...

The AI Disruption: Challenges and Guidance for Data Center Design

From large training clusters to small edge inference servers, AI is becoming a larger percentage of data center workloads. Learn more.

A better approach to boost data center capacity – Supply capacity agreements

Explore a transformative approach to data center capacity planning with insights on supply capacity agreements, addressing the impact of COVID-19, the AI race, and the evolving...


Coolant Distribution Units: The Heart of a Liquid Cooling System

nVent's Abhishek Gupta explains why CDUs are at the core of driving the efficiencies that liquid cooling can bring to data centers, so choosing the right one is critical.

White Papers

Get the full report

Ethernet in Data Center Networks

Aug. 1, 2022
This white paper from Anritsu discusses Ethernet usage trends in data center networks, as well as the technologies helping operators meet growing bandwidth demands and verify ...