Google Unveils TPU/GPU Enhancements for Large-Scale Data Center AI Workloads

Just in time for "back-to-school" season comes new TPU / GPU data center hardware and software goodies from Google Cloud.

Matt Vincent

Sept. 5, 2023

7 min read

Add Us On Google

It's been a busy summer for the data center industry particularly in the area of AI, and now, just in time for "back-to-school" season, comes new TPU / GPU hardware and software goodies from Google for furnishing enhanced cloud capabilities.

As previously noted in DCF reporting by David Chernicoff, cloud-based AI is having an immediate impact on the way the world does business. Google offers more than 20 AI products, services, and solution sets via Google Cloud, whose primary constituency includes developers and data scientists.

Chernicoff notes further:

"Google defines their available services by use case, breaking those down by data scientist, developer, infrastructure, business solutions, and consulting.
Infrastructure is where customers get access to just that; preconfigured deep learning containers and deep learning VMs to run on the Google cloud, but also access to high performance GPUs, Google’s in-house developed hardware accelerated TPUs, and TensorFlow enterprise, which allows users to scale resources across CPUs, GPUs, and TPUs.
This infrastructure supports Google's AI tools, from their unified ML development platform, Vertex AI, to their AutoML tool for training custom ML models."

In an August 29th blog, Google's Amin Vahdatm VP/GM ML, Systems, and Cloud AI; and Mark Lohmeyer VP/GM Compute and ML Infrastructure announced significant enhancements to Google Cloud's AI infrastructure technologies in the form of TPUs and GPUs. The central announcements were two:

Now available in preview, the TPU v5e platform integrates with Google Kubernetes Engine (GKE) and Vertex AI systems, as well as burgeoning frameworks such as Pytorch, JAX and TensorFlow, allowing users to get started with easy-to-use, familiar interfaces.
Concurrently announced, the company's A3 VMs, based on Nvidia H100 GPUs and delivered as a GPU supercomputer, are to be made generally available this month to power large-scale AI models.

Hitting the AI sweet spot

The company bills its Cloud TPU v5e hardware and software platform as its "most cost-efficient, versatile, and scalable Cloud TPU to date."

The company added that the platform is tailored to hit the performance and cost-efficiency "sweet spot" for medium- and large-scale training and inference purposes, delivering up to 2x higher training performance per dollar, and up to 2.5x inference performance per dollar for LLMs and generative AI models, as compared to Cloud TPU v4.

At less than half the cost of TPU v4, the company said TPU v5e makes it possible for more organizations to train and deploy larger, more complex AI models. How so? The TPU v5e pods allow up to 256 chips to be interconnected with an aggregate bandwidth of more than 400 Tb/s and 100 petaOps of INT8 performance.

The TPU v5e platform also provides support for eight different virtual machine (VM) configurations, ranging from one chip to more than 250 chips within a single slice, allowing customers to choose the right configurations for serving a wide range of LLM and generative AI model sizes.

"Our speed benchmarks are demonstrating a 5X increase in the speed of AI models when training and running on Google Cloud TPU v5e," asserted Wonkyum Lee, Head of Machine Learning, Gridspace. "We are also seeing a tremendous improvement in the scale of our inference metrics. We can now process 1000 seconds in one real-time second for in-house speech-to-text and emotion prediction models — a 6x improvement.”

The blog also announced the general availability of Cloud TPUs in GKE, the scalable and ubiquitous Kubernetes service. Customers can now improve AI development productivity by leveraging GKE to manage large-scale AI workload orchestration on Cloud TPU v5e, as well as on Cloud TPU v4.

Additionally as part the enhanced platform launch, the Google Vertex AI tool now supports training with various frameworks and libraries using Cloud TPU VMs for organizations that prefer the simplicity of managed services.

The company noted the Cloud TPU v5e platform also now supports AI frameworks provided by open-source tools such as Hugging Face’s Transformers and Accelerate platforms, as well as the PyTorch Lightning and Ray systems. The upcoming PyTorch/XLA 2.1 release will include Cloud TPU v5e support, along with new features including model and data parallelism for large-scale model training, among others.

Finally, the company's new Multislice technology, now available in previews, allows users to easily scale AI models beyond the boundaries of physical TPU pods over up to tens of thousands of Cloud TPU v5e or TPU v4 chips for easier scale-up of training jobs.

Announced at last month's Google Cloud Next, Multislice is billed as a full-stack large-scale training technology that enables near-linear scaling up to tens of thousands of Cloud Tensor Processing Units (TPU) chips.

As further noted by the Google blog:

Historically, a training run could only use a single slice, a reservable collection of chips connected via inter-chip interconnect (ICI). This meant that a run could use no more than 3072 TPU v4 chips, which is currently the largest slice in the largest Cloud TPU system. With Multislice, a training run can scale beyond a single slice and use multiple slices over a number of pods by communicating over data center networking (DCN).
Until now, training jobs using TPUs were limited to a single slice of TPU chips, capping the size of the largest jobs at a maximum slice size of 3,072 chips for TPU v4. With Multislice, developers can scale workloads up to tens of thousands of chips over inter-chip interconnect (ICI) within a single pod, or across multiple pods over a data center network (DCN). Multislice technology powered the creation of our state-of-the-art PaLM models; now we are pleased to make this innovation available to Google Cloud customers."

“Easily one of the most exciting features for our development team is the unified toolset. The amount of wasted time and hassle they can avoid by not having to make different tools fit together will streamline the process of taking AI from idea to training to deployment," observed Yoav HaCohen, Core Generative AI Team Lead, Lightricks, who added, "Configuring and deploying your AI models on Google Kubernetes Engine and Google Compute Engine along with Google Cloud TPU infrastructure enables our team to speed up the training and inference of the latest foundation models at scale, while enjoying support for autoscaling, workload orchestration, and automatic upgrades.“

A3 VMs power GPU supercomputers to drive gen AI workloads in partnership with Nvidia

To empower customers to take advantage of the rapid advancements in AI, Google Cloud is partnering closely with Nvidia on new AI cloud infrastructure and open-source tools for Nvidia GPUs, as well as building workload-optimized end-to-end solutions specifically for generative AI.

Google Cloud and Nvidia aim to make AI more accessible for a broad set of workloads. The blog notes how that vision is now being realized with Google's co-announcement that A3 VMs for powering GPU supercomputers' generative AI workflows will be generally available this month.

Powered by Nvidia’s H100 Tensor Core GPUs, which address trillion-parameter models, the Google Cloud A3 VMs are especially built to train and serve especially demanding generative AI workloads and LLMs.

The companies said combination of Nvidia GPUs with Google Cloud’s infrastructure technologies stands to improve scale and performance and represents a significant extension in supercomputing capabilities, with 3x faster training, and 10x greater networking bandwidth compared to the prior generation platform.

The A3 enhancement enables users to scale models to tens of thousands of Nvidia H100 GPUs.

As further specified by Google:

"The A3 VM features dual next-generation 4th Gen Intel Xeon scalable processors, eight Nvidia H100 GPUs per VM, and 2TB of host memory. Built on the latest Nvidia HGX H100 platform, the A3 VM delivers 3.6 TB/s bisectional bandwidth between the eight GPUs via fourth generation Nvidia NVLink technology. Improvements in the A3 networking bandwidth are powered by our Titanium network adapter and Nvidia Collective Communications Library (NCCL) optimizations. In all, A3 represents a huge boost for AI innovators and enterprises looking to build the most advanced AI models."

The following video provides an inside view of the data centers where the Google Cloud TPUs operate. As noted by the company, "Customers use Cloud TPUs to run some of the world's largest AI workloads and that power comes from much more than just a chip." The video provides a rare peek at the data center physical infrastructure of the TPU system, including core components for data center networking, optical circuit switches, advanced biometric security verification capabilities, and new cooling systems technology.

About the Author

Matt Vincent

Matt Vincent is Editor in Chief of Data Center Frontier, where he leads editorial strategy and coverage focused on the infrastructure powering cloud computing, artificial intelligence, and the digital economy. A veteran B2B technology journalist with more than two decades of experience, Vincent specializes in the intersection of data centers, power, cooling, and emerging AI-era infrastructure. Since assuming the EIC role in 2023, he has helped guide Data Center Frontier’s coverage of the industry’s transition into the gigawatt-scale AI era, with a focus on hyperscale development, behind-the-meter power strategies, liquid cooling architectures, and the evolving energy demands of high-density compute, while working closely with the Digital Infrastructure Group at Endeavor Business Media to expand the brand’s analytical and multimedia footprint. Vincent also hosts The Data Center Frontier Show podcast, where he interviews industry leaders across hyperscale, colocation, utilities, and the data center supply chain to examine the technologies and business models reshaping digital infrastructure. Since its inception he serves as Head of Content for the Data Center Frontier Trends Summit. Before becoming Editor in Chief, he served in multiple senior editorial roles across Endeavor Business Media’s digital infrastructure portfolio, with coverage spanning data centers and hyperscale infrastructure, structured cabling and networking, telecom and datacom, IP physical security, and wireless and Pro AV markets. He began his career in 2005 within PennWell’s Advanced Technology Division and later held senior editorial positions supporting brands such as Cabling Installation & Maintenance, Lightwave Online, Broadband Technology Report, and Smart Buildings Technology. Vincent is a frequent moderator, interviewer, and keynote speaker at industry events including the HPC Forum, where he delivers forward-looking analysis on how AI and high-performance computing are reshaping digital infrastructure. He graduated with honors from Indiana University Bloomington with a B.A. in English Literature and Creative Writing and lives in southern New Hampshire with his family, remaining an active musician in his spare time.

You can connect with Matt via LinkedIn or email.

Vertiv Launches OneCore Modular Data Center Platform for AI and HPC

Amazon’s Project Rainier Sets New Standard for AI Supercomputing at Scale

Sponsored

Get in Touch: Conduit Solutions for Data Centers

Sponsored

NECA Manual of Labor Rates Chart

Voices of the Industry

Sponsored

The Power Loop and the Cooling Loop Need to Start Talking

nVent's David Wood explains why the next generation of data center efficiency gains will come from making the power loop and the cooling loop operate as a single system, not by...

Sponsored

Harmonic Distortion in Data Centers: A Power Quality Planning Guide

Rick Hombsch of MTE explains why power quality is not optional, it's foundational.

Google Unveils TPU/GPU Enhancements for Large-Scale Data Center AI Workloads

Hitting the AI sweet spot

A3 VMs power GPU supercomputers to drive gen AI workloads in partnership with Nvidia

About the Author

Matt Vincent

Related

Vertiv Launches OneCore Modular Data Center Platform for AI and HPC

Amazon’s Project Rainier Sets New Standard for AI Supercomputing at Scale

Get in Touch: Conduit Solutions for Data Centers

NECA Manual of Labor Rates Chart

Voices of the Industry

The Power Loop and the Cooling Loop Need to Start Talking

Harmonic Distortion in Data Centers: A Power Quality Planning Guide

Trending

Revolutionizing Data Center Cooling: Innovations for AI and HPC Growth

Building the AI Factory: Power, Cooling, and Execution at Scale Meets the Deployment Reality Gap - Q2 Executive Roundtable

Emerging Power Strategies Transforming AI Data Center Development

Sponsored Picks

Choosing the Right Underground Conduit: Types, Benefits, and Applications

NECA Manual of Labor Rates Chart

Get in Touch: Conduit Solutions for Data Centers