The rapid rise of AI workloads is causing a lot of excitement, activity, predictions, and investment in the industry. For some, it’s a bigger topic right now than the cloud. Indeed the rise of AI is a major driver behind the shortage of data center space, and existing data centers currently maximizing their loads will struggle to cope with the demands of the upcoming AI deployment boom.
To address the mammoth demands of AI, developers are constructing new data centers — ones that are built with AI in mind. Although it’s a major industry trend, deployments remain relatively uncommon, and represent a new way of doing things. As developers rush to erect facilities for AI deployments, data center end users need to be confident that their developers are knowledgeable, and prepared to manage their exponential AI-based growth. To do so, let’s peel back the first layer of AI-related construction issues to understand more about the specific requirements of AI applications, what data center end users need to look for in their developer, and why it's all so important to get it right.
The Problems With Power
Let’s start with the most obvious area: power. The surge in AI adoption comes with a unique set of challenges, particularly concerning utility power constraints for data centers. AI applications, fueled by machine learning and deep learning algorithms, require substantial computational power. Data centers find themselves grappling with intensifying demand for computing resources, contributing to increased power consumption within data centers. Here at PowerHouse Data Centers, we are seeing power requirement demands at least two to three times greater with AI than in more traditional use cases. This demand uptick places additional stress on local utility power infrastructure, leading to constraints and necessitating upgrades to meet evolving needs. For these reasons, it is imperative that end users partner with a data center developer that has trusted relationships with each utility, as well as a firm grasp of power timetables, forecasting, and availability.
Simultaneously, the semiconductor industry faces shortages in the production of high-performance chip sets crucial for AI computations. The demand for specialized chips designed for AI workloads has outpaced supply, leading to a scarcity in the market. This shortage amplifies the challenges faced by data centers and hyperscalers alike, as they strive to keep pace with AI demands while contending with limited access to the cutting-edge hardware required for optimal performance.
Power requirements of AI applications significantly differ from those of traditional commercial cloud computing. In contrast to commercial cloud computing, which often handles conventional business workloads, AI workloads are more computationally intensive, leading to higher power demands. AI applications deployed in scenarios requiring real-time decision-making, demand rapid and continuous processing of data. Instantaneous responses in applications such as autonomous vehicles or robotics place additional stress on the computational infrastructure, once again leading to higher power needs as compared to the intermittent processing requirements of commercial cloud computing. Additionally, AI workloads often require considerable scalability to handle vast datasets and complex models efficiently. While commercial cloud computing also exhibits scalability, the scale and complexity of distributed AI computing is often far greater.
Layouts And All The Rest
These kinds of examples continue with other important facets of a new build project, including cooling. It’s generally accepted that the heat generated by AI hardware can be substantial. The specialty cooling solutions used to address that heat, though, is not generally practiced. Simply put, a 50-kilowatt rack can’t be cooled like a 10-kilowatt rack, and a developer with know-how in the areas of chipset cooling, rack-level cooling, and liquid cooling should be a must.
Here's another one: hardware optimization. It’s commonly known that AI-devoted data centers need to support various types of specialized hardware, like graphics processing units (GPUs), tensor processing units (TPUs), and field-programmable gate arrays (FPGAs). But, it’s much less common knowledge to understand how those units need to be clustered and connected. As with the discussion about power, AI the latency into the data center is much less important than the latency within the data center. So, the cabling between each rack and cluster requires far more attention in order for the groups of servers to operate quickly and cohesively as possible, especially with special AI communication patterns like model parallelism or distributed training. Many variables are different from the average data center. All links must be shorter, active optical cables may be in use instead of transceivers, and each server must be connected to the switch fabric, storage, and out-of-band management. For AI workloads like model training and inference, high-throughput internal networking is all the more critical. For large model training, one estimate holds that a full 30 percent of the application’s processing is spent on network latency instead of compute time. In sum, a good data center developer will know about the hardware requirements, but a great one will know about the design functionally necessary to optimize their networking capabilities.
While all of these areas (and more) are crucially important for data center operations, their culmination comes very early in the process: the initial design stage. A plan to accommodate 40 kilowatts per rack needs to be charted carefully, and from the very start. It’s not simply a matter of taking a traditional data center space and introducing new IT equipment. As the International Data Corporation’s Ashish Nadkarni says in regard to contemporary data center builds, “Do as much pre-planning as you possibly can. Preparation is far more important than the building.” Hot and cold aisles need to be larger, server cabinets need to be deeper, flooring needs to be thicker. All of this, of course, has major ramifications for the size and shape of the facility, not to mention other considerations like electrical equipment, humidity control, and fire suppression. Determining how to move air and power through the building has substantial ramifications for a data center layout, all of which a stalwart developer will be able to navigate.
For Now And For The Future
The importance of building an AI data center the right way is vital for the launch of a successful deployment. It’s even more important for that facility years after deployment. As a novel presence in the market, AI technology will undergo significant change. As such, AI data centers should be designed with scalability and flexibility in mind, allowing for the integration of new software frameworks, libraries, tools, and regulations that may emerge.
While data center end clients have been deploying traditional IT computer assets for decades, AI deployments represent a new kind of build whose true demands are still being established. Prospective clients should be keen to seek out strong developers that can guide them through the unchartered waters of new AI deployment by demonstrating understanding the specific requirements of AI applications, designing the data center infrastructure to support diverse AI workloads, and future-proofing the facility for evolving AI technologies.