How AI is driving changes in network infrastructure

In the rapidly evolving realm of AI technology, new developments surface nearly every day.

Gary Bernstein global data center solutions specialist at Siemon

Recent achievements encompass the debut of OpenAI's ChatGPT and Microsoft's GPT-4 in April 2023, trailed by Google's Bard AI Chatbot in May. Clearly, AI possesses a significant capacity to transform our lives, spanning chatbots, facial recognition, self-driving vehicles, and early disease detection.

According to Statista, the global AI market is valued at 142.3 billion U.S. dollars in 2023, with finance, healthcare and the high-tech/telco markets taking the lead in AI adoption.

Taking a closer look at data centers, AI is already being utilized for the monitoring of data center assets, for proactively detecting faults and for improving energy efficiency through better management of Power Usage Effectiveness (PUE).

What we are seeing today is that AI is being utilized not only by hyperscalers but also by many large enterprise companies.

InfiniBand vs. Ethernet

Many of today's AI networks utilize InfiniBand technology. This is noteworthy as Ethernet remains the prevailing global standard in most data centers, with InfiniBand holding a mere fraction of the market share, primarily for HPC networks. A competition is emerging between InfiniBand market leader Nvidia and prominent Ethernet switch and chip manufacturers such as Cisco, Arista, and Broadcom. Recently, Broadcom unveiled the "Jericho3-AI" StrataDNX chip, designed to construct AI clusters using Ethernet instead of InfiniBand. Regardless of the protocol chosen, both InfiniBand and Ethernet share requirements for high bandwidth and low latency, necessitating top-tier optical cabling solutions for optimal performance.

Skyrocketing demands for power and bandwidth

Two of the key challenges that data centers are facing relate to extreme power needs and associated cooling requirements for the equipment as well as the exorbitant bandwidth needs of the GPUs.

Supercomputers with GPUs running AI applications, like Nvidia’s DGX platform, demand vast power and multiple high bandwidth connections. Nvidia currently presents DGX models: A100, H100, and the latest GH200 at the Computex Conference, in May 2023. These GPUs demand 6.5kW to over 11kW per 6U unit. When contrasted with packed data center cabinets, averaging 7-8kW and maxing at 15-20kW per cabinet, the extent of AI's power appetite becomes clear.

Regarding bandwidth, these GPUs typically need up to 8x100Gb/s (EDR) or 200Gb/s (HDR) connections. Every GPU offers 8 connections, summing up to 8x200G per GPU.

Jensen Huang, CEO of Nvidia, recently stated in Data Center Frontier that "Generative AI is driving exponential growth in compute requirements," and that "You're seeing the beginning of a 10-year transition to basically recycle or reclaim the world's data centers and build it out as accelerated computing.”

How is IT infrastructure going to cope?

The power and cooling demands are pushing network managers to reconsider infrastructure design and enact modifications. This often involves altering network blueprints and spacing out GPU cabinets further, potentially adopting End of Row (EoR) configurations to better handle escalating temperatures. This translates to an increased physical gap between switches and GPUs. To accommodate these extended switch-to-GPU links, data center operators might need to incorporate more fiber cabling, supplementing the customary structured fiber cabling used for switch-to-switch connections. Given these extended spans, direct attach cables (DACs) are unlikely to be suitable as they are confined to 3 to 5 meters at most for such speeds. Alongside fiber cabling, active optical cables (AOCs) also stand as a feasible choice due to their capacity to cover greater distances compared to DACs. AOCs offer the added advantages of significantly reduced power consumption in comparison to transceivers, as well as enhanced latency. Siemon provides active optical cables in increments of 0.5m, thereby enhancing cable management.

Transitioning data center backbone interconnections between switches will necessitate parallel optic technology to sustain increasing bandwidth demands. Several existing choices for parallel fiber optic technology employ 8 fibers in conjunction with multi-fiber push-on connectivity (MPO/MTP fiber connectors). These MPO Base-8 solutions permit the adoption of either multimode or singlemode fiber and facilitate smooth migration to higher speeds. For enterprise data centers, contemplating a Base-8 MPO OM4 cabling solution is advisable when upgrading to 100Gb/s and 400Gb/s. Conversely, cloud data centers should select a Base-8 MPO singlemode cabling solution while transitioning to 400Gb/s and 800Gb/s speeds.

Innovative new fiber enclosure systems on the market can flexibly support different fiber modules, including Base-8 and Base-12 with shuttered LC, MTP pass-thru modules, and splicing modules. They allow for easy access and improved cable management.

In the realm of AI applications, where latency holds immense significance, Siemon suggests opting for 'AI-Ready' solutions employing ultra-low loss (ULL) performance alongside MTP/APC connectors. The incorporation of ultra-low loss fiber connectivity becomes pivotal for emerging short-reach singlemode applications (backing 100, 200, and 400 Gb/s speeds over distances exceeding 100 meters). This ULL connectivity effectively meets the more stringent insertion loss prerequisites set by AI applications, thereby enhancing the entirety of network performance. Additionally, Siemon advises the adoption of APC (angled physical connect) fiber connectors, including the MTP/APC variant, for specific multimode cabling applications, alongside the traditional singlemode approach. The angle-polished end-face configuration of APC connectors (in contrast to UPC connectors) reduces reflectance, thus elevating fiber performance.

AI stands as a disruptive technology, yet it harbors the capacity to transform the very fabric of our existence and professional engagements. Preparations must be undertaken by data center operators to accommodate AI's requisites, initiating this process promptly. Contemplation should be directed towards adopting measures facilitating a seamless shift to elevated data speeds, concurrently focusing on enhancing the energy efficiency of data centers. Those data center operators who adeptly brace for AI's demands will find themselves well-placed to leverage the forthcoming prospects accompanying AI's evolutionary journey and its widespread integration.

Gary Bernstein is global data center solutions specialist at Siemon. Established in 1903, Siemon is an industry leader specializing in the design and manufacture of high quality, high performance IT infrastructure solutions and services for Data Centers, LANs, and Intelligent Buildings. Contact Siemon to learn more.