Uptime: Longer Data Center Outages Are Becoming More Common

The frequency of data center downtime hasn’t changed significantly, but outages are becoming longer and more expensive, according to new research from The Uptime Institute, which reported a “disturbing increase” in outages longer than 24 hours.

Rich Miller

June 8, 2022

5 min read

Add Us On Google

The Uptime Institute summary of downtime in 2022. (Image: The Uptime Institute)

Uptime is always the prime directive for data centers. As the world recovers from the COVID-19 pandemic, reliable digital infrastructure is more important than ever in keeping the economy connected.

So how are things going? The frequency of data center downtime hasn’t changed significantly, but outages are becoming longer and more expensive, according to new research from The Uptime Institute. The key findings:

Prolonged downtime is becoming more common in publicly reported outages. The gap between the beginning of a major public outage and full recovery has stretched significantly over the last five years, with nearly 30% of these outages in 2021 lasted more than 24 hours, which Uptime characterized as “a disturbing increase” from just 8% in 2017.
Downtime is also becoming more expensive, with more than 60% of failures resulting in at least $100,000 in total losses, up substantially from 39% in 2019. The share of outages that cost upwards of $1 million increased from 11% to 15% over that same period.
In a trend we first highlighted last year, networking issues have become the single biggest cause of all IT service downtime incidents – regardless of severity – over the past three years. Uptime attributes this to “complexities from the increasing use of cloud technologies, software-defined architectures and hybrid, distributed architectures. “
The most significant outages are usually tied to electrical equipment, especially uninterruptible power supply (UPS) failures. “Power-related outages account for 43% of outages that are classified as significant (causing downtime and financial loss),” said Uptime.

The survey is the latest annual survey from The Uptime Institute, whose data is notable because it highlights trends in data center outages that may not be publicly reported.

Online services are more important than ever in the wake of the COVID-19 pandemic, which has boosted reliance on remote work and learning – meaning that service outages are more broadly felt, and generate wider notice.

Lengthy Downtime Incidents Make Headlines

A new wrinkle is the growth of lengthier outages over the past two years. Some of these have been very public, such as a massive global outage at Meta last October that left Facebook, Instagram and WhatsApp offline for at least five hours. Facebook later said that a configuration error broke its connection to a key network backbone, disconnecting all of its data centers from the Internet and leaving its DNS servers unreachable.

Another example is the 73-hour outage last year at Roblox, which cost the metaverse company an estimated $25 million in lost bookings. In an incident report, Roblox said several software services contended for resources, making it harder to diagnose a bug in a database.

The Facebook and Roblox incidents illustrates how the growing complexity of online applications can sometimes make it harder to trouble-shoot automated infrastructure, leading to lengthier outages.

The growing role for network issues was seen in a major outage at Amazon Web Services in December, with the ripples spreading across the Internet to interrupt service for many popular web services that run their infrastructure on the AWS cloud. The issue was traced to problems with several network devices in the AWS data center cluster in Northern Virginia.

Power and equipment issues were prominent in another lengthy outage in 2021, when a data center fire at OVH in Strasbourg, France left many customers offline for days. The SBG2 data center was destroyed by fire on March 9, which required the power to be turned off for the entire four-building campus. A second data center building, SBG1, was eventually shuttered after a smoke incident in a UPS room.

The growing financial impact of outages is not a surprise, given how digital services have become central to nearly every business. The “cost of downtime” has long been used to underscore the value of data center services and maintenance, but now serves as a reflection of increased reliance on data infrastructure.

More Investment, Yet More Complexity

All of this is happening in a period of enormous investment in digital infrastructure, including huge growth for cloud platforms, record-setting M&A action and the creation of new operating platforms for data centers.

That investment doesn’t neatly translate into improved reliability, especially in a complex environment in which new architectures are spreading IT workloads across cloud, colocation, edge and on-premises facilities.

“Digital infrastructure operators are still struggling to meet the high standards that customers expect and service level agreements demand – despite improving technologies and the industry’s strong investment in resiliency and downtime prevention,” said Andy Lawrence, founding member and executive director, Uptime Institute Intelligence. The survey resuls will be summarized in a presentation next week.

“The lack of improvement in overall outage rates is partly the result of the immensity of recent investment in digital infrastructure, and all the associated complexity that operators face as they transition to hybrid, distributed architectures,” said Lawrence. “In time, both the technology and operational practices will improve, but at present, outages remain a top concern for customers, investors, and regulators. Operators will be best able to meet the challenge with rigorous staff training and operational procedures to mitigate the human error behind many of these failures.”

About the Author

Rich Miller

I write about the places where the Internet lives, telling the story of data centers and the people who build them. I founded Data Center Knowledge, the data center industry's leading news site. Now I'm exploring the future of cloud computing at Data Center Frontier.

Vertiv Launches OneCore Modular Data Center Platform for AI and HPC

AI’s Execution Era: Aligned and Netrality on Power, Speed, and the New Data Center Reality

Sponsored

Get in Touch: Conduit Solutions for Data Centers

Sponsored

NECA Manual of Labor Rates Chart

Voices of the Industry

Sponsored

InP Capacity Arms Race: How AI-Driven Optical Interconnects Are Reshaping the Semiconductor Value Chain

Eric Yang, Vice Chairman and Secretary General of China International Optoelectronic Exposition (CIOE), provides an update on the global indium phosphide (InP) substrate market...

Source: alexgo.photography/Shutterstock.com, courtesy of BluePrint Supply Chain

Sponsored

Power Is Redefining Data Center Site Selection — And Forcing Supply Chains to Evolve With it

Jarrett Atkinson of BluePrint Supply Chain explains why the data center industry's next competitive advantage won't come from finding power alone. It will come from building the...

Uptime: Longer Data Center Outages Are Becoming More Common

Lengthy Downtime Incidents Make Headlines

More Investment, Yet More Complexity

About the Author

Rich Miller

Related

Vertiv Launches OneCore Modular Data Center Platform for AI and HPC

AI’s Execution Era: Aligned and Netrality on Power, Speed, and the New Data Center Reality

Get in Touch: Conduit Solutions for Data Centers

NECA Manual of Labor Rates Chart

Voices of the Industry

InP Capacity Arms Race: How AI-Driven Optical Interconnects Are Reshaping the Semiconductor Value Chain

Power Is Redefining Data Center Site Selection — And Forcing Supply Chains to Evolve With it

Trending

Meta’s Canadian AI Data Center: A New Model for Infrastructure and Energy Integration

When Buildability Breaks: What Prince William and New York Signal for Data Center Development

TeraWulf’s $19B Anthropic Lease Puts Its Brownfield AI Strategy to the Test

Sponsored Picks

AI-Driven Data Centers: Transforming Decarbonization Strategies

Case Study: Energy-Efficient Cooling and Cost Savings

NECA Manual of Labor Rates Chart