Uptime: Networks, Software Play Growing Role in Data Center Outages

April 19, 2021
Networking and software issues are emerging as two of the more common causes of data center outages, while power problems are becoming somewhat less of an issue, according to new data from Uptime Institute.

Networking and software issues are emerging as two of the more common causes of data center outages, while power problems are becoming somewhat less of an issue, according to new data from Uptime Institute’s Annual Outage Analysis.

This trend is not surprising, given the growing role of cloud computing and SaaS (software as a service) applications, which increasingly use architectures that can route around physical failures of electrical components like UPS systems, transfer switches and generators. To be sure, power chain issues still have a major impact on downtime, as seen in several headline-grabbing incidents in early 2021.

“Overall, the causes of outages are changing,” said Andy Lawrence, executive director of research, Uptime Institute. “Software and IT configuration issues are becoming more common, while power issues are now less likely to cause a major IT service outage.”

Online services were more important than ever in 2020, as the COVID-19 pandemic and social distancing practices boosted remote work and learning. That meant that service outages were more broadly felt, and generated wider notice.

“Although there were significant disruptions affecting financial trading, government services, internet and telecom, the outages that made headlines in 2020 were often about the impact to consumers and workers at home, with interruptions to applications such as Microsoft Exchange and Teams, Zoom, fitness trackers and the like,” Uptime noted.

Outages Create More Concern, More Cost

The cost of data center outages goes beyond headlines and user complaints on social media. More than half of the respondents who reported an outage to Uptime in the past three years estimated its cost at more than $100,000, and almost a third reported costs of $1 million or above.

“Resiliency remains near the top of management priorities when delivering business services,” said Lawrence. “The fact is outages remain common and justify the increased concern and investment in preventing them. Because of the disruption and high costs that result from disrupted IT services, identifying and analyzing the root causes of failures is a critical step in avoiding more expensive problems.”

Some of the findings from Uptime’s 2020 survey include:

  • Almost half (44%) of data center operators surveyed think that concern about resiliency of data center/mission-critical IT has increased in the past twelve months.
  • Serious and severe outages are less common (one in six reported having one in the past three years) but can have catastrophic results for stakeholders. Vigilance and investment are necessary.
  • More than half (56%) of all organizations using a third-party data service have experienced a moderate or serious IT service outage in the last three years that was itself caused by the provider.

As Architecture Shifts, So Does Outage Culprits

The focus on third-party service performance accompanies the ongoing shift from on-premises data centers to the use of colocation facilities and cloud platforms, which has a positive impact on uptime, but also amplifies any failures in the networks and software automation that drive the cloud delivery model. (For more on this shift, see our DCF article “Rethinking Redundancy: Is Culture Part of the Problem”).

“This rise in outages caused by IT systems and network issues is due to the broad shift in recent years from siloed IT services running on dedicated, specialized equipment to an architecture in which more IT functions run on standard IT systems, often distributed or replicated across many sites,” Uptime says in its outage report. “As more organizations move to cloud-based, distributed IT (driven by a desire for greater agility and automation), the underlying data center infrastructure is becoming less of a focus or a single point of failure.

“This does not mean, however, that there is any case, at least at present, for de-emphasizing site-level resiliency or investing less,” Uptime added. “Site-level failures invariably cause major problems, regardless of whether distributed resiliency architectures are deployed.”

As the originator of the Tier System, which has long been used as a benchmark for reliability design, Uptime has an ongoing interest in equipment redundancy, a key focus for the tier ratings. As recent events have shown, power equipment continues to be central to uptime.

  • In March, an OVH data center in Strasbourg, France was destroyed by a fire.  While no final analysis has been provided, early indications point to UPS units as the likely origin of the incident.
  • This month, an emergency generator caught fire at a WebNX data center in Ogden, Utah, causing the full shutdown of the data center and lengthy outages for customers.
About the Author

Rich Miller

I write about the places where the Internet lives, telling the story of data centers and the people who build them. I founded Data Center Knowledge, the data center industry's leading news site. Now I'm exploring the future of cloud computing at Data Center Frontier.

Sponsored Recommendations

The AI Disruption: Challenges and Guidance for Data Center Design

From large training clusters to small edge inference servers, AI is becoming a larger percentage of data center workloads. Learn more.

A better approach to boost data center capacity – Supply capacity agreements

Explore a transformative approach to data center capacity planning with insights on supply capacity agreements, addressing the impact of COVID-19, the AI race, and the evolving...

How Modernizing Aging Data Center Infrastructure Improves Sustainability

Explore the path to improved sustainability in data centers by modernizing aging infrastructure, uncovering challenges, three effective approaches, and specific examples outlined...

How Modern DCIM Helps Multi-Tenant Colocation Data Centers Be More Competitive

Discover the transformative impact of modern DCIM software on multi-tenant colocation data centers, enhancing competitiveness through improved resiliency, security, environmental...

Yentafern/Shutterstock.com

Cooling the AI Revolution in Data Centers

Nathan Blom of Iceotope Technologies explains that by aligning liquid cooling strategies with broader business objectives, organizations can accelerate innovation, improve cost...

White Papers

Get the full report

Boston Data Center Market

April 27, 2022
The Boston region is one of the most prominent data center markets in the northeast, despite a higher cost of power than is found in most major markets. DCF, in conjunction with...