Equipment Longevity & Performance: Why the Bathtub Curve Is Inaccurate

Aug. 12, 2022
Chad Peters, Director of Infrastructure Solutions for Service Express, revisits the Bathtub Curve theory and explains how to track equipment reliability and performance for data-driven buying decisions.
The power of computing and data storage is only as good as product reliability.
With billions of individuals accessing data daily for personal and professional use, confidence in its accessibility and availability is paramount to everyday life. But how can OEMs and suppliers ensure their products are reliable — not just at the point of sale — but for years down the line?

We often hear the Bathtub Curve is the tried-and-true method for determining product reliability, but this common tactic often overlooks the complexity of longevity and reliability. Read on as we dispel the myth of a previously thought ironclad method and break down the importance of continued data collection throughout the product lifecycle.

Revisiting the Bathtub Curve

In our popular 2020 article The Bathtub Curve and Data Center Equipment Reliability, Jake Blough, Chief Technology Officer for Service Express, dispels several myths about new and aging hardware. The Bathtub Curve theory suggests that equipment failure rates are high when a product is released, then decrease over the next two to three years. According to the theory, the failures increase again toward the product’s End of Life (EOL) date.

Although the Bathtub Curve (pictured below) accurately reflects the failure behavior of many products, we’ve found it does not universally apply to all equipment the same.

Defining critical and non-critical failures

To better understand the context of the Bathtub Curve, it’s essential to define the common types of failures seen in data center hardware: critical and non-critical.

A critical failure occurs when something like a CPU or system board fails. Critical server failures result in the loss of access to applications or data, a significant issue that impacts overall business productivity.

A non-critical failure occurs when a component like a disk drive or power supply fails. Modern data center equipment has built-in redundancies for these components, so there is no data loss.

Contrary to what the Bathtub Curve portrays, the sample data above shows failure rates don’t drastically increase. By combining reliability and longevity data, we can better understand how equipment performs over time.

Testing the myth

At Service Express, we’ve collected over 15 years of equipment data from over half a million devices to understand equipment longevity and reliability better.

Our previous article only studied equipment longevity and how it stacks up against the Bathtub Curve. In the past two years, we’ve implemented real-time reliability studies that allow for previously unseen granularity. But what’s the difference between longevity and reliability?

Longevity reporting details a product’s expected failure rate over time. This can be useful for CapEx budget planning purposes. However, beyond equipment longevity, reliability plays a critical role. Reliability studies examine how equipment has performed over time. This practice is useful in identifying outliers within a product model number or family.

Below is an example of equipment and its reliability. This graph utilizes data sets from over 500,000 pieces of equipment under agreement for over 6,000 customers and compares performance using 55,000 average annual service calls.

In this example (pictured above), we use the following definitions when describing the quadrants.

  • Upper right – Customer model is performing better than the product family and perhaps better than similar models
  • Upper left – Customer model is worse than the overall product family but is better than similar models for other customers
  • Lower left – Model is underperforming against the product family, and the customer’s experience is worse than others with similar models
  • Lower right – Customer experience with the model is worse than others with similar models but better than the product family

The data shows that Product A is in the lower right quadrant. This position signifies that, based on equipment model failures, Product A is typically a well-performing product compared to other products within its product family. Still, the customer is having a worse experience than most.

When utilizing historic equipment reliability and performance data, customers can better understand which equipment meets performance needs or vice versa. This practice has helped customers make data-driven decisions in the data center to optimize current environments or make necessary upgrades to maintain productivity.

The impact of leveraging data-driven equipment insights

The Bathtub Curve is a useful tool when speaking about equipment performance in general. However, this method comes with many misnomers that can cause suppliers to repurchase parts or switch models before necessary.

By combining longevity studies with real-time reliability data, we can better understand which equipment will continue to perform over time and which hardware requires our attention. When those high performers (or outliers) are defined, customers can better plan or extend hardware refreshes and more efficiently develop budget and resource planning.

Chad Peters, Director of Infrastructure Solutions for Service Express, a global data center solutions provider that helps IT teams control costs, optimize infrastructure strategies and automate support.

About the Author

Voices of the Industry

Our Voice of the Industry feature showcases guest articles on thought leadership from sponsors of Data Center Frontier. For more information, see our Voices of the Industry description and guidelines.

Sponsored Recommendations

The AI Disruption: Challenges and Guidance for Data Center Design

From large training clusters to small edge inference servers, AI is becoming a larger percentage of data center workloads. Learn more.

A better approach to boost data center capacity – Supply capacity agreements

Explore a transformative approach to data center capacity planning with insights on supply capacity agreements, addressing the impact of COVID-19, the AI race, and the evolving...

How Modernizing Aging Data Center Infrastructure Improves Sustainability

Explore the path to improved sustainability in data centers by modernizing aging infrastructure, uncovering challenges, three effective approaches, and specific examples outlined...

How Modern DCIM Helps Multi-Tenant Colocation Data Centers Be More Competitive

Discover the transformative impact of modern DCIM software on multi-tenant colocation data centers, enhancing competitiveness through improved resiliency, security, environmental...

Sashkin/Shutterstock.com

Unpacking CDU Motors: It’s Not Just About Redundancy

Matt Archibald, Director of Technical Architecture at nVent, explores methods for controlling coolant distribution units (CDU), the "heart" of the liquid cooling system.

White Papers

Get the full report.

Focusing on Data Center Expertise

Feb. 19, 2022
A new paper from CBRE looks at the importance of outsourcing as a way of delivering real-world data center facility management success.