Getting the Most Reliability from Your Assets with Remote Monitoring

Feb. 7, 2024
Max Hamner and Ray Daugherty of Modius explain why hyperscalers should include environmental and power compliance as part of their colocation contracts.

As a data center operator, the success of your business model is dependent on your uptime and SLA compliance – which are highly affected by your assets’ reliability and longevity. There is a direct correlation between reliability and your ROI.

Power distribution and cooling gear (Generators, PDUs, UPSs, CRACs, CRAHs) for data centers are costly to purchase and maintain, but downtime and lost contracts are far more expensive

To maximize availably and SLA compliance, two of the top factors (above vendor quality) are the environmental conditions in which they operate, and the quality of power being provided.

ASHRAE, ITIC, and other organizations have long proven the direct relationship between power quality and operating

environment on hardware uptime and reliability. They have produced many standards to define the challenges and goals of environmental and power conditions. This awareness has led all major hyperscalers to include environmental and power compliance as part of their colocation contracts.

How Power Affects Your Infrastructure Gear

Power quality directly affects the life of your hardware. Not just significant power events, but ongoing variations in the power quality that accumulate as degraded reliability in your infrastructure and IT gear. The many small sags, surges, and spikes slowly erode the quality of the components in an electrical device. This effect has been well researched and documented by multiple organizations, who have produced reports and created standards that define the optimum operating conditions to maximize equipment life. The most prevalent industry standard is provided by ITIC (Information Technology Industry Council). They provide and maintain the ITIC curve (shown below), which allows for measurement of damage/risk to a device for individual power events.

How Environmental Conditions Affect Your Infrastructure Gear

Humidity, air pressure, air quality and temperature combine to affect condensation and oxidation on surfaces and on airflow, which can affect cooling efficiency.

High humidity can lead to condensation and moisture buildup inside IT equipment, causing electrical short circuits and corrosion. Low humidity can lead to static electricity buildup, causing electrostatic discharge events that can damage electronic components.

Dust and other particulate matter can accumulate inside IT equipment, blocking airflow and causing overheating. It can also interfere with moving parts, such as fans and drives, reducing their efficiency and potentially leading to equipment failure.

Excessive heat can cause IT equipment to overheat, leading to reduced performance and potentially causing hardware failures. Extremely cold conditions can also affect IT equipment, causing it to become less efficient and possibly leading to condensation and moisture-related problems. Most IT equipment has recommended operating temperature ranges, and it’s important to ensure that the operating environment stays within these ranges.

The above factors can be controlled by climate control systems (HVAC) to maintain proper temperature and humidity levels, and by dust filters and clean rooms to reduce particulate contamination.

ASHRAE (The American Society of Heating, Refrigerating and Air-Conditioning Engineers) has provided research results from studies into the impact of environmental conditions on hardware. One example is provided here:

From this research they have provided a recommended standard (Equipment Thermal Guidelines for Data Processing Environments) specifically for data center operations. This specifies optimal environmental conditions to maximize equipment reliability and life.

The ASHRAE standard for thermal guidelines in the data center can be found in the 2016 ASHRAE white paper TC9.9 Data Center Power Equipment Thermal Guidelines and Best Practices. They were last updated in 2021. Here is a link to a PDF for a reference card on the 2021 Equipment Thermal Guidelines for Data Processing Environments.

Visibility of Your Power Quality and Impact

How do you know if you are being affected by these factors – is your critical infrastructure gear slowly degrading in reliability, or is an increase in downtime looming ahead?

A DCIM (Data Center Infrastructure Management) solution can help monitor power conditions on your gear and provide easier access to power events and waveforms logged by PQMs (Power Quality Meters). General monitoring of individual device report points (e.g., voltage, current, etc.) allows you to track and trend different aspects of your power quality and detect when thresholds have been exceeded:

Power Quality Meter monitoring: high resolution power monitoring with event capture.

Event capture: seeing very short duration events that power gear like UPS, PDUs, RPPs generally cannot detect or report. PQM (Power Quality Meters) has high resolution power monitoring and when an event occurs, saves a snapshot around the moment of the event so it can be accessed after the event – often down to sub millisecond level – the 63rd harmonic for advanced PQMs.

Visibility of Your Environmental Conditions and Impact

Environmental monitoring in data centers is crucial to ensure the efficient and reliable operation of IT equipment while protecting against environmental factors that can lead to downtime or hardware failures. Usually this is accomplished with sensors and detectors:

  • Temperature/humidity sensors ensure your HVAC systems are allowing you to operate within the ASHRAE guidelines or within Service Level Agreement (SLA) guidelines imposed by your
  • If you have sufficient sensor density, you can create heat maps of your data center to correct hot and cold spots and improve cooling Monitoring the supply and return temperatures of your cooling equipment helps ensure you are not overcooling your data center and wasting energy.
  • Airflow sensors monitor the air movement in and around They ensure that hot and cold aisles are well-structured.
  • Water leak detectors alert you to leaks or flooding, helping to prevent damage to hardware and electrical systems.

How to Minimize the Risk

These factors all contribute to a risk of reduced reliability and downtown for your critical infrastructure gear. The risk and impact have been proven through thorough research with documented data provided by independent agencies. Tracking these risk factors should be part of your ongoing monitoring and data collection of your hardware.

A powerful tool for managing these risks is a full-feature DCIM solution capable of tracking these conditions, providing real-time alarms, as well as analysis of historical data. The visibility of these factors can be challenging, but a quality DCIM solution provides this visibility in addition to meeting your basic monitoring and alerting needs.

A DCIM solution that can track these aspects of your hardware also provides core data, and advanced views like thermal distribution maps and psychrometric charts, which will allow you to maximize the efficiency of your power distribution and cooling infrastructure.

About the Author

Max Hamner

Max Hamner is Research and Development Engineer at Modius. Contact Modius to learn more about its DCIM solutions. Modius OpenData provides integrated tools including machine learning capability to manage the assets and performance of colocation facilities, enterprise data centers, and critical infrastructure. OpenData is a ready-to-deploy DCIM featuring an enterprise-class architecture that scales incredibly well. In addition, OpenData gives you real-time, normalized, actionable data accessible through a single sign-on and a single pane of glass.

Delivering DCIM solutions since 2007, Modius is passionate about helping clients run more profitable data centers and providing operators with the best possible view into a managed facility’s data. Modius is based in San Francisco, California, and is proudly a Veteran Owned Small Business (VOSB Certified).

About the Author

Ray Daugherty

Ray Daugherty is Senior Services Consultant at Modius. Contact Modius to learn more about its DCIM solutions and other critical infrastructure management software that optimize the infrastructure and operations of critical facilities, including data centers, telecom, smart buildings and other IoT environments.

Sponsored Recommendations

NECA Manual of Labor Rates Chart

See how Champion Fiberglass compares to PVC, GRC and PVC-coated steel in installation.

Electrical Conduit Cost Savings: A Must-Have Guide for Engineers & Contractors

To help identify cost savings that don’t cut corners on quality, Champion Fiberglass developed a free resource for engineers and contractors.

Conduit Sweeps and Elbows for Data Centers and Utilities

Data Centers and Utilities projects require a large number of electrical conduit sweeps and elbows. Learn why Champion Fiberglass is the best supplier for these projects.

Prefabricated Conduit Duct Banks Enable Smooth and Safe Electrical Installation for a Data Center

Prefabricated conduit duct banks encourage a smooth, safe electrical conduit installation for a data center.

yucelyilmaz/Shutterstock.com
Source: yucelyilmaz/Shutterstock.com

The Pathway to Liquid Cooling

Jason Matteson, Global Director of Product Management at nVent Data Solutions, provides insights on successfully making the transition to liquid coooing.

White Papers

Dcf Venyu Wp Cover 2021 07 12 7 15 51 233x300

The Business Case for Data Center Geo Diversity

July 13, 2022
Geo diversity, or shortening the distance that your data travels, will allow you to reaching your user bases more effectively, and create better customer experiences. This white...