Existential DCIM: How New Trends in Data Center Monitoring Inform Facility Operations and Planning

Feb. 5, 2024
New software and hardware advancements have cemented and expanded data center infrastructure management (DCIM) technology as a universally avowed necessity of modern data centers' sustainable operations.

In hallway conversations at data center industry events last fall, the necessity for data center infrastructure management (DCIM) software as a universally avowed fact of life in modern facilities came attached to a detectible perception that at least the hype surrounding DCIM technology may be growing somewhat long in the tooth.

To counter such assumptions, DCF recently reported on how new software such as Eaton's BrightLayer Data Centers suite of digital tools for facilities both embodies and transcends DCIM in aligning performance management with sustainability metrics and monitoring, while reckoning with energy data from an array of sources including utilities, generators, renewables, and emerging technologies such as hydrogen and microgrids.

But it always helps to have a universally respected industry analyst's confirmation of a trend. So it piqued our interest when, during Uptime Insitute's January webinar, Five Data Center Predictions for 2024, Max Smolaks, research analyst for the firm, assessed that the current industry is seeing "something of a data center software renaissance." 

A DCIM Renaisssance

Smolaks noted how, for more than a decade, DCIM software has been positioned as the main platform for organizing and presenting data center facilities data. "Which is very useful," he said. "But its role has been mostly limited to real-time monitoring, asset management and capacity planning."

In contrast, Smolaks asserted how today, the industry is seeing the emergence of a new school of thought on data center management. 

He explained, "This comes from data scientists and statisticians. For these people, a data center is not a collection of physical components, it's a complex system of data patches. Every device has a known optimal state, and you can balance the overall system so that as many devices as possible are working in this optimal state."

Smolaks emphasized that any deviation in data indicates a failure of sensor equipment, affecting overall data platform performance. He pointed out that data halls are full of power and cooling equipment and temperature and humidity sensors that are either wired or wireless. "Of course, IT equipment is absolutely full of sensors," he added. "There's like a dozen on every motherboard."

Smolaks also pointed out that in recent years, major data center equipment suppliers have also started adding network connectivity to their equipment. "They benefit from this data because it can be used in quality control, in condition-based or predictive maintenance, and in new product design," he said. 

He emphasized that data center operators can also tap into the same data, because it's exchanged using industry-standard protocols."Once they can harvest this data, they can put it to work as a source of valuable operational insight, as long as they have the right tool, and these tools can often involve machine learning," he added.

BMS and DCIM

Uptime Institute's Smolaks pointed out that most data centers nowadays use building management systems (BMS) or DCIM software as primary interfaces for facility operations.

"But you will also need new software tools that can generate value from the facilities equipment," he explained. He noted that these tools are not intended to replace BMS or DCIM, but can be used to augment the functionality of existing data center management tools.

The process of deploying such tools usually involves a software stack with core elements including the data platform and DCIM. "And then you can lay on your machine learning applications on top," said Uptime Insitute's Smolaks.

He continued:

"What are the benefits? Why would you consider doing this? When you combine equipment sensor data with an emerging category of data center optimization tools, we call it data-centric management software [with] the same abbreviation as DCIM. You can achieve improved facility efficiency because some of your equipment will be configurated automatically. 

These systems can also help you identify faulty or inefficient equipment. DCIM can do it and BMS can do it, but these tools can do it with greater granularity. They can find when, for example, you have five UPS and one of them underperforms, they will identify that UPS. This is a real use case. You can also achieve better maintenance. Predictive maintenance, condition-based maintenance, these things are fairly new for data centers. 

Not a lot of people are doing predictive maintenance yet, but we definitely will see more of this happening in the future because it helps you more efficiently use your resources and it should be cheaper, at least in theory.

Beyond discovery of trended capacity, these new tools also give you an opportunity for a more thorough analysis of data center metrics, not just high level indicators like how much power cooling space you have left, but combinatorial problems. If you move things and if you combine things, can you actually free up more space? It's an interesting problem that is very complex to solve if you're just running it through a spreadsheet, but fairly easy if you're using a machine learning model.

Then there's elimination of human error. Obviously, if there's a higher degree of automation or the system produces automatically generated recommendations for human employees, there's less opportunity for human error.

Finally, we're all concerned about the shortage of employees in the data center sector. Some of these systems can actually learn by analyzing the skills of your most experienced staff and codifying them in software, so when your experienced employees leave, you still got what the knowledge they acquired that is on the job."

Smolaks added that chief among the new challenges that accompany use of such new tools is the issue of data quality and how facilities collect data.

Getting the Most Reliability from Data Center Assets with Remote Monitoring

Data Center Frontier recently engaged with Max Hamner, Research and Development Engineer, and Ray Daugherty, Senior Services Consultant, with DCIM technology provider Modius, developer of the Open Data platform, a ready-to-deploy DCIM featuring "an enterprise-class architecture optimized for scalabiity which provides real-time, normalized, actionable data accessible through a single sign-on and a single pane of glass," as stated by Modius.

Hamner and Daugherty contend that the success of any data center operator's business model is dependent on uptime and SLA compliance – two factors which are highly affected by facility assets’ reliability and longevity. They contend that for data center operators, there is a direct correlation between reliability and ROI.

The Modius subject matter experts (SME) emphasized that power distribution and cooling gear for data centers (including generators, PDUs, UPSs, CRACs and CRAHs) is costly to purchase and maintain, but that downtime and lost contracts are far more expensive. Meanwhile, to maximize availably and SLA compliance, two major factors -- above vendor quality-- are the environmental conditions in which such gear operates, and the quality of power being provided.

ASHRAE, ITIC, and other organizations have long discerned and proven the direct relationship between power quality and operating environment regarding data center hardware uptime and reliability. These organizations have produced multiple standards to define the challenges and goals of environmental and power conditions.

Such awareness has led all major hyperscalers to include environmental and power compliance as part of their colocation contracts.

How Power and Environmental Conditions Affect Data Center Infrastructure Gear 

Hamner and Daugherty emphasized that power quality directly affects the life of data center infrastructure hardware. Not only in terms of significant power events, but also regarding ongoing variations in the power quality that accumulate as degraded reliability in data center infrastructure and IT gear. 

Many small sags, surges, and spikes in power slowly erode the quality of the components in anys electrical device. This effect has been well codified by multiple organizations, who have produced reports and created standards that define the optimum operating conditions to maximize equipment life.

The most prevalent industry standard is provided by the ITIC [Information Technology Industry Council], creator and maintainer of the "ITIC curve" metric, which allows for measurement of damage and/or risk to a device for individual power events.

Meanwhile, humidity, air pressure, air quality and temperature combine to affect condensation and oxidation on data center surfaces and on airflow, which can affect cooling efficiency. High humidity can lead to condensation and moisture buildup inside IT equipment, causing electrical short circuits and corrosion. 

Low humidity can lead to static electricity buildup, causing electrostatic discharge events that can damage electronic components. Dust and other particulate matter can accumulate inside IT equipment, blocking airflow and causing overheating. It can also interfere with moving parts, such as fans and drives, reducing their efficiency and potentially leading to equipment failure.

Then too, excessive heat can cause IT equipment to overheat, leading to reduced performance and potentially causing hardware failures, while extremely cold conditions can also affect IT equipment, causing it to become less efficient and possibly leading to condensation and moisture-related problems. Most IT equipment has recommended operating temperature ranges, and it’s important to ensure that the operating environment stays within these ranges.

The above factors can be controlled by climate control systems (HVAC) to maintain proper temperature and humidity levels, and by dust filters and clean rooms to reduce particulate contamination. 

ASHRAE [The American Society of Heating, Refrigerating and Air-Conditioning Engineers] has provided research results from studies into the impact of environmental conditions on hardware.

From this research, ASHRAE has provided a recommended standard (Equipment Thermal Guidelines for Data Processing Environments) specifically for data center operations, which specifies optimal environmental conditions to maximize equipment reliability and life. The ASHRAE standard for thermal guidelines in the data center can be found in the 2016 ASHRAE white paper, "TC9.9 Data Center Power Equipment Thermal Guidelines and Best Practices," last updated in 2021. 

How Visibility of Power Quality and Environmental Conditions Reduces Risk for Data Centers

The Modius experts asked, how do data center operators know if are being affected by these environmental factors, if critical infrastructure gear is slowly degrading in reliability, and if an increase in downtime is looming ahead?

DCIM platforms help monitor power conditions around facilites gear, and provide easier access to power events and waveforms logged by PQMs (Power Quality Meters). General monitoring of individual device report points (e.g., voltage, current, etc.) allows operators to track and trend different aspects of power quality and detect when thresholds have been exceeded. 

Hamner and Daugherty noted that PQM monitoring's high resolution event capture capabilities see very short-duration events that power gear such as UPS, PDUs, RPPs generally cannot detect or report. When an event occurs, the PQM saves a snapshot around the moment of the event so it can be accessed after the event – often down to sub-millisecond level, aka the 63rd harmonic for advanced PQMs.

Environmental monitoring in data centers is crucial to ensure the efficient and reliable operation of IT equipment, while protecting against environmental factors that can lead to downtime or hardware failures. Usually this is accomplished with sensors and detectors.

Temperature/humidity sensors ensure HVAC systems are operable within the ASHRAE guidelines or within SLA guidelines imposed by customers. Sufficient sensor density enables creation of data center heat maps to correct hot and cold spots and improve cooling efficiency. 

Monitoring the supply and return temperatures of cooling equipment helps ensure the operator is not overcooling the data center and wasting energy. Airflow sensors monitor the air movement in and around racks. They ensure that hot and cold aisles are well-structured. Water leak detectors alert facilities to leaks or flooding, helping to prevent damage to hardware and electrical systems.

Hamner and Daugherty explained that these factors all contribute to the risk of reduced reliability and downtime for critical infrastructure gear.

To meet the challenge, the Modius SMEs noted that full-featured DCIM solution is capable of tracking these conditions and managing risk factors through ongoing hardware monitoring and data collection, providing real-time alarms, as well as analysis of historical data. 

The visibility of these factors can be challenging, but a quality DCIM platform provides such visibility, in addition to meeting basic monitoring and alerting needs. An optimal DCIM solution can track these aspects of hardware and also provides core data, as well as advanced views such as thermal distribution maps and psychrometric charts, which help operators to maximize the efficiency of power distribution and cooling infrastructure. 

Hamner and Daugherty noted that the Modius OpenData DCIM provides integrated tools, including machine learning capability to manage the assets and performance of colocation facilities, enterprise data centers, and critical infrastructure.

The OpenData DCIM consists of three core elements: 

  • The Modius OpenData Remote Monitoring Module tracks and trends raw data values across power parameters (load/voltage/current) and captures events from PQMs and other equipment.
  • The OpenData Environment Module provides visibility of cooling loads and distribution of cooling loads, plus: the ability to assess 'what-if scenarios'; heat maps via thermal images; and ASHRAE compliance/psychrometric charts, with the ability to measure/show compliance.
  • The OpenData Machine Learning Module detects anomalies in device operations which can potentially identify equipment that has degraded. This allows operators to fix or replace the equipment before it fails and causes an outage.

 

More Advancements in Data Center Power Monitoring 

 

Legrand Expands Starline CPM Platform

As global energy consumption continues to rise -- the U.S. Energy Information Administration projects global electric power generating capacity to increase anywhere between 55% and 108% by 2050 -- the need for greater efficiency and power density within mission-critical spaces is surging.

Correspondingly, energy monitoring systems are becoming more crucial than ever in providing site managers and decision-makers with the real-time data they need to optimize their electrical infrastructure. As demand for energy monitoring systems soars, Legrand's Starline brand, which defines a track busway system for enabling customizable data center power distribution, has now launched its M70 CPM as the third generation of its Critical Power Monitor products.

The Starline CPM has become a preferred method of data center busway power monitoring for major data center providers globally. Initially introduced in 2014, Starline says it has worked since 2020 to develop this latest generation of its flagship energy meter to help customers achieve greater energy efficiency in mission-critical and energy-hungry sites.

The company says the M70 CPM is now positioned to empower site managers across numerous verticals, from data centers to hospitals, to monitor energy consumption easily, reliably and in real-time.

"With the M70 CPM, we are demonstrating not only our commitment to listening to customers' needs, but also our desire to innovate through our products, with the aim of improving lives," says John Berenbrok, Product Director at Starline.

Berenbrok added, "The M70 builds on our legacy of being at the forefront of technology, especially as it is also the only meter that will work optimally with track busway systems thanks to its compact size. It also outperforms most other power monitors on the market today, allowing up to six current transformer inputs versus the one or two that others permit. The M70 CPM is robust, innovative and flexible, designed to provide customers with the information they need to increase energy efficiency and reduce costs through preventative maintenance."

Whether monitoring is required at the power feed, branch-circuit level or stand-alone enclosure, the M70 offers greater granularity, says Starline. The meter is equipped with a comprehensive breadth of features and functionality including unique temperature monitoring capabilities, audible alarms and a pivoting display for easy visual access from the floor.

The M70 also includes extensive communications protocol support. Other features include redesigned LED and LCD displays for real-time visual data reference, and the ability to configure it with DCIM and BMS packages for quick deployment.

Starline expects the M70 CPM to transform power monitoring for industries of all kinds, since in addition to the platform's added features and functionality, significant improvements have been made to aid in its speed, robustness and ease of implementation.

The platform supports single-, two- and three-phase power outlet monitoring wth circuit breaker position sensing and end feed lug temperature monitoring. Daisy-chain Ethernet and Modbus connectivity is  standard on all AC or DC versions, with optional Wi-Fi capability included.

nVent, Hyperview Collaborate for Cloud-Based PDU Monitoring and Control

Last December's announced collaboration between global electrical connection and protection specialist nVent Electric and Hyperview, a provider of a cloud-based DCIM platform, offers data center operators expanded control and monitoring capabilities for power distribution units (PDUs).

The new Hyperview integration will enable greater remote monitoring of nVent PDUs through the cloud, including real-time updates and notifications for data center system functions supporting customers’ mission-critical operations. The integration also provides for customers' seamless monitoring of operational technology (OT) equipment, and the ability securely control and upgrade PDUs, streamlining operations while improving security and promoting sustainability.

"At nVent, we are building a more sustainable and electrified world,” said Shubhayu Chakraborty, president of CIS Global, part of nVent. “Because of this collaboration, our customers will be able to better monitor and control mission critical systems, making them more efficient and safer. The combination of Hyperview's innovative DCIM platform and our reliable PDUs will help operators manage their infrastructure more efficiently than ever before."

The Hyperview integration also offers a secure architecture to access an end-to-end control pipeline for outlet control and firmware delivery. Via this advancement, users will receive proactive notifications for firmware availability, confirming authenticity, model compatibility and delivery. Additionally, the platform's upgrade and rollback process has been simplified with automated change management and documentation.

"Security is paramount in infrastructure management, especially when dealing with OT equipment located in remote areas," said Jad Jebara, president and CEO of Hyperview. "By empowering operations teams to remotely monitor, control outlets and perform firmware upgrades, they can fortify their infrastructure, reduce travel and lighten the workload on their staff."

 

In Conclusion: AI and DCPI 

AI power and cooling requirements will lift data center physical infrastructure (DCPI) market to over $46 billion by 2028, according to the latest data center industry forecast from Dell’Oro Group. 

“The AI opportunity for the DCPI market is getting closer, and bigger,” said Lucas Beran, Research Director at Dell’Oro Group. “DCPI vendors are increasing manufacturing capacity to support the expected scale of orders for purpose-built AI facilities."

The analyst said the proliferation of accelerated computing to support AI and ML workloads has emerged as a major DCPI market driver, which is significantly increasing data center power and thermal management requirements. 

Of particular note, Dell'Oro's Beran added, "The emergence of accelerated computing is propelling liquid cooling to become a mainstream technology, with its presence likely in most greenfield facilities. In response, we have raised our liquid cooling forecast to now surpass $3 billion in 2028.” 

Meanwhile, the market sub-segments for data center cabinet power distribution, and busway, rack power distribution and thermal management products and systems including DCIM are forecast to grow at the fastest rates during the forecast period.

 

Keep pace with the fast-moving world of data centers and cloud computing by connecting with Data Center Frontier on LinkedIn, following us on X/Twitter and Facebook, and signing up for our weekly newsletters using the form below.

About the Author

Matt Vincent

A B2B technology journalist and editor with more than two decades of experience, Matt Vincent is Editor in Chief of Data Center Frontier.

Sponsored Recommendations

Get Utility Project Solutions

Lightweight, durable fiberglass conduit provides engineering benefits, performance and drives savings for successful utility project outcomes.

Guide to Environmental Sustainability Metrics for Data Centers

Unlock the power of Environmental, Social, and Governance (ESG) reporting in the data center industry with our comprehensive guide, proposing 28 key metrics across five categories...

The AI Disruption: Challenges and Guidance for Data Center Design

From large training clusters to small edge inference servers, AI is becoming a larger percentage of data center workloads. Learn more.

A better approach to boost data center capacity – Supply capacity agreements

Explore a transformative approach to data center capacity planning with insights on supply capacity agreements, addressing the impact of COVID-19, the AI race, and the evolving...

Photon photo/Shutterstock.com

In the Age of Data Centers, Our Connected Future Still Needs Edge Computing

David Wood, Senior Product Manager – Edge Computing at nVent, explains why edge computing will play a pivotal role in shaping the future of technology.

White Papers

Dcf Siemon Sp Cover 2022 02 28 17 26 36 232x300

The Journey to 400/800G has Begun

March 3, 2022
Siemon explains the factors data centers should consider when determining which path to 400G they should take.