Coolan’s Analytics Offer Community-Powered View of Data Center TCO

Coolan is developing a community-based analytics solution that can extend the benefits of Big Data analytics beyond the hyperscale computing sector.

Rich Miller

Dec. 5, 2015

6 min read

What’s the real total cost of ownership of your servers? Data center startup Coolan has developed software to help companies get a fuller pitcture of server cost and performance. (Image: RIch Miller)

Your servers aren’t always happy. When they’re unhappy, Amir Michael thinks you should know it. The same goes for whether your servers are happier and more productive than the servers in your neighbors’ data center.

Michael, a veteran of the data center teams at Google and Facebook and a co-founder of the Open Compute project, founded Coolan to develop software that can extend the benefits of Big Data analytics beyond the hyperscale computing sector. Coolan is designed as a community-based analytics solution, in which the collective experience of data center operators can be harnessed to spot trends that can benefit the industry, including both end users and IT vendors.

Coolan is applying machine learning to discover problems that impact the total cost of ownership (TCO) of data center infrastructure. Like “unhappy server days,” for example. That’s the metric Coolan uses to track how often a server is undergoing maintenance, and how long it is out of service (mean time to repair, or MTTR).

“I got the sense that by not measuring this, we were missing an important piece of the picture,” said Michael, the CEO and co-founder of Coolan. “No one had a really good system or metric for these things.”

Tracking maintenance and failure rates helps understand the reliability (and thus the TCO) of servers from different vendors – as well as providing data on the cost to own your hardware (either on premises or in a colo facility) or run workloads in the cloud.

These analytics will grow in importance as Open Compute hardware gains broader adoption. Many enterprise companies are interested in the benefits of open hardware, but don’t have the luxury to build in-house tools for operations analysis, as do Google and Facebook.

Harnessing The Power of Community

With his new venture, Michael is seeking to replicate a key feature of the Open Compute Project, the open source hardware initiative that drew upon a collective approach to innovation in server design.

Amir Michael, CEO of Coolan.

Coolan is designed as a community-based analytics solution, in which the data of many users can be pooled and analyzed to spot trends that can benefit the entire industry – like comparing the reliability of servers from traditional vendors like Dell and HP versus Open Compute products.

“No one actually publishes that data,” said Michael. “The consumers of infrastructure need to be informed.”

That information leads to interesting questions and disruptive ideas. For example: Could detailed failure data for servers and components allow users to hold hardware vendors to uptime agreements, much like the service level agreements (SLAs) used to compensate users for data center downtime?

“It seemed like an opportunity to shake things up, and that’s what I like to do,” said Michael.

A History of Innovation

Michael was previously a member of the data center engineering teams at Google and Facebook as those companies began building their own server hardware and data centers. He helped Facebook develop custom hardware and power distribution designs for its own data centers, which formed the nucleus of the Open Compute Project (OCP) and the open hardware movement.

In 2013, Michael formed Coolan with his brother Yoni, a software engineer with experience at Silver Spring Networks and Practice Fusion, and former Facebook colleague and investor Jonathan Heiliger. The company is based in San Mateo, Calig., and recently raised seed funding.

Coolan’s analytics software is currently in beta, with pilot programs underway with several end users, representing thousands of servers and tens of thousands of components. The product features a data collector on each server, with its analytics tools offered in a software as a service (SaaS) model, with reports accessed either through an online portal or APIs. Alerts and trouble tickets can be generated automatically.

The software runs more than 40 different types of analyses, including downtime, unhappy server days, reboot rates and many more. Coolan correlates these with the data center operating environment (including server inlet temperature and humidity) to look for trends.

“We’re providing intelligence and insight into how hardware is running,” said Michael.

How Big Data Benefits the Bottom Line

These analytics are important because they can evaluate the economics of more aggressive environmental set points – for example, running inlet temperatures of 80 degrees, instead of 75 degrees – and allow further refinement. It can also spot trends among different vendor hardware under various conditions, like which servers have fewest failures in high heat or humidity.

These insights can guide procurement teams in selecting vendors, and help operations teams optimize the data center environment.

If companies elect to opt in to the data sharing features, Coolan will anonymize the data and establish a “community benchmark” for its key metrics, allowing customers to gauge how they perform versus others using similar equipment and data center environments.

“That’s really where customers begin to benefit from each other,” said Amir Michael, who noted that the community benchmarks can identify failure rates when equipment reaches a certain age and alert customers.

Why Elderly Servers Are Problematic

Coolan provided an example of its analytics capabilities with a recent blog post that examined how to calculate the best time to refresh server hardware, evaluating the cost and performance of existing units against newer servers with improved power and energy efficiency.

“Aging infrastructure costs more than you might think,” the blog post noted. “Old machines lingering in a data center exact a hidden cost – with each new generation of hardware, servers become more powerful and energy efficient. Over time, the total cost of ownership drops through reduced energy bills, a lower risk of downtime, and improved IT performance. Figuring out when to invest in new servers doesn’t have to be a guessing game, though.”

Coolan’s analysis of relative costs of and total cost of ownership (TCO) of server hardware replacement.

The answer? In many cases, a three-year refresh cycle will yield the best results. Coolan offers a downloadable spreadsheet to help companies test their own TCO assumptions.

Amir Michael says most commercial tools available today are developed for enterprise IT, rather than “scale-out” computing. Coolan can be used as a diagnostic and failure tracking tool, but also offers features of a CMDB or asset management software.

What has Coolan learned from its early feedback?

“The biggest surprise is that the state of operations is not as advanced as I thought,” said Amir Michael. “We’ve seen a range of solutions, from spreadsheets to SQL databases to full asset management tools. It’s eye-opening to see how much variation there is between those who are automating really well, and those who aren’t.

“Some customers have really poor utilizations,” he added. “Some are buying server hardware almost as though they are deploying instances.” [clickToTweet tweet=”Amir Michael: Some users are buying server hardware almost as though they are deploying instances.” quote=”Amir Michael: Some users are buying server hardware almost as though they are deploying instances.”]

Michael believes the sweet spot for Coolan will be customers whose server counts are growing faster than the staff needed to support them.

“Once you have 1,000 servers, you have events on a regular basis,” he said. “It becomes a problem when you have someone dealing with this on a regular basis. We want to allow these administrators to focus on their jobs, rather than tracking down dead servers.”

Or even servers that are having an unhappy day.

About the Author

Rich Miller

I write about the places where the Internet lives, telling the story of data centers and the people who build them. I founded Data Center Knowledge, the data center industry's leading news site. Now I'm exploring the future of cloud computing at Data Center Frontier.

Oracle’s Global AI Infrastructure Strategy Takes Shape with Bloom Energy and Digital Realty

DoD Taps 8 Nuclear SMR Vendors in Push to Deploy On-Site Microreactors: Data Center Energy Implications

Sponsored

NECA Manual of Labor Rates Chart

Sponsored

Electrical Conduit Cost Savings: A Must-Have Guide for Engineers & Contractors

Voices of the Industry

Sponsored

When AI Compute Meets Real-World Infrastructure: What Operators Need to Know

Schneider Electric's Vance Peterson and Gia Wiryawan explain why power distribution and thermal management—not compute—are the bottleneck for operators when supporting NVIDIA'...

Sponsored

Taking the Compromise Out of Buy vs. Build Calculus

Stream Data Centers' Chris Bair explains why hyperscalers need the timing flexibility of third-party capacity—and the optionality of internal capacity— to scale properly.

Coolan’s Analytics Offer Community-Powered View of Data Center TCO

Harnessing The Power of Community

A History of Innovation

How Big Data Benefits the Bottom Line

Why Elderly Servers Are Problematic

About the Author

Rich Miller

Related

Oracle’s Global AI Infrastructure Strategy Takes Shape with Bloom Energy and Digital Realty

DoD Taps 8 Nuclear SMR Vendors in Push to Deploy On-Site Microreactors: Data Center Energy Implications

NECA Manual of Labor Rates Chart

Electrical Conduit Cost Savings: A Must-Have Guide for Engineers & Contractors

Voices of the Industry

When AI Compute Meets Real-World Infrastructure: What Operators Need to Know

Taking the Compromise Out of Buy vs. Build Calculus

Trending

Nvidia’s $100 Billion OpenAI Bet Shrinks and Signals a New Phase in the AI Infrastructure Cycle

AI Infrastructure Scales Out and Up: Edge Expansion Meets the Gigawatt Campus Era

Transmission at the Breaking Point: Why the Grid Is Becoming the Defining Constraint for AI Data Centers

Sponsored Picks

Go with the Flow: Is it Time Yet for Liquid Cooling?

Case Study: Energy-Efficient Cooling and Cost Savings

The modular solution to the AI infrastructure challenge