As so many modern things have, the Open Compute Project (OCP) of course started in a Facebook data center. And since OCP's founding in 2011 as it grew out of an internal project at Facebook, the scope and mandate of the group has scaled over the years along with the cloud data centers its technology initiatives serve.
Today as a top-shelf hyperscale titan, Facebook’s progenitive colossus Meta is perched explicitly in the OCP ranks alongside co-titans Google and Microsoft, among other hyperscalers - and, in fact, every other data center industry stakeholder soon to be in attendance this October at the 2023 OCP Global Summit in San Jose, CA.
The proof is in the board appointments, if not the earnings calls, of which lately there’s been a few. We can parse the particulars of the earnings announcements, but overall, the news for Meta, Alphabet/Google and Microsoft for 1H 2023 was good, very good.
Last week Meta reported double-digit revenue growth for the first time since 4Q 2021, with profit up 16% to $7.8 billion and revenue up 11% to $32 billion. With its forecast for the third quarter better than analysts predicted, the company added that its 2024 expenses are expected to grow due to investments in data centers and AI.
Meanwhile, both Microsoft and Alphabet/Google also reported better-than-expected earnings last week, largely based on both companies' AI efforts. With apologies to 20th century American writer Raymond Carver comes now a question which DCF's editors ask each other at least twice a month on the site's podcast: What do we talk about when we talk about AI? For the record, according to reporting from Reuters, "Participants on Alphabet's analyst call on Tuesday [July 25] mentioned AI 62 times, up from 52 times three months ago. The same day, AI was mentioned 58 times on Microsoft's call, up from 35 times in its previous call."
Now onto the subject of the board appointments: In June, OCP announced its election of Microsoft's Zaid Kahn as its board chair, while also appointing Google’s Amber Huffman to the OCP board of directors.
Zaid is currently GM for Microsoft’s Silicon, Cloud Hardware, and Infrastructure Engineering team where he leads systems engineering and hardware development for Azure, including AI systems and infrastructure. He is part of the technical leadership team across Microsoft that sets AI hardware strategy for training and inference. Zaid's teams are also responsible for software and hardware engineering efforts developing specialized compute systems, FPGA network products and ASIC hardware accelerators.
Prior to Microsoft, Zaid was head of infrastructure at LinkedIn, where he was responsible for all aspects of architecture and engineering for data centers, networking, compute, storage, and hardware. He also led several software development teams focused on building and managing infrastructure as code, including zero touch provisioning, software-defined networking, network operating systems (SONiC, OpenSwitch), self-healing networks, backbone controllers, software defined storage, and distributed host-based firewalls. Under his leadership, the global network for LinkedIn was created including POPs, peering for edge services, IPv6 implementation, DWDM infrastructure, and data center network fabric, while his hardware and data center engineering teams were responsible for water cooling to the racks, optical fiber infrastructure, and open hardware development which was contributed to the Open Compute Project Foundation (OCP).
Zaid holds several patents in networking and is a sought-after keynote speaker at top-tier conferences and events. He currently serves on the EECS External Advisory Board (EAB) at UC Berkeley and is a board member of Internet Ecosystem Innovation Committee (IEIC), a global internet think tank promoting internet diversity. He holds a Bachelor of Science in Computer Science and Physics from the University of the South Pacific.
"I am honored to have been chosen as the next chairperson of the OCP Foundation," said Kahn. "This organization has been at the forefront of revolutionizing the data center industry through open collaboration. I am excited to work alongside the talented community of contributors, industry leaders, Foundation staff and stakeholders to accelerate the pace of innovation, drive sustainable practices, and create a more open and interconnected future."
Amber Huffman is a principal engineer at Google responsible for leading industry engagement in the company's data center ecosystem. Before joining Google, Amber spent 25 years at Intel, serving as an Intel Fellow and VP. She is the president of NVM Express, co-chair of the OCP Storage Project, chair of the RISC-V Software Ecosystem (RISE) Project, and sits on the board of directors for the Universal Chiplet Express Interconnect consortium. She has led numerous industry standards to successful adoption including NVM Express, Open NAND Flash Interface, and Serial ATA. She began her term on June 1.
“The Open Compute Project’s enduring mission to develop open solutions that meet the market and shape the future has a crucial role to play as AI transforms our world,” said Huffman. “I am honored to continue the responsible stewardship of the organization, driving even more impact for the data center industry through key initiatives such as AI, security, resilience and sustainability.”
Importantly, as OCP enters its second decade, the group says it has explicitly evolved its original mission of community-driven open hardware specifications for operations at scale to embrace a wider focus on innovation and collaboration. Specifically identified adjacent communities in a recent OCP blog include academics, research labs and "the start-up ecosystem."
In a runup to October's Global Summit, DCF recently conducted an interview with newly installed OCP board members Kahn and Huffman, alongside Dirk Van Slyke, vice president and chief marketing officer of the Open Compute Project Foundation.
FTS and FTI redouble focus on collaboration and innovation
Just as OCP has been directly involved in building the compute infrastructure to power the incredibly rapid expansion of the cloud, Van Slyke described how the group has been building the profile of its ongoing signature events, the Future Technologies Symposium and Future Technologies Initiative.
The OCP Future Technologies Symposium (FTS) showcases promising developments from the world’s most innovative research and academic institutions. Van Slyke noted that the FTS was initiated as an event three years ago, pre-COVID.
He continued, “It was intentionally, specifically to bring in academics, research, as well as the investor and the startup community into the OCP ecosystem, because we recognized that there's a lot of organizations and individuals working on technologies that are going to eventually have a place in the OCP Community five, maybe even 7-8-9-10 years out in the future, but it's still relevant for the conversation, and we want to make sure that people know about them from not only an interest perspective, but potentially a participation and even an investment perspective.”
The FTS is a contest held at both annual OCP Summits, a regional event whose location changes each year held each spring in Europe, in addition to the global, yearly October event. “We've been there for 10 years now and that's our largest event,” added Van Slyke. OCP is expecting nearly 4000 people this fall in San Jose, where it will award a $10,000 prize through the FTS.
The OCP Community cites collaboration in an open environment as the critical element that gives the group its strength to tackle the large technology and environmental challenges facing the data center industry in a safe and trusted environment, while moving to a definitively sustainable consumption pattern. As such, the OCP Foundation has chosen "Scaling Innovation Through Collaboration" as a slogan for the fall Global Summit.
The OCP Future Technologies Initiative (FTI) represents the group's newest project workstream, which focuses on emerging technologies such as optical interconnect architectures, computational storage and AI hardware/software co-design.
OCP board chair Kahn commented, "Thinking about where the journey started, with Meta / Facebook at that time coming up with the concept of making hardware open and creating an ability for suppliers to engage in innovation, in 10 years, the amount of contribution that has happened, the amount of interest in the community, has been really great to see. If you think about open source software, how mature it was [ten years ago], people didn't really think that that was possible."
Of the project's expanded collaborative ambitions, Kahn added, "Initially, it was just making sure that there weren’t any shackles of proprietary, one-size-fits-all gear, but I think over time very specific innovation started happening. It was not just about the supply side of things. It started to become about innovation of hardware in general and also hardware/software co-design, which started to take shape in the last few years, where you just can't think of one without the other."
"I think that the vision of the Open Compute Project has been that we always believe that collaboration is the best way to get there," summarized Microsoft's Kahn. "Our goal being the platform of a highly collaborative environment to enable CSPs, enterprises, vendors all to come together to become a center of innovation. Looking forward to the future as things are pivoting to AI, we've already started seeing a lot of interest in [questions such as], How do we innovate for the next generation of hardware that's completely different?"
OCP participant Axiado debuts AI-driven security processors
An example of such next-generation hardware was on display at this year's Computex 2023 in Taipei (May 30-June 2), where cybersecurity semiconductor specialist Axiado, a company developing a new class of processors called the trusted control/compute unit (TCU), demonstrated its AX3000/AX2000 TCUs, which leverage its AI-driven hardware security platform.
Embodying an AI-driven approach to platform security against ransomware, supply chain, side-channel and other cyberattacks, the single-chip TCU provides hardware-anchored capability for detecting attacks on servers in cloud data centers, 5G infrastructure, and network switches.
Axiado notes its TCU comes to market at a time when cybercrime and ransomware attacks are skyrocketing. According to Cybersecurity Ventures as cited by the company, cybercrime is forecast to cost the global economy around $10.5 trillion annually. Meanwhile, DataProt estimates suggest that in 2022, a ransomware attack took place successfully every 40 seconds, with an attempt nearly every 11 seconds.
Residing in the lowest layer of the hardware stack and integrating all security functions within a single SoC or module, Axiado says its TCU effectively acts as a "last line of defense" against cyberattacks, even when all other network functions have been compromised. The TCU detects and stops ongoing attacks and recovers the system from an attack by isolating it from the network.
Axiado contends its AX3000/AX2000 TCUs represent a new category of forensic-enabled cybersecurity processors, designed to enhance existing zero-trust models. The company said the TCUs' design combines silicon, AI and data collection, and software into a compact, power-efficient SoC with unique AI functionality explicitly designed for security. The single-chip platform enables real-time, proactive AI with pre-emptive threat detection and comprehensive protection abilities furnished by a dedicated co-processor.
"This is a major step forward in our vision to provide comprehensive, AI-driven platform security in a single-chip SoC," said Gopi Sirineni, president and CEO of Axiado. "Our TCU is a game-changer that delivers a lower cost of ownership than any other alternative in the market. We look forward to collaborating with ODMs/OEMs, cloud service providers, and the entire security ecosystem to help make the world's digital infrastructure safer and more secure."
Axiado contends its TCU platform has capabilities never available before. Housed in a 23 x 23 BGA SoC and drawing under 5W, the TCU features a distributed hardware security manager with anti-tamper and anti-counterfeit hardware, and a control/management plane SmartNIC network interface controller that includes platform and tenant virtualization. It also offers protection from ransomware and side-channel attacks such as differential power analysis, voltage glitching and clock manipulation, which are used to extract cryptographic keys.
"There are multiple market pressure points we are grappling with when it comes to cloud computing," observed Patrick Moorhead, founder, CEO, and chief analyst at Moor Insights & Strategy. "Finding ways to protect against the endemic ransomware trend, the move towards modular server systems as driven by OCP and the natural integration of functions in silicon SoCs sets us up for a new wave of innovation in silicon and systems. I believe Axiado is well positioned to shine in the new world that takes zero-trust security to the next level."
Axiado adds that the TCU relies extensively on AI-based real-time threat mitigation with forensic-enabled hardware fingerprints, as well as platform monitoring and optimization (clocks/voltages/temp) using AI and machine learning. Further, the SoC includes Root of Trust (RoT), a baseboard management controller (BMC), a trusted platform module (TPM), a hardware security module, SmartNIC, firewall, and AI and machine learning.
Axiado is currently sampling its AX3000/AX2000 TCUs to early-access partners in servers, wireless base stations, wired security appliances, centralized and distributed infrastructure, and smart edge gateways. The AX2000 TCU provides a cost-effective advanced platform security option, while the AX3000 adds runtime protection and AI-based automation.
Axiado also offers AX3000/AX2000 TCUs in a Smart-SCM security module that is compliant with the Open Compute Project (OCP) data center-ready secure control module (DC-SCM) standard.
Danny Hsu, president of Tyan Computer Corporation, noted, "We have collaborated with Axiado on their DC-SCM project for cloud service providers -- DC-SCM 2.0 server is a product that very few manufacturers in the OCP community can develop and bring to the market. Tyan is excited to partner with Axiado for its TCU product design; Axiado's Smart-SCM provides our DC-SCM 2.0 servers with the cutting-edge security that puts us at the forefront of next-generation multi-node servers."
George Tchaparian, CEO of OCP, concluded, "We are very pleased to have Axiado actively participate in the OCP Community by taking the OCP-approved Datacenter Secure Control Module Specification (DC-SCM) and buildomg a product compliant with this specification. Axiado's innovative security platform is a perfect example of how OCP's open specifications make hyperscale DC operator-led innovation available to all. Similarly, adopters from all corners of the market can now easily deploy OCP's DC-SCM standard."
'A very complex industry and ecosystem'
In the spirit of continuing to establish new collaborations and communities driving innovation, the OCP's blog noted that this year's Global Summit includes special focus tracks covering optical networking, quantum computing and communications, and a track that highlights significant deployments of OCP-recognized equipment.
"We're experiencing a very complex industry and ecosystem," explained Google's Huffman. "It’s difficult for suppliers to accommodate every customers’ unique request even when they're large customers."
She continued:
"I’ll give you a very concrete example: NVM Express in storage is what everyone uses as interface for everyone's SSD [solid state drive]. But if you look at the NVM Express standard, it has a gazillion optional features. So Microsoft and Meta came together in 2020 to define the OCP data center NVMe SSD specification.
What that does is very clear, in defining what out of the NVMe specification is required in an SSD for my data center use model. Then in 2021, Dell and HPE joined with Microsoft and Meta, and they published the 2.0 version of that specification that comprised those four customers saying, this is what we need, the 90%, and they still have some differences, but it really provided clarity.
Then this year, Google joined those four companies and we're working on the 2.5 specification. It'll come out later this year, and just provides clarity, so that there's not confusion by the supplier base in terms of, what should I build and how do I build something that will be broadly applicable?
So that's the value of collaboration as it's becoming ever more complex storage. It's just one example, but in every area, we have challenges of, how do we really come together as community and create clarity, and really provide that better time-to-market, that better value for everyone?"
OCP data center NVMe SSD developments
Last month, SMART Modular Technologies, Inc. (Taipei), a division of SGH (NASDAQ: SGH), announced that the OCP has accepted SMART’s DC4800 data center SSDs as an OCP Inspired™ product to be featured on the OCP website in its Marketplace section. Only products that comply with 100% of OCP’s stringent specifications are selected for OCP Inspired honors after a rigorous process that demonstrates the product’s efficiency, openness, impact and scale.
SMART says its DC4800 represents a new class of high performance, power efficient data center SSDs. This family of PCIe Gen 4 data center class drives was designed for compliance with the OCP NVMe data center storage standards, and is available in capacities up to 7.68TB in U.2 and E1.S form factors.
By designing to OCP’s standards, SMART contends that an expanding base of enterprise class data center SSD users are assured that the OCP class DC4800 exceeds basic NVMe standards by including incremental management, LED and other form-factor related requirements that further help drive a greater level of standardization for data center servers and other equipment utilizing OCP class SSDs.
The DC4800 devices are manufactured with a specialized hardware-accelerated SSD controller that draws less power without compromising storage input/output (I/O) performance. This unique architecture results in near zero thermal induced throttling, enabling these SSDs to perform better under continued duress, even when they are pushed to their performance limit. This further translates to significant power consumption improvement per server, as well as consistent latency performance of up to 7-nines or 99.99999% of the time, according to the company.
“OCP-compliant SSDs are becoming the new standard for both data center and classic enterprise storage applications, which is a testament to how essential the standard has become,” said Andy Mills, senior director of advanced product development at SMART. “The Open Compute Project represents a truly cooperative venture in leading innovation within the technology industry and we are proud that our DC4800 product has now joined the ranks of OCP Inspired products.”
OCP's Fifth Tenet: Sustainability
The OCP's blog notes how, within the industry, many look to the OCP Community to take the lead on pioneering a sustainable computational infrastructure. With sustainability now as the OCP's newly codified fifth tenet, the organization looks forward to first results being revealed at the OCP Global Summit.
Commenting on sustainability as OCP's new fifth tenet, Google's Huffman said, "One of the challenges is transparency, and really, everybody measuring things the same way. How do we create the metrics and the transparency and the standards so that each of us can report our environmental product declarations (EPDs) that really understand what are we doing and move the needle in the right way for the environment?"
Further explaining OCP's burgeoning sustainability concerns, Huffman added:
"Then in data center facilities, there's a sustainability sub-project that talks about, how are we commonly defining the metrics? How are we doing things? Not only cooling, but heat re-use and other aspects. OCP is partnering with other leaders in this space, like iMasons Climate Accord and other bodies, asking how do we collaborate and really move the entire ecosystem forward? I think it's a great opportunity for how we all come together and just up-level, to do more for the globe."
Establishing 'a true open chiplet economy'
While all the industry disruptions being worked on by the OCP Community are too numerous recount here (including significant work on data center cooling), OCP's blog concludes that "one which must not go unnoticed" is the standardization provided by the OCP ODSA Project, which the group says "provides leadership in establishing a true open chiplet economy as silicon supply chains reconfigure to meet the needs of AI and ML workloads."
Google's Huffman explained, "In the chiplet area, Omdia has done some leadership work on both the open API, and then the Bunch of Wire specifications, showing how to bring about standardization. As we see an increasing focus on chiplets, that has really spurred a lot of innovation. You're also seeing the Universal Triplet Interconnect Express (UCIe) coming along."
Huffman continued:
"There's a whole industry of collaboration around chiplets and OCP has been at the center of spurring this [question] in the industry, of how do we go from the SoC level to the chiplet level, and where do we get to mix-and-match chiplets. That's still a journey -- how do you truly get to have interoperability, instead of having to have a deep collaboration within a company or between companies on chiplets, and OCP has been at the center of really driving people to pioneer this space. There's more to come, and I think it really will help with supply chain resilience as we move forward.”
As an example of such advances, in May, Eliyan Corporation (Santa Clara), credited for the invention of the semiconductor industry's highest-performance and most efficient chiplet interconnect, announced silicon availability for its breakthrough NuLink PHY technology. First silicon, achieved in one year from company's initial funding, validates Eliyan's UCIe-compatible approach to enabling high performance and highly scalable multi-die architectures for compute-intensive applications.
Eliyan's chiplet interconnect technology has been the foundation of the aforementioned Bunch of Wires (BoW) standard, which has been adopted by the OCP, and is fundamentally compatible with the UCIe standardization efforts. Eliyan is currently working with standards bodies to create an efficient universal die-to-die interconnect optimized for memory traffic to help accelerate the adoption of memory chiplets.
Multi-die architectures are an increasingly necessary approach to handling the needs of compute-intensive applications in data centers, cloud computing, and especially generative AI that require large amounts of memory and fast inter-chip communications. Additionally, Eliyan notes that a more efficient approach to implementing chiplet products is needed in industries such as automotive and gaming devices, which require much more reliable and cost-effective solutions than what advanced packaging options can offer.
"First silicon with the results and time frame we have achieved is a significant milestone and differentiator in the successful commercialization of our technology. It positions us as the front runner in enabling the most efficient chiplet interconnect in the semiconductor industry," said Eliyan's founding CEO Ramin Farjadrad. "With a proven silicon implementation, chip developers will now be able to realize the full benefits of the multi-die architectures without constraints imposed by advanced packaging such as size limitations of silicon interposers. It also enables the practical mix and match of chiplets in different processes and foundries."
As stated in a press release, implemented in a standard 5nm process from TSMC, the new chip operates at 40Gbps/bump, delivering over 2.2Tbps/mm of beachfront bandwidth at 130um pitch on standard organic packaging, and meets the company's aggressive power and area targets. The highly area-efficient NuLink PHY is bump limited and can deliver up to 3Tbps/mm once implemented on available standard packaging technologies at finer bump pitches, leveraging the device's innovative interference cancellation techniques.
"The economics of adopting a chiplet approach for IC design are tightly linked with the cost and maturity of the interconnect and packaging solution, as we demonstrated in our analysis," asserted John Lorenz, senior analyst, Computing and Software Solutions at Yole Intelligence. "Eliyan's chiplet interconnect technology will make multi-die approaches more attractive to chip suppliers whose designs must optimize on power and bandwidth vectors. This is especially the case for those in accelerated server computing applications, a market mainly served by data center GPU hardware, and which we see sustaining a 22% unit growth CAGR through 2028."
Eliyan said the successful silicon implementation demonstrates that technology in standard organic packaging achieves similar bandwidth, power efficiency, and latency as die-to-die implementations using advanced packaging technologies, but without the drawbacks of those complex and expensive solutions.
The company added that its ability to implement chiplet-based systems in standard organic packages enables the creation of larger system-in-package (SiP) solutions, and thus higher performance per power, at considerably lower cost and higher yield. These factors stand to provide major gains in sustainability as well.