The AI Arms Race: Startups Offer New Technology, Target Niche Markets

With every major sea change in the technology world new players rise to the top. AI is still dominated by Nvidia, with some analysts putting Nvidia’s AI dominance at as much as 90% of the hardware market. But as we showed you last week, Intel and AMD are doing their best to gain ground in the AI hardware space.

There are a number of lesser known players in the AI hardware space who have made announcements in the last few years, all looking to make their mark, with either specific solutions, novel technologies, or just what they think is a better and more effective way to address AI technology issues. The attrition rate for these startups has been significant, but a few of these startups have survived, and even flourished.

Whether looking to provide new, general purpose AI solutions, or to fill an underserved niche, new and lesser-known companies are looking to make their mark in the AI world. The companies we look at here are introducing new types of hardware, along with necessary software and SDKs, to move AI technology forward. Some with a focus on specific markets, such as edge AI and the AI Internet of Things, others with a different approach to supporting faster and more efficient training and development of generative AI.

Let’s look at a few of the contenders.

Cerebras Systems

Cerebras has developed and deployed its Wafer-Scale Engine, now in its second generation. The WSE-2 is designed to give cluster-scale performance on the largest single chip in production. The technology puts 850,000 cores, 40 GB SRAM available at 20 PB/s bandwidth, and an on-wafer interconnect that can provide a 220 Pb/second connection between the cores. The product is designed specifically for AI development and operations, with the focus on sparse tensor cores that are an effective shortcut in the way that deep neural nets process information.

Cerebras has offered access to its Andromeda AI supercomputer since late 2022. This AI specific supercomputer is built around 16 of the WSE-2 wafers, with more than 13 million AI cores in total and just over 18,000 third-generation AMD Epyc CPUs. It is able to deliver over 1 Exaflop of AI compute and 120 petaflops of dense compute. The highlight of the systems has been its near linear scaling of workloads. The system is available for academic and commercial users.

Groq

Groq is an AI-focused hardware company that offers hardware solutions from a single PCIe card to full data center racks, to their own cloud. Groq uses its own AI chips called the Tensor Streaming Processor (TSP), which is optimized for large-scale matrix operations used in machine learning algorithms. The optimizations for a TSP, which are for handling high-dimensional data accurately and with low latency, makes it a good choice for real-time AI operations.

The TSP is a hardware accelerator designed to efficiently perform tensor operations commonly used in machine learning and other AI applications. Accelerating tensor operations, such as matrix multiplication, convolution, and pooling, are critical to many AI workloads. TSPs can significantly improve performance and energy efficiency when compared to traditional GPU-based ML/AI systems.

Groq offers an SDK, the GroqWare Developer Tools Package, as well as consulting service to speed up development as well as a group of productivity tools all focused on enabling developers and users to get the greatest possible benefit from their hardware.

Kneron

Kneron is a company that develops hardware and software products for on-device edge AI. These are devices that do their AI processing at the edge and on the device that needs the results, without requiring access to greater processing or storage in the cloud. Kneron markets to a variety of industries that can make use of AI on the edge including autonomous driving, machine vision, and tasks that require semantic analysis.

Their latest hardware is the KL530, a neural processing unit that integrates their next-generation NPU with a state-of-the-art image processing sensor in a single package. The NPU package has exceptionally low power requirements and is designed to enable L1 and L2 autonomous driving capabilities as well as being targeted at future generations of Artificial Intelligence of Things (AIoT) products.

Mythic

Mythic offers an AI-focused processor called the Mythic Analog Matrix Processor. Their M1076 processor is a low-power design that can be scaled from edge AI devices to server-class data center compute. Their processors are based on their Analog Compute Engine (Mythic ACE) and is an array of computer tiles integrating flash memory and ADCs that store the AI model parameters and can perform low-power high performance matrix multiplication.

The company’s AMP processors use an architecture called "analog compute-in-memory," which combines memory and compute functions in a single chip. This enables the chips to perform computations with minimal data movement, resulting in faster processing and lower power consumption.

In March 2023 the company completed another round of funding enabling them to move forward with their next generation chip.

Qualcomm

When thinking of Qualcomm what comes to mind for most people is their Snapdragon processors, the multicore ARM CPUs found in the words leading mobile phones that aren’t made by Apple. What most people don’t realize is that the Snapdragon CPU has support for AI processing including NNP, which means that on-device machine learning is possible. Qualcom is working on making AI ubiquitous and believes that mobile will be the pervasive AI platform.

Beyond that Qualcomm offers purpose-built hardware under the name Qualcomm Cloud AI 100 designed for high performance, low-power AI processing in the cloud and at the edge with a focus on AI inference acceleration. Currently the five models support PCIe Gen 4 for server slots and Gen 3 for M.2 form factor allowing data centers to add the capability to run inference more quickly and with the low power consumption design, more efficiently.

SambaNova Systems

SambaNova Systems offers the SambaNova Datascale, a comprehensive software and hardware solution for accelerating AI workloads. SambaNova Datascale consists of two primary components: the SambaNova Software Stack and the SambaNova System. The SambaNova Software Stack is a software platform that includes machine learning frameworks, libraries, and tools for building, training, and deploying AI models., and enabling developers to optimize easily for the SambaNova System, their hardware platform, which is a unique chip architecture called the Reconfigurable Dataflow Unit (RDU), which is optimized for AI workloads.

The RDU is a programmable chip that can be dynamically reconfigured to handle different types of AI workloads, and the system includes a high-speed interconnect fabric that allows multiple RDUs to be clustered and lets users build highly scalable systems.

SamabaNova’s goal is to provide a comprehensive end-to-end solution for AI workloads, from data preparation and model development to training and inference. And one that can be deployed in a very standardized model, fitting into existing data center infrastructures.

Syntiant

Syntiant specializes in developing ultra-low-power AI chips for edge devices. Their chips are designed to perform machine learning tasks locally on the edge device, not requiring access to cloud or of-device resources. Syntiant Neural Decision Processors come in multiple version explicitly designed for specific tasks such as always-on speech recognition, always-on vision, and combinations of tasks such as always on speech, vision and sensor processing. Voice-activation, environmental control, and biometrics are some of the areas these processors are used in.

The NDPs are flexible and can support a wide range of AI models and algorithms, including deep learning and convolutional neural networks. The company provides a software development kit that enables developers to easily integrate their AI models utilizing the NDPs.

The Road Ahead

Despite all the media coverage on AI and its issues, the market is still in its infancy; the opportunity for a new name to rise to the top layer is there. In the last few years there have been announcements from more than a dozen startups announcing their focus on introducing new AI devices, chips, technologies, software development kits, and a wide array of AI-driven solutions and products. The majority of those announcements came to nothing. But the technologies that are surviving are making AI more practical and usable across a broad range of industries.

There is yet to be a clear winner in the world of AI computers at the top end. For example, the Argonne National Laboratory has Cerebras WSE, Groq, and a SambaNova Datascale system in use on AI projects. These are in addition to the forthcoming Aurora Exascale supercomputer, being built in partnership with Intel and Hewlett-Packard Enterprise and the existing Polaris supercomputer, also being built by HPE and using AMD CPUs and Nvidia GPUs.

New technology approaches may change the way we currently train and interact with AI. Understanding AI and ML is an important focus for any forward-looking business. Knowing how it works is just the starting point for understanding how it can best benefit your business.