
GAUDI2 PROCESSOR FOR DEEP LEARNING TRAINING AND INFERENCE WORKLOADS.

BORN FOR DEEP LEARNING.
RAISED TO A WHOLE NEW LEVEL.
The Gaudi2 processor significantly increases training and inference performance, building on the high-efficiency, first-generation Gaudi architecture that delivers up to 40% better price performance in the AWS cloud with Amazon EC2 DL1 instances and on-premises with the Supermicro Gaudi®2 AI Training Server. Gaudi2 delivers training and inference performance on a whole new level with a leap from the first-gen Gaudi 16nm process technology to Gaudi2’s 7nm. It increases the number of AI-customized Tensor Processor Cores–from 8 to 24, adds support for FP8 and integrates a media processing engine for processing compressed media for offloading the host subsystem. Gaudi2’s in-package memory has tripled to 96 GB of HBM2e at 2.45 Terabytes-per-second bandwidth.

Scaling Systems with Gaudi2
Networking capacity, efficiency and flexibility
Habana has made it cost-effective and easy for customers to scale out training and inference capacity, amplifying Gaudi2 bandwidth with the integration of 24 100-Gigabit RDMA over Converged Ethernet (RoCE2) ports on chip, an increase from ten ports on first-gen Gaudi.
Twenty-one of the ports on every Gaudi2 processor are dedicated to connecting the other seven processors in an all-to-all, non-blocking configuration within the server. Three of the ports on every processor are dedicated to scale out, providing 2.4 Terabits of networking throughput in the 8-card Gaudi server, the HLS-Gaudi2.
To simplify system design for its customers, Habana also offers an 8-Gaudi2 baseboard as a product. With the integration of industry-standard RoCE on chip, customers can easily scale and configure Gaudi2 systems to suit their deep learning cluster requirements, from one to 1000s of Gaudi2s.
With system implementation on widely used industry-standard Ethernet connectivity, Gaudi2 enables customers to choose from a wide array of Ethernet switching and related networking equipment, enabling cost-savings. And the on-chip integration of the Networking Interface Controller (NIC) ports lowers component costs.
HLS GAUDI®2 SERVER
GAUDI®2 System Partners


SIMPLIFIED MODEL BUILD AND MIGRATION WITH SYNAPSE AI SOFTWARE SUITE
The Habana SynapseAI® Software Suite is optimized for deep learning model development and to ease migration of existing GPU-based models to Gaudi platform hardware. It integrates TensorFlow and PyTorch frameworks and a rapidly expanding array of computer vision, natural language processing and multi-modal models. Developers are supported with documentation and tools, how-to content and a community support forum on the Habana Developer Site and with reference models and model roadmap on the Habana GitHub. Getting started with model migration is as easy as adding 2 lines of code, and for expert users who wish to program their own kernels, Habana offers the full tool-kit to do that as well.
SynapseAI software supports training and inference models on Gaudi2 with ease and flexibility. SynapseAI is also integrated with ecosystem partners such as Hugging Face with transformer model repositories and tools, Grid.ai Pytorch Lightning and CNVRG.IO MLOPS software.

Synapse®AI Software Suite
Habana Developer Site
Habana GitHub
Habana Community Forum
Software Ecosystem


“As a world leader in automotive and driving assistance systems, training cutting edge Deep Learning models for mission-critical to Mobileye business and vision. As training such models is time consuming and costly, multiple teams across Mobileye have chosen to use Gaudi-accelerated training machines, either on Amazon EC2 DL1 instances or on-prem; Those teams constantly see significant cost-savings relative to existing GPU-based instances across model types, enabling them to achieve much better Time-To-Market for existing models or training much larger and complex models aimed at exploiting the advantages of the Gaudi architecture,” said Gaby Hayon, executive vice president of R&D at Mobileye. “We’re excited to see Gaudi2’s leap in performance, as our industry depends on the ability to push the boundaries with large-scale high performance deep learning training accelerators.”

“The rapid-pace R&D required to tame COVID demonstrates an urgent need our medical and health sciences customers have for fast, efficient deep learning training of medical imaging data sets–when hours and even minutes count—to unlock disease causes and cures. We expect Gaudi2, building on the speed and cost-efficiency of Gaudi1, to provide customers with dramatically accelerated model training, while preserving the DL efficiency we experienced with first-gen Gaudi.” Chetan Paul, CTO Health and Human Services at Leidos.
“We are excited to bring our next-generation AI deep learning server to market, featuring the high-performance Gaudi2 accelerators enabling our customers to achieve faster time-to-train, efficient scaling with industry-standard Ethernet connectivity and improved TCO,” said Charles Liang, president and CEO of Supermicro. “We are committed to collaborating with Intel and Habana to deliver leadership AI solutions optimized for deep-learning in the cloud and data center.”
“We congratulate Habana on the launch of its new high-performance, 7nm Gaudi2 accelerator. We look forward to collaborating on the turnkey AI solution consisting of our DDN AI400X2 storage appliance combined with Supermicro Gaudi®2 AI Training Servers to help enterprises with large, complex deep learning workloads unlock meaningful business value with simple but powerful storage,” said Paul Bloch, president and co-founder of DataDirect Networks.