INTRODUCING THE GAUDI2 PROCESSOR FOR TRAINING DEEP LEARNING WORKLOADS.
BUILT ON THE HIGH-EFFICIENCY GAUDI ARCHITECTURE, NOW IN 7nm
BORN FOR DEEP LEARNING.
RAISED TO A WHOLE NEW LEVEL.
Now, the Gaudi2 processor significantly increases training performance, building on the high-efficiency, first-generation Gaudi architecture that delivers up to 40% better price performance in the AWS cloud with Amazon EC2 DL1 instances and on-premises with the Supermicro Gaudi®2 AI Training Server. Gaudi2 takes training performance to a whole new level with a leap from the first-gen Gaudi 16nm process technology to Gaudi2’s 7nm. It increases the number of AI-customized Tensor Processor Cores–from 8 to 24, adds support for FP8 and integrates a media processing engine for processing compressed media for offloading the host subsystem. Gaudi2’s in-package memory has tripled to 96 GB of HBM2e at 2.45 Terabytes-per-second bandwidth.
NATURAL LANGUAGE PROCESSING
Natural Language Processing
Scaling Systems with Gaudi2
Networking capacity, efficiency and flexibility
Habana has made it cost-effective and easy for customers to scale out training capacity, amplifying training bandwidth on Gaudi2 with the integration of 24 100-Gigabit RDMA over Converged Ethernet (RoCE2) ports on chip, an increase from ten ports on first-gen Gaudi.
Twenty-one of the ports on every Gaudi2 processor are dedicated to connecting the other seven processors in an all-to-all, non-blocking configuration within the server. Three of the ports on every processor are dedicated to scale out, providing 2.4 Terabits of networking throughput in the 8-card Gaudi server, the HLS-Gaudi2.
To simplify system design for its customers, Habana also offers an 8-Gaudi2 baseboard as a product. With the integration of industry-standard RoCE on chip, customers can easily scale and configure Gaudi2 systems to suit their deep learning cluster requirements, from one to 1000s of Gaudi2s.
With system implementation on widely used industry-standard Ethernet connectivity, Gaudi2 enables customers to choose from a wide array of Ethernet switching and related networking equipment, enabling cost-savings. And the on-chip integration of the Networking Interface Controller (NIC) ports lowers component costs.
HLS GAUDI®2 SERVER
SIMPLIFIED MODEL BUILD AND MIGRATION WITH SYNAPSE AI SOFTWARE SUITE
The Habana SynapseAI® Software Suite is optimized for deep learning model development and to ease migration of existing GPU-based models to Gaudi platform hardware. It integrates TensorFlow and PyTorch frameworks and 37 reference models, covering primarily computer vision and natural language processing models. Developers are supported with documentation and tools, how-to content and a community support forum on the Habana Developer Site and with reference models and model roadmap on the Habana GitHub. Getting started with model migration is as easy as adding 2 lines of code, and for expert users who wish to program their own kernels, Habana offers the full tool-kit to do that as well.
SynapseAI software supports training models on Gaudi2 and inferencing them on any target, including Intel®Xeon® processors, Habana® Greco™ or inferencing on Gaudi2 itself. SynapseAI is also integrated with ecosystem partners such as Hugging Face with transformer model repositories and tools, Grid.ai Pytorch Lightning and CNVRG.IO MLOPS software.
Synapse®AI Software Suite
Habana Developer Site
Habana Community Forum
“As a world leader in automotive and driving assistance systems, training cutting edge Deep Learning models for mission-critical to Mobileye business and vision. As training such models is time consuming and costly, multiple teams across Mobileye have chosen to use Gaudi-accelerated training machines, either on Amazon EC2 DL1 instances or on-prem; Those teams constantly see significant cost-savings relative to existing GPU-based instances across model types, enabling them to achieve much better Time-To-Market for existing models or training much larger and complex models aimed at exploiting the advantages of the Gaudi architecture,” said Gaby Hayon, executive vice president of R&D at Mobileye. “We’re excited to see Gaudi2’s leap in performance, as our industry depends on the ability to push the boundaries with large-scale high performance deep learning training accelerators.”
“The rapid-pace R&D required to tame COVID demonstrates an urgent need our medical and health sciences customers have for fast, efficient deep learning training of medical imaging data sets–when hours and even minutes count—to unlock disease causes and cures. We expect Gaudi2, building on the speed and cost-efficiency of Gaudi1, to provide customers with dramatically accelerated model training, while preserving the DL efficiency we experienced with first-gen Gaudi.” Chetan Paul, CTO Health and Human Services at Leidos.
“We’re excited to bring our next-generation AI deep learning server to market featuring the high-performance 7 nm Gaudi2 processor that will enable our customers to achieve faster time-to-train advantages while preserving the efficiency and expanding on the scalability of first-generation Gaudi,” said Charles Liang, CEO, Supermicro
“We congratulate Habana on the launch of its new high-performance, 7nm Gaudi2 accelerator. We look forward to collaborating on the turnkey AI solution consisting of our DDN AI400X2 storage appliance combined with Supermicro Gaudi®2 AI Training Servers to help enterprises with large, complex deep learning workloads unlock meaningful business value with simple but powerful storage,” said Paul Bloch, president and co-founder of DataDirect Networks.