Gaudi2 logo

GAUDI2 PROCESSOR FOR DEEP LEARNING TRAINING AND INFERENCE WORKLOADS.

BUILT ON THE HIGH-EFFICIENCY GAUDI ARCHITECTURE, NOW IN 7nm
Habana Pod

BORN FOR DEEP LEARNING.
RAISED TO A WHOLE NEW LEVEL.

The Gaudi2 processor significantly increases training and inference performance, building on the high-efficiency, first-generation Gaudi architecture that delivers up to 40% better price performance in the AWS cloud with Amazon EC2 DL1 instances and on-premises with the Supermicro Gaudi®2 AI Training Server. Gaudi2 delivers training and inference performance on a whole new level with a leap from the first-gen Gaudi 16nm process technology to Gaudi2’s 7nm. It increases the number of AI-customized Tensor Processor Cores–from 8 to 24, adds support for FP8 and integrates a media processing engine for processing compressed media for offloading the host subsystem. Gaudi2’s in-package memory has tripled to 96 GB of HBM2e at 2.45 Terabytes-per-second bandwidth.

See what customers and partners are saying about Gaudi2 >

Read the Gaudi2 whitepaper >

2X
These Gaudi advances result in ~2X the throughput of the A100 on ResNet-50 and BERT.

Computer Vision

ResNet-50 throughput measurement based on:

A100-80GB: Measured by Habana on Azure instance tandard_ND96amsr_A100_v4 using single A100-80GB with TF docker 22.03-tf2-py3 from NGC (Phase-1: Seq len=128, BS=312, accu steps=256; Phase-2: seq len=512, BS=40, accu steps=768) 
A100-40GB: Measured by Habana on DGX-A100 using single A100-40GB with TF docker 22.03-tf2-py3 from NGC (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=16, accu steps=2048)
V100-32GB: Measured by Habana on p3dn.24xlarge using single V100-32GB with TF docker 21.12-tf2-py3 from NGC   (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=8, accu steps=4096)       
Gaudi2: Measured by Habana on Gaudi2-HLS system using single Gaudi2 with SynapseAI TF docker 1.5.0 (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=16, accu steps=2048)
Results may vary

NATURAL LANGUAGE PROCESSING

BERT Effective Training Throughput: Combining Phase-1 & Phase-2 
A100-80GB: Measured by Habana on Azure instance Standard_ND96amsr_A100_v4 using single A100-80GB with TF docker 22.03-tf2-py3 from NGC (Phase-1: Seq len=128, BS=312, accu steps=256; Phase-2: seq len=512, BS=40, accu steps=768) len=512, BS=40, accu steps=768)
A100-40GB: Measured by Habana on DGX-A100 using single A100-40GB with TF docker 22.03-tf2-py3 from NGC (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=16, accu steps=2048)
V100-32GB: Measured by Habana on p3dn.24xlarge using single V100-32GB with TF docker 21.12-tf2-py3 from NGC   (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=8, accu steps=4096)     
Gaudi2: Measured by Habana on Gaudi2-HLS system using single Gaudi2 with SynapseAI TF docker 1.5.0 (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=16, accu steps=2048)
Results may vary

Natural Language Processing

BERT Phase-1 and Phase-2 throughput measurement based on:
A100-80GB: Measured by Habana on Azure instance Standard_ND96amsr_A100_v4 using single A100-80GB with TF docker 22.03-tf2-py3 from NGC (Phase-1: Seq len=128, BS=312, accu steps=256; Phase-2: seq len=512, BS=40, accu steps=768)
A100-40GB: Measured by Habana on DGX-A100 using single A100-40GB with TF docker 22.03-tf2-py3 from NGC (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=16, accu steps=2048)
V100-32GB: Measured by Habana on p3dn.24xlarge using single V100-32GB with TF docker 21.12-tf2-py3 from NGC   (Phase-1: Seq len=128, BS=64, accu steps=1024;Phase-2: seq len=512, BS=8, accu steps=4096)       
Gaudi2: Measured by Habana on Gaudi2-HLS system using single Gaudi2 with SynapseAI TF docker 1.5.0 (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=16, accu steps=2048))
Results may vary.
For a topline on Gaudi2, please watch >
gaudi2-roce

Scaling Systems with Gaudi2

Networking capacity, efficiency and flexibility

Habana has made it cost-effective and easy for customers to scale out training and inference capacity, amplifying Gaudi2  bandwidth with the integration of 24 100-Gigabit RDMA over Converged Ethernet (RoCE2) ports on chip, an increase from ten ports on first-gen Gaudi.

Twenty-one of the ports on every Gaudi2 processor are dedicated to connecting the other seven processors in an all-to-all, non-blocking configuration within the server. Three of the ports on every processor are dedicated to scale out, providing 2.4 Terabits of networking throughput in the 8-card Gaudi server, the HLS-Gaudi2.

To simplify system design for its customers, Habana also offers an 8-Gaudi2 baseboard as a product. With the integration of industry-standard RoCE on chip, customers can easily scale and configure Gaudi2 systems to suit their deep learning cluster requirements, from one to 1000s of Gaudi2s.

With system implementation on widely used industry-standard Ethernet connectivity, Gaudi2 enables customers to choose from a wide array of Ethernet switching and related networking equipment, enabling cost-savings. And the on-chip integration of the Networking Interface Controller (NIC) ports lowers component costs.

To see how easy it is to scale systems with a single-card to thousands of Gaudi2s, watch this >

HLS GAUDI®2 SERVER

For customers who want to build custom servers, they can begin with the HLS-Gaudi2 reference server shown here.

GAUDI®2 System Partners

For more information about Gaudi2 on-premises solutions contact us >
supermicro server
> 8 Gaudi2 Mezzanine Cards
> 2 Intel® Xeon® Scalable Processors
ddn server
> Supporting advanced AI storage
> Learn more about the Gaudi and DDN storage solution see the Whitepaper

SIMPLIFIED MODEL BUILD AND MIGRATION WITH SYNAPSE AI SOFTWARE SUITE

Software suite icon

The Habana SynapseAI® Software Suite is optimized for deep learning model development and to ease migration of existing GPU-based models to Gaudi platform hardware. It integrates TensorFlow and PyTorch frameworks and a rapidly expanding array of computer vision, natural language processing and multi-modal models. Developers are supported with documentation and tools, how-to content and a community support forum on the Habana Developer Site and with reference models and model roadmap on the Habana GitHub. Getting started with model migration is as easy as adding 2 lines of code, and for expert users who wish to program their own kernels, Habana offers the full tool-kit to do that as well.

SynapseAI software supports training and inference models on Gaudi2 with ease and flexibility. SynapseAI is also integrated with ecosystem partners such as Hugging Face with transformer model repositories and tools, Grid.ai Pytorch Lightning and CNVRG.IO MLOPS software.

Please see more SynapseAI details here >

PyTorch logo
TensorFlow logo
Framework integration layer
SynapseAI

Synapse®AI Software Suite

Optimized, for deep learning model development and to ease migration of existing GPU-based models to Gaudi platform hardware. It integrates TensorFlow and PyTorch frameworks and 30+ reference models, covering primarily computer vision and natural language processing models. Getting started with model migration is as easy as adding 2 lines of code, and for expert users who wish to program their own kernels, Habana offers the full tool-kit and libraries to do that as well. SynapseAI software supports training models on first-gen Gaudi and Gaudi2 and inferencing them on any target, including Intel®Xeon® processors, Habana® Greco™ or inferencing on Gaudi2 itself.
Developer Icon

Habana Developer Site

Developers are supported with documentation and tools, how-to content, tutorials and updated on training opportunities–from webinars to hands-on trainings–to make using Habana processors and software platform as fast and easy as possible.
Habana GitHub

Habana GitHub

Our GitHub is the go-to hub for developers to access our Gaudi and Gaudi2-integrated reference models, plan ahead with our open model roadmap and file issues and bugs with our Habana team and the GitHub community.
Forum ICON

Habana Community Forum

Our Developer Site houses the Habana Community Forum, another channel where developers can access relevant information and topics, log opinions and engage with industry peers.

Software Ecosystem

SynapseAI is also integrated with ecosystem partners such as Hugging Face with transformer model repositories and tools, Grid.ai Pytorch Lightning and CNVRG.IO MLOPS software.
For more information on how we can help make it easy for you to build on the Habana Gaudi platform, please see our Developer Site
habana ecosystem partners
What customers and partners are saying about Gaudi:
MobileEye logo

“As a world leader in automotive and driving assistance systems, training cutting edge Deep Learning models for mission-critical to Mobileye business and vision. As training such models is time consuming and costly, multiple teams across Mobileye have chosen to use Gaudi-accelerated training machines, either on Amazon EC2 DL1 instances or on-prem; Those teams constantly see significant cost-savings relative to existing GPU-based instances across model types, enabling them to achieve much better Time-To-Market for existing models or training much larger and complex models aimed at exploiting the advantages of the Gaudi architecture,” said Gaby Hayon, executive vice president of R&D at Mobileye. “We’re excited to see Gaudi2’s leap in performance, as our industry depends on the ability to push the boundaries with large-scale high performance deep learning training accelerators.” 

Leidos Logo

“The rapid-pace R&D required to tame COVID demonstrates an urgent need our medical and health sciences customers have for fast, efficient deep learning training of medical imaging data sets–when hours and even minutes count—​to unlock disease causes and cures. We expect Gaudi2, building on the speed and cost-efficiency of Gaudi1, to provide customers with dramatically accelerated model training, while preserving the DL efficiency we experienced with first-gen Gaudi.” Chetan Paul, CTO Health and Human Services at Leidos.

SuperMicro logo

“We are excited to bring our next-generation AI deep learning server to market, featuring the high-performance Gaudi2 accelerators enabling our customers to achieve faster time-to-train, efficient scaling with industry-standard Ethernet connectivity and improved TCO,” said Charles Liang, president and CEO of Supermicro. “We are committed to collaborating with Intel and Habana to deliver leadership AI solutions optimized for deep-learning in the cloud and data center.”

DDN logo

“We congratulate Habana on the launch of its new high-performance, 7nm Gaudi2 accelerator. We look forward to collaborating on the turnkey AI solution consisting of our DDN AI400X2 storage appliance combined with Supermicro Gaudi®2 AI Training Servers to help enterprises with large, complex deep learning workloads unlock meaningful business value with simple but powerful storage,” said Paul Bloch, president and co-founder of DataDirect Networks.