Habana Gaudi AI Processors to bring lower cost-to-train to Amazon EC2 customers

by Eitan Medina, Santa Clara, CA, United States

Today, in the re:Invent CEO Keynote, Amazon Web Services announced EC2 instances that will leverage up to 8 Gaudi accelerators and deliver up to 40% better price performance than current GPU-based EC2 instances for machine learning workloads. Availability of Gaudi-based EC2 Instances is targeted to the first half of 2021.

In this morning’s keynote, AWS CEO Andy Jassy underscored the massively expanding demand across industry sectors for high-performance, yet more affordable AI workloads. With the company’s plans to introduce new EC2 instances featuring Gaudi for deep learning training, AWS will further reduce the cost of training AI datasets and lower total cost of operations for customers who want to leverage the business insights, efficiencies and enhanced end-user experiences that AI can provide.

An 8-card Gaudi solution can process about 12,000 images-per-second training the ResNet-50 model on TensorFlow. Each Gaudi processor integrates 32GB of HBM2 memory and features RoCE on-chip integration used for inter-processor connectivity inside the server. Scaling across servers will be enabled using the AWS Elastic Fabric Adapter (EFA) technology, allowing AWS and its customers to seamlessly expand use of multiple Gaudi based systems for efficient and scalable distributed training.

 Habana’s SynapseAI Software Suite is designed to facilitate high-performance DL training on Habana’s Gaudi processor. Popular DL frameworks such as TensorFlow and PyTorch are integrated with SynapseAI and optimized for Gaudi. Developers will have open access to Gaudi software, reference models and documentation. Reference models will be hosted in public in Habana’s GitHub repository and will include popular models for diverse applications such as image classification, object detection, natural language processing and recommendation systems. The SynapseAI software suite includes Habana’s graph compiler and runtime, Tensor Processor Core (TPC) kernel library, firmware and drivers, and developer tools such as the TPC SDK for custom kernel development and SynapseAI Profiler.  For more information on enabling the new AWS EC2 Instances on Gaudi, please see our whitepaper.

Habana will build on Gaudi’s current efficiencies with its next generation TSMC 7nm Gaudi2, making AI training applications and services even more accessible to a breadth of customers, data scientists and researchers.

For more information about the advantages of AI training on Gaudi, please see habana.ai/training

The price performance claim is made by AWS and based on AWS internal testing. Habana Labs does not control or audit third-party data; your costs and results may vary.

Gaudi performance is based on an 8-Gaudi server configuration, the HLS—1, and SynapseAI software release 0.11.

For more information about Habana click here.


Deep-Dive into Habana’s GAUDI AI Training Processor

With the launch of the GAUDI processor, Habana has brought to the world of AI computing two critically important sets of advantages. GAUDI delivers extraordinary AI training throughput that scales at near-perfectly linear progression–from a single card up to more than a thousand processors, a critically important capability that enables optimized system performance and total cost of ownership. Habana’s GAUDI also features the unprecedented on-chip integration of RoCE RDMA based on open Ethernet standards to boost system-wide performance and provide extremely flexible scale-out and scale-up options, enabling datacenter and cloud customers to avoid lock-in associated with proprietary solutions.

For a deeper dive into GAUDI’s unique design and capability, check out the GAUDI whitepaper or see Eitan Medina’s session on scaling with GAUDI.