by Eitan Medina, Santa Clara, CA, United States
Today, in the re:Invent CEO Keynote, Amazon Web Services announced EC2 instances that will leverage up to 8 Gaudi accelerators and deliver up to 40% better price performance than current GPU-based EC2 instances for machine learning workloads. Availability of Gaudi-based EC2 Instances is targeted to the first half of 2021.
In this morning’s keynote, AWS CEO Andy Jassy underscored the massively expanding demand across industry sectors for high-performance, yet more affordable AI workloads. With the company’s plans to introduce new EC2 instances featuring Gaudi for deep learning training, AWS will further reduce the cost of training AI datasets and lower total cost of operations for customers who want to leverage the business insights, efficiencies and enhanced end-user experiences that AI can provide.
An 8-card Gaudi EC2 instance can process about 12,000 images-per-second training the ResNet-50 model on TensorFlow. Each Gaudi processor integrates 32GB of HBM2 memory and features RoCE on-chip integration used for inter-processor connectivity inside the server. Scaling across servers will be enabled using the AWS Elastic Fabric Adapter (EFA) technology, allowing AWS and its customers to seamlessly expand use of multiple Gaudi based systems for efficient and scalable distributed training.
Habana’s SynapseAI Software Suite is designed to facilitate high-performance DL training on Habana’s Gaudi processor. Popular DL frameworks such as TensorFlow and PyTorch are integrated with SynapseAI and optimized for Gaudi. Developers will have open access to Gaudi software, reference models and documentation. Reference models will be hosted in public in Habana’s GitHub repository and will include popular models for diverse applications such as image classification, object detection, natural language processing and recommendation systems. The SynapseAI software suite includes Habana’s graph compiler and runtime, Tensor Processor Core (TPC) kernel library, firmware and drivers, and developer tools such as the TPC SDK for custom kernel development and SynapseAI Profiler. For more information on enabling the new AWS EC2 Instances on Gaudi, please see our whitepaper.
Habana will build on Gaudi’s current efficiencies with its next generation TSMC 7nm Gaudi2, making AI training applications and services even more accessible to a breadth of customers, data scientists and researchers.
For more information about the advantages of AI training on Gaudi, please see habana.ai/training
The price performance claim is made by AWS and based on AWS internal testing. Habana Labs does not control or audit third-party data; your costs and results may vary.
Gaudi performance is based on an 8-Gaudi server configuration, the HLS—1, and SynapseAI software release 0.11.