
Gaudi-Based
Amazon EC2 DL1 TRAINING Instances

Latest News:



Habana’s Gaudi Accelerator technology powers new Amazon EC2 DL1 instances for training deep learning models.
All In on price/performance
Delivering up to 40% better price performance than comparable GPU-based training instances, Amazon EC2 DL1 instances make training models in the cloud more accessible to customers— enabling them to leverage the insights, efficiencies and enhanced end-user experiences that AI computer vision and natural language applications can provide.
To learn to how set up and run Habana-based Amazon EC2 DL1 Training Instances visit the developer site.
All In on usability
To enable customers to easily build new or migrate existing GPU-based models to Gaudi, we provide developers with the Gaudi-optimized software platform, SynapseAI® with integrated TensorFlow and PyTorch frameworks. There’s also all the necessary documentation, how to videos, tools and resources which are on Habana’s Developer Site, and models, scripts and source code on Habana GitHub.
The new DL1 EC2 instance, which is powered by eight Gaudi accelerators, can be launched using the AWS Deep Learning AMIs, Amazon EKS and Amazon ECS for containerized applications, and Amazon SageMaker.

Price / performance by the numbers
Here, using publicly published performance metrics and pricing, are the details behind the up to 40% better price performance of the Amazon EC2 DL1 instances based on Gaudi accelerators.
Model: https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Classification/ConvNets/resnet50v1.5
Your measured performance results may vary.
(**) Measured by Habana on AWS EC2 DL1.24xlarge instance, using DLAMI integrating SynapseAI 1.0.1-81 Tensorflow 2.5.1 Container at Habana’s Vault, model: https://github.com/HabanaAI/Model-References/tree/master/TensorFlow/computer_vision/Resnets/resnet_keras Based on pricing published at: https://aws.amazon.com/ec2/pricing/on-demand
Your measured performance results may vary.
Model: https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT
Your measured performance results may vary.
(**) Measured by Habana on AWS EC2 DL1.24xlarge instance, using DLAMI integrating SynapseAI 1.0.1-81 Tensorflow 2.5.1 Container at Habana’s Vault, model: https://github.com/HabanaAI/Model-References/tree/master/TensorFlow/nlp/bert
Pricing published at https://aws.amazon.com/ec2/pricing/on-demand
Your measured performance results may vary.
For a deep dive on the Amazon EC2 DL1 instances powered by Gaudi processors, watch this video.
References
Seagate® Technology has been a global leader offering data storage and management solutions for over 40 years. Seagate’s data science and machine learning engineers have built an advanced deep learning (DL) defect detection system and deployed it globally across the company’s manufacturing facilities. In a recent proof-of-concept Habana Gaudi exceeded the performance targets for training one of the DL semantic segmentation models currently used in Seagate’s production.
“We expect the significant price/performance advantage of Amazon EC2 DL1 instances, powered by Habana Gaudi accelerators, could make a compelling future addition to AWS compute clusters,” said Seagate’s Senior Engineering Director of Operations and Technology, Advanced Analytics, Darrell Louder. “As Habana Labs continues to evolve and enables broader coverage of operators, there is potential for expanding to additional enterprise use cases, and thereby harnessing additional cost savings.”


Leidos is recognized as a Top 10 Health IT provider delivering a broad range of customizable and scalable solutions to hospitals, biomedical organizations, and every U.S. federal agency focused on health.
“One of the numerous technologies we are enabling to advance healthcare today is the use of machine learning and deep learning for disease diagnosis based on medical imaging data. Our massive data sets require timely and efficient training to aid researchers seeking to solve some of the most urgent medical mysteries. Given Leidos’ and its customers need for quick, easy, and cost-effective training for deep learning models, we are excited to have begun this journey with Intel to use Amazon EC2 DL1 instances based on Habana Gaudi AI processors. Using DL1 instances, we expect an increase in model training speed and efficiency, with a subsequent reduction in risk and cost of research and development,” said Chetan Paul, CTO Health and Human Sciences at Leidos.”
RiskFuel provides real-time valuations and risk sensitivities to companies managing financial portfolios, helping them increase trading accuracy and performance.
“Two factors drew us to Amazon EC2 DL1 instances based on Habana Gaudi AI accelerators. First, we want to make sure our banking and insurance clients can run Riskfuel models that take advantage of the newest hardware. We found migrating our models to DL1 instances to be simple and straightforward – really, it was just a matter of changing a few lines of code. Second, training costs are a big component of our spending, and the promise of up to 40% improvement in price/performance offers potentially substantial benefit to our bottom line,” said Ryan Ferguson, CEO of Riskfuel.”


Fractal is a global leader in artificial intelligence and analytics, powering decisions in Fortune 500 companies.
“AI and deep learning are at the core of our Machine Vision capability, enabling customers to make better decisions across industries we serve. In order to improve accuracy, data sets are becoming larger and more complex, requiring larger and more complex models. This is driving the need for improved compute price-performance,” said Srikanth Velamakanni, Group CEO of Fractal. “The new Amazon EC2 DL1 instances promise significantly lower cost training than GPU-based EC2 instances. We expect this to make training of AI models on cloud much more cost competitive and accessible than before for a broad array of clients.”
Intel’s 3D Athlete Tracking (3DAT) is a solution that utilizes deep learning to analyze athlete-in-action videos, captured in real time, to inform human performance workflows and enhance the audience viewing experience during sports competitions.
“Training our models on Amazon EC2 DL1 instances, powered by Gaudi accelerators from Habana Labs, will enable us to accurately and reliably process thousands of videos and generate associated performance data, while lowering training cost.” said Rick Echevarria, Vice President, Sales and Marketing Group, Intel. “With DL1 instances, we can now train at the speed and cost required to productively serve athletes, teams, and broadcasters of all levels across a variety of sports.”

The price performance claim is made by AWS and based on AWS internal testing. Habana Labs does not control or audit third-party data; your price performance may vary.