Fine-Tuning Llama2-70B with DeepSpeed ZeRO-3 and Low-Rank Adaptation (LoRA) on Intel® Gaudi®2 AI Accelerator
With Habana’s SynapseAI 1.13.0 release, users can run Fine Tune the Llama2 70B model using only 8 Gaudi2 Accelerators.
DeepSpeed, Fine Tuning, Llama, LoRA
Training Llama and Bloom 13 Billion Parameter LLMs with 3D Parallelism on Habana® Gaudi2®
One of the main challenges in training Large Language Models (LLMs) is that they are often too large to fit on a single node or even if they fit, the training may be too slow. To address this issue, their training can be parallelized across multiple Gaudi accelerators (HPUs).
3D-Parallelism, DeepSpeed, GenAI, Large Language Models
Porting a model to Megatron-DeepSpeed with Habana Gaudi
If you want to train a large model using Megatron-DeepSpeed, but the model you want is not included in the implementation, you can port it to the Megatron-DeepSpeed package. Assuming your model is transformer-based, you can add your implementation easily, basing it on existing code.
3D-Parallelism, DeepSpeed, GenAI, Large Language Models
Optimizing Large Language Model Inference on Gaudi2 with Hugging Face Optimum-Habana
We have optimized additional Large Language Models on Hugging Face using the Optimum Habana library.
DeepSpeed, Hugging Face, Inference
BLOOM 176B Inference on Habana Gaudi2
With Habana’s SynapseAI 1.8.0 release support of DeepSpeed Inference, users can run inference on large language models, including BLOOM 176B.
BLOOM, DeepSpeed, Inference
Pre-Training the BERT 1.5B model with DeepSpeed
In this post, we show you how to run Habana’s DeepSpeed enabled BERT1.5B model from our Model-References repository.
BERT, DeepSpeed, developer, Gaudi, Gaudi2, pytorch, synapseai
Fine tuning GPT2 with Hugging Face and Habana Gaudi
In this tutorial, we will demonstrate fine tuning a GPT2 model on Habana Gaudi AI processors using Hugging Face optimum-habana library with DeepSpeed.
DeepSpeed, developer, Fine Tuning, Gaudi, GPT, GPT2, Hugging Face
Memory-Efficient Training on Habana® Gaudi® with DeepSpeed
One of the key challenges in Large Language Model (LLM) training is reducing the memory requirements needed for training without sacrificing compute/communication efficiency and model accuracy.
DeepSpeed, developer, Gaudi, Large Language Models