Efficient scale-out for the large-scale generative AI era
Scale deep learning with native integration
of 24 100 Gigabit Ethernet on every Gaudi 2 accelerator
Architected to scale deep learning acceleration from one to thousands of processors, Gaudi 2 offers unique advantages of substantial scaling capacity with 24 100 Gigabit Ethernet ports on every Gaudi 2. Integration of Ethernet connectivity provides tangible advantages of flexibility, efficiency and cost-efficiency.
Gaudi 2
Accelerator Architecture
Gaudi 2
Server Architecture
Advantages of Gaudi 2
Integrated Connectivity
Gaudi 2’s integrated networking gives the customer greater versatility and freedom to build the system that fits their needs
Large-scale-capacity
- All-to-all scale-up
- 2.4TB scale-out
- Reduces network bottlenecks
Flexible
- Non-proprietary,
industry standard - Scale from one to thousands of Gaudi 2s
- Wide variety of
Ethernet switches for build out
Efficient
- Most data centers already utilize Ethernet
- Ease of build with industry standard
- Cost-efficient with breadth of Ethernet choices
Control
- Industry standard
- No vendor lock in with proprietary solutions
- Cost control through choice of system equipment
Scale-out scenarios
Required Components
Optional Components / Examples
Small Pod Architecture
16-40 Gaudi 2s (2-5 servers, with 8x Gaudi 2 each)
Large Pod Architecture
- Supports variable ratio of nodes-to-switches
- 3x 400G switches + up to 128 Gaudi 2s (16 servers with 8x Gaudi 2 each)
Gaudi 2 MegaPod Architecture
Large clusters can be easily built using multiple MegaPods
- 8 Gaudi 2 servers & 3 switches
- 8x Gaudi 2 per node (server)
- Leaf switches are placed close to Gaudi 2 nodes to minimize communication latency
- Copper cables can be used within pod