Goya + Glow

April 4, 2019

Habana Goya Inference Processor is the first AI processor to implement and open source the Glow comp

Read More

Keynote Announcement

September 18, 2018

See the Goya launch keynote from the AI Hardware Summit

Read More

GOYA PERFORMANCE ON BERT

Workload: Task: Question Answering, Dataset: SQuAD, Base Model, Layers=12 , Hidden Size=768, Heads=12 , Intermediate Size=3,072, Max Seq Len = 128

Goya Configuration:
Hardware: Goya HL-100; CPU Xeon Gold 6152@2.10GHz
Software: Ubuntu v-16.04.4; SynapseAI v-0.2.0-1173
Precision: 16-bit

GPU Configuration:
Hardware: T4; CPU Xeon Gold 6154@3Ghz/16GB/4 VMs
Software: Ubuntu-18.04.2.x86_64-gnu; CUDA Ver 10.1, cudnn7.5; TensorRT-5.1.5.0
Precision: 16-bit

GOYA IMAGE CLASSIFICATION ON RESNET-50

Goya Measurement:
Hardware: Goya HL-100 PCIe Card; CPU XEON E5
Software: Ubuntu v-16.04; SynapseAI v-0.1.6
Workload implementation: Precision INT8; Batch size 10;

GPU Measurement:
Hardware Configuration: T4; Host Supermicro SYS-4029GP – TRT T4
Software Configuration:  TensorRT 5.1; Synthetic dataset; Container – 19.03-py3;
Workload implementation: Precision INT8; Batch size 128;

 

RESNET-50 THROUGHPUT & LATENCY

Batch Size Images Per Second Latency/
ms
Goya 1 7,466 0.2
4 13,221 0.9
8 14,546 0.9
10 15,453 1
128 15,453 1
T4 1 1,037 0.96
2 1,710 1.2
8 3,730 2.2
128 5,013 26

MLPERF PUBLISHED GOYA PERFORMANCE RESULTS

Habana Reported MLPerf Inference Results for the Goya Processor in Available Category

Some of the key distinctions assessed are:

Product status

  • Available – available now for purchase/deployment
  • Preview – on a path to availability; not yet there
  • Research – to test, learn and iterate

Test specification adherence

  • Closed – tested to match the specification, enabling comparison with other closed results
  • Open – tested to a set of parameters defined by the vendor to present product results in most favorable light
  • Number of accelerators contained in the tested solution
  • Type of host processor employed

Other important factors, such as power, were not included in the measurements.

 

Goya performance shown here was reported in the available and closed categories.

For more details, see the MLPerf industry-wide results and whitepaper.

Highest AI inference throughput speeds with lowest power

>3x Images-per-second (IPS) and 2x IPS/Watt vs. GPUs

24/12/2018
Linley Microprocessor Report
Download

Market-leading inference performance, regardless of batch size

18/02/2019
Linley Microprocessor Report
Download
General purpose AI Processor (AIP) – supporting any topology
Image
recognition
Sentiment
analysis
Neural machine
translation
Recommendation
system

Goya Hardware

Goya
  • PCIe: Gen4 x 16 lanes
  • Form Factor: Dual-slot
  • Memory: 4/8/16GB with ECC

GOYA HL-100 PCIe Card Datasheet

8-card server: Typical deployment

Goya 8 card
For more details on Goya's performance, GOYA Whitepaper

Goya’s software platform

Delivers full programmability and support for popular frameworks

Goya Software Stack

 

SynapseAI and Habana development tools speed and ease development

Profiler-Debugger screenshots goya