GOYA™ PERFORMANCE ON BERT
Workload: Task: Question Answering, Dataset: SQuAD, Base Model, Layers=12 , Hidden Size=768, Heads=12 , Intermediate Size=3,072, Max Seq Len = 128
Goya Configuration:
Hardware: Goya HL-100; CPU Xeon Gold [email protected]
Software: Ubuntu v-16.04.4; SynapseAI v-0.2.0-1173
Precision: 16-bit
GPU Configuration:
Hardware: T4; CPU Xeon Gold [email protected]/16GB/4 VMs
Software: Ubuntu-18.04.2.x86_64-gnu; CUDA Ver 10.1, cudnn7.5; TensorRT-5.1.5.0
Precision: 16-bit
GOYA™ IMAGE CLASSIFICATION ON RESNET-50
Goya Measurement:
Hardware: Goya HL-100 PCIe Card; CPU XEON E5
Software: Ubuntu v-16.04; SynapseAI v-0.1.6
Workload implementation: Precision INT8; Batch size 10;
GPU Measurement:
Hardware Configuration: T4; Host Supermicro SYS-4029GP – TRT T4
Software Configuration: TensorRT 5.1; Synthetic dataset; Container – 19.03-py3;
Workload implementation: Precision INT8; Batch size 128;
RESNET-50 THROUGHPUT & LATENCY
Batch Size | Images Per Second | Latency/ ms |
|
---|---|---|---|
Goya | 1 | 7,466 | 0.2 |
4 | 13,221 | 0.9 | |
8 | 14,546 | 0.9 | |
10 | 15,453 | 1 | |
128 | 15,453 | 1 | |
T4 | 1 | 1,037 | 0.96 |
2 | 1,710 | 1.2 | |
8 | 3,730 | 2.2 | |
128 | 5,013 | 26 |
MLPERF PUBLISHED GOYA™ PERFORMANCE RESULTS
Habana Reported MLPerf Inference Results for the Goya Processor in Available Category
Some of the key distinctions assessed are:
Product status
- Available – available now for purchase/deployment
- Preview – on a path to availability; not yet there
- Research – to test, learn and iterate
Test specification adherence
- Closed – tested to match the specification, enabling comparison with other closed results
- Open – tested to a set of parameters defined by the vendor to present product results in most favorable light
- Number of accelerators contained in the tested solution
- Type of host processor employed
Other important factors, such as power, were not included in the measurements.
Goya performance shown here was reported in the available and closed categories.
For more details, see the MLPerf industry-wide results and whitepaper.

recognition

analysis

translation

system
Goya Hardware™

- PCIe: Gen4 x 16 lanes
- Form Factor: Dual-slot
- Memory: 4/8/16GB with ECC
8-card server: Typical deployment
