While the Habana GOYA™ AI Inference processor is relatively new to AI processing, having been introduced only in September 2018, its performance is redefining what customers can expect from a processor that’s custom-designed and optimized for AI inference.
On the ResNet-50 benchmark, GOYA is outpacing performance of its closest rival, the T4 processor, by a factor of more than 3. GOYA delivers 15,393 images-per-second inference throughput as opposed to the T4’s Nvidia-reported performance of 4,944 images-per-second. As you see here, 3x makes a tangible difference…3 times faster processing = 3 times quicker processing of deep learning workloads = 3 times increases in productivity.
The key factors used in assessing inference performance are throughput/speed, power efficiency, latency and the ability to support small batch sizes. In this same ResNet-50 benchmark, GOYA offered power efficiency of 149 images-per-second-per-Watt (IPS/W) vs. T4’s power efficiency of 71 IPS/W. And, GOYA supports minimal latency of 1.01ms (well below the industry requirement of 7 milliseconds) vs. T4’s whopping 26 ms. In addition, GOYA’s performance is linear and sustained even at small batch sizes.
For more information on the GOYA AI Inference Processor, check out the whitepaper.