Intel Habana overtakes Nvidia with latest ML Perf results

We are excited to bring Transform 2022 back in person from July 19th, effectively July 20th to 28th. Join an AI and data leader for insightful talk and exciting networking opportunities. Register today!


Intel’s Habana has overtaken Nvidia with the latest ML Perf benchmark results. It has become an industry standard benchmark set for comparing AI accelerators. Nvidia has already announced next-generation GPUs, and the results show that competition for deep learning training hardware is intensifying.

Intel acquired startup Habana for $ 2 billion in late 2019, and at the end of last year its first-generation 16nm Gaudi NPU (Neural Processing Unit) was running in Amazon’s AWS cloud, $ 1 more than an Nvidia-based instance. It boasts 40% higher performance per unit. However, because it was competing with Nvidia’s 7nm A100, Habana almost achieved its value by charging a low price rather than beating Nvidia in performance.

This changed in May when Habana announced Gaudi 2 at 7nm. This triples the number of tensor-processed cores and provides up to 96GB of HBM2e. Habana claimed to outperform Nvidia’s major two-year-old data center GPU, the A100, with a comfortable margin. The announcement was just in time for inclusion in the latest MLPerf results, an industry attempt to standardize deep learning benchmarks.

Performance results

According to Havana, it was only 10 days after launch to submit the results, so it wasn’t possible to run all eight tests, and the two most widely known benchmarks, ResNet-50 (image recognition). And BERT (Natural Language) only. )process). MLPerf submission goes through a one-month peer review process.

Havana also said that a short time meant that there wasn’t yet time to thoroughly optimize the software. For example, Gaudi2 has added support for a new low-precision FP8 format that was not used in submissions. Instead, Habana chose to send results based on the same software available to all Habana customers, but Nvidia uses optimizations not available in the software available to customers. It has been with.

This means that there is a big difference in performance when not optimized. In Habana’s own tests using public repositories on Azure instances, Habana measured that Gaudi2 is at least twice as fast as the A100 on both the ResNet-50 and BERT. Habana claims that these results better represent the out-of-the-box performance that customers see with published software.

MLPerf results show that Gaudi2 was able to train ResNet-50 in 36% less time than Nvidia’s submission. This corresponds to 56% higher performance. However, keep in mind that the results of MLPerf, a deep learning startup MosaicML using PyTorch, are slower than Gaudi2, but provide 23.8 minutes of training time, which exceeds Nvidia’s own submissions. On the other hand, further optimization of the software may reduce Gaudi2’s time for future submissions.

At BERT, the win was small and Gaudi 2 was 7% shorter than the A100. Compared to Gaudi, Gaudi2 was 3x and 4.7x faster on the ResNet-50 and BERT, respectively. The results for all accelerators are based on 8 card servers. Habana also showed results for a 256-core system. This provides about 25 times higher performance compared to the 32x theoretical scaling limit, demonstrating that performance is maintained in scale-out configurations where these chips are frequently deployed.

What’s next

The theory of most AI startups was that you could beat Nvidia by throwing away all your GPUs and focusing solely on AI hardware. Although it took only a few days to submit the results since its official launch, Habana’s Gaudi 2 uses ready-to-use hardware and off-the-shelf software to build the Nvidia A100 with 7nm process technology. I defeated it. Habana further argues that the performance difference in non-optimized code other than MLPerf can more than double. Habana is likely to set the price of Gaudi 2 lower than Nviida’s A100, and each Gaudi chip also has 24 integrated 100G Ethernet ports, so total cost of ownership, as Habana and AWS have already claimed. The difference between can be even greater. Gaudi generation.

Habana may have won the crown of performance in this round, but Nvidia has already announced the next-generation H100, which will be available later this year. Habana hasn’t announced a cloud instance of Gaudi2 yet.