Inspur AI Servers Demonstrate Leading Performance in the Latest MLPerf™ Training v1.0 Benchmarks
Inspur improves its performance over previous MLPerf™ Training Benchmarks, setting four single node performance records in image Classification, NLP, Object Detection (light weight) and Recommendation.
SAN JOSE, Calif., July 9, 2021 – Recently, MLCommons™, a well-known open engineering consortium, released new results for MLPerf™ Training v1.0, the organization’s machine learning training performance benchmark suite. Inspur topped the list in four out of eight tasks in the Single Node Closed division of the MLPerf™ Training v1.0.
|Inspur’s Single Node Performance Results
MLPerf™ Training v1.0
MLPerf™ is the leading industry benchmark for AI performance, developed in 2018. Inspur is a founding member of MLCommons™, alongside more than 50 other leading organizations and companies from across the artificial intelligence field. MLPerf™ Training v1.0 measures the time it takes to train machine learning models to a standard quality target in a variety of tasks including Image Classification (ResNet), Image Segmentation (U-Net3D), Object Detection (Light weight, SSD), Object Detection (Heavy weight, Mask R-CNN), Speech Recognition (RNN-T), NLP (BERT), Recommendation (DLRM) and Reinforcement Learning (MiniGo), each with both Closed and Open performance divisions.
Inspur ranked first in the training tasks of Image Classification (ResNet), NLP (BERT), Object Detection (SSD) and Recommendation (DLRM) in the Closed division, with Inspur NF5688M6 achieving the best single node performance in ResNet, DLRM, and SSD, and NF5488A5 in BERT.
With its ability to optimize both software and hardware, Inspur dramatically improved the single node performance of the MLPerf™ training benchmark. Compared with its performance in the MLPerf™ Training v0.7 benchmark test in 2020, Inspur set a record in the single node performance of Image Classification, NLP, Object Detection and Recommendation by shortening the training time of each model by 17.95%, 56.85%, 18.61% and 42.64% respectively, clearly demonstrating the value of using top-level AI servers to improve the efficiency of AI model training.
Inspur’s success in the MLPerf™ benchmark lies in the strength of the system design and full-stack optimization as part of innovation in the AI computing system. In terms of hardware, Inspur made comprehensive optimizations and in-depth calibrations to the data transmission between NUMA nodes and GPU to ensure non-blocking I/O in training. In addition, Inspur developed an advanced liquid cooled plate-based cooling system for the A100 GPU at 500W TDP (the highest power in the industry) to ensure that the GPU can function properly at full capacity, thus significantly increasing the performance of the AI computing system.
In keeping with the philosophy of MLCommons™, Inspur contributed the optimized solutions explored in the benchmark to the community to accelerate machine learning innovation and AI technology.
During the MLPerf™ Training v0.7 benchmark test in 2020, Inspur made an optimization to boost the convergence of ResNet: on ImageNet, the solution achieved 75.9% of the targeted accuracy with only 85% iterations, improving the training efficiency by 15%. Since then, the optimization has been adopted by community members and widely used in the MLPerf™ Training v1.0 benchmark – an important reason for the significant improvement in ResNet this year.
Since 2020, Inspur has participated in four MLPerf™ benchmarks – training v0.7, Inference v0.7, Inference v1.0 and training v1.0. In this year’s MLPerf™ Inference v1.0, Inspur set 11 records in the data center Closed division and 7 records in the edge Closed division, becoming the company with the highest number of top results.
As a leading AI computing company, Inspur is committed to the R&D and innovation of AI computing, resource-based and algorithm platforms. It also works with other leading AI enterprises to promote the industrialization of AI and the development of AI-driven industries through its “Meta-Brain” technology ecosystem.
To view the complete results of MLPerf™ Training v1.0, please visit: https://mlcommons.org/en/training-normal-10/