Inspur Blog

Inspur AI server sets 18 performance records in the world’s authoritative MLPerf benchmark test

On October 21st, the world’s highly anticipated and authoritative AI benchmark test MLPerf announced this year’s inference test list. Inspur AI server NF5488A5 set 18 performance records in one fell swoop, far ahead of other manufacturers’ products in data center AI inference performance.

MLPerf is currently the world’s most influential AI computing benchmark evaluation organization, jointly established by Turing Award winner David Patterson with Google, Stanford, Harvard University and other institutions. It conducts annual global AI training and inference performing tests annually, and the MLPerf AI inference benchmark test in particular evaluates the performance of AI computing products from 23 companies and organizations around the world in data center and edge scenarios.

Inspur NF5488A5 leads the pack in data center AI performance

Data center AI performance was the most highly anticipated test criteria this year, and all participating organizations submitted 507 performance test data. Inspur NF5488A5 lead the category by setting 13 performance records in 22 data center competitions, while NVIDIA DGX achieved 5 data center performance records. In the previous MLPerf training list, the NF5488A5 also set a performance record in the core Resnet50 training task, and its stand-alone performance topped the list.

3x performance improvement highlights the strengths of full-stack AI capabilities

Inspur AI server NF5488A5 performed well in both Open and Closed ResNet50 benchmark performance tests, showing an improvement of 300% over the peak server performance from 2019’s MLPerf inference list.

NF5488A5 is one of Inspur’s new generation of AI servers and is the only AI server in the MLPerf global competition that can support 8 ampere architecture A100 chips in 4U to achieve NVLink high-speed interconnection. Inspur NF5488A5 adopts ultra-low latency design in the system topology to support optimization of PCIe 4.0, and the high-frequency communication unit adopts the topology to maximize the communication performance between the processor and the AI chip. In addition, NUMA nodes are configured to ensure the optimal communication performance between each processor and its directly connected GPU, and to minimize communication delays. Its system structure design is also optimized to ensure stable operation in high temperature environments.

The NF5488A5 owes its outstanding results to Inspur’s AI full-stack software and hardware optimization capabilities. At the hardware level, in-depth calibration of the CPU and GPU performance and connections made the system optimal for AI inference; at the software level, the GPU topology combined with round-robin scheduling optimization of multi-GPUs enables approximately linear expansion; at the algorithmic level, the Tensor Core GPU’s capabilities enables the channel compression algorithm to maximally the model, improving the performance nearly twofold without loss of accuracy.

More detailed results can be found here at the MLPerf official site.

Leave a Reply

Your email address will not be published.