Summary
As an important strategic partner of Intel, after Intel launched the 2nd Generation Intel Xeon Scalable processor with stronger computing performance and better user experience support, Inspur and Intel jointly developed artificial intelligence inference based on this processor Solutions that accelerate AI inference and are committed to helping industry and enterprise customers reduce deployment time and costs while ensuring their performance to meet customers’ growing needs. Intel Select Solutions for AI Inferencing provides customers with an efficient artificial intelligence inference algorithm solution based on the proven Intel architecture, which can be quickly launched and deployed, enabling you to innovate and get to market faster. In order to accelerate AI Inferencing and time to market, Intel Select Solutions for AI Inferencing also combines several Intel and third-party software and hardware technologies.
Background
In the context of artificial intelligence and machine learning, training is the stage where neural networks try to learn from data. Reasoning puts learning into practice, and trained models are used to infer and predict outcomes—classify, identify, and process new input data based on what you have learned. The solution is a deep learning inference solution for some of the fastest growing areas of artificial intelligence, such as video, natural language processing, and image processing. As a “turnkey platform” solution, with proven IA building blocks, partners can innovate and bring this integrated tool’s building block solution to market, making AI simple and efficient. Given that neural network models have been trained, model inference will be a challenge. This solution provides a starting point for customers to deploy efficient artificial intelligence inference algorithms. Use OpenVINO to accelerate inference, which shortens the time from enterprise data to strategic decisions, provides low latency and high-end to end throughput, and reduces enterprise costs.
Solution Introduction: NF5280M5 AI Server
Industry: Cross-industry solutions, mainly for retail, finance, medical and other fields.
Use cases: Natural language processing, image classification, object detection and object tracking.
Deployment: Edge and data center deployments.
To apply this Intel Select Solutions for AI Inferencing, the server vendor or data center solution provider must meet or exceed the defined minimum configuration components and reference minimum benchmark performance thresholds listed below.
Basic Configuration
Component Model | Description | Qty |
---|---|---|
NF5280M5 | 2U dual-socket rackmount server | 1 |
6240/6248 | Intel Xeon Gold 6240 CPU at 2.6 GHz / 18C / 36T, 6248 at 2.5 GHz / 20C / 40T | 2 |
192 GB with 12 x 16 GB or higher, 2933 MHz or higher, DDR4 ECC RDIMM | 12 | |
S4510 | Intel SSD D3-S4510 Series (240GB, 2.5in SATA 6Gb / s, 3D2, TLC)or Intel SSD D3-S4510 Series (240GB, M.2 80mm SATA 6Gb / s, 3D2, TLC) or higher |
1 |
P4610 | Intel SSD DC P4610 Series (2.5in PCIe 3.1 x4, 3D2, TLC) @ 3.2TB or higher |
1 |
25Gb Network | 56Gb InfiniBand / 25GbEthernet | 1 |
Upgrade Configuration
Component Model | Description | Qty |
---|---|---|
NF5280M5 | 2U dual-socket rackmount server | 1 |
8280 | Intel Xeon Platinum 8280 CPU at 2.7 GHz / 28C / 56T | 2 |
384 GB with 12 x 32 GB or higher, 2933 MHz or higher, DDR4 ECC RDIMM | 12 | |
S4510 | Intel SSD D3-S4510 Series (240GB, 2.5in SATA 6Gb / s, 3D2, TLC)or Intel SSD D3-S4510 Series (240GB, M.2 80mm SATA 6Gb / s, 3D2, TLC) or higher |
1 |
P4610 | Intel SSD DC P4610 Series (2.5in PCIe 3.1 x4, 3D2, TLC) @ 3.2TB or higher |
1 |
25Gb Network | 56Gb InfiniBand / 25GbEthernet | 1 |
Single Node Spec
CPU | 2 x Intel Xeon Scalable Gold 6230/6240/6248/8280 |
---|---|
Memory (min) | 512GB (12 x 16 GB 2666MHz DDR4 ECC RDIMM) |
Boot Drive | 1 x Intel SSD DC S4500 Series >= 240GB (P4101) |
Storage | Intel SSD D3-S4510 Series(240G, 2.5in SATA 6Gb/s, 3D2, TLC) or Intel SSDD3-S4510 Series(240G, M.280mm SATA 6Gb/s, 3D2, TLC) or higher |
Intel SSD DC P4610 Series(2.5in PCIe 3.1x 4, 3D2, TLC)@3.2 TB or higher | |
Data Network | InfiniBand (IB) 56Gb and Ethernet 25Gb |
Management Network | Integrated 1 GbE port 0/RMM port |
Software
Software | Required or Recommended | Version |
---|---|---|
Linux Distribution | Recommended | CentOS* Linux* release 7.6.1810 (Core) |
Intel Math Kernal Library (MKL) | Required | 2018 Update 3 |
Intel MKL-DNN | Required | 0.17 |
OpenVino Toolkit | Required | 2018 R5 |
OpenVino Model Server | Recommended | 0.41 |
TensorFlow | Required | 1.14 |
PyTorch | Recommended | 1.0.1 |
MXNet | Recommended | 1.3.1 |
Intel Python | Recommended | 2019 Update 1 |
Intel Perf & Monitoring Tools | Recommended | Cores per Node (VTune Platform Profiler) |
Customer Benefits
Test Results
In the test comparison between Intel Xeon Gold 6240 and Intel Xeon Gold 6130, the performance can be accelerated by 1.35-1.43 times.
- ResNet50 training performance reaches 1.35 times acceleration
- Inception_v3 training performance reaches 1.43 times acceleration
Inference performance (FP32) can achieve 1.3-3 times acceleration on Intel Xeon Gold 6240 and Intel Xeon Gold 6130, with the best performance of multiple streams (BS = 1)
- ReNnet50 multi-steam inference for best performance up to 3x acceleration
- Inception_v3 multi-steam best inference performance up to 2.6 times faster
- A3C multi-steam optimal performance can reach 1.32 times acceleration
Compared with the FP32 model, using VNNI to upgrade and optimize TensorFlow, the multi-steam inference performance of INT8 on the Intel Xeon Gold 6240 can be accelerated by 1.9-3.2 times.
- Under 40 parallel streams, RN50 int8 performance can reach 3.5 times acceleration
- Under 40 parallel streams, the performance of Inception int8 can be 3.2 times faster
- W & D Int8 performance can be accelerated up to 1.9 times under 40 parallel streams
- Under 20 parallel streams, SSD Mobilent int8 performance can be doubled
Compared with TensorFlow, OpenVINO can deeply improve INT8 performance and achieve multiple speedups
- Based on Intel Xeon Gold 6240, compared to TensorFlow, ResNet50 INT8 performance can achieve 1.3 times acceleration
- Based on Intel Xeon Gold 6240, compared to TensorFlow, Inception_v3 INT8 performance can achieve 1.4 times speedup
Intel, the Intel logo and Xeon are trademarks of Intel Corporation in the United States and/or other countries.