メイン コンテンツに進む
  • すばやく簡単にご注文が可能
  • 注文内容の表示、配送状況をトラック
  • 会員限定の特典や割引のご利用
  • 製品リストの作成とアクセスが可能

HPC & AI Performance on DSS8440 with V100S GPUs

概要: GPU, V100S, V100, DSS8440, 8 GPUs, MLPerf, HPL, LAMMPS, Benchmark

この記事は次に適用されます:   この記事は次には適用されません: 

現象

Authors: Frank Han, Rengan Xu, Quy Ta
Dell EMC HPC & AI Innovation Lab, May 2020

Executive Summary

This blog presents the results of the study evaluating 8x V100S on DSS8440 for different HPC and deep learning applications including HPL, LAMMPS and MLPerf-v0.6 suite. In summary:

  • Applications limited by GPU bandwidth like LAMMPS can take advantage of the new V100S GPUs and will get boosted performance for both single and multiple GPUs.
  • Deep learning applications, like those tested in MLPerf, will get benefits from the higher boosted clock and higher bandwidth of V100S.
  • GPU compute-bound applications like the HPC benchmark HPL will get the same performance as V100-PCIe.

The rest of this blog lays out the details of this testing.  Note that in the future, the same applications will be run on DSS8440 with RTX GPUs (in place of the V100S), and other tests, like V100S performance on the AMD platform, will also be run.

解決方法

Overview of the Testbed

The Dell EMC DSS8440 server is an accelerator-optimized server, specifically designed for high-performance computing and deep learning workloads. The NVIDIA V100S is the latest member in the Tesla Volta series and it is a double-width 32G PCIe based GPU card. This blog will present the results of the study evaluating 8x V100S on DSS8440 for different HPC and deep learning applications including HPL, LAMMPS and MLPerf-v0.6 suite.

The hardware and software details of the DSS 8440 server tested and the comparison of V100S and V100-PCIe are listed in Table 1 and Table 2.

 

Table 1: The hardware and software details

SLN321304_en_US__1image(15660)

Table 2: V100S and V100-PCIe difference in specification
SLN321304_en_US__2image(15661)

 

HPC Application Performance

 

 SLN321304_en_US__3image(15658)

Figure 1: V100S and V100-PCIe HPL results on DSS8440

Figure 1 shows the HPL performance numbers. There is not much difference between V100S and V100-PCIe, because HPL is an extreme stress test application. There is little temperature room for the GPU boost feature, therefore the frequency of the GPUs fall back to the base clock rate very quickly. Because V100S and V100-PCIe have almost the same base clock rate, for GPU compute bounded applications like HPL, V100S delivers about the same level performance as V100-PCIe. 

SLN321304_en_US__4image(15659)

Figure 2: V100S and V100-PCIe LAMMPS results on DSS8440

Figure 2 has the timestep/s results of LAMMPS with Lennard Jones dataset. LAMMPS is an example of molecular dynamics code which is known to be a GPU bandwidth bounded application. V100S delivers 27% more performance than V100-PCIe in this testing. The speedup is contributed not only from the 15% higher boost frequency and 26% more bandwidth but also from the newer software version. V100-PCIe numbers were obtained using old KOKKOS package in LAMMPS 8Feb2019 version. However, the newer version 24Jan2020 had added support for using cuFFT on the GPU with KOKKOS. Most details can be found in this LAMMPS 24Jan2020 release note.

 

Deep Learning Application Performance

SLN321304_en_US__5image(15662)

Figure 3: V100S and V100-PCIe MLPerf results on DSS8440

MLPerf training closed division 0.6 version has 6 sub-tests covering wide deep learning domains including image classification (ResNet-50), object detection (Mask R-CNN and SSD), Translation (NMT and Transformer) and reinforcement learning (MiniGo). The comparison results of both GPU cards are presented in Figure 3. Around 1-5% performance gains were observed across the MLPerf suite for V100S, which is consistent with the 1-5% higher throughput in the result log files. The real-time output of GPU clock rate was monitored, and it was observed that V100S GPUs were running at 1-5% higher in all those tests, so the performance benefits came from the higher boosted frequency of V100S.

Conclusions and Future works

In this blog, HPC applications performance with HPL, LAMMPS, and deep learning performance with MLPerf were compared with V100S and V100-PCIe GPU cards on the same DSS8440 server. Application limited by GPU bandwidth like LAMMPS can take advantage of the new V100S GPUs and will get boosted performance for both single and multiple GPUs. Deep learning applications tested in MLPerf also get benefits from the higher boosted clock and higher bandwidth of V100S. The GPU compute bounded HPC benchmark HPL gets the same performance as V100-PCIe. In the future, the same applications on DSS8440 will be run with RTX GPUs, and some other tests like V100S performance on the AMD platform will be explored.

対象製品

DSS 8440, High Performance Computing Solution Resources