Savitha Pareek, HPC and AI Innovation Lab, November 2019
AMD recently announced its 2nd generation EPYC processors (codenamed "ROME") which support up to 64 cores, and DellEMC has just released High Performance Computing (HPC) servers designed from the ground up to take full advantage of these new processors. We have been evaluating applications on these servers in our HPC and AI innovation Labs, including the Molecular Dynamics Application – GROningen MAchine for Chemical Simulations (GROMACS) application and report our findings for GROMACS in this blog.
GROMACS is a free and open-source parallel molecular dynamics package designed for simulations of biochemical molecules such as proteins, lipids, and nucleic acids. It is used by a wide variety of researchers, particularly for biomolecular and chemistry simulations. It supports all the usual algorithms expected from modern molecular dynamics implementation. It is open-source software with the latest versions available under the GNU Lesser General Public License (LGPL). The code is mainly written in C and makes use of both MPI and OpenMP parallelism.
This blog describes the performance of GROMACS on two-socket PowerEdge servers using the latest addition to the AMD® EPYC Rome processors listed in Table 1(a). For this study, we carried out all benchmarks on a single server equipped with two processors, running only a single job at a time on the server. We compared performance improvements on the 2nd generation AMD EPYC Rome (7xx2 series) based PowerEdge servers with the previous generation DellEMC PowerEdge servers equipped with the 1st generation AMD EPYC Naples (7xx1 series) processors listed in table 1(b).
Table 1(a)-ROME CPU models evaluated for single node study
CPU |
Cores/Socket |
Config |
Base frequency |
TDP |
7742 |
64c |
4c per CCX |
2.25 GHz |
225W |
7702 |
64c |
4c per CCX |
2.0 GHz |
200W |
7502 |
32c |
4c per CCX |
2.5 GHz |
180W |
7452 |
32c |
4c per CCX |
2.35 GHz |
155W |
7402 |
24c |
3c per CCX |
2.8 GHz |
180W |
Table 1(b)- Naples CPU model evaluated for comparison
CPU |
Cores/Socket |
Config |
Base Clock |
TDP |
7601 |
32c |
4c per CCX |
2.2 GHz |
180W |
Server configurations are included in Table 2(a), with the list of the benchmark data sets given in Table 2(b).
Table 2(a)-Testbed
Component |
ROME Platform |
NAPLES Platform |
Processor |
As shown in Table.1a |
As shown in Table.1b |
Memory |
256 GB, 16x16GB 3200 MT/s DDR4 |
256 GB, 16x16GB 2400 MT/s DDR4 |
Operating System |
Red Hat Enterprise Linux 7.6 |
Red Hat Enterprise Linux 7.5 |
Kernel |
3.10.0.957.27.2.e17.x86_64 |
3.10.0-862.el7.x86_64 |
Application |
GROMACS – 2019.2 |
Table 2(b)- Benchmark datasets used for GROMACS performance evaluation on ROME
Dataset |
Details |
1536K and 3072K |
|
1400K and 3000K |
|
Prace – Lignocellulose |
3M |
For this single node study, we compiled GROMACS version 2019.3, with the latest OPENMPI and FFTW, testing several different compilers, associated high-level compiler options and electrostatic field load balancing (i.e. PME, etc). We carried out two studies for our blog: our first study focused on the performance of the Rome based systems with hyperthreading enabled vs hyperthreading disabled; and our second study investigated the performance advantage obtained with Rome over the Naples system. For our Hyperthreading study, our Hyperthreading results were obtained by enabling Hyperthreading through the BIOS and adjusting the benchmarking parameters to run each benchmark with twice as many threads as the non-Hyperthreaded counterpart. As an example, for the 24-core based 7402 benchmarks, the non-Hyperthreaded single node used 48 threads (dual-processor server) and the Hyperthreaded results used 96 threads. Our results are presented in Figure 1.
Figure 1. GROMACS performance evaluation with hyper-threading disabled vs hyper-threading enabled on ROME
For these benchmarks, the electrostatic field used was Particle Mesh Ewald (PME) for Water-1536K, Water-3072K, and the HECBIOSIM datasets (1.4M and 3M). We used the reaction field (RF) electrostatic force for the Lignocellulose_3M case.
While the performance gains observed (higher is better) with enabling Hyperthreading were varied both with respect to the different processors and data sets, they were consistently better than the non-Hyperthreaded baselines (1.0). GROMACS shows a clear performance boost with hyperthreading enabled across the ROME SKUs.
In the second study, we have compared the Rome based servers to the Naples based server, using Hyperthreading enabled for all tests based on the results from the first study. We have measured the relative performance w.r.t to Naples 7601 as baseline (1.0) with the other ROME SKUs. These results are shown in Figure 2.
Figure 2. Performance evaluation across different AMD EPYC Generation Processors
Comparing the 32-core based servers (7551,7601,7452,7502), we observed a generational performance improvement of about 50%. The 24-core Rome based 7402, while lacking as many cores as the Naples systems, still managed to outperform the Naples based systems by about 20-40%, depending on the respective benchmark. The 64-core based (7702,7742) systems displayed close to a 250% increase in overall performance over the 32-core based Naples server. Overall, the Rome results, particularly with Hyperthreading enabled demonstrated a substantial performance improvement for GROMACS over Naples.
Conclusion
Dell EMC PowerEdge servers equipped with the AMD ROME processors offer significant single node performance gains over previous generation Naples counterparts for applications such as GROMACS. We found a strong positive correlation with overall system performance and processor core count and a weak correlation with processor frequency. The 64-core Rome processors delivered a sizable performance advantage over the 24-core and 32-core processors. We are in the processing of exploring how these single node performance gains (with and without Hyperthreading) will translate into multi-node performance gains for Molecular Dynamic applications on our new Minerva Cluster at the HPC and AI Innovation Lab. Watch this blog site for updates.