WRF Performance on AMD Rome Platform

Table of Contents

Detailed Article

Symptoms

Resolution

Affected Products

Provide Feedback

This article applies to This article does not apply to

Check out resources for

Symptoms

Puneet Singh, HPC and AI Innovation Lab, November 2019

WRF – Weather Research and Forecasting model – is a numerical weather prediction system that performs well on the latest generation of the AMD® EPYC processor family codenamed Rome. With these performance improvements, we highlight the key value points for AMD Rome processors in Dell servers for achieving better speedups with HPC Workloads.
SLN319584_en_US__1image(13037)

Resolution

AMD® recently rolled out their 2nd Gen EPYC (7002 Series) processors. This blog describes the performance of the WRF (Weather Research and Forecasting) model on two socket DellEMC PowerEdge servers using the latest addition to the AMD® processor family, code-named "Rome". This is a follow up to our first blog in this series, where we introduced the processor architecture, key BIOS tuning options, and baseline microbenchmark performance.

The WRF model is a next-generation mesoscale numerical weather prediction system designed to serve both atmospheric research and operational forecasting needs. The model serves a wide range of metrological applications across scales from tens of meters to thousands of kilometers. WRF allows for atmospheric simulations based on real data (observations, analysis) or idealized conditions to be generated. We analyzed the performance improvement on the latest AMD EPYC Rome (7002 series processor) based servers when compared to the first-generation AMD EPYC Naples (7001 series processor) based servers using the datasets mentioned in Table 1. These tests were carried out on two socket Dell PowerEdge servers by setting the BIOS option to the HPC workload profile.

Table 1: Test bed hardware and software details –

SLN319584_en_US__2image(13023)

WRF was compiled with the dm + sm configuration. For this test all the available cores in the system were utilized with one job per system. To optimize performance we tried different process – thread combinations, tiling schemes (WRF_NUM_TILES) and Transparent Huge Pages options. We found that 1 process per CPU Complex (CCX) with THP disabled gave the best results. Tile sizes for the study, mentioned in Table 2, were selected based on our past experience with WRF along with a mix of hit and trial.

Table 2: Tile size details -

SLN319584_en_US__3image(13022)

Two case studies / datasets were used for analyzing WRF’s performance – Maria at 3km and Conus at 2.5 km grid resolution. The input data for the Maria case was prepared from the NCEP GFS Model Runs and the raw data can be downloaded from - https://rda.ucar.edu/datasets/ds084.1/. The dataset files (wrfbdy_01, wrfrst_01, wrfinput etc.) for the official Conus 2.5 km dataset are available at - http://www2.mmm.ucar.edu/wrf/WG2/benchv3/ for download.

SLN319584_en_US__4image(13037)
SLN319584_en_US__5image(13034)

During the full run of WRF, the time taken to compute each model time step (along with the time to write history and restart output files) gets logged in the rsl.error.000 file. The Mean Time Taken per time step (MTT/ts) is the pure computational metric and was calculated from the rsl.error.000 file. This blog uses mean time per time step as the performance metric, with performance relative to the 7551 CPU presented in Figure 1, a higher bar denotes better performance. The graph compares the different CPU models listed in Table 1.

SLN319584_en_US__6image(13020)

Figure1: Relative difference in the performance of WRF on Rome 7551 CPU model compared to CPU models mentioned in Table 1

Available memory capacity is not the constraint as the memory utilization did not exceed 30% for either dataset. With the increase in the number of cores, the computational part of WRF is able to use the computational resources available on C6525 with both Maria 3km and Conus 2.5km datasets.

WRF is unable to take performance advantage on CPU models having higher clock frequency and identical number of cores (and cache) on the Rome platform. 7742 and 7702 have 128 cores each, and the same amount of cache per core. The 7742’s frequency is 12.5% better than the 7702, but there is ~1.3-2.8% gain in the performance. 7502 and 7452 with 64 cores each had identical performance numbers (less than 1% difference in performance) despite 7502 having ~6.4% faster frequency. On Naples, WRF is ~3.2-3.4% faster on 7601 (having 10% faster frequency) when compared to 7551 on Conus and Maria datasets.

As per expectation, on the Rome platform, WRF simulates faster on 7742 and slower on 7402. Considering relative performance gain on 7402 and 7742 with 7601, the Rome platform delivers ~18–54% better performance than Naples for Conus and Maria datasets.

SLN319584_en_US__4image(13037)
SLN319584_en_US__8image(13035)

WRF multi-node studies will be carried out on the new Minerva Cluster at the HPC and AI Innovation Lab to test WRF scalability on Rome processors. Watch this blog site for updates.

Affected Products

High Performance Computing Solution Resources

WRF Performance on AMD Rome Platform

Symptoms

Resolution

Affected Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services

Welcome

Welcome to Dell

WRF Performance on AMD Rome Platform

Detailed Article

Symptoms

Resolution

Affected Products

Symptoms

Resolution

Affected Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services