メイン コンテンツに進む
  • すばやく簡単にご注文が可能
  • 注文内容の表示、配送状況をトラック
  • 会員限定の特典や割引のご利用
  • 製品リストの作成とアクセスが可能

The Performance study with Cascade Lake for Genomics Applications

概要: HPC High Performance Computing, HPC & AI Innovation Lab, Genomics, Variant Calling, De Novo Assembly, Next Generation Sequencing, BWA-GATK, SOAPdenovo2, SPAdes, Cascade Lake

この記事は次に適用されます: この記事は次には適用されません: この記事は、特定の製品に関連付けられていません。 すべての製品パージョンがこの記事に記載されているわけではありません。

現象

Article written by Kihoon Yoon of HPC and AI Innovation Lab in May 2019

解決方法


Variant calling and De novo assembly


Overview


Second Generation Intel® Xeon® Scalable processors is a successor to Skylake and offers up to 56 cores with a single processor (Cascade Lake AP 9282). In addition to Intel offering more cores, there’s Optane support, faster DRAM (DDR4-2933 in 1 DPC configuration), and more DRAM configurations (1TB, 2TB, and 4TB). It is clear that consumers are generally expecting more performance, better efficiency, and lower power from a newer processor. However, some customers look for the improvements which are not so obvious such as support for new instructions, layered ecosystem optimizations, support for new technology, or a new product direction. Cascade Lake builds on a foundation of Skylake focusing on the secondary characteristics, and the improvements are not so obvious.
Typically, applications in Next Generation Sequencing (NGS) data analysis are open-source and will not be updated as fast as the new technology emerges. This means that the improvements coming with Cascade Lake are less likely to impact the performances of NGS applications.
This blog illustrates how Cascade Lake CPUs behave on two different genomics workloads, Variant Calling and De Novo assembly.
The detailed test configurations for variant calling and De Novo assembly are listed in Table 1.

 

Table 1 Test configuration for variant calling and De Novo assembly

 

Dell PowerEdge R640
Variant Calling

Dell PowerEdge R940
De Novo Assembly

Skylake

Cascade Lake

Skylake

Cascade Lake

CPU

2x 6154

2x 6148

2x 6152

2x 6138

2x 6248

2x 6252

2x 6230

4x 8168

4x 8280M

Base Frequency (GHz)

3.0

2.4

2.1

2.0

2.5

2.1

2.1

2.7

2.7

Number of Cores

18

20

22

20

20

24

20

24

28

TDP (W)

200

150

140

140

150

125

125

205

205

Memory

24x 16GB DDR4-2666MHz, 2 DPC

12x 32GB DDR4-2933MHz, 1 DPC

48x 32GB DDR4-2666MHz, 2 DPC

24x 64GB DDR4-2933MHz, 1 DPC

Storage

10x 1.2TB SAS 12 Gbps, 10K in RAID 0

18x 1.2TB SAS 12 Gbps, 10K in RAID 0

System Bios

2.1.3

Kernel

3.10.0-957.el7.x86_64

OS

Red Hat Enterprise Linux Server release 7.6 (Maipo)

Sequence Reads

ERR194161, 50x Whole Human Genome for Variant Calling and ERR318658, 3.2 Billion Reads of Whole Human Genome for De Novo Assembly

 

Variant Calling

 
BWA-GATK Pipeline
As shown in Figure 1, each step behaves quite differently on each CPU that was tested, and the performance differences among different steps with the tested CPUs ranges from 0.61% to 46.34%. However, the differences in overall runtime are not quite notable (Table 2).

SLN317154_en_US__1Figure1-ky

Figure 1 Runtimes of each step in Variant Calling pipeline

Cascade Lake 6248 outperformed in most steps and in the best overall runtime, but it performed poorly at the "Mark Duplicates" step, running 27% slower than Cascade Lake 6252. It is unclear that why 6248 performs poorly for this step although repeated tests show consistent results. With this inconsistent behavior over the different steps, considering overall performance makes better sense when selecting a proper CPU for the workflow.

Table 2 Total runtime comparisons among Skylake vs Cascade Lake CPUs

CPU

Price

Spec

Total BWA-GATK Runtime (hours)

Skylake

6148

$3,072.00 - $3078.00

2.4 GHz, 20 cores, 150W

24.26

6154

$3,543.00

3.0 GHz, 18 cores, 200W

23.47

6152

$3,655.00 - $3661.00

2.1 GHz, 22 cores, 140W

24.58

6138

$2,612.00 - $2618.00

2.0 GHz, 20 cores, 125W

24.83

Cascade Lake

6248

$3,072.00 - $3,078.00

2.5 GHz, 20 cores, 150W

23.36

6252

$3,655.00 - $3,662.00

2.1 GHz, 24 cores, 150W

23.82

6230

$1,894.00 - $1,900.00

2.1 GHz, 20 cores, 125W

23.68

Although the best overall performance can be achieved with Cascade Lake 6248, Cascade Lake 6230 is not a bad choice for customers with limited power. Since the results shown here are based on a single sample test, it is hard to conclude if Cascade Lake 6230 and 6248 are better than Cascade Lake 6252 without the results of throughput tests. However, in consideration of throughput, Cascade Lake 6252 could outperform on throughput tests due to the higher core counts. It can accommodate more samples to process simultaneously. Nonetheless, Cascade Lake 6230 could be the most cost-effective choice among the tested CPUs.
 


De Novo Assembly


For De Novo Assembly, Skylake 8168 and Cascade Lake 8280M are compared with the same amount of system memory, 1.5TB in R940. The main reason Cascade Lake 8280M was chosen is for higher its core counts and because it supports more memory which is beneficial be the data size for De Novo assembly continues to grow larger over the time.

SOAPdenovo2

The maximum performance gain by upgrading from Skylake 8168 to Cascade Lake 8280M is roughly 1% as shown in 92 cores of Skylake 8168 versus 108 cores of Cascade Lake 8280M comparisons from Figure 2. For the test, one core per CPU was n left for OS and other housekeeping use.  Although the results show that Cascade Lake 8280M is slower by 2% on average with various number of cores used, the comparisons between 92 cores of 8168 and 108 cores of 8280M confirmed that Cascade Lake 8280M performs slightly better than Skylake 8168.

SLN317154_en_US__2Figure2-ky

Figure 2 Runtimes and peak memory consumption plots for SOAPdenovo2 with various number of cores

SOAPdenovo2 seems to be memory bandwidth bounded. The peak memory consumption is constantly rising as more cores are used for a process with 1 DPC configuration on Cascade Lake CPU while the peak memory consumption is declining with 2 DPC configuration on Skylake CPU. As shown Figure 3 in our previously published blog, memory bandwidth can differ by 11% between 1 DPC and 2 DPC configuration with the same type of dual ranked DIMMs. To make a better conclusion, further tests are required with 2 DPC configuration (DDR4-2666) on Cascade Lake 8280M CPU.

SPAdes

Cascade 8280M performs better across the tests with various number of cores, and 5% better performance is achievable in CPU vs CPU comparison (comparison between 92-core 8168 and 108-core 8280M) as shown in Figure 3. The patterns of peak memory consumption are nearly similar between two CPUs; however, Cascade Lake 8280M with 1 DPC configuration shows higher memory consumptions than Skylake 8168 with 2 DPC configuration. Although memory bandwidth does not seem to be as critical as we can see from SOAPdenovo2 tests, 2 DPC configuration with DDR4-2666MHz can be a better configuration for De Novo Assembly.

SLN317154_en_US__3Figure3-ky

Figure 3 Runtimes and peak memory consumption plots for SPAdes with various number of cores

 


Conclusion

Overall, Cascade Lake CPUs tested here do not perform superior over Skylake CPUs for Genomics workloads such as Variant Calling and De Novo Assembly. Similar performance was somewhat expected since the Cascade Lake CPU is based on the Skylake CPU and aims to improve supportive functionality rather than improving pure performance. However, Cascade Lake provides more choices compared to Skylake in terms of lower TDP and higher core count for Variant Calling kinds of workloads. It is notable that 1 DPC configuration with DDR4 2933 MHz DIMMS does not improve performance for SOAPdenovo2. For De Novo Assembly applications, larger memory bandwidth seems to be better. There is no benefit from upgrading memory to DDR4 2933MHz in 1 DPC configuration for Cascade Lake CPUs. It is recommended to setup 2 DPC configuration with DDR4 2666MHz, especially for De Novo assembly applications.

 

文書のプロパティ
文書番号: 000146815
文書の種類: Solution
最終更新: 10 4月 2021
バージョン:  3
質問に対する他のDellユーザーからの回答を見つける
サポート サービス
お使いのデバイスがサポート サービスの対象かどうかを確認してください。