Table 1 Test configuration for variant calling and De Novo assembly
|
Dell PowerEdge R640 |
Dell PowerEdge R940 |
|||||||
Skylake |
Cascade Lake |
Skylake |
Cascade Lake |
||||||
CPU |
2x 6154 |
2x 6148 |
2x 6152 |
2x 6138 |
2x 6248 |
2x 6252 |
2x 6230 |
4x 8168 |
4x 8280M |
Base Frequency (GHz) |
3.0 |
2.4 |
2.1 |
2.0 |
2.5 |
2.1 |
2.1 |
2.7 |
2.7 |
Number of Cores |
18 |
20 |
22 |
20 |
20 |
24 |
20 |
24 |
28 |
TDP (W) |
200 |
150 |
140 |
140 |
150 |
125 |
125 |
205 |
205 |
Memory |
24x 16GB DDR4-2666MHz, 2 DPC |
12x 32GB DDR4-2933MHz, 1 DPC |
48x 32GB DDR4-2666MHz, 2 DPC |
24x 64GB DDR4-2933MHz, 1 DPC |
|||||
Storage |
10x 1.2TB SAS 12 Gbps, 10K in RAID 0 |
18x 1.2TB SAS 12 Gbps, 10K in RAID 0 |
|||||||
System Bios |
2.1.3 |
||||||||
Kernel |
3.10.0-957.el7.x86_64 |
||||||||
OS |
Red Hat Enterprise Linux Server release 7.6 (Maipo) |
||||||||
Sequence Reads |
ERR194161, 50x Whole Human Genome for Variant Calling and ERR318658, 3.2 Billion Reads of Whole Human Genome for De Novo Assembly |
Figure 1 Runtimes of each step in Variant Calling pipeline
Cascade Lake 6248 outperformed in most steps and in the best overall runtime, but it performed poorly at the "Mark Duplicates" step, running 27% slower than Cascade Lake 6252. It is unclear that why 6248 performs poorly for this step although repeated tests show consistent results. With this inconsistent behavior over the different steps, considering overall performance makes better sense when selecting a proper CPU for the workflow.
Table 2 Total runtime comparisons among Skylake vs Cascade Lake CPUs
CPU |
Price |
Spec |
Total BWA-GATK Runtime (hours) |
|
Skylake |
6148 |
$3,072.00 - $3078.00 |
2.4 GHz, 20 cores, 150W |
24.26 |
6154 |
$3,543.00 |
3.0 GHz, 18 cores, 200W |
23.47 |
|
6152 |
$3,655.00 - $3661.00 |
2.1 GHz, 22 cores, 140W |
24.58 |
|
6138 |
$2,612.00 - $2618.00 |
2.0 GHz, 20 cores, 125W |
24.83 |
|
Cascade Lake |
6248 |
$3,072.00 - $3,078.00 |
2.5 GHz, 20 cores, 150W |
23.36 |
6252 |
$3,655.00 - $3,662.00 |
2.1 GHz, 24 cores, 150W |
23.82 |
|
6230 |
$1,894.00 - $1,900.00 |
2.1 GHz, 20 cores, 125W |
23.68 |
Although the best overall performance can be achieved with Cascade Lake 6248, Cascade Lake 6230 is not a bad choice for customers with limited power. Since the results shown here are based on a single sample test, it is hard to conclude if Cascade Lake 6230 and 6248 are better than Cascade Lake 6252 without the results of throughput tests. However, in consideration of throughput, Cascade Lake 6252 could outperform on throughput tests due to the higher core counts. It can accommodate more samples to process simultaneously. Nonetheless, Cascade Lake 6230 could be the most cost-effective choice among the tested CPUs.
For De Novo Assembly, Skylake 8168 and Cascade Lake 8280M are compared with the same amount of system memory, 1.5TB in R940. The main reason Cascade Lake 8280M was chosen is for higher its core counts and because it supports more memory which is beneficial be the data size for De Novo assembly continues to grow larger over the time.
The maximum performance gain by upgrading from Skylake 8168 to Cascade Lake 8280M is roughly 1% as shown in 92 cores of Skylake 8168 versus 108 cores of Cascade Lake 8280M comparisons from Figure 2. For the test, one core per CPU was n left for OS and other housekeeping use. Although the results show that Cascade Lake 8280M is slower by 2% on average with various number of cores used, the comparisons between 92 cores of 8168 and 108 cores of 8280M confirmed that Cascade Lake 8280M performs slightly better than Skylake 8168.
SOAPdenovo2 seems to be memory bandwidth bounded. The peak memory consumption is constantly rising as more cores are used for a process with 1 DPC configuration on Cascade Lake CPU while the peak memory consumption is declining with 2 DPC configuration on Skylake CPU. As shown Figure 3 in our previously published blog, memory bandwidth can differ by 11% between 1 DPC and 2 DPC configuration with the same type of dual ranked DIMMs. To make a better conclusion, further tests are required with 2 DPC configuration (DDR4-2666) on Cascade Lake 8280M CPU.
Cascade 8280M performs better across the tests with various number of cores, and 5% better performance is achievable in CPU vs CPU comparison (comparison between 92-core 8168 and 108-core 8280M) as shown in Figure 3. The patterns of peak memory consumption are nearly similar between two CPUs; however, Cascade Lake 8280M with 1 DPC configuration shows higher memory consumptions than Skylake 8168 with 2 DPC configuration. Although memory bandwidth does not seem to be as critical as we can see from SOAPdenovo2 tests, 2 DPC configuration with DDR4-2666MHz can be a better configuration for De Novo Assembly.
Overall, Cascade Lake CPUs tested here do not perform superior over Skylake CPUs for Genomics workloads such as Variant Calling and De Novo Assembly. Similar performance was somewhat expected since the Cascade Lake CPU is based on the Skylake CPU and aims to improve supportive functionality rather than improving pure performance. However, Cascade Lake provides more choices compared to Skylake in terms of lower TDP and higher core count for Variant Calling kinds of workloads. It is notable that 1 DPC configuration with DDR4 2933 MHz DIMMS does not improve performance for SOAPdenovo2. For De Novo Assembly applications, larger memory bandwidth seems to be better. There is no benefit from upgrading memory to DDR4 2933MHz in 1 DPC configuration for Cascade Lake CPUs. It is recommended to setup 2 DPC configuration with DDR4 2666MHz, especially for De Novo assembly applications.