Overview
The purpose of this blog is to provide valuable performance information for BWA-GATK pipeline benchmark with Dell EMC Ready Solutions for HPC BeeGFS Storage. Unfortunately, we were not able to setup enough compute nodes and BeeGFS storage large enough to compare to the previous performance results published for a Lustre storage. However, the results will be helpful to estimate the amount of computational resource for a given variant calling workload.
The test cluster configurations are summarized in Table 1.
Table 1 Tested compute node configuration
Dell EMC PowerEdge C6420 |
|
CPU |
2x Xeon® Gold 6248 20 cores 2.5 GHz (Cascade Lake) |
RAM |
12x 16GB at 2933 MTps |
OS |
Red Hat Enterprise Linux Server release 7.4 (Maipo) |
Interconnect |
Mellanox EDR InfiniBand |
BIOS System Profile |
Performance Optimized |
Logical Processor |
Disabled |
Virtualization Technology |
Disabled |
BWA |
0.7.15-r1140 |
Sambamba |
0.7.0 |
Samtools |
1.6 |
GATK |
3.6-0-g89b7209 |
The tested compute nodes were connected to the BeeGFS storage via Mellanox EDR InfiniBand switches. The BeeGFS storage is connected to a bridge EDR switch, and this bridge is connected to additional EDR switch where all compute nodes are communicating. The summary configuration of the storage is listed in Table 2.
Table 2 BeeGFS solution hardware and software specifications
Specification |
|
Management server |
1 x Dell EMC PowerEdge R640 |
MDS |
2 x Dell EMC PowerEdge R740 |
Storage servers |
2 x Dell EMC PowerEdge R740 |
Processors |
Management server: Dual Intel Xeon Gold 5218 MDS and SS servers: Dual Intel Xeon Gold 6230 |
Memory |
Management server: 12 x 8 GB 2666 MT/s DDR4 RDIMMs MDS and SS servers: 12 x 32 GB 2933 MT/s DDR4 RDIMMs |
Local disks and RAID controller |
Management server: PERC H740P Integrated RAID, 8GB NV cache, 6x 300GB 15K SAS hard drives (HDDs) configured in RAID10 MDS and SS servers: PERC H330+ Integrated RAID, 2x 300GB 15K SAS HDDs configured in RAID1 for OS |
InfiniBand HCA |
Mellanox ConnectX-6 HDR100 InfiniBand adapter |
External storage controllers |
On each MDS: 2 x Dell 12 Gb/s SAS HBAs On each SS: 4 x Dell 12 Gb/s SAS HBAs |
Object storage enclosures |
4 x Dell EMC PowerVault ME4084 fully populated with a total of 336 drives |
Metadata storage enclosure |
1 x Dell EMC PowerVault ME4024 with 24 SSDs |
RAID controllers |
Duplex RAID controllers in the ME4084 and ME4024 enclosures |
HDDs |
On each ME4084 Enclosure: 84 x 8 TB 3.5 in. 7.2 K RPM NL SAS3 ME4024 Enclosure: 24 x 960 GB SAS3 SSDs |
Operating system |
CentOS Linux release 8.1.1911 (Core) |
Kernel version |
4.18.0-147.5.1.el8_1.x86_64 |
Mellanox OFED version |
4.7-3.2.9.0 |
BeeGFS file system version |
7.2 (beta2) |
The test data was chosen from one of Illumina’s Platinum Genomes. ERR194161 was processed with Illumina HiSeq 2000 submitted by Illumina and can be obtained from EMBL-EBI. The DNA identifier for this individual is NA12878. The description of the data from the linked website shows that this sample has a >30x depth of coverage, and it actually reaches to ~53x.
Performance Evaluation
Multiple Sample/Multiple Nodes Performance
A typical way of running NGS pipeline is to process multiple samples on a compute node and use multiple compute nodes to maximize the throughput. The number of compute nodes used for the tests was eight C6420 compute nodes, and the number of samples per node was seven samples. Hence, up-to 56 samples are processed concurrently to estimate the maximum number of genomes per day without a job failure.
As shown in Figure 1, single C6420 compute node can process 3.69 of 50x whole human genomes per day when 7 samples are processed together. For each sample, 5 cores and 20 GB memory are allocated.
Figure 1 Throughput tests with up-to 8x C6420s with BeeGFS
56 of 50x whole human genomes can be processed with 8 of C6420 compute nodes in ~54 hours. In other words, the performance of the test configuration summarizes as 25.11 genomes per day for whole human genome with 50x depth of coverage.
Conclusion
As the data size of WGS has been growing constantly. The current average size of WGS is about 55x. This is 5 times larger than a typical WGS 4 years ago when we started to benchmark BWA-GATK pipeline. The increasing data size does not strain storage side capacity since most applications in the pipeline are also bounded by CPU clock speed. Hence, the pipeline runs longer with larger data size rather than generating heavier IOs.
However, more temporary files are generated during the process due to the larger data needs to be parallelized, and this increased number of temporary files opened at the same time exhausts the open file limit in a Linux operating system. One of the applications silently fails to complete by hitting the limit of the number of open files. A simple solution is to increase the limit to >150K.
The results in Figure 1 shows that the throughput tests did not hit the maximum capacity of the system. Since there was not any sign of significant slowdown by adding more samples, it must be possible to process more than 7 samples if compute nodes are setup with larger memory. Overall, the BeeGFS storage is a suitable scratch storage for NGS data processing.