Article written by Ankit Sethia of Parabricks and Kihoon Yoon of HPC and AI Innovation Lab in October 2018
This blog post describes Parabricks NGS secondary analysis on a Dell PowerEdge server.
Advancements in Next Generation Sequencing (NGS) technologies have jump started the personalized medicine revolution where medical treatment can be customized based on a patient’s DNA. This is driving increased research and clinical applications. As a result, the number of human genomes sequenced is predicted to double every year and transform the diagnosis and treatment of diseases, leading to a disruptive change in modern medicine.
Parabricks brings high performance computing technologies that are tailored for NGS analyses and accelerates the standard NGS software from several days to approximately one hour. The accelerated software is a drop-in replacement of existing tools that does not sacrifice output accuracy or configurability. Parabricks provides 30-50 times faster secondary analysis of FASTQ files coming out of sequencer to variant call files (VCFs) for tertiary analysis. The standard pipeline shown below consists of three steps and are defined as the Genome Analysis Toolkit (GATK). Parabricks accelerates existing GATK 4 best practices to generate equivalent results as the baseline. The image below (Figure 1) shows the pipeline currently supported by Parabricks.
Figure 1 Parabricks GPU accelerated pipeline
The Fastq files that come out of the sequencer along with the reference genome are input to the GPU accelerated BWA-Mem alignment. The aligned output is then coordinate sorted, followed by marking the duplicates. This is the first output of the standard pipeline in binary alignment map (BAM) format. This BAM file is then used for base quality score recalibration (BQSR) followed by updating the base qualities of the BAM by using Apply BQSR. Finally, a variant caller is used depending on the task at hand. Parabricks has accelerated several variant callers: GATK Haplotypecaller, GATK Mutect2, and CNVKit; and, Google DeepVariant is in the development phase.
Dell Hardware Configuration
The PowerEdge C4140 Server is an accelerator optimized server with support for two Intel Xeon Scalable processors and four NVIDIA Tesla GPUs (PCIe or NVLink) in a 1U form factor. The tested server equipped with the PCIe version of GPUs (standard PCIe Gen3 connections between GPU to CPU) and configured with GPU configuration B (shown in
Figure 2 below) from the choices of four different Configurations: B, C, K, and G. The hardware and system software configurations are summarized below.
Table 1 Hardware Configuration
Server |
Dell EMC PowerEdge C4140 |
Processor |
Intel Xeon Gold 6148. 20 cores, 2.40 GHz |
Memory |
384 GB @ 2667 MTps |
GPU |
NVIDIA V100-16GB PCIe |
Storage |
1x Samsung Electronics Co Ltd NVMe SSD Controller 172Xa (rev 01), 1.2TB |
Power Supplies |
Dual 2000W |
Table 2 Software/Firmware Configuration
Component |
Version |
BIOS |
1.1.7 |
OS |
Red Hat Enterprise Linux 7.4 |
Kernel |
3.10.0-693.17.1.el7.x86_64 |
System Profile |
Performance optimized (Turbo enabled, C-States disabled, Power management set to Max Performance) |
CUDA Driver |
390.46 |
CUDA Toolkit |
9.1 |
Compilers |
gcc- 4.8.5 , OpenMPI – 1.10.2 |
Intel MKL |
From Intel Parallel Studio 2017 |
Figure 2 PowerEdge C4140 in Configuration B with 4x V100
Performance Evaluation
Secondary analysis of genomic data can on a c3.8xlarge AWS node, for a 30x WGS data can take upto 30-40 hours for running the pipeline shown before using HaplotypeCaller for variant calling. Below, the raw run times in minutes for the Parabricks software on a Dell EMC PowerEdge C4140 are presented for 3 DNA samples with different coverages (10x, 38x, 53x).
Table 3 Others include Co-ordinate sorting, marking duplicates, bqsr and applybqsr
Benchmark |
Coverage |
BWA-Mem |
Others* |
HaplotypeCaller |
Total |
ERR091571 |
10X |
16.5 |
6 |
7.5 |
30 |
SRR12837 |
38X |
61 |
14.5 |
14 |
89.5 |
ERR194161 |
53X |
89 |
23.5 |
20 |
132.5 |
Figure 3 Variant calling pipeline benchmark on 3 different DNA samples
Throughput Evaluation
The Parabricks GPU solution with 4 V100 GPUs on a Dell PowerEdge C4140 Server showed significantly improved throughput. One such server can analyze 48 whole genomes at 10x coverage per day. In comparison, a similar CPU-only solution can process only about 8 genomes per day. This 6-fold increase in throughput with the Parabricks GPU solution results in large savings in the Total Cost of Ownership by reducing hardware, IT management, cooling, power, and maintenance costs for centers processing large volumes of genomic data.
Features of Parabricks software
- 25-30 times faster analysis: Compared to a CPU-only solution, Parabricks accelerates secondary analysis by orders of magnitude.
- 100% Deterministic and Reproducible: Parabricks software, regardless of platform and number/type of resources, generates the exact same results every execution.
- Equivalent Results: Parabricks’ pipeline generates equivalent results as the reference Broad Institute GATK 4 best practices pipeline as the same algorithm is used.
- Up to Date Support of All Tool Versions: Parabricks’ accelerated software supports multiple versions of BWA-Mem, Picard and GATK and will support all future versions of these tools.
- Visualization: Parabricks generates several key visualizations real-time, while performing secondary analysis that can improve the user’s understanding of the data.
- Single Node Execution: The entire pipeline is run using one computing node and does not incur any overhead of distributing data and work across multiple servers.
- Turnkey Solution: Parabricks software runs on standard CPU and GPU nodes available on the cloud or on-premise, and requires no additional setup steps by the user.
- On-Premise and Cloud: Parabricks software can run on local servers, AWS, Google Cloud, and Azure.
Please contact info@parabricks.com for further information.