Accelerated Exome Analysis with Clara Parabricks
Accelerated Exome Analysis with Clara Parabricks (Latest Version)

Step #2: Running fq2bam

The following examples will require using the system console of the GPU host. Click on the “System Console” link in the left menu of this page to open a web-based SSH session.

parabricks-05.png

The fq2bam command runs read alignment, sorting, duplicate marking, and base quality score recalibration (BQSR), according to GATK best practices, but at a much faster rate than community tools by leveraging the GPUs.

Copy
Copied!
            

docker run --gpus all --rm \ -v $(pwd):/results \ -v $(pwd):/data \ -w /data nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1 \ pbrun fq2bam \ --in-fq $FASTQ1 $FASTQ2 \ --ref ${REFERENCE_FILE} \ --out-bam /results/fastq2bam.pb.bam \ --out-qc-metrics-dir qc-metrics


The output should look like:

Copy
Copied!
            

[Parabricks Options Mesg]: Checking argument compatibility [Parabricks Options Mesg]: Automatically generating ID prefix [Parabricks Options Mesg]: Read group created for /data/HG002/MPHG002_S1_L001_R1_001.fastq.gz and /data/HG002/MPHG002_S1_L001_R2_001.fastq.gz [Parabricks Options Mesg]: @RG\tID:C6UP4ANXX.1\tLB:lib1\tPL:bar\tSM:sample\tPU:C6UP4ANXX.1 [PB Info 2022-Sep-13 08:16:23] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 08:16:23] || Parabricks accelerated Genomics Pipeline || [PB Info 2022-Sep-13 08:16:23] || Version 4.0.0-1 || [PB Info 2022-Sep-13 08:16:23] || GPU-BWA mem, Sorting Phase-I || [PB Info 2022-Sep-13 08:16:23] --------------------------------------------------------------------------------- [M::bwa_idx_load_from_disk] read 0 ALT contigs [PB Warning 2022-Sep-13 08:16:51][ParaBricks/src/pbOpts.cu:316] WARNING The system has 186 GB, however recommended RAM with 4 GPU is 196 GB. The run might not finish or might have less than expected performance. [PB Info 2022-Sep-13 08:16:51] GPU-BWA mem [PB Info 2022-Sep-13 08:16:51] ProgressMeter Reads Base Pairs Aligned [PB Info 2022-Sep-13 08:17:41] 5040000 620000000 [PB Info 2022-Sep-13 08:18:34] 10080000 1260000000 [PB Info 2022-Sep-13 08:19:28] 15120000 1870000000 … [PB Info 2022-Sep-13 09:35:53] 428400000 53550000000 [PB Info 2022-Sep-13 09:36:38] GPU-BWA Mem time: 4787.160169 seconds [PB Info 2022-Sep-13 09:36:38] GPU-BWA Mem is finished. [main] CMD: /usr/local/parabricks/binaries//bin/bwa mem -Z ./pbOpts.txt /data/Test/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna /data/HG002/MPHG002_S1_L001_R1_001.fastq.gz /data/HG002/MPHG002_S1_L001_R2_001.fastq.gz @RG\tID:C6UP4ANXX.1\tLB:lib1\tPL:bar\tSM:sample\tPU:C6UP4ANXX.1 [main] Real time: 4815.253 sec; CPU: 204261.998 sec [PB Info 2022-Sep-13 09:36:38] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:36:38] || Program: GPU-BWA mem, Sorting Phase-I || [PB Info 2022-Sep-13 09:36:38] || Version: 4.0.0-1 || [PB Info 2022-Sep-13 09:36:38] || Start Time: Tue Sep 13 08:16:23 2022 || [PB Info 2022-Sep-13 09:36:38] || End Time: Tue Sep 13 09:36:38 2022 || [PB Info 2022-Sep-13 09:36:38] || Total Time: 80 minutes 15 seconds || [PB Info 2022-Sep-13 09:36:38] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:36:41] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:36:41] || Parabricks accelerated Genomics Pipeline || [PB Info 2022-Sep-13 09:36:41] || Version 4.0.0-1 || [PB Info 2022-Sep-13 09:36:41] || Sorting Phase-II || [PB Info 2022-Sep-13 09:36:41] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:36:41] progressMeter - Percentage [PB Info 2022-Sep-13 09:36:41] 0.0 0.00 GB [PB Info 2022-Sep-13 09:36:51] 16.2 1.00 GB [PB Info 2022-Sep-13 09:37:01] 32.4 1.00 GB [PB Info 2022-Sep-13 09:37:11] 47.1 1.00 GB [PB Info 2022-Sep-13 09:37:21] 61.2 1.00 GB [PB Info 2022-Sep-13 09:37:31] 76.2 1.00 GB [PB Info 2022-Sep-13 09:37:41] 90.4 1.00 GB [PB Info 2022-Sep-13 09:37:51] Sorting and Marking: 70.002 seconds [PB Info 2022-Sep-13 09:37:51] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:37:51] || Program: Sorting Phase-II || [PB Info 2022-Sep-13 09:37:51] || Version: 4.0.0-1 || [PB Info 2022-Sep-13 09:37:51] || Start Time: Tue Sep 13 09:36:41 2022 || [PB Info 2022-Sep-13 09:37:51] || End Time: Tue Sep 13 09:37:51 2022 || [PB Info 2022-Sep-13 09:37:51] || Total Time: 1 minute 10 seconds || [PB Info 2022-Sep-13 09:37:51] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:37:51] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:37:51] || Parabricks accelerated Genomics Pipeline || [PB Info 2022-Sep-13 09:37:51] || Version 4.0.0-1 || [PB Info 2022-Sep-13 09:37:51] || Marking Duplicates, BQSR || [PB Info 2022-Sep-13 09:37:51] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:37:51] progressMeter - Percentage [PB Info 2022-Sep-13 09:38:01] 0.0 8.50 GB [PB Info 2022-Sep-13 09:38:11] 0.0 16.68 GB … [PB Info 2022-Sep-13 09:50:11] 100.0 0.00 GB [PB Info 2022-Sep-13 09:50:18] BQSR and writing final BAM: 746.711 seconds [PB Info 2022-Sep-13 09:50:18] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:50:18] || Program: Marking Duplicates, BQSR || [PB Info 2022-Sep-13 09:50:18] || Version: 4.0.0-1 || [PB Info 2022-Sep-13 09:50:18] || Start Time: Tue Sep 13 09:37:51 2022 || [PB Info 2022-Sep-13 09:50:18] || End Time: Tue Sep 13 09:50:18 2022 || [PB Info 2022-Sep-13 09:50:18] || Total Time: 12 minutes 27 seconds || [PB Info 2022-Sep-13 09:50:18] --------------------------------------------------------------------------------- /tmp/7QB5DKRM_run.sh Generating qualityscore pdf... Generating insertsize pdf... Generating meanqualitybycycle pdf... Generating qualityscore pdf... Generating gcbias pdf... Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation


Now let’s look at the outputs generated :

Copy
Copied!
            

ls -l fastq2bam* -rw-r--r-- 1 root root 4819386784 Sep 6 12:43 fastq2bam.pb.bam -rw-r--r-- 1 root root 6882792 Sep 6 12:43 fastq2bam.pb.bam.bai ls qc-metrics/ alignment.txt insert_size.pdf qualityscore.png base_distribution_by_cycle.pdf insert_size.png qualityscore.txt base_distribution_by_cycle.png insert_size.txt sequencingArtifact.bait_bias_detail_metrics.txt base_distribution_by_cycle.txt mean_quality_by_cycle.pdf sequencingArtifact.bait_bias_summary_metrics.txt gcbias.pdf mean_quality_by_cycle.png sequencingArtifact.error_summary_metrics.txt gcbias_0.png mean_quality_by_cycle.txt sequencingArtifact.pre_adapter_detail_metrics.txt gcbias_detail.txt quality_yield.txt sequencingArtifact.pre_adapter_summary_metrics.txt gcbias_summary.txt qualityscore.pdf


parabricks-03.png

parabricks-04.png

© Copyright 2022-2023, NVIDIA. Last updated on May 22, 2023.