Step #2: Running fq2bam

Accelerated Exome Analysis with Clara Parabricks (Latest Version)

The following examples will require using the system console of the GPU host. Click on the “System Console” link in the left menu of this page to open a web-based SSH session.

parabricks-05.png

The fq2bam command runs read alignment, sorting, duplicate marking, and base quality score recalibration (BQSR), according to GATK best practices, but at a much faster rate than community tools by leveraging the GPUs.

Copy
Copied!
            

docker run --gpus all --rm \ -v $(pwd):/results \ -v $(pwd):/data \ -w /data nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1 \ pbrun fq2bam \ --in-fq $FASTQ1 $FASTQ2 \ --ref ${REFERENCE_FILE} \ --out-bam /results/fastq2bam.pb.bam \ --out-qc-metrics-dir qc-metrics


The output should look like:

Copy
Copied!
            

[Parabricks Options Mesg]: Checking argument compatibility [Parabricks Options Mesg]: Automatically generating ID prefix [Parabricks Options Mesg]: Read group created for /data/HG002/MPHG002_S1_L001_R1_001.fastq.gz and /data/HG002/MPHG002_S1_L001_R2_001.fastq.gz [Parabricks Options Mesg]: @RG\tID:C6UP4ANXX.1\tLB:lib1\tPL:bar\tSM:sample\tPU:C6UP4ANXX.1 [PB Info 2022-Sep-13 08:16:23] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 08:16:23] || Parabricks accelerated Genomics Pipeline || [PB Info 2022-Sep-13 08:16:23] || Version 4.0.0-1 || [PB Info 2022-Sep-13 08:16:23] || GPU-BWA mem, Sorting Phase-I || [PB Info 2022-Sep-13 08:16:23] --------------------------------------------------------------------------------- [M::bwa_idx_load_from_disk] read 0 ALT contigs [PB Warning 2022-Sep-13 08:16:51][ParaBricks/src/pbOpts.cu:316] WARNING The system has 186 GB, however recommended RAM with 4 GPU is 196 GB. The run might not finish or might have less than expected performance. [PB Info 2022-Sep-13 08:16:51] GPU-BWA mem [PB Info 2022-Sep-13 08:16:51] ProgressMeter Reads Base Pairs Aligned [PB Info 2022-Sep-13 08:17:41] 5040000 620000000 [PB Info 2022-Sep-13 08:18:34] 10080000 1260000000 [PB Info 2022-Sep-13 08:19:28] 15120000 1870000000 … [PB Info 2022-Sep-13 09:35:53] 428400000 53550000000 [PB Info 2022-Sep-13 09:36:38] GPU-BWA Mem time: 4787.160169 seconds [PB Info 2022-Sep-13 09:36:38] GPU-BWA Mem is finished. [main] CMD: /usr/local/parabricks/binaries//bin/bwa mem -Z ./pbOpts.txt /data/Test/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna /data/HG002/MPHG002_S1_L001_R1_001.fastq.gz /data/HG002/MPHG002_S1_L001_R2_001.fastq.gz @RG\tID:C6UP4ANXX.1\tLB:lib1\tPL:bar\tSM:sample\tPU:C6UP4ANXX.1 [main] Real time: 4815.253 sec; CPU: 204261.998 sec [PB Info 2022-Sep-13 09:36:38] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:36:38] || Program: GPU-BWA mem, Sorting Phase-I || [PB Info 2022-Sep-13 09:36:38] || Version: 4.0.0-1 || [PB Info 2022-Sep-13 09:36:38] || Start Time: Tue Sep 13 08:16:23 2022 || [PB Info 2022-Sep-13 09:36:38] || End Time: Tue Sep 13 09:36:38 2022 || [PB Info 2022-Sep-13 09:36:38] || Total Time: 80 minutes 15 seconds || [PB Info 2022-Sep-13 09:36:38] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:36:41] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:36:41] || Parabricks accelerated Genomics Pipeline || [PB Info 2022-Sep-13 09:36:41] || Version 4.0.0-1 || [PB Info 2022-Sep-13 09:36:41] || Sorting Phase-II || [PB Info 2022-Sep-13 09:36:41] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:36:41] progressMeter - Percentage [PB Info 2022-Sep-13 09:36:41] 0.0 0.00 GB [PB Info 2022-Sep-13 09:36:51] 16.2 1.00 GB [PB Info 2022-Sep-13 09:37:01] 32.4 1.00 GB [PB Info 2022-Sep-13 09:37:11] 47.1 1.00 GB [PB Info 2022-Sep-13 09:37:21] 61.2 1.00 GB [PB Info 2022-Sep-13 09:37:31] 76.2 1.00 GB [PB Info 2022-Sep-13 09:37:41] 90.4 1.00 GB [PB Info 2022-Sep-13 09:37:51] Sorting and Marking: 70.002 seconds [PB Info 2022-Sep-13 09:37:51] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:37:51] || Program: Sorting Phase-II || [PB Info 2022-Sep-13 09:37:51] || Version: 4.0.0-1 || [PB Info 2022-Sep-13 09:37:51] || Start Time: Tue Sep 13 09:36:41 2022 || [PB Info 2022-Sep-13 09:37:51] || End Time: Tue Sep 13 09:37:51 2022 || [PB Info 2022-Sep-13 09:37:51] || Total Time: 1 minute 10 seconds || [PB Info 2022-Sep-13 09:37:51] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:37:51] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:37:51] || Parabricks accelerated Genomics Pipeline || [PB Info 2022-Sep-13 09:37:51] || Version 4.0.0-1 || [PB Info 2022-Sep-13 09:37:51] || Marking Duplicates, BQSR || [PB Info 2022-Sep-13 09:37:51] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:37:51] progressMeter - Percentage [PB Info 2022-Sep-13 09:38:01] 0.0 8.50 GB [PB Info 2022-Sep-13 09:38:11] 0.0 16.68 GB … [PB Info 2022-Sep-13 09:50:11] 100.0 0.00 GB [PB Info 2022-Sep-13 09:50:18] BQSR and writing final BAM: 746.711 seconds [PB Info 2022-Sep-13 09:50:18] --------------------------------------------------------------------------------- [PB Info 2022-Sep-13 09:50:18] || Program: Marking Duplicates, BQSR || [PB Info 2022-Sep-13 09:50:18] || Version: 4.0.0-1 || [PB Info 2022-Sep-13 09:50:18] || Start Time: Tue Sep 13 09:37:51 2022 || [PB Info 2022-Sep-13 09:50:18] || End Time: Tue Sep 13 09:50:18 2022 || [PB Info 2022-Sep-13 09:50:18] || Total Time: 12 minutes 27 seconds || [PB Info 2022-Sep-13 09:50:18] --------------------------------------------------------------------------------- /tmp/7QB5DKRM_run.sh Generating qualityscore pdf... Generating insertsize pdf... Generating meanqualitybycycle pdf... Generating qualityscore pdf... Generating gcbias pdf... Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation


Now let’s look at the outputs generated :

Copy
Copied!
            

ls -l fastq2bam* -rw-r--r-- 1 root root 4819386784 Sep 6 12:43 fastq2bam.pb.bam -rw-r--r-- 1 root root 6882792 Sep 6 12:43 fastq2bam.pb.bam.bai ls qc-metrics/ alignment.txt insert_size.pdf qualityscore.png base_distribution_by_cycle.pdf insert_size.png qualityscore.txt base_distribution_by_cycle.png insert_size.txt sequencingArtifact.bait_bias_detail_metrics.txt base_distribution_by_cycle.txt mean_quality_by_cycle.pdf sequencingArtifact.bait_bias_summary_metrics.txt gcbias.pdf mean_quality_by_cycle.png sequencingArtifact.error_summary_metrics.txt gcbias_0.png mean_quality_by_cycle.txt sequencingArtifact.pre_adapter_detail_metrics.txt gcbias_detail.txt quality_yield.txt sequencingArtifact.pre_adapter_summary_metrics.txt gcbias_summary.txt qualityscore.pdf


parabricks-03.png

parabricks-04.png

© Copyright 2022-2023, NVIDIA. Last updated on May 22, 2023.