FQ2BAM Tutorial
The fq2bam
tool aligns, sorts (by coordinate), and marks duplicates in pair-ended FASTQ
file data. The data files used in this example are taken from the sample data downloaded in
the previous section.
If you execute the following command using the Clara Parabricks sample data, you should get the same results as show here:
$ pbrun fq2bam \
--ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
--out-bam output.bam
Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation
[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for /mnt/dsk/parabricks_sample/Data/sample_1.fq.gz and
/mnt/dsk/parabricks_sample/Data/sample_2.fq.gz
[Parabricks Options Mesg]: @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
------------------------------------------------------------------------------
|| Parabricks accelerated Genomics Pipeline ||
|| Version 3.6.1-1.beta6 ||
|| GPU-BWA mem, Sorting Phase-I ||
|| Contact: Parabricks-Support@nvidia.com ||
------------------------------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
GPU-BWA mem
ProgressMeter Reads Base Pairs Aligned
[19:02:36] 5043564 580000000
[19:03:04] 10087128 1160000000
[19:03:32] 15130692 1740000000
[19:04:01] 20174256 2320000000
[19:04:30] 25217820 2900000000
[19:04:59] 30261384 3480000000
[19:05:28] 35304948 4060000000
[19:05:56] 40348512 4640000000
[19:06:26] 45392076 5220000000
[19:06:55] 50435640 5800000000
GPU-BWA Mem time: 312.780832 seconds
GPU-BWA Mem is finished.
GPU Sorting, Marking Dups, BQSR
ProgressMeter SAM Entries Completed
Total GPU-BWA Mem + Sorting + MarkingDups + BQSR Generation + BAM writing
Processing time: 312.781866 seconds
[main] CMD: PARABRICKS mem -Z ./pbOpts.txt /mnt/dsk/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /mnt/dsk/parabricks_sample/Data/sample_1.fq.gz /mnt/dsk/parabricks_sample/Data/sample_2.fq.gz @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
[main] Real time: 317.249 sec; CPU: 3288.001 sec
------------------------------------------------------------------------------
|| Program: GPU-BWA mem, Sorting Phase-I ||
|| Version: 3.6.1-1.beta6 ||
|| Start Time: Tue Sep 21 19:01:58 2021 ||
|| End Time: Tue Sep 21 19:07:16 2021 ||
|| Total Time: 5 minutes 18 seconds ||
------------------------------------------------------------------------------
------------------------------------------------------------------------------
|| Parabricks accelerated Genomics Pipeline ||
|| Version 3.6.1-1.beta6 ||
|| Sorting Phase-II ||
|| Contact: Parabricks-Support@nvidia.com ||
------------------------------------------------------------------------------
progressMeter - Percentage
[19:07:17] 0.0 0.00 GB
Sorting and Marking: 10.000 seconds
------------------------------------------------------------------------------
|| Program: Sorting Phase-II ||
|| Version: 3.6.1-1.beta6 ||
|| Start Time: Tue Sep 21 19:07:16 2021 ||
|| End Time: Tue Sep 21 19:07:27 2021 ||
|| Total Time: 11 seconds ||
------------------------------------------------------------------------------
------------------------------------------------------------------------------
|| Parabricks accelerated Genomics Pipeline ||
|| Version 3.6.1-1.beta6 ||
|| Marking Duplicates, BQSR ||
|| Contact: Parabricks-Support@nvidia.com ||
------------------------------------------------------------------------------
progressMeter - Percentage
[19:07:38] 72.3 5.13 GB
[19:07:48] 100.0 0.00 GB
BQSR and writing final BAM: 20.034 seconds
------------------------------------------------------------------------------
|| Program: Marking Duplicates, BQSR ||
|| Version: 3.6.1-1.beta6 ||
|| Start Time: Tue Sep 21 19:07:27 2021 ||
|| End Time: Tue Sep 21 19:07:48 2021 ||
|| Total Time: 21 seconds ||
------------------------------------------------------------------------------
On an AWS g4dn.8xlarge instance (32 vCPUs, one T4 GPU, 128 GB memory), this takes approximately six minutes.
This fq2bam
command produces three output files:
$ ls -l
total 14330820
-rw-r--r-- 1 root root 4728919831 Sep 21 19:07 output.bam
-rw-r--r-- 1 root root 6882792 Sep 21 19:07 output.bam.bai
-rw-r--r-- 1 root root 87690 Sep 21 19:07 output_chrs.txt
(input files not shown)
The first few lines of output.bam
(as viewed with the samtools view output.bam
command) are
as follows:
HWI-D00127:570:HK3TJBCX2:1:1202:9643:76055 99 chr1 10027 26 24M5I86M = 10178 231 ACCCTAACCCTAACCCTAACCCGACCCCGACCCCGACCCAAACCCAAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCTAACCCTAACC DDDDDHGHIIIIIHIIHHIHHHIHIIIIIIHDHHIHHHIHIHIIIIFHIEHHIIHHIIIIEHIIIIHHIHIIICHE@1FHH?1GEFE1111D11<FH11<FD11<<FFE111<11 MD:Z:22T5T0A4T5T41A27 PG:Z:MarkDuplicatesRG:Z:HK3TJBCX2.1 NM:i:11 AS:i:69 XS:i:72
HWI-D00127:570:HK3TJBCX2:1:2207:14059:11605 163 chr1 10114 18 31M1I15M4D14M3D15M4D35M4S = 10258 219 TAACCCTAACCCTAACCCTAACCCTAACCCTAAACCCCTAACCCTAAAACCCCAACCCTAAAACCCTAACCCTAACACCCAAACCCAAACCCCAACCCTAACCCTAACCCCACCA DDDADIHHIIIIHHIHHHIHIF?EFHHEHIHEG?1<0<1D1D@11<11<10000000011<1<C0E1<111C1<11<<00111<<00111<<<00<00111<<@1<D######## MD:Z:46^CCCT5T8^CCT15^CCTA4T5T5T18 PG:Z:MarkDuplicates RG:Z:HK3TJBCX2.1 NM:i:16 AS:i:54 XS:i:52
HWI-D00127:570:HK3TJBCX2:1:1208:11381:9809 97 chr1 10160 0 35M1D38M42S chr3 10596 0 CCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCTAACCCTAACCCTAACACTAAACCCAACCCTAACCCTCCCTCACACCCACCGCCCCATCACGTCTCCCCACACGCCCC DBAA@GCGFHEHHHI1FHHGFHH@@CHCHHFH1<?FH1<G@HHH1FC@H1C<<111<11<100100<1111<<110<1<1<1100<D############################ XA:Z:chr18,+10062,74M41S,5;chr1,+10141,6M1D12M1D56M41S,5;chr3_KI270784v1_alt,+62121,35M1D9M1D22M49S,4;chr3,-198173372,49S20M1D9M1D37M,4;chr20,-64287255,66S12M1D37M,2; MD:Z:35^C18C4C2T11 PG:Z:MarkDuplicates RG:Z:HK3TJBCX2.1 NM:i:4 AS:i:51 XS:i:49
HWI-D00127:570:HK3TJBCX2:1:1202:9643:76055 147 chr1 10178 26 34S11M1I69M = 10027 -231 CCCTCCACCCCCCCCCCCACCCTCACCCCCACCCCCTAACCCTAACCCCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAACCCTAAACCTAACCCTAAACCCTAAA ##############################################E</<11@GD11@ECD11GG?HCD<HHHCF<HHHCIHHGHHHIHIIHHGCHHFEF<IHHHECFHIDBDD@ MD:Z:14T38C6C19 PG:Z:MarkDuplicates RG:Z:HK3TJBCX2.1 NM:i:4 AS:i:58 XS:i:56
HWI-D00127:570:HK3TJBCX2:1:2105:11353:65509 83 chr1 10217 13 18M1D12M1D85M = 10252 -82 AACCCCAACCCCAACCCCAACCCTTACCCTAACCCTAAACCCCAACCCTAACCCCAACCCCAACCCTAACCCCAACCCCAACCCCAACCCCAACCCCAACCCCAACCCCAACCCC ################C/C<<<111HFGD<<<1HGCCC<E?C/<<EHE?<<E?E<C/HEC/1FHHEFHCHECCEEECCC<<IHC?C<IHCCC<IHIHHEHFHHECIHIHEBDBBD XA:Z:chr1,+248946154,76M1I20M18S,6;chr18,-10310,12S45M1I57M,9;chr17,+83247311,78M37S,5;chr4,+190122704,76M1I8M1I29M,12; MD:Z:5T5T6^T6A5^A12T11T5T47T6 PG:Z:MarkDuplicates RG:Z:HK3TJBCX2.1 NM:i:9 AS:i:66 XS:i:68
HWI-D00127:570:HK3TJBCX2:1:2105:11353:65509 163 chr1 10252 13 76M39S = 10217 82 CCTAAACCCCAACCCTAACCCCAACCCCAACCCTAACCCCAACCCCAACCCCAACCCCAACCCCAACCCCCACCCTCACCCACAACCTAAACCCAAACCCAAACCCAACCCCAAA DDDDCIIIIIIIIIIHEHHIIIIGC0GHHI0EE1<<<<<00<00<<<E0D0D<G00<0000<<00<00D0/</D01<F11C10<1<011111<00<1<<0<D1<<00<0<C<000 XA:Z:chr17,-83247317,39S70M1I5M,5;chr1,-248946166,45S64M1I5M,4;chr1,-248946154,39S70M1I5M,6;chr4,-190122787,33S42M1I35M4S,6; MD:Z:9T11T5T42A5 PG:Z:MarkDuplicates RG:Z:HK3TJBCX2.1 NM:i:4 AS:i:56 XS:i:52
HWI-D00127:570:HK3TJBCX2:1:2207:14059:11605 83 chr1 10258 18 27S30M1D44M14S = 10114 -219 AACCCCACCCCTACCCCCTACCCCTACCCCTACCCCTAACCCTAACCCTACCCCTAACCCAACCCCAACCCCAACCCCAACCCCAACCCCAACCCGAACCCGAACCCGAACCCGA ##D/D<0/?<<0<//D<<00/E<00</HD<00/CD<1<1G?<<<1HHC<</D<<1<0HDC?FHGHHC<IIHEC/HDIHF<IIHHC<HHHECDIIHHCFIIHIIHIHHHEHDDD?D MD:Z:5A17A6^C38T5 PG:Z:MarkDuplicatesRG:Z:HK3TJBCX2.1 NM:i:4 AS:i:52 XS:i:61
HWI-D00127:570:HK3TJBCX2:1:1212:19530:27014 163 chr1 11071 0 115M = 11218 262 GGGGAGGGTGGCGCCGTGCACGCGCAGAAACTCACGTCACGGTGGCGCGGCGCAGAGACGGGTAGAACCTCAGTAATCCGAAAAGCCGGGATCGACCGCCCCTTGCTTGCAGCCG DADDA?HIIICEHIIIIIIGIIHIIH<HFHHIIHIHIHIHIIHIHIIHIIHHIIGHIIHHII<FHHIIIHIIFHHHHIIIIGIHIGIHGIIIIIHHIIIIIEHIIIIHDEGEHII XA:Z:chr15,-101979782,115M,0;chr16,+10751,16M1D99M,4; MD:Z:115 PG:Z:MarkDuplicates RG:Z:HK3TJBCX2.1 NM:i:0 AS:i:115 XS:i:115
HWI-D00127:570:HK3TJBCX2:1:1212:19530:27014 83 chr1 11218 0 115M = 11071 -262 TGCCAGGGCGCCCCCTGCTGGCGACTAGGGCAACTGCAGGGCTCTCTTGCTTAGAGTGGTGGCCAGCGCCCCCTGCTGGCGCCGGGGCACTGCAGGGCCCTCTTGCTTACTGTAT IIIIIIIIHIIIHCIIIHIIIIIIGIHIHHGIIIFIFIIIIIIIIIIIHIIIIIIIIIIIIIIIIIHIIIIIIIIIIIIIIHIIIIHIIIIIIIIIIGIIIIIIIIIIHHDDDDD MD:Z:115 PG:Z:MarkDuplicates RG:Z:HK3TJBCX2.1 NM:i:0 AS:i:115 XS:i:115
HWI-D00127:570:HK3TJBCX2:1:1113:5411:67415 99 chr1 11953 0 115M = 12074 236 TTTCCACCGGGCCTTTGGGAGGTCACAGGGTCTTGATGCTGTGGTCTTGATCTGCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTGGAAGCTGAGCACTGGAGTG D@DDDHIEHCEC/EHHECEHIHGHHHFIHFH@GHHEHIIIICHHHHHIIEHHIIF?GHHHHHHHIHIIGHEHHCG@@CCGEHEHHIIHGCHHHICEH?EHHFHFHHHIIIH1CD@ MD:Z:17A30C47C18 PG:Z:MarkDuplicates RG:Z:HK3TJBCX2.1 NM:i:3 AS:i:100 XS:i:105
....
If the fq2bam
command is run on a system with too little memory, you will see this message
after the initial header:
WARNING
The system has 62 GB, however recommended RAM with 1 GPU is 64 GB.
The run might not finish or might have less than expected performance.