FQ2BAM TutorialΒΆ

The fq2bam tool aligns, sorts (by coordinate), and marks duplicates in pair-ended FASTQ file data. The data files used in this example are taken from the sample data downloaded in the previous section.

If you execute the following command using the Clara Parabricks sample data, you should get the same results as show here:

$ pbrun fq2bam \
         --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
         --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
         --out-bam output.bam
Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation



[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for /mnt/dsk/parabricks_sample/Data/sample_1.fq.gz and
/mnt/dsk/parabricks_sample/Data/sample_2.fq.gz
[Parabricks Options Mesg]: @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
------------------------------------------------------------------------------
||                 Parabricks accelerated Genomics Pipeline                 ||
||                           Version 3.6.1-1.beta6                          ||
||                       GPU-BWA mem, Sorting Phase-I                       ||
||                  Contact: Parabricks-Support@nvidia.com                  ||
------------------------------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs

GPU-BWA mem
ProgressMeter       Reads           Base Pairs Aligned
[19:02:36]  5043564         580000000
[19:03:04]  10087128        1160000000
[19:03:32]  15130692        1740000000
[19:04:01]  20174256        2320000000
[19:04:30]  25217820        2900000000
[19:04:59]  30261384        3480000000
[19:05:28]  35304948        4060000000
[19:05:56]  40348512        4640000000
[19:06:26]  45392076        5220000000
[19:06:55]  50435640        5800000000

GPU-BWA Mem time: 312.780832 seconds
GPU-BWA Mem is finished.

GPU Sorting, Marking Dups, BQSR
ProgressMeter       SAM Entries Completed

Total GPU-BWA Mem + Sorting + MarkingDups + BQSR Generation + BAM writing
Processing time: 312.781866 seconds

[main] CMD: PARABRICKS mem -Z ./pbOpts.txt /mnt/dsk/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /mnt/dsk/parabricks_sample/Data/sample_1.fq.gz /mnt/dsk/parabricks_sample/Data/sample_2.fq.gz @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
[main] Real time: 317.249 sec; CPU: 3288.001 sec
------------------------------------------------------------------------------
||        Program:                      GPU-BWA mem, Sorting Phase-I        ||
||        Version:                                     3.6.1-1.beta6        ||
||        Start Time:                       Tue Sep 21 19:01:58 2021        ||
||        End Time:                         Tue Sep 21 19:07:16 2021        ||
||        Total Time:                           5 minutes 18 seconds        ||
------------------------------------------------------------------------------
------------------------------------------------------------------------------
||                 Parabricks accelerated Genomics Pipeline                 ||
||                           Version 3.6.1-1.beta6                          ||
||                             Sorting Phase-II                             ||
||                  Contact: Parabricks-Support@nvidia.com                  ||
------------------------------------------------------------------------------
progressMeter - Percentage
[19:07:17]  0.0      0.00 GB
Sorting and Marking: 10.000 seconds
------------------------------------------------------------------------------
||        Program:                                  Sorting Phase-II        ||
||        Version:                                     3.6.1-1.beta6        ||
||        Start Time:                       Tue Sep 21 19:07:16 2021        ||
||        End Time:                         Tue Sep 21 19:07:27 2021        ||
||        Total Time:                                     11 seconds        ||
------------------------------------------------------------------------------
------------------------------------------------------------------------------
||                 Parabricks accelerated Genomics Pipeline                 ||
||                           Version 3.6.1-1.beta6                          ||
||                         Marking Duplicates, BQSR                         ||
||                  Contact: Parabricks-Support@nvidia.com                  ||
------------------------------------------------------------------------------
progressMeter -     Percentage
[19:07:38]  72.3     5.13 GB
[19:07:48]  100.0    0.00 GB
BQSR and writing final BAM:  20.034 seconds
------------------------------------------------------------------------------
||        Program:                          Marking Duplicates, BQSR        ||
||        Version:                                     3.6.1-1.beta6        ||
||        Start Time:                       Tue Sep 21 19:07:27 2021        ||
||        End Time:                         Tue Sep 21 19:07:48 2021        ||
||        Total Time:                                     21 seconds        ||
------------------------------------------------------------------------------

On an AWS g4dn.8xlarge instance (32 vCPUs, one T4 GPU, 128 GB memory), this takes approximately six minutes.

This fq2bam command produces three output files:

$ ls -l
total 14330820
-rw-r--r-- 1 root   root   4728919831 Sep 21 19:07 output.bam
-rw-r--r-- 1 root   root      6882792 Sep 21 19:07 output.bam.bai
-rw-r--r-- 1 root   root        87690 Sep 21 19:07 output_chrs.txt
(input files not shown)

The first few lines of output.bam (as viewed with the samtools view output.bam command) are as follows:

HWI-D00127:570:HK3TJBCX2:1:1202:9643:76055  99      chr1    10027   26      24M5I86M        =       10178   231     ACCCTAACCCTAACCCTAACCCGACCCCGACCCCGACCCAAACCCAAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCTAACCCTAACC     DDDDDHGHIIIIIHIIHHIHHHIHIIIIIIHDHHIHHHIHIHIIIIFHIEHHIIHHIIIIEHIIIIHHIHIIICHE@1FHH?1GEFE1111D11<FH11<FD11<<FFE111<11     MD:Z:22T5T0A4T5T41A27   PG:Z:MarkDuplicatesRG:Z:HK3TJBCX2.1     NM:i:11 AS:i:69 XS:i:72
HWI-D00127:570:HK3TJBCX2:1:2207:14059:11605 163     chr1    10114   18      31M1I15M4D14M3D15M4D35M4S       =       10258   219     TAACCCTAACCCTAACCCTAACCCTAACCCTAAACCCCTAACCCTAAAACCCCAACCCTAAAACCCTAACCCTAACACCCAAACCCAAACCCCAACCCTAACCCTAACCCCACCA     DDDADIHHIIIIHHIHHHIHIF?EFHHEHIHEG?1<0<1D1D@11<11<10000000011<1<C0E1<111C1<11<<00111<<00111<<<00<00111<<@1<D########     MD:Z:46^CCCT5T8^CCT15^CCTA4T5T5T18      PG:Z:MarkDuplicates     RG:Z:HK3TJBCX2.1        NM:i:16 AS:i:54 XS:i:52
HWI-D00127:570:HK3TJBCX2:1:1208:11381:9809  97      chr1    10160   0       35M1D38M42S     chr3    10596   0       CCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCTAACCCTAACCCTAACACTAAACCCAACCCTAACCCTCCCTCACACCCACCGCCCCATCACGTCTCCCCACACGCCCC     DBAA@GCGFHEHHHI1FHHGFHH@@CHCHHFH1<?FH1<G@HHH1FC@H1C<<111<11<100100<1111<<110<1<1<1100<D############################     XA:Z:chr18,+10062,74M41S,5;chr1,+10141,6M1D12M1D56M41S,5;chr3_KI270784v1_alt,+62121,35M1D9M1D22M49S,4;chr3,-198173372,49S20M1D9M1D37M,4;chr20,-64287255,66S12M1D37M,2;  MD:Z:35^C18C4C2T11      PG:Z:MarkDuplicates     RG:Z:HK3TJBCX2.1        NM:i:4  AS:i:51 XS:i:49
HWI-D00127:570:HK3TJBCX2:1:1202:9643:76055  147     chr1    10178   26      34S11M1I69M     =       10027   -231    CCCTCCACCCCCCCCCCCACCCTCACCCCCACCCCCTAACCCTAACCCCCAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAACCCTAAACCTAACCCTAAACCCTAAA     ##############################################E</<11@GD11@ECD11GG?HCD<HHHCF<HHHCIHHGHHHIHIIHHGCHHFEF<IHHHECFHIDBDD@     MD:Z:14T38C6C19 PG:Z:MarkDuplicates     RG:Z:HK3TJBCX2.1        NM:i:4  AS:i:58 XS:i:56
HWI-D00127:570:HK3TJBCX2:1:2105:11353:65509 83      chr1    10217   13      18M1D12M1D85M   =       10252   -82     AACCCCAACCCCAACCCCAACCCTTACCCTAACCCTAAACCCCAACCCTAACCCCAACCCCAACCCTAACCCCAACCCCAACCCCAACCCCAACCCCAACCCCAACCCCAACCCC     ################C/C<<<111HFGD<<<1HGCCC<E?C/<<EHE?<<E?E<C/HEC/1FHHEFHCHECCEEECCC<<IHC?C<IHCCC<IHIHHEHFHHECIHIHEBDBBD     XA:Z:chr1,+248946154,76M1I20M18S,6;chr18,-10310,12S45M1I57M,9;chr17,+83247311,78M37S,5;chr4,+190122704,76M1I8M1I29M,12; MD:Z:5T5T6^T6A5^A12T11T5T47T6   PG:Z:MarkDuplicates     RG:Z:HK3TJBCX2.1        NM:i:9  AS:i:66 XS:i:68
HWI-D00127:570:HK3TJBCX2:1:2105:11353:65509 163     chr1    10252   13      76M39S  =       10217   82      CCTAAACCCCAACCCTAACCCCAACCCCAACCCTAACCCCAACCCCAACCCCAACCCCAACCCCAACCCCCACCCTCACCCACAACCTAAACCCAAACCCAAACCCAACCCCAAA     DDDDCIIIIIIIIIIHEHHIIIIGC0GHHI0EE1<<<<<00<00<<<E0D0D<G00<0000<<00<00D0/</D01<F11C10<1<011111<00<1<<0<D1<<00<0<C<000     XA:Z:chr17,-83247317,39S70M1I5M,5;chr1,-248946166,45S64M1I5M,4;chr1,-248946154,39S70M1I5M,6;chr4,-190122787,33S42M1I35M4S,6;    MD:Z:9T11T5T42A5        PG:Z:MarkDuplicates     RG:Z:HK3TJBCX2.1        NM:i:4  AS:i:56 XS:i:52
HWI-D00127:570:HK3TJBCX2:1:2207:14059:11605 83      chr1    10258   18      27S30M1D44M14S  =       10114   -219    AACCCCACCCCTACCCCCTACCCCTACCCCTACCCCTAACCCTAACCCTACCCCTAACCCAACCCCAACCCCAACCCCAACCCCAACCCCAACCCGAACCCGAACCCGAACCCGA     ##D/D<0/?<<0<//D<<00/E<00</HD<00/CD<1<1G?<<<1HHC<</D<<1<0HDC?FHGHHC<IIHEC/HDIHF<IIHHC<HHHECDIIHHCFIIHIIHIHHHEHDDD?D     MD:Z:5A17A6^C38T5       PG:Z:MarkDuplicatesRG:Z:HK3TJBCX2.1     NM:i:4  AS:i:52 XS:i:61
HWI-D00127:570:HK3TJBCX2:1:1212:19530:27014 163     chr1    11071   0       115M    =       11218   262     GGGGAGGGTGGCGCCGTGCACGCGCAGAAACTCACGTCACGGTGGCGCGGCGCAGAGACGGGTAGAACCTCAGTAATCCGAAAAGCCGGGATCGACCGCCCCTTGCTTGCAGCCG     DADDA?HIIICEHIIIIIIGIIHIIH<HFHHIIHIHIHIHIIHIHIIHIIHHIIGHIIHHII<FHHIIIHIIFHHHHIIIIGIHIGIHGIIIIIHHIIIIIEHIIIIHDEGEHII     XA:Z:chr15,-101979782,115M,0;chr16,+10751,16M1D99M,4;   MD:Z:115        PG:Z:MarkDuplicates     RG:Z:HK3TJBCX2.1        NM:i:0  AS:i:115        XS:i:115
HWI-D00127:570:HK3TJBCX2:1:1212:19530:27014 83      chr1    11218   0       115M    =       11071   -262    TGCCAGGGCGCCCCCTGCTGGCGACTAGGGCAACTGCAGGGCTCTCTTGCTTAGAGTGGTGGCCAGCGCCCCCTGCTGGCGCCGGGGCACTGCAGGGCCCTCTTGCTTACTGTAT     IIIIIIIIHIIIHCIIIHIIIIIIGIHIHHGIIIFIFIIIIIIIIIIIHIIIIIIIIIIIIIIIIIHIIIIIIIIIIIIIIHIIIIHIIIIIIIIIIGIIIIIIIIIIHHDDDDD     MD:Z:115        PG:Z:MarkDuplicates     RG:Z:HK3TJBCX2.1        NM:i:0  AS:i:115        XS:i:115
HWI-D00127:570:HK3TJBCX2:1:1113:5411:67415  99      chr1    11953   0       115M    =       12074   236     TTTCCACCGGGCCTTTGGGAGGTCACAGGGTCTTGATGCTGTGGTCTTGATCTGCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTGGAAGCTGAGCACTGGAGTG     D@DDDHIEHCEC/EHHECEHIHGHHHFIHFH@GHHEHIIIICHHHHHIIEHHIIF?GHHHHHHHIHIIGHEHHCG@@CCGEHEHHIIHGCHHHICEH?EHHFHFHHHIIIH1CD@     MD:Z:17A30C47C18        PG:Z:MarkDuplicates     RG:Z:HK3TJBCX2.1        NM:i:3  AS:i:100        XS:i:105
....

Note

If the fq2bam command is run on a system with too little memory, you will see this message after the initial header:

WARNING
The system has 62 GB, however recommended RAM with 1 GPU is 64 GB.
The run might not finish or might have less than expected performance.