HaplotypeCaller Tutorial

This tutorial will show you how to run the gold-standard GATK variant caller, HaplotypeCaller, which takes your aligned output BAM from the FQ2BAM Tutorial, assembles plausible haplotypes from active regions, and identifies genotype likelihoods according to Bayes’ Rule. The result is a Variant Call Format (VCF) file of all the variant calls across the genome, including their position in the genome and the allelic information.

To do this, run the following command:

 $ docker run \
      --gpus all \
      --rm \
      --volume $(pwd):/workdir \
      --volume $(pwd):/outputdir \
    nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1 \
    pbrun haplotypecaller \
      --ref /workdir/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
      --in-bam /workdir/fq2bam_output.bam \
      --out-variants /outputdir/variants.vcf

[PB Info 2022-Sep-02 20:08:13] ------------------------------------------------------------------------------
[PB Info 2022-Sep-02 20:08:13] ||                 Parabricks accelerated Genomics Pipeline                 ||
[PB Info 2022-Sep-02 20:08:13] ||                           Version 4.0.0-1                                ||
[PB Info 2022-Sep-02 20:08:13] ||                         GPU-GATK4 HaplotypeCaller                        ||
[PB Info 2022-Sep-02 20:08:13] ------------------------------------------------------------------------------
[PB Info 2022-Sep-02 20:08:48] 0 /outputdir/fq2bam_output.bam/outputdir/variants.vcf
[PB Info 2022-Sep-02 20:08:48] ProgressMeter -  Current-Locus   Elapsed-Minutes Regions-Processed   Regions/Minute
[PB Info 2022-Sep-02 20:08:58] chr1:26179201    0.2 143431  860586
[PB Info 2022-Sep-02 20:09:08] chr1:88982401    0.3 428734  1286202
[PB Info 2022-Sep-02 20:09:18] chr1:155342401   0.5 622031  1244062
[PB Info 2022-Sep-02 20:09:28] chr1:222508801   0.7 921183  1381774
[PB Info 2022-Sep-02 20:09:38] chr2:38697601    0.8 1206544 1447852
[PB Info 2022-Sep-02 20:09:48] chr2:101697601   1.0 1469587 1469585
[PB Info 2022-Sep-02 20:09:58] chr2:176107201   1.2 1802198 1544741
[PB Info 2022-Sep-02 20:10:08] chr3:8990401 1.3 2124644 1593483
[PB Info 2022-Sep-02 20:10:18] chr3:55473601    1.5 2349628 1566418
[PB Info 2022-Sep-02 20:10:28] chr3:133680001   1.7 2671435 1602861
[PB Info 2022-Sep-02 20:10:38] chr3:192724801   1.8 2922628 1594160
[PB Info 2022-Sep-02 20:10:48] chr4:61934401    2.0 3222268 1611134
[PB Info 2022-Sep-02 20:10:58] chr4:143524801   2.2 3573491 1649303
[PB Info 2022-Sep-02 20:11:08] chr5:38947201    2.3 3949926 1692825
[PB Info 2022-Sep-02 20:11:18] chr5:116337601   2.5 4255653 1702261
[PB Info 2022-Sep-02 20:11:28] chr5:176035201   2.7 4508018 1690506
[PB Info 2022-Sep-02 20:11:38] chr6:54768001    2.8 4780747 1687322
[PB Info 2022-Sep-02 20:11:48] chr6:152140801   3.0 5176363 1725454
[PB Info 2022-Sep-02 20:11:58] chr7:27105601    3.2 5377117 1698036
[PB Info 2022-Sep-02 20:12:08] chr7:105105601   3.3 5713863 1714158
[PB Info 2022-Sep-02 20:12:18] chr8:18086401    3.5 6002114 1714889
[PB Info 2022-Sep-02 20:12:28] chr8:73915201    3.7 6267992 1709451
[PB Info 2022-Sep-02 20:12:38] chr9:5553601 3.8 6588617 1718769
[PB Info 2022-Sep-02 20:12:48] chr9:93672001    4.0 6886981 1721745
[PB Info 2022-Sep-02 20:12:58] chr10:4094401    4.2 7095584 1702939
[PB Info 2022-Sep-02 20:13:08] chr10:94593601   4.3 7489282 1728295
[PB Info 2022-Sep-02 20:13:18] chr11:18398401   4.5 7757420 1723871
[PB Info 2022-Sep-02 20:13:28] chr11:95976001   4.7 8083942 1732273
[PB Info 2022-Sep-02 20:13:38] chr12:6652801    4.8 8282324 1713584
[PB Info 2022-Sep-02 20:13:48] chr12:70632001   5.0 8560282 1712056
[PB Info 2022-Sep-02 20:13:58] chr13:24446401   5.2 8860588 1714952
[PB Info 2022-Sep-02 20:14:08] chr13:99038401   5.3 9200934 1725175
[PB Info 2022-Sep-02 20:14:18] chr14:68467201   5.5 9480546 1723735
[PB Info 2022-Sep-02 20:14:28] chr15:66432001   5.7 9821985 1733291
[PB Info 2022-Sep-02 20:14:38] chr16:31128001   5.8 10123582    1735471
[PB Info 2022-Sep-02 20:14:48] chr17:15782401   6.0 10402640    1733773
[PB Info 2022-Sep-02 20:14:58] chr17:55262401   6.2 10553414    1711364
[PB Info 2022-Sep-02 20:15:08] chr18:27960001   6.3 10790487    1703761
[PB Info 2022-Sep-02 20:15:18] chr19:15883201   6.5 11074759    1703809
[PB Info 2022-Sep-02 20:15:28] chr20:16108801   6.7 11311965    1696794
[PB Info 2022-Sep-02 20:15:38] chr21:10468801   6.8 11563626    1692237
[PB Info 2022-Sep-02 20:15:48] chr22:29289601   7.0 11766272    1680896
[PB Info 2022-Sep-02 20:15:58] chrX:77452801    7.2 12197330    1701952
[PB Info 2022-Sep-02 20:16:08] chrUn_JTFH01000876v1_decoy:1 7.3 12604293    1718767
[PB Info 2022-Sep-02 20:16:18] Total time taken: 450.258102
[PB Info 2022-Sep-02 20:16:18] ------------------------------------------------------------------------------
[PB Info 2022-Sep-02 20:16:18] ||        Program:                         GPU-GATK4 HaplotypeCaller        ||
[PB Info 2022-Sep-02 20:16:18] ||        Version:                                     4.0.0-1              ||
[PB Info 2022-Sep-02 20:16:18] ||        Start Time:                       Fri Sep  2 20:08:13 2022        ||
[PB Info 2022-Sep-02 20:16:18] ||        End Time:                         Fri Sep  2 20:16:18 2022        ||
[PB Info 2022-Sep-02 20:16:18] ||        Total Time:                            8 minutes 5 seconds        ||
[PB Info 2022-Sep-02 20:16:18] ------------------------------------------------------------------------------
/usr/local/parabricks/binaries//bin/htvc /workdir/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /outputdir/fq2bam_output.bam 1 -o /outputdir/variants.vcf -nt 5
Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation

If you get an out-of-memory error make sure your computer has enough RAM, and that large amounts of memory aren't being used by other programs.

You should now have the following files in your directory:

$ ls -lrt
-rw-r--r--  1 root     root     4819386804 Sep  2 15:58 fq2bam_output.bam
-rw-r--r--  1 root     root        6882792 Sep  2 15:59 fq2bam_output.bam.bai
-rw-r--r--  1 root     root          87690 Sep  2 15:59 fq2bam_output_chrs.txt
-rw-r--r--  1 root     root       23643404 Sep  2 16:16 variants.vcf
(sample data not shown)

The first ten non-header lines of variants.vcf should be as follows:

chr1    16378   .   T   C   45.28   .   AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=23.55;QD=22.64;SOR=2.303    GT:AD:DP:GQ:PL  1/1:0,2:2:6:57,6,0
chr1    63268   .   T   C   43.28   .   AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=23.09;QD=21.64;SOR=2.303    GT:AD:DP:GQ:PL  1/1:0,2:2:6:55,6,0
chr1    63516   .   A   G   1202.03 .   AC=2;AF=1.00;AN=2;DP=40;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=26.22;QD=30.05;SOR=1.292    GT:AD:DP:GQ:PL  1/1:0,40:40:99:1216,120,0
chr1    63527   .   T   C   1002.03 .   AC=2;AF=1.00;AN=2;DP=33;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=26.05;QD=30.36;SOR=1.179    GT:AD:DP:GQ:PL  1/1:0,33:33:99:1016,99,0
chr1    131609  .   C   A   83.60   .   AC=1;AF=0.500;AN=2;BaseQRankSum=-0.275;DP=11;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=20.00;MQRankSum=0.000;QD=7.60;ReadPosRankSum=-1.349;SOR=0.551 GT:AD:DP:GQ:PL  0/1:6,5:11:91:91,0,115
chr1    133483  .   G   T   70.60   .   AC=1;AF=0.500;AN=2;BaseQRankSum=-1.534;DP=8;ExcessHet=3.0103;FS=3.332;MLEAC=1;MLEAF=0.500;MQ=26.65;MQRankSum=1.611;QD=8.83;ReadPosRankSum=-0.703;SOR=2.303  GT:AD:DP:GQ:PL  0/1:5,3:8:78:78,0,100
chr1    264627  .   A   G   37.28   .   AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=49.00;QD=18.64;SOR=0.693    GT:AD:DP:GQ:PL  1/1:0,2:2:6:49,6,0
chr1    268012  .   G   A   330.60  .   AC=1;AF=0.500;AN=2;BaseQRankSum=0.636;DP=116;ExcessHet=3.0103;FS=1.639;MLEAC=1;MLEAF=0.500;MQ=39.33;MQRankSum=-6.168;QD=2.85;ReadPosRankSum=-1.358;SOR=0.895    GT:AD:DP:GQ:PL  0/1:93,23:116:99:338,0,3307
chr1    268130  .   G   T   2188.60 .   AC=1;AF=0.500;AN=2;BaseQRankSum=1.543;DP=228;ExcessHet=3.0103;FS=8.632;MLEAC=1;MLEAF=0.500;MQ=44.88;MQRankSum=-12.533;QD=9.60;ReadPosRankSum=1.105;SOR=0.356    GT:AD:DP:GQ:PL  0/1:128,100:228:99:2196,0,4482
chr1    268516  .   C   T   59.60   .   AC=1;AF=0.500;AN=2;BaseQRankSum=1.465;DP=7;ExcessHet=3.0103;FS=3.680;MLEAC=1;MLEAF=0.500;MQ=52.18;MQRankSum=-0.180;QD=8.51;ReadPosRankSum=-0.366;SOR=2.258  GT:AD:DP:GQ:PL  0/1:5,2:7:67:67,0,138