Comparison with Baseline Tools

Many users want to compare output generated by Parabricks software with other standard tools. We recommend the following way to compare output generated by Parabricks software and the counterpart non-accelerated software.

GATK4 sorts the SAM files based on QNAME, FLAG, RNAME, POS, MAPQ, MRNM/RNEXT, MPOS/PNEXT, and ISIZE. If all these fields are the same for two different SAMs they are considered equal for sorting purposes. Therefore, the way that we compare two sorted BAMs is by using the BamUtil diff tool to compare these fields, and there should be no difference reported.

Copy
Copied!
            

$ bam diff --in1 mark_dups_gpu.bam --in2 mark_dups_cpu.bam --noCigar --isize --flag --mate --mapQual

The output of this comparison should result in no differences.

The files generated by Parabricks and GATK4 should be exactly the same. There should be no output from the following command:

Copy
Copied!
            

$ diff -w recal_gpu.txt recal_cpu.txt

To compare VCF files we use the GATK Concordance tools to get sensitivity and specificity of SNPs and INDELs. When the following command is run, variant accuracy results will be stored in out.txt.

Copy
Copied!
            

$ gatk Concordance --evaluation result_gpu.vcf --truth result_cpu.vcf --summary out.txt

© Copyright 2022, Nvidia. Last updated on Jun 28, 2023.