VARIANT CALLERS¶
NVIDIA Clara Parabricks Pipelines accelerated variant callers
BCFTOOLS CALL¶
Accelerated bcftools call.
bcftools-call calls variants from mpileup output
QUICK START¶
$ pbrun bcftoolscall --in-file pileup.bcf \
--out-file output.vcf
COMPATIBLE CPU COMMAND¶
The command below is the CPU counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.
bcftools call pileup.bcf -c -o output.vcf
OPTIONS¶
- --in-file
Path to the input mpileup file (default: None)
- --out-file
Path of output file. If this option is not used, it will write to standard output (default: None)
- --num-threads
Number of threads for worker (default: 1)
- --variant-sites
Output variant sites only (default: None)
- --tmp-dir TMP_DIR
Full path to the directory where temporary files will be stored.
- --with-petagene-dir WITH_PETAGENE_DIR
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
- --keep-tmp
Do not delete the directory storing temporary files after completion.
- --license-file LICENSE_FILE
Path to license file license.bin if not in installation directory.
- --version
View compatible software versions.
HAPLOTYPECALLER¶
GPU accelerated haplotypecaller.
This tool runs GPU accelerated haplotypecaller. Users can provide an optional BQSR report to fix the BAM similar to ApplyBQSR. In that case the updated base qualities will be used.
QUICK START¶
$ pbrun haplotypecaller --ref Ref/Homo_sapiens_assembly38.fasta \
--in-bam mark_dups_gpu.bam \
--in-recal-file recal_gpu.txt \
--out-variants result.vcf
COMPATIBLE GATK4 COMMAND¶
The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.
# Run ApplyBQSR Step
$ gatk ApplyBQSR --java-options -Xmx30g -R Ref/Homo_sapiens_assembly38.fasta \
-I=mark_dups_cpu.bam --bqsr-recal-file=recal_file.txt -O=cpu_nodups_BQSR.bam
#Run Haplotype Caller
$ gatk HaplotypeCaller --java-options -Xmx30g --input cpu_nodups_BQSR.bam --output \
result_cpu.vcf --reference Ref/Homo_sapiens_assembly38.fasta \
--native-pair-hmm-threads 16
OPTIONS¶
- --ref
(required) The reference genome in fasta format.
- --in-bam
(required) Path to the input bam file.
- --out-variants
(required) Path of .vcf, g.vcf, or gVCF file.
- --in-recal-file
Path to the input BQSR report. Only required if ApplyBQSR step is needed.
- --haplotypecaller-options
Pass supported haplotype caller options as one string. Current original haplotypecaller supported options: -min-pruning, -standard-min-confidence-threshold-for-calling, -max-reads-per-alignment-start, -min-dangling-branch-length, and -pcr-indel-model .
- --static-quantized-quals
Use static quantized quality scores to a given number of levels. Repeat this option multiple times for multiple bins.
- --ploidy
Defaults to 2.
Ploidy assumed for the bam file. Currently only haploid (ploidy 1) and diploid (ploidy 2) are supported.
- --interval-file
Path to an interval file for BQSR step with possible formats: Picard-style (.interval_list or .picard), GATK-style (.list or .intervals), or BED file (.bed). This option can be used multiple times (default: None)
- --interval
(-L) Interval within which to call variants from the input reads. All intervals will have a padding of 100 to get read records and overlapping intervals will be combined. Interval files should be passed using the –interval-file option. This option can be used multiple times.
e.g.
"-L chr1 -L chr2:10000 -L chr3:20000+ -L chr4:10000-20000"
(default: None)- --interval-padding
(-ip) Padding size (in base pairs) to add to each interval you are including (default: None)
- --gvcf
Defaults to False.
Generate variant calls in gVCF format. When using this option –out-variants file should end with g.vcf or g.vcf.gz. If the
--out-variants
file ends in gz, the tool will generate gvcf.gz and index for it.- --batch
Given an input list of BAMs, run the variant calling of each BAM using one GPU, and process BAMs in parallel based on how many GPUs the system has.
- --disable-read-filter
Disable the read filters for bam entries. Currently supported read filters that can be disabled are: MappingQualityAvailableReadFilter, MappingQualityReadFilter, and NotSecondaryAlignmentReadFilter. This option can be repeated multiple times.
- --max-alternate-alleles
Maximum number of alternate alleles to genotype (default: None)
- --annotation-group
(-G) Which groups of annotations to add to the output variant calls. Currently supported annotation groups: StandardAnnotation, StandardHCAnnotation, AS_StandardAnnotation (default: None)
- --gvcf-gq-bands
(-GQB) Exclusive upper bounds for reference confidence GQ bands. Must be in the range [1, 100] and specified in increasing order (default: None)
- --dont-use-soft-clipped-bases
Dont’ use fot clipped bases for variant calling
- --haplotypecaller-options
Pass supported haplotype caller options as one string. Currently supported original haplotypecaller options:
-min-pruning <int>
-standard-min-confidence-threshold-for-calling <int>
-max-reads-per-alignment-start <int>
-min-dangling-branch-length <int>
-pcr-indel-model <NONE, HOSTILE AGGRESSIVE, CONSERVATIVE>
e.g. –haplotypecaller-options=”-min-pruning 4 -standard-min-confidence-threshold-for-calling 30”
- --rna
Run haplotypecaller optimized for RNA Data.
- --num-gpus NUM_GPUS
Number of GPUs to use for a run. GPUs 0..(NUM_GPUS-1) will be used.
- --gpu-devices GPU_DEVICES
Which GPU devices to use for a run. By default, all GPU devices will be used. To use specific GPU devices enter a comma-separated list of GPU device numbers. Possible device numbers can be found by examining the output of the nvidia-smi command. For example, using –gpu-devices 0,1 would only use the first two GPUs.
- --tmp-dir TMP_DIR
Full path to the directory where temporary files will be stored.
- --with-petagene-dir WITH_PETAGENE_DIR
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
- --keep-tmp
Do not delete the directory storing temporary files after completion.
- --license-file LICENSE_FILE
Path to license file license.bin if not in installation directory.
- --version
View compatible software versions.
MUTECTCALLER¶
GPU accelerated mutect2.
mutectcaller supports tumor or tumor-normal variant calling. The figure below shows the high level functionality of mutectcaller. All dotted boxes indicate optional data, with some constraints.
QUICK START¶
$ pbrun mutectcaller --ref Ref/Homo_sapiens_assembly38.fasta \
--in-tumor-bam tumor.bam \
--tumor-name foobar \
--out-vcf output.vcf
COMPATIBLE GATK4 COMMAND¶
The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.
gatk Mutect2 -R ref.tar.gz --input tumor.bam --tumor-sample foobar --output result.vcf
OPTIONS¶
- --ref
(required) The reference genome in fasta format. We assume that the indexing required to run bwa has been completed by the user.
- --in-tumor-bam
(required) Path of bam file for tumor reads.
- --tumor-name
(required) Name of sample for tumor reads.
- --out-vcf
(required) Path to the VCF output file.
- --in-tumor-recal-file
Path of BQSR report for tumor sample.
- --in-normal-bam
Path of bam file for normal reads.
- --in-normal-recal-file
Path of BQSR report for normal sample.
- --normal-name
Name of sample for normal reads.
- --ploidy
Ploidy assumed for the bam file. Currently only haploid (ploidy 1) and diploid (ploidy 2) are supported.
- --interval-file
Path to an interval file for BQSR step with possible formats: Picard-style (.interval_list or .picard), GATK-style (.list or .intervals), or BED file (.bed). This option can be used multiple times (default: None)
- --interval
(-L) Interval within which to call variants from the input reads. All intervals will have a padding of 100 to get read records and overlapping intervals will be combined. Interval files should be passed using the
--interval-file
option. This option can be used multiple times.e.g.
"-L chr1 -L chr2:10000 -L chr3:20000+ -L chr4:10000-20000"
(default: None)- --interval-padding
(-ip) Padding size (in base pairs) to add to each interval you are including (default: None)
- --mutectcaller-options
Pass supported mutectcaller options as one string. Currently supported original mutectcaller options:
-pcr-indel-model <NONE, HOSTILE, AGGRESSIVE, CONSERVATIVE>
e.g. –mutectcaller-options=”-pcr-indel-model HOSTILE” (default: None)
- --num-gpus NUM_GPUS
Number of GPUs to use for a run. GPUs 0..(NUM_GPUS-1) will be used.
- --gpu-devices GPU_DEVICES
Which GPU devices to use for a run. By default, all GPU devices will be used. To use specific GPU devices enter a comma-separated list of GPU device numbers. Possible device numbers can be found by examining the output of the nvidia-smi command. For example, using –gpu-devices 0,1 would only use the first two GPUs.
- --tmp-dir TMP_DIR
Full path to the directory where temporary files will be stored.
- --with-petagene-dir WITH_PETAGENE_DIR
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
- --keep-tmp
Do not delete the directory storing temporary files after completion.
- --license-file LICENSE_FILE
Path to license file license.bin if not in installation directory.
- --version
View compatible software versions.
SOMATICSNIPER¶
Accelerated Somatic Sniper.
Somatic sniper supports tumor-normal variant calling. Parabricks has Somatic Sniper as a standalone tool or you can use the Somatic Sniper workflow (sniperworkflow) to generate a VCF file from BAM/CRAM.
QUICK START¶
$ pbrun somaticsniper --ref Ref/Homo_sapiens_assembly38.fasta --in-tumor-bam tumor.bam --in-normal-bam normal.bam --out-file output.vcf
COMPATIBLE GATK4 COMMAND¶
The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.
bam-somaticsniper -q 1 -G -L -F vcf -f Ref/Homo_sapiens_assembly38.fasta tumor.bam normal.bam output.vcf
OPTIONS¶
- --ref
Path to the reference file (default: None)
- --in-tumor-bam
Path of bam/cram file for tumor reads. Path can be a Google Cloud Storage object (default: None)
- --in-normal-bam
Path of bam/cram file for normal reads. Path can be a Google Cloud Storage object (default: None)
- --out-file
Path of output file (default: None)
- --num-threads
Number of threads for worker (default: 1)
- --min-mapq
Filtering reads with mapping quality less than this value (default: 0)
- --out-format
Type of output format. Possible values are {classic, vcf} (default: classic)
- --correct
Fix baseline bugs. If this option is not passed, the same output will be generated as baseline (default: None)
- --no-gain
Do not report Gain of Reference variants as determined by genotypes (default: None)
- --no-loh
Do not report LOH variants as determined by genotypes (default: None)
- --tmp-dir TMP_DIR
Full path to the directory where temporary files will be stored.
- --with-petagene-dir WITH_PETAGENE_DIR
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
- --keep-tmp
Do not delete the directory storing temporary files after completion.
- --license-file LICENSE_FILE
Path to license file license.bin if not in installation directory.
- --version
View compatible software versions.
SOMATICSNIPER WORKFLOW¶
Somatic sniper workflow to generate VCF from BAM/CRAM input files.

QUICK START¶
$ pbrun somaticsniper_workflow --ref Ref/Homo_sapiens_assembly38.fasta \
--in-tumor-bam tumor.bam \
--in-normal-bam normal.bam \
--out-prefix output
COMPATIBLE CPU COMMAND¶
The command below is the CPU counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.
bam-somaticsniper -q 1 -G -L -F vcf -f Ref/Homo_sapiens_assembly38.fasta tumor.bam normal.bam output.vcf
bcftools mpileup -A -B -d 2147483647 -Ou -f Ref/Homo_sapiens_assembly38.fasta tumor.bam | bcftools call -c | vcfutils.pl varFilter -Q 20 | awk 'NR > 55 {print}' > output.indel_pileup_Tum.pileup
perl snpfilter.pl --snp-file output.vcf --indel-file output.indel_pileup_Tum.pileup
perl prepare_for_readcount.pl --snp-file output.vcf.SNPfilter
bam-readcount -b 15 -f Ref/Homo_sapiens_assembly38.fasta -l output.vcf.SNPfilter.pos tumor.bam > output.readcounts.rc
perl fpfilter.pl -snp-file output.vcf.SNPfilter -readcount-file output.readcounts.rc
perl highconfidence.pl -snp-file output.vcf.SNPfilter.fp_pass.vcf
OPTIONS¶
- --ref
(required) The reference genome in fasta format. We assume that the indexing required to run bwa has been completed by the user.
- --in-tumor-bam
(required) Path of bam file for tumor reads.
- --in-normal-bam
Path of bam file for normal reads.
- --out-prefix
Prefix filename for output data (default: None)
- --num-threads
Number of threads for worker (default: 1)
- --min-mapq
Filtering reads with mapping quality less than this value (default: 1)
- --tmp-dir TMP_DIR
Full path to the directory where temporary files will be stored.
- --with-petagene-dir WITH_PETAGENE_DIR
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
- --keep-tmp
Do not delete the directory storing temporary files after completion.
- --license-file LICENSE_FILE
Path to license file license.bin if not in installation directory.
- --version
View compatible software versions.
DEEPVARIANT¶
Run GPU-accelerated deepvariant algorithm.
Parabricks has accelerated Google Deepvariant to extensively use GPUs and finish 30x WGS analysis in 25 minutes instead of hours. The Parabricks flavor of Deepvariant is more like other command line tools that users are familiar with. It takes the BAM and reference as inputs and produces variants as outputs. Currently, Deepvariant is supported for T4, V100, and A100 GPUs.
QUICK START¶
$ pbrun deepvariant --ref Ref/Homo_sapiens_assembly38.fasta \
--in-bam mark_dups_gpu.bam \
--out-variants output.vcf
COMPATIBLE GOOGLE DEEPVARIANT COMMANDS¶
The command below is the Google counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.
# Run make_examples in parallel
seq 0 $((N_SHARDS-1)) | \
parallel --eta --halt 2 --joblog "${LOGDIR}/log" --res "${LOGDIR}" \
sudo docker run \
-v ${HOME}:${HOME} \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/make_examples \
--mode calling \
--ref "${REF}" \
--reads "${BAM}" \
--examples "${OUTPUT_DIR}/examples.tfrecord@${N_SHARDS}.gz" \
--task {}
# Run call_variants in parallel
sudo docker run \
-v ${HOME}:${HOME} \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/call_variants \
--outfile "${CALL_VARIANTS_OUTPUT}" \
--examples "${OUTPUT_DIR}/examples.tfrecord@${N_SHARDS}.gz" \
--checkpoint "${MODEL}"
# Run postprocess_variants in parallel
sudo docker run \
-v ${HOME}:${HOME} \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/postprocess_variants \
--ref "${REF}" \
--infile "${CALL_VARIANTS_OUTPUT}" \
--outfile "${FINAL_OUTPUT_VCF}"
OPTIONS¶
- --ref
(required) The reference genome in fasta format.
- --in-bam
(required) Path to the input BAM file.
- --out-variants
(required) Name of output VCF file.
- --pb-model-file
Path of a non-default parabricks model file for deepvariant.
- --interval-file
Path to an interval file for BQSR step with possible formats: Picard-style (.interval_list or .picard), GATK-style (.list or .intervals), or BED file (.bed). This option can be used multiple times (default: None)
- --interval
(-L) Interval within which to call variants from the input reads. Overlapping intervals will be combined. Interval files should be passed using the
--interval-file
option. This option can be used multiple times.e.g.
"-L chr1 -L chr2:10000 -L chr3:20000+ -L chr4:10000-20000"
(default: None)- --disable-use-window-selector-model
Change the window selector model from Allele Count Linear to Variant Reads. This option will increase the accuracy and run time (default: Allele Count Linear)
- --gvcf
Generate variant calls in gVCF format.
- --num-gpus NUM_GPUS
Number of GPUs to use for a run. GPUs 0..(NUM_GPUS-1) will be used.
- --gpu-devices GPU_DEVICES
Which GPU devices to use for a run. By default, all GPU devices will be used. To use specific GPU devices enter a comma-separated list of GPU device numbers. Possible device numbers can be found by examining the output of the nvidia-smi command. For example, using –gpu-devices 0,1 would only use the first two GPUs.
- --tmp-dir TMP_DIR
Full path to the directory where temporary files will be stored.
- --with-petagene-dir WITH_PETAGENE_DIR
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
- --keep-tmp
Do not delete the directory storing temporary files after completion.
- --license-file LICENSE_FILE
Path to license file license.bin if not in installation directory.
- --version
View compatible software versions.
CNVKIT¶
CPU accelerated Copy number variant calling.
Run CNVkit with accelerated coverage calculation from read depths.
QUICK START¶
$ pbrun cnvkit --ref Ref/Homo_sapiens_assembly38.fasta \
--in-bam mark_dups_gpu.bam
--out-file output.vcf
OPTIONS¶
- --ref
(required) Path to the reference file.
- --in-bam
(required) Path to the bam file.
- --output-dir
Path to the directory that will contain all of the generated files.
- --cnvkit-options
Pass supported cnvkit options as one string. Currently supported options are –count-reads and –drop-low-coverage.
e.g.
--cnvkit-options="--count-reads --drop-low-coverage"
.- --generate-vcf
Export the output cns to VCF after running batch (default: None)
- --tmp-dir TMP_DIR
Full path to the directory where temporary files will be stored.
- --with-petagene-dir WITH_PETAGENE_DIR
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
- --keep-tmp
Do not delete the directory storing temporary files after completion.
- --license-file LICENSE_FILE
Path to license file license.bin if not in installation directory.
- --version
View compatible software versions.
MANTA¶
Structural variant (SV) and indel caller from mapped paired-end sequencing reads. This tools is not accelerated and will be installed from the source on the server.
QUICK START¶
$ pbrun manta --ref Ref/Homo_sapiens_assembly38.fasta \
--in-tumor-bam tumor.bam \
--in-normal-bam normal.bam \
--out-prefix output
OPTIONS¶
- --ref
Path to the reference file (default: None)
- --in-tumor-bam
Path of bam file for tumor reads (default: None)
- --in-normal-bam
Path of bam file for normal reads. This option can be used multiple times (default: None)
- --bed
Optional bgzip-compressed/tabix-indexed BED file containing the set of regions to call (default: None)
- --out-prefix
Prefix filename for output data (default: None)
- --num-threads
Number of threads for worker (default: 1)
- --manta-options
Pass supported manta options as one string. e.g. –manta-options=”–rna –unstrandedRNA” (default: None)
- --tmp-dir TMP_DIR
Full path to the directory where temporary files will be stored.
- --with-petagene-dir WITH_PETAGENE_DIR
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
- --keep-tmp
Do not delete the directory storing temporary files after completion.
- --license-file LICENSE_FILE
Path to license file license.bin if not in installation directory.
- --version
View compatible software versions.
STRELKA¶
SNP and indel caller from mapped paired-end sequencing reads. This tools is not accelerated and will be installed from the source on the server.
QUICK START¶
$ pbrun manta --ref Ref/Homo_sapiens_assembly38.fasta \
--in-tumor-bam tumor.bam \
--in-normal-bam normal.bam \
--indel-candidates candidates.vcf \
--out-prefix output
OPTIONS¶
- --ref
Path to the reference file (default: None)
- --in-tumor-bam
Path of bam file for tumor reads (default: None)
- --in-normal-bam
Path of bam file for normal reads. This option can be used multiple times (default: None)
- --indel-candidates
Path to a VCF of candidate indel alleles. Must be in vcf/vcf.gz format. This option can be used multiple times (default: None)
- --bed
Optional bgzip-compressed/tabix-indexed BED file containing the set of regions to call (default: None)
- --out-prefix
Prefix filename for output data (default: None)
- --num-threads
Number of threads for worker (default: 1)
- --strelka-options
Pass supported strelka options as one string. e.g. –strelka-options=”–exome” (default: None)
- --tmp-dir TMP_DIR
Full path to the directory where temporary files will be stored.
- --with-petagene-dir WITH_PETAGENE_DIR
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
- --keep-tmp
Do not delete the directory storing temporary files after completion.
- --license-file LICENSE_FILE
Path to license file license.bin if not in installation directory.
- --version
View compatible software versions.
STRELKA WORKFLOW¶
Strelka workflow to generate VCF from BAM/CRAM input files.

QUICK START¶
$ pbrun strelka_workflow --ref Ref/Homo_sapiens_assembly38.fasta \
--in-tumor-bam tumor.bam \
--in-normal-bam normal.bam \
--out-prefix output
COMPATIBLE GATK4 COMMAND¶
The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.
mkdir -p manta_work
python $MANTA_DIR/bin/configManta.py --referenceFasta Ref/Homo_sapiens_assembly38.fasta \
--normalBam ${NORMAL} --tumorBam tumor.bam \
--runDir manta_work
cd manta_work
python ./runWorkflow.py -m local -j ${MAX_NUM_PROCESSORS}
cd ..
mkdir -p strelka_work
python $STRELKA_PATH/configureStrelkaSomaticWorkflow.py \
--referenceFasta Ref/Homo_sapiens_assembly38.fasta \
--normalBam normal.bam --tumorBam tumor.bam \
--indelCandidates ${WORK_PATH}/manta_work/results/variants/candidateSmallIndels.vcf.gz \
--runDir strelka_work
cd strelka_work
python ./runWorkflow.py -m local -j ${MAX_NUM_PROCESSORS}
cd ..
OPTIONS¶
- --ref
(required) The reference genome in fasta format. We assume that the indexing required to run bwa has been completed by the user.
- --in-tumor-bam
(required) Path of bam file for tumor reads.
- --in-normal-bam
Path of bam file for normal reads.
- --out-prefix
Prefix filename for output data (default: None)
- --num-threads
Number of threads for worker (default: 1)
- --tmp-dir TMP_DIR
Full path to the directory where temporary files will be stored.
- --with-petagene-dir WITH_PETAGENE_DIR
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
- --keep-tmp
Do not delete the directory storing temporary files after completion.
- --license-file LICENSE_FILE
Path to license file license.bin if not in installation directory.
- --version
View compatible software versions.