VARIANT CALLERS
NVIDIA Clara Parabricks Pipelines accelerated variant callers
GPU accelerated haplotypecaller.
This tool runs GPU accelerated haplotypecaller. Users can provide an optional BQSR report to fix the BAM similar to ApplyBQSR. In that case the updated base qualities will be used.
QUICK START
$ pbrun haplotypecaller --ref Ref/Homo_sapiens_assembly38.fasta \
--in-bam mark_dups_gpu.bam \
--in-recal-file recal_gpu.txt \
--out-variants result.vcf
COMPATIBLE GATK4 COMMAND
The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.
# Run ApplyBQSR Step
$ gatk ApplyBQSR --java-options -Xmx30g -R Ref/Homo_sapiens_assembly38.fasta \
-I=mark_dups_cpu.bam --bqsr-recal-file=recal_file.txt -O=cpu_nodups_BQSR.bam
#Run Haplotype Caller
$ gatk HaplotypeCaller --java-options -Xmx30g --input cpu_nodups_BQSR.bam --output \
result_cpu.vcf --reference Ref/Homo_sapiens_assembly38.fasta \
--native-pair-hmm-threads 16
OPTIONS
- --ref
- --in-bam
- --out-variants
- --in-recal-file
- --haplotypecaller-options
- --static-quantized-quals
- --ploidy
- --interval-file
- --interval
- --interval-padding
- --interval
- --gvcf
- --batch
- --disable-read-filter
- --max-alternate-alleles
- --annotation-group
- --gvcf-gq-bands
- --tmp-dir
- --num-gpus
- --gpu-devices
(required) The reference genome in fasta format.
(required) Path to the input bam file.
(required) Path of .vcf, g.vcf, or gvcf file.
Path to the input BQSR report. Only required if ApplyBQSR step is needed.
Pass supported haplotype caller options as one string. Current original haplotypecaller supported options: -min-pruning , -standard-min-confidence-threshold-for-calling , -max-reads-per-alignment-start , -min-dangling-branch-length , and -pcr-indel-model .
Use static quantized quality scores to a given number of levels. Repeat this option multiple times for multiple bins.
Defaults to 2.
Ploidy assumed for the bam file. Currently only haploid (ploidy 1) and diploid (ploidy 2) are supported.
Path to an interval file for BQSR step with possible formats: Picard-style (.interval_list or .picard), GATK-style (.list or .intervals), or BED file (.bed). This option can be used multiple times (default: None)
(-L) Interval within which to call variants from the input reads. All intervals will have a padding of 100 to get read records and overlapping intervals will be combined. Interval files should be passed using the –interval-file option. This option can be used multiple times.
e.g. "-L chr1 -L chr2:10000 -L chr3:20000+ -L chr4:10000-20000"
(default: None)
(-ip) Padding size (in base pairs) to add to each interval you are including (default: None)
(-L) Interval within which to call the variants from the bam file. This option can be used multiple times. All intervals will have a padding of 100 and overlapping intervals will be combined. The intervals can be specified in a file using the BED file format or GATK style format.
e.g. "-L chr1 -L chr2:1000-3100"
or "-L interval.bed"
.
Defaults to False.
Generate variant calls in gvcf format. When using this option –out-variants file should end with g.vcf or g.vcf.gz. If the --out-variants
file ends in gz, the tool will generate gvcf.gz and index for it.
Given an input list of BAMs, run the variant calling of each BAM using one GPU, and process BAMs in parallel based on how many GPUs the system has.
Disable the read filters for bam entries. Currently supported read filters that can be disabled are: MappingQualityAvailableReadFilter, MappingQualityReadFilter, and NotSecondaryAlignmentReadFilter. This option can be repeated multiple times.
Maximum number of alternate alleles to genotype (default: None)
(-G) Which groups of annotations to add to the output variant calls. Currently supported annotation groups: StandardAnnotation, StandardHCAnnotation, AS_StandardAnnotation (default: None)
(-GQB) Exclusive upper bounds for reference confidence GQ bands. Must be in the range [1, 100] and specified in increasing order (default: None)
Defaults to ..
Full path to the directory where temporary files will be stored.
Defaults to number of GPUs in the system.
The number of GPUs to be used for this analysis task.
Which GPU devices to use for a run. By default, all GPU devices will be used. To set specific GPU devices, enter a comma-separated list of GPU device numbers.
GPU accelerated mutect2.
mutectcaller supports tumor or tumor-normal variant calling. The figure below shows high level functionality of mutectcaller. All dotted boxes are optional with some constraints.
QUICK START
$ pbrun mutectcaller --ref Ref/Homo_sapiens_assembly38.fasta \
--in-tumor-bam tumor.bam \
--tumor-name foobar \
--out-vcf output.vcf
COMPATIBLE GATK4 COMMAND
The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.
gatk Mutect2 -R ref.tar.gz --input tumor.bam --tumor-sample foobar --output result.vcf
OPTIONS
- --ref
- --in-tumor-bam
- --tumor-name
- --out-vcf
- --in-tumor- recal-file
- --in-normal-bam
- --in-normal-recal-file
- --normal-name
- --ploidy
- --interval-file
- --interval
- --interval-padding
- --tmp-dir
- --num-gpus
- --gpu-devices
(required) The reference genome in fasta format. We assume that the indexing required to run bwa has been completed by the user.
(required) Path of bam file for tumor reads.
(required) Name of sample for tumor reads.
(required) Path to the VCF output file.
Path of BQSR report for tumor sample.
Path of bam file for normal reads.
Path of BQSR report for normal sample.
Name of sample for normal reads.
Ploidy assumed for the bam file. Currently only haploid (ploidy 1) and diploid (ploidy 2) are supported.
Path to an interval file for BQSR step with possible formats: Picard-style (.interval_list or .picard), GATK-style (.list or .intervals), or BED file (.bed). This option can be used multiple times (default: None)
(-L) Interval within which to call variants from the input reads. All intervals will have a padding of 100 to get read records and overlapping intervals will be combined. Interval files should be passed using the --interval-file
option. This option can be used multiple times.
e.g. "-L chr1 -L chr2:10000 -L chr3:20000+ -L chr4:10000-20000"
(default: None)
(-ip) Padding size (in base pairs) to add to each interval you are including (default: None)
Defaults to ..
Full path to the directory where temporary files will be stored.
Defaults to number of GPUs in the system.
The number of GPUs to be used for this analysis task.
Which GPU devices to use for a run. By default, all GPU devices will be used. To set specific GPU devices, enter a comma-separated list of GPU device numbers.
Run GPU-accelerated deepvariant algorithm.
Parabricks has accelerated Google Deepvariant to extensively use GPUs and finish 30x WGS analysis in 25 minutes. The Parabricks flavor of Deepvariant is more like other commandline tools that users are familiar with. It takes the BAM and reference as inputs and produces variants as outputs. In the next versions, we will allow users to choose the exact model to use.
QUICK START
$ pbrun deepvariant --ref Ref/Homo_sapiens_assembly38.fasta \
--in-bam mark_dups_gpu.bam \
--out-variants output.vcf
COMPATIBLE GOOGLE DEEPVARIANT COMMANDS
The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.
# Run make_examples in parallel
seq 0 $((N_SHARDS-1)) | \
parallel --eta --halt 2 --joblog "${LOGDIR}/log" --res "${LOGDIR}" \
sudo docker run \
-v ${HOME}:${HOME} \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/make_examples \
--mode calling \
--ref "${REF}" \
--reads "${BAM}" \
--examples "${OUTPUT_DIR}/examples.tfrecord@${N_SHARDS}.gz" \
--regions '"chr20:10,000,000-10,010,000"' \
--task {}
# Run call_variants in parallel
sudo docker run \
-v ${HOME}:${HOME} \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/call_variants \
--outfile "${CALL_VARIANTS_OUTPUT}" \
--examples "${OUTPUT_DIR}/examples.tfrecord@${N_SHARDS}.gz" \
--checkpoint "${MODEL}"
# Run postprocess_variants in parallel
sudo docker run \
-v ${HOME}:${HOME} \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/postprocess_variants \
--ref "${REF}" \
--infile "${CALL_VARIANTS_OUTPUT}" \
--outfile "${FINAL_OUTPUT_VCF}"
OPTIONS
- --ref
- --in-bam
- --out-variants
- --pb-model-file
- --interval-file
- --interval
- --interval-padding
- --disable-use-window-selector-model
- --gvcf
- --tmp-dir
- --num-gpus
- --gpu-devices
(required) The reference genome in fasta format.
(required) Path to the input BAM file.
(required) Name of output vcf file.
Path of a non-default parabricks model file for deepvariant.
Path to an interval file for BQSR step with possible formats: Picard-style (.interval_list or .picard), GATK-style (.list or .intervals), or BED file (.bed). This option can be used multiple times (default: None)
(-L) Interval within which to call variants from the input reads. All intervals will have a padding of 100 to get read records and overlapping intervals will be combined. Interval files should be passed using the --interval-file
option. This option can be used multiple times.
e.g. "-L chr1 -L chr2:10000 -L chr3:20000+ -L chr4:10000-20000"
(default: None)
(-ip) Padding size (in base pairs) to add to each interval you are including (default: None)
Change the window selector model from Allele Count Linear to Variant Reads. This option will increase the accuracy and runtime (default: None)
Generate variant calls in gvcf format.
Defaults to ..
Full path to the directory where temporary files will be stored.
Defaults to number of GPUs in the system.
The number of GPUs to be used for this analysis task.
Which GPU devices to use for a run. By default, all GPU devices will be used. To set specific GPU devices, enter a comma-separated list of GPU device numbers.
CPU accelerated Copy number variant calling.
Run CNVkit with accelerated coverage calculation from read depths. CNVkit is not available as part of the free for Covid19 program.
QUICK START
$ pbrun cnvkit --ref Ref/Homo_sapiens_assembly38.fasta \
--in-bam mark_dups_gpu.bam
--out-file output.vcf
OPTIONS
- --ref
- --in-bam
- --out-file
- --cnvkit-options
- --generate-vcf
(required) Path to the reference file.
(required) Path to the bam file.
Path to the output vcf file.
Pass supported cnvkit options as one string.
e.g. --cnvkit-options="--count-reads --drop-low-coverage"
.
Export the output cns to vcf after running batch (default: None)