VARIANT CALLERS - NVIDIA Docs

NVIDIA Clara Parabricks Pipelines accelerated variant callers

BCFTOOLS CALL

Accelerated bcftools call.

bcftools-call calls variants from mpileup output

QUICK START

Copy
Copied!

            
            $ pbrun bcftoolscall --in-file pileup.bcf \
--out-file output.vcf

The command below is the CPU counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.

Copy
Copied!

            
            bcftools call pileup.bcf -c -o output.vcf

OPTIONS

--in-file
--out-file
--num-threads
--variant-sites

--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--license-file LICENSE_FILE
--version

HAPLOTYPECALLER

GPU accelerated haplotypecaller.

This tool runs GPU accelerated haplotypecaller. Users can provide an optional BQSR report to fix the BAM similar to ApplyBQSR. In that case the updated base qualities will be used.

QUICK START

Copy
Copied!

            
            $ pbrun haplotypecaller --ref Ref/Homo_sapiens_assembly38.fasta \
--in-bam mark_dups_gpu.bam \
--in-recal-file recal_gpu.txt \
--out-variants result.vcf

COMPATIBLE GATK4 COMMAND

The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.

Copy
Copied!

            
            # Run ApplyBQSR Step
$ gatk ApplyBQSR --java-options -Xmx30g -R Ref/Homo_sapiens_assembly38.fasta \
-I=mark_dups_cpu.bam --bqsr-recal-file=recal_file.txt -O=cpu_nodups_BQSR.bam

#Run Haplotype Caller
$ gatk HaplotypeCaller --java-options -Xmx30g --input cpu_nodups_BQSR.bam --output \
result_cpu.vcf --reference Ref/Homo_sapiens_assembly38.fasta \
--native-pair-hmm-threads 16

OPTIONS

--ref
--in-bam
--out-variants
--in-recal-file
--haplotypecaller-options
--static-quantized-quals
--ploidy
--interval-file
--interval
--interval-padding
--gvcf
--batch
--disable-read-filter
--max-alternate-alleles
--annotation-group
--gvcf-gq-bands
--dont-use-soft-clipped-bases
--haplotypecaller-options
--rna

--num-gpus NUM_GPUS
--gpu-devices GPU_DEVICES

--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--license-file LICENSE_FILE
--version

MUTECTCALLER

GPU accelerated mutect2.

mutectcaller supports tumor or tumor-normal variant calling. The figure below shows the high level functionality of mutectcaller. All dotted boxes indicate optional data, with some constraints.

QUICK START

Copy
Copied!

            
            $ pbrun mutectcaller --ref Ref/Homo_sapiens_assembly38.fasta \
--in-tumor-bam tumor.bam \
--tumor-name foobar \
--out-vcf output.vcf

COMPATIBLE GATK4 COMMAND

The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.

Copy
Copied!

            
            gatk Mutect2 -R ref.tar.gz --input tumor.bam --tumor-sample foobar --output result.vcf

OPTIONS

--ref
--in-tumor-bam
--tumor-name
--out-vcf
--in-tumor-recal-file
--in-normal-bam
--in-normal-recal-file
--normal-name
--ploidy
--interval-file
--interval
--interval-padding
--mutectcaller-options

--num-gpus NUM_GPUS
--gpu-devices GPU_DEVICES

--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--license-file LICENSE_FILE
--version

SOMATICSNIPER

Accelerated Somatic Sniper.

Somatic sniper supports tumor-normal variant calling. Parabricks has Somatic Sniper as a standalone tool or you can use the Somatic Sniper workflow (sniperworkflow) to generate a VCF file from BAM/CRAM.

QUICK START

Copy
Copied!

            
            $ pbrun somaticsniper --ref  Ref/Homo_sapiens_assembly38.fasta  --in-tumor-bam tumor.bam  --in-normal-bam normal.bam --out-file  output.vcf

COMPATIBLE GATK4 COMMAND

The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.

Copy
Copied!

            
            bam-somaticsniper -q 1 -G -L -F vcf -f  Ref/Homo_sapiens_assembly38.fasta  tumor.bam normal.bam output.vcf

OPTIONS

--ref
--in-tumor-bam
--in-normal-bam
--out-file
--num-threads
--min-mapq
--out-format
--correct
--no-gain
--no-loh

--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--license-file LICENSE_FILE
--version

SOMATICSNIPER WORKFLOW

Somatic sniper workflow to generate VCF from BAM/CRAM input files.

QUICK START

Copy
Copied!

            
            $ pbrun somaticsniper_workflow --ref Ref/Homo_sapiens_assembly38.fasta \
--in-tumor-bam tumor.bam \
--in-normal-bam normal.bam \
--out-prefix output

COMPATIBLE CPU COMMAND

The command below is the CPU counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.

Copy
Copied!

            
            bam-somaticsniper -q 1 -G -L -F vcf -f  Ref/Homo_sapiens_assembly38.fasta  tumor.bam normal.bam output.vcf

bcftools mpileup -A -B -d 2147483647 -Ou -f Ref/Homo_sapiens_assembly38.fasta tumor.bam | bcftools call -c | vcfutils.pl varFilter -Q 20 | awk 'NR > 55 {print}' > output.indel_pileup_Tum.pileup

perl snpfilter.pl --snp-file output.vcf --indel-file output.indel_pileup_Tum.pileup

perl prepare_for_readcount.pl --snp-file output.vcf.SNPfilter

bam-readcount -b 15 -f  Ref/Homo_sapiens_assembly38.fasta -l output.vcf.SNPfilter.pos tumor.bam > output.readcounts.rc

perl fpfilter.pl -snp-file output.vcf.SNPfilter -readcount-file output.readcounts.rc

perl highconfidence.pl -snp-file output.vcf.SNPfilter.fp_pass.vcf

OPTIONS

--ref
--in-tumor-bam
--in-normal-bam
--out-prefix
--num-threads
--min-mapq

--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--license-file LICENSE_FILE
--version

DEEPVARIANT

Run GPU-accelerated deepvariant algorithm.

Parabricks has accelerated Google Deepvariant to extensively use GPUs and finish 30x WGS analysis in 25 minutes instead of hours. The Parabricks flavor of Deepvariant is more like other command line tools that users are familiar with. It takes the BAM and reference as inputs and produces variants as outputs. Currently, Deepvariant is supported for T4, V100, and A100 GPUs.

QUICK START

Copy
Copied!

            
            $ pbrun deepvariant --ref Ref/Homo_sapiens_assembly38.fasta \
--in-bam mark_dups_gpu.bam \
--out-variants output.vcf

COMPATIBLE GOOGLE DEEPVARIANT COMMANDS

The command below is the Google counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.

Copy
Copied!

            
            # Run make_examples in parallel
seq 0 $((N_SHARDS-1)) | \
parallel --eta --halt 2 --joblog "${LOGDIR}/log" --res "${LOGDIR}" \
sudo docker run \
-v ${HOME}:${HOME} \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/make_examples \
--mode calling \
--ref "${REF}" \
--reads "${BAM}" \
--examples "${OUTPUT_DIR}/examples.tfrecord@${N_SHARDS}.gz" \
--task {}

# Run call_variants in parallel
sudo docker run \
-v ${HOME}:${HOME} \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/call_variants \
--outfile "${CALL_VARIANTS_OUTPUT}" \
--examples "${OUTPUT_DIR}/examples.tfrecord@${N_SHARDS}.gz" \
--checkpoint "${MODEL}"

# Run postprocess_variants in parallel
sudo docker run \
-v ${HOME}:${HOME} \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/postprocess_variants \
--ref "${REF}" \
--infile "${CALL_VARIANTS_OUTPUT}" \
--outfile "${FINAL_OUTPUT_VCF}"

OPTIONS

--ref
--in-bam
--out-variants
--pb-model-file
--mode
--proposed-variants
--interval-file
--interval
--disable-use-window-selector-model
--gvcf
--norealign-reads
--sort-by-haplotypes
--keep-duplicates
--vsc-min-count-snps
--vsc-min-count-indels
--vsc-min-fraction-snps
--vsc-min-fraction-indels
--min-mapping-quality
--min-base-quality
--alt-aligned-pileup
--variant-caller

--num-gpus NUM_GPUS
--gpu-devices GPU_DEVICES

--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--license-file LICENSE_FILE
--version

CNVKIT

CPU accelerated Copy number variant calling. You need to pass “–extra-tools” to the installer to use this tool.

Run CNVkit with accelerated coverage calculation from read depths.

QUICK START

Copy
Copied!

            
            $ pbrun cnvkit --ref Ref/Homo_sapiens_assembly38.fasta \
--in-bam mark_dups_gpu.bam
--out-file output.vcf

OPTIONS

--ref
--in-bam
--output-dir
--cnvkit-options
--generate-vcf

--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--license-file LICENSE_FILE
--version

MANTA

Structural variant (SV) and indel caller from mapped paired-end sequencing reads. This tools is not accelerated and original precompiled binary will run on the server.

QUICK START

Copy
Copied!

            
            $ pbrun manta --ref Ref/Homo_sapiens_assembly38.fasta \
--in-tumor-bam tumor.bam \
--in-normal-bam normal.bam \
--out-prefix output

OPTIONS

--ref
--in-tumor-bam
--in-normal-bam
--bed
--out-prefix
--num-threads
--manta-options

--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--license-file LICENSE_FILE
--version

STRELKA

SNP and indel caller from mapped paired-end sequencing reads. This tools is not accelerated and original precompiled binary will run on the server.

QUICK START

Copy
Copied!

            
            $ pbrun manta --ref Ref/Homo_sapiens_assembly38.fasta \
--in-tumor-bam tumor.bam \
--in-normal-bam normal.bam \
--indel-candidates candidates.vcf \
--out-prefix output

OPTIONS

--ref
--in-tumor-bam
--in-normal-bam
--indel-candidates
--bed
--out-prefix
--num-threads
--strelka-options

--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--license-file LICENSE_FILE
--version

STRELKA WORKFLOW

Strelka workflow to generate VCF from BAM/CRAM input files.

QUICK START

Copy
Copied!

            
            $ pbrun strelka_workflow --ref Ref/Homo_sapiens_assembly38.fasta \
--in-tumor-bam tumor.bam \
--in-normal-bam normal.bam \
--out-prefix output

COMPATIBLE GATK4 COMMAND

The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.

Copy
Copied!

            
            mkdir -p manta_work

python $MANTA_DIR/bin/configManta.py --referenceFasta Ref/Homo_sapiens_assembly38.fasta \
    --normalBam ${NORMAL} --tumorBam tumor.bam \
    --runDir manta_work

cd manta_work

python ./runWorkflow.py -m local -j ${MAX_NUM_PROCESSORS}

cd ..

mkdir -p strelka_work

python $STRELKA_PATH/configureStrelkaSomaticWorkflow.py \
    --referenceFasta Ref/Homo_sapiens_assembly38.fasta \
    --normalBam normal.bam --tumorBam tumor.bam \
    --indelCandidates ${WORK_PATH}/manta_work/results/variants/candidateSmallIndels.vcf.gz \
    --runDir strelka_work

cd strelka_work

python ./runWorkflow.py -m local -j ${MAX_NUM_PROCESSORS}

cd ..

OPTIONS

--ref
--in-tumor-bam
--in-normal-bam
--out-prefix
--num-threads

--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--license-file LICENSE_FILE
--version