VARIANT PROCESSING

ACCELERATED VARIANT MANIPULATION METHODS

VCFQC

Generating QC charts from the input VCF. You need to pass “–extra-tools” to the installer to use this tool.

QUICK START

$ pbrun vcfqc --in-vcf sample.vcf \
                        --output-dir sample_charts

OPTIONS

--in-vcf

(required) Path to the vcf file (default: None)

--output-dir

(required) Output Directory to store all images (default: None)

--in-bamqc-dir

Input directory containing BAMQC images. These should be the output of collectmultiplemetrics (default: None)

--caller

The variant caller tag for the input VCF (default:None)

--quality

Specify the quality field to use and create plot for the quality field in the vcf. Generally it is QUAL (default: None)

--mapq

Specify the quality field to use and create plot for the mapping quality field in the vcf. Generally it is MQ (default: None)

--depth

Specify the quality field to use and create plot for the depth field in the vcf. Generally it is DP (default: None)

--allele-depth

Specify the quality field to use and create plot for the allele depth field in the vcf. Generally it is AD (default: None)

--vaf

Specify the quality field to use and create plot for the variant allele frequeny field in the vcf. Generally it is VAF (default: None)

--window-size

Window size for the vcfqc tool (default: 1000)

--image-format

Image format for saved plots [png] (default: png)

--threads

Number of processing threads for VCF reading (default: 4)

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--seccomp-override

Do not override seccomp options for docker

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.

VCFQCBYBAM

Generate a summary file using samtoolsmpileup that can be used for plotting/report generation. You need to pass “–extra-tools” to the installer to use this tool.

QUICK START

$ pbrun vcfqcbybam --in-vcf sample.vcf \
                    --in-bam sample.bam \
                    --out-file output_pileup.txt\
                    --output-dir sample_qc

OPTIONS

--ref

Path to the reference file (default: None)

--in-vcf

(required) Path to the vcf file to be QC’ed (default: None)

--in-bam

(required) Path to the bam file. Path can be a Google Cloud Storage object or AWS S3 Storage object. This option can be used multiple times (default: None)

--out-file

(required) Path of output text pileup (default: None)

--output-dir

(required) Path to the directory that will contain all of the generated files (default: None)

--interval-file

Path to a BED file (.bed) for selective access. This option can be used multiple times (default: None)

--num-threads

Number of threads for worker (default: 12)

--min-mapq MIN_MAPQ

Skip alignments with mapping quality smaller than this value (default: 0)

--enable-baq

Enable BAQ (per-Base Alignment Quality) (default: None)

–interval(-L) Interval within which to call the variants from the bam file. Interval files should be passed using the –interval-file option. This option can be used multiple times. e.g. “-L chr1 -L chr2:10000 -L chr3:20000+ -L chr4:10000-20000” (default: None)

--anomalous-reads

Do not discard anomalous read pairs (default: None)

--window-size

Size of output plot window (default: 1000)

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--seccomp-override

Do not override seccomp options for docker

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.

VBVM

Run vote based vcf merging on two or more vcf files to generate a merged vcf

QUICK START

$ pbrun vbvm  --in-vcf deepvariant:deepvariant.vcf(.gz) \
              --in-vcf haplotypecaller:haplotypecaller.vcf(.gz) \
              --min-votes 2 \
              --out-vcf merged_2votes.vcf

OPTIONS

--in-vcf

(required) A tag and VCF in the format <tag>:<vcf-file> where tag can be the name of the variant caller. The VCF file must be an absolute path. This option can be used multiple times, but at least two input VCFs are required (default: None)

--out-vcf

(required) Path for output vcf file (default: None)

--min-votes

Minimum number of votes to consider for filtering the VCF (default: None)

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--seccomp-override

Do not override seccomp options for docker

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.

DBSNP

Annotate variants based on a variant database

QUICK START

$ pbrun dbsnp --in-vcf sample.vcf \
                         --out-vcf output.vcf\
                        --in-dbsnp-file database.vcf.gz

OPTIONS

--in-vcf

Path to the input VCF file (default: None)

--in-dbsnp-file

Path to the input dbsnp file in vcf.gz format with its tabix index (default: None)

--out-vcf

Output annotated VCF file (default: None)

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--seccomp-override

Do not override seccomp options for docker

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.

CNNSCOREVARIANTS

GPU accelerated CNNScorevariants

Generate variant scores using a Convolutional Neural Network.

QUICK START

$ pbrun cnnscorevariants --ref Ref.fa \
                         --in-bam sample.bam \
                         --in-vcf sample.vcf \
                         --out-vcf output.vcf

COMPATIBLE GATK4 COMMAND

gatk CNNScoreVariants -R Ref.fa \
                      -I sample.bam \
                      -V sample.vcf \
                      -O output.vcf \
                      --tensor-type read_tensor

POST-ANALYSIS FILTERING

CNNScoreVariants generates an info field for each variant called CNN_2D. This field can be used to create filters for each variant by running the GATK4 tool FilterVariantTranches on the CNNScoreVariants output.

OPTIONS

--ref

(required) Path to the reference file.

--in-bam

(required) Path to the input BAM/CRAM file.

--in-vcf

(required) Path to the input VCF file.

--out-vcf

(required) Path to the output VCF file.

--pb-model-file

Path of a non-default parabricks model file for cnnscorevariants.

--num-gpus NUM_GPUS

Number of GPUs to use for a run. GPUs 0..(NUM_GPUS-1) will be used. If you are using flexera, please include –gpu-devices too.

--gpu-devices GPU_DEVICES

Which GPU devices to use for a run. By default, all GPU devices will be used. To use specific GPU devices enter a comma-separated list of GPU device numbers. Possible device numbers can be found by examining the output of the nvidia-smi command. For example, using –gpu-devices 0,1 would only use the first two GPUs.

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--seccomp-override

Do not override seccomp options for docker

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.

VARIANTFILTRATION

Accelerated variant filtration based on conditions

Filter a VCF using a boolean expression.

QUICK START

$ pbrun variantfiltration --in-vcf sample.vcf \
                          --out-file output.vcf \
                          --expression "QD < 2.0 || ReadPosRankSum < -20.0" \
                          --filter-name FILTER

COMPATIBLE GATK4 COMMAND

gatk VariantFiltration -V sample.vcf \
                       -O output.vcf \
                       --filter-expression "QD < 2.0 || ReadPosRankSum < -20.0" \
                       --filter-name FILTER

OPTIONS

--in-vcf

(required) Path to the input VCF file.

--out-file

(required) Path to the output variants file with an extension of either ‘.vcf’ or ‘.csv’.

--expression

(required) Boolean expression for filtering variants.

--filter-name

(required) Field value for variants that pass the filter expression.

--mode

Defaults to BOTH.

Type of variants to include in the filter. Possible values are SNP, INDEL, or BOTH.

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--seccomp-override

Do not override seccomp options for docker

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.

FREQUENCYFILTRATION

Filter variants within a VCF file by numeric fields containing frequency/count information.

QUICK START

$ pbrun frequencyfiltration   --in-vcf input.vcf \
                        --out-vcf output.vcf \
                        --or-expression "gnomad_AF <= 0.02" \
                        --or-expression "dbsnpCOMMON != 1"

OPTIONS

--in-vcf

(required) Path to the input VCF file to filter (default: None)

--out-vcf

(required) Path to the output filtered VCF file (default: None)

--excluded-vcf

A path to write variants which fail filtration (default: None)

--and-expression

A string of the form “VARIABLE OPERATOR THRESHOLD” to use for filtering (e.g. “AF < 0.02”). A variant must pass all AND expressions to pass filtering (default: None)

--or-expression

A string of the form “VARIABLE OPERATOR THRESHOLD” to use for filtering (e.g. “AF < 0.02”). A variant need only pass a single OR expression to pass filtering (default: None)

--drop-missing

Drop variants that are missing any fields used in filtering expressions from output (default: None)

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--seccomp-override

Do not override seccomp options for docker

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.

VCFANNO

Annotate a VCF using dbsnp and annotation files (Original VCFANNO Project)

QUICK START

$ pbrun vcfanno   --in-vcf input.vcf \
                        --out-vcf output.vcf \
                        --annotaions  database.vcf.gz \
                        --dbsnp dbsnp.vcf.gz

OPTIONS

--in-vcf

(required) Path to the input VCF file to annotate (default: None)

--out-vcf

(required) Path to the output annotated VCF file (default: None)

--annotations

A prefix and VCF in the format <prefix:/absolute/path/anno.vcf.gz>. INFO fields from <anno.vcf.gz> will be added to the input VCF. This option can be used multiple times and is required if –dbsnp is not used. Annotation VCFs must be bgzipped and tabix indexed. At least one of the dpsnp or annotation arguments is required (default: None)

--dbsnp

dbSNP file(s) used to annotate the input VCF, passed in the same way as –annotations (A prefix and VCF in the format <prefix:/absolute/path/dbsnp.vcf.gz>). This option can be used multiple times and is required if –annotations is not used. Must be passed separately as special handing is performed to fix errors in the dbSNP VCF (default: None)

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--seccomp-override

Do not override seccomp options for docker

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.

VQSR

Accelerated variant filtration using VQSR

Build a recalibration model to score variant quality and apply a score cutoff to filter variants.

QUICK START

$ pbrun vqsr --in-vcf sample.vcf \
             --out-vcf output.vcf
             --out-recal output.recal \
             --out-tranches output.tranches \
             --resource omni,known=false,training=true,truth=true,prior=12.0:1000G_omni2.5.hg38.vcf \
             --annotation QD --annotation MQ --annotation MQRankSum -annotation ReadPosRankSum

COMPATIBLE GATK4 COMMAND

gatk VariantRecalibrator -V sample.vcf \
                         -O output.recal \
                         --tranches-file output.tranches \
                         --resource omni,known=false,training=true,truth=true,prior=12.0:1000G_omni2.5.hg38.vcf \
                         -an QD -an MQ -an MQRankSum -an ReadPosRankSum \
                         --mode BOTH

gatk ApplyVQSR -V sample.vcf \
               --recal-file output.recal \
               --tranches-file output.tranches \
               -O output.vcf \
               --mode BOTH

OPTIONS

--in-vcf

(required) Path to the input VCF file.

--out-vcf

(required) Path to the output VCF file.

--out-recal

(required) Path to the output recal file.

--out-tranches

(required) Path to the output tranches file.

--resource

(required) Known, truth, and training sets. The format string is

<set name>,known=<boolean>,training=<boolean>,truth=<boolean>,prior=<float>:<path to the VCF file>.

There must be at least one resource that is training and one resource that is truth. Any resource can be both. This option can be used multiple times.

--annotation

(required) Annotation which should be used for calculations. This option can be used multiple times.

--mode

Defaults to BOTH.

Type of variants to include in the recalibration. Possible values are SNP, INDEL, or BOTH.

--max-gaussians

Defaults to 8.

Max number of Gaussians for the positive model.

--truth-sensitivity-level

The truth sensitivity level at which to start filtering..

--lod-score-cutoff

The VQSLOD score below which to start filtering.

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--seccomp-override

Do not override seccomp options for docker

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.