VARIANT PROCESSING

ACCELERATED VARIANT MANIPULATION METHODS

VCFQC

Generating QC charts from the input VCF. You need to pass “–extra-tools” to the installer to use this tool.

QUICK START

$ pbrun vcfqc --in-vcf sample.vcf \
                        --out-image-dir sample_charts\
                        --out-report sample_qc_report

OPTIONS

--in-vcf

(required) Path to the vcf file (default: None)

--out-image-dir

(required) Output Directory to store all images (default: None)

--out-report

(required) Output report file (default: None)

--in-bamqc-dir

Input directory containing BAMQC images. These should be the output of collectmultiplemetrics (default: None)

--caller

The variant caller tag for the input VCF (default:None)

--quality

Specify the quality field to use and create plot for the quality field in the vcf. Generally it is QUAL (default: None)

--mapq

Specify the quality field to use and create plot for the mapping quality field in the vcf. Generally it is MQ (default: None)

--depth

Specify the quality field to use and create plot for the depth field in the vcf. Generally it is DP (default: None)

--allele-depth

Specify the quality field to use and create plot for the allele depth field in the vcf. Generally it is AD (default: None)

--vaf

Specify the quality field to use and create plot for the variant allele frequeny field in the vcf. Generally it is VAF (default: None)

--window-size

Window size for the vcfqc tool (default: 1000)

--image-format

Image format for saved plots [png] (default: png)

--threads

Number of processing threads for VCF reading (default: 4)

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.

DBSNP

Annotate variants based on a variant database

QUICK START

$ pbrun dbsnp --in-vcf sample.vcf \
                         --out-vcf output.vcf\
                        --in-dbsnp-file database.vcf.gz

OPTIONS

--in-vcf

Path to the input VCF file (default: None)

--in-dbsnp-file

Path to the input dbsnp file in vcf.gz format with its tabix index (default: None)

--out-vcf

Output annotated VCF file (default: None)

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.

CNNSCOREVARIANTS

GPU accelerated CNNScorevariants

Generate variant scores using a Convolutional Neural Network.

QUICK START

$ pbrun cnnscorevariants --ref Ref.fa \
                         --in-bam sample.bam \
                         --in-vcf sample.vcf \
                         --out-vcf output.vcf

COMPATIBLE GATK4 COMMAND

gatk CNNScoreVariants -R Ref.fa \
                      -I sample.bam \
                      -V sample.vcf \
                      -O output.vcf \
                      --tensor-type read_tensor

POST-ANALYSIS FILTERING

CNNScoreVariants generates an info field for each variant called CNN_2D. This field can be used to create filters for each variant by running the GATK4 tool FilterVariantTranches on the CNNScoreVariants output.

OPTIONS

--ref

(required) Path to the reference file.

--in-bam

(required) Path to the input bam file.

--in-vcf

(required) Path to the input VCF file.

--out-vcf

(required) Path to the output VCF file.

--pb-model-file

Path of a non-default parabricks model file for cnnscorevariants.

--num-gpus NUM_GPUS

Number of GPUs to use for a run. GPUs 0..(NUM_GPUS-1) will be used. If you are using flexera, please include –gpu-devices too.

--gpu-devices GPU_DEVICES

Which GPU devices to use for a run. By default, all GPU devices will be used. To use specific GPU devices enter a comma-separated list of GPU device numbers. Possible device numbers can be found by examining the output of the nvidia-smi command. For example, using –gpu-devices 0,1 would only use the first two GPUs.

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.

VARIANTFILTRATION

Accelerated variant filtration based on conditions

Filter a VCF using a boolean expression.

QUICK START

$ pbrun variantfiltration --in-vcf sample.vcf \
                          --out-file output.vcf \
                          --expression "QD < 2.0 || ReadPosRankSum < -20.0" \
                          --filter-name FILTER

COMPATIBLE GATK4 COMMAND

gatk VariantFiltration -V sample.vcf \
                       -O output.vcf \
                       --filter-expression "QD < 2.0 || ReadPosRankSum < -20.0" \
                       --filter-name FILTER

OPTIONS

--in-vcf

(required) Path to the input VCF file.

--out-file

(required) Path to the output variants file with an extension of either ‘.vcf’ or ‘.csv’.

--expression

(required) Boolean expression for filtering variants.

--filter-name

(required) Field value for variants that pass the filter expression.

--mode

Defaults to BOTH.

Type of variants to include in the filter. Possible values are SNP, INDEL, or BOTH.

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.

VQSR

Accelerated variant filtration using VQSR

Build a recalibration model to score variant quality and apply a score cutoff to filter variants.

QUICK START

$ pbrun vqsr --in-vcf sample.vcf \
             --out-vcf output.vcf
             --out-recal output.recal \
             --out-tranches output.tranches \
             --resource omni,known=false,training=true,truth=true,prior=12.0:1000G_omni2.5.hg38.vcf \
             --annotation QD --annotation MQ --annotation MQRankSum -annotation ReadPosRankSum

COMPATIBLE GATK4 COMMAND

gatk VariantRecalibrator -V sample.vcf \
                         -O output.recal \
                         --tranches-file output.tranches \
                         --resource omni,known=false,training=true,truth=true,prior=12.0:1000G_omni2.5.hg38.vcf \
                         -an QD -an MQ -an MQRankSum -an ReadPosRankSum \
                         --mode BOTH

gatk ApplyVQSR -V sample.vcf \
               --recal-file output.recal \
               --tranches-file output.tranches \
               -O output.vcf \
               --mode BOTH

OPTIONS

--in-vcf

(required) Path to the input VCF file.

--out-vcf

(required) Path to the output VCF file.

--out-recal

(required) Path to the output recal file.

--out-tranches

(required) Path to the output tranches file.

--resource

(required) Known, truth, and training sets. The format string is

<set name>,known=<boolean>,training=<boolean>,truth=<boolean>,prior=<float>:<path to the VCF file>.

There must be at least one resource that is training and one resource that is truth. Any resource can be both. This option can be used multiple times.

--annotation

(required) Annotation which should be used for calculations. This option can be used multiple times.

--mode

Defaults to BOTH.

Type of variants to include in the recalibration. Possible values are SNP, INDEL, or BOTH.

--max-gaussians

Defaults to 8.

Max number of Gaussians for the positive model.

--truth-sensitivity-level

The truth sensitivity level at which to start filtering..

--lod-score-cutoff

The VQSLOD score below which to start filtering.

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.