vqsr

Accelerated variant filtration using VQSR.

Build a recalibration model to score variant quality and apply a score cutoff to filter variants.

Quick Start

$ pbrun vqsr \
    --in-vcf sample.vcf \
    --out-vcf output.vcf
    --out-recal output.recal \
    --out-tranches output.tranches \
    --resource omni,known=false,training=true,truth=true,prior=12.0:1000G_omni2.5.hg38.vcf \
    --annotation QD \
    --annotation MQ \
    --annotation MQRankSum \
    -annotation ReadPosRankSum

Compatible GATK4 Command

gatk VariantRecalibrator -V sample.vcf \
                         -O output.recal \
                         --tranches-file output.tranches \
                         --resource omni,known=false,training=true,truth=true,prior=12.0:1000G_omni2.5.hg38.vcf \
                         -an QD -an MQ -an MQRankSum -an ReadPosRankSum \
                         --mode BOTH

gatk ApplyVQSR -V sample.vcf \
               --recal-file output.recal \
               --tranches-file output.tranches \
               -O output.vcf \
               --mode BOTH

vqsr Reference

Build a recalibration model to score variant quality and apply a score cutoff to filter variants.

Input/Output file options

--in-vcf IN_VCF

Path to the input VCF file. (default: None)

Option is required.

--out-recal OUT_RECAL

Path to the output recal file. (default: None)

Option is required.

--out-tranches OUT_TRANCHES

Path to the output tranches file. (default: None)

Option is required.

-r RESOURCE [RESOURCE ...], --resource RESOURCE [RESOURCE ...]

Known, truth, and training sets. The format string is "[set name],known=[boolean],training=[boolean],truth=[ boolean],prior=[float]:[path to the VCF file]". There must be at least one resource that is training and one resource that is truth. Any resource can be both (e.g. "--resource omni,known=false,training=true,truth=true, prior=12.0:1000G_omni2.5.hg38.vcf") (default: None)

Option is required.

--out-vcf OUT_VCF

Path to the output VCF file. (default: None)

Option is required.

Tool Options:

-a ANNOTATION [ANNOTATION ...], --annotation ANNOTATION [ANNOTATION ...]

Annotation which should be used for calculations (e.g. "-a QD"). (default: None)

Option is required.

-m MODE, --mode MODE Type of variants to include in the recalibration.

Possible values are {SNP, INDEL, BOTH}. (default: BOTH)

-g MAX_GAUSSIANS, --max-gaussians MAX_GAUSSIANS

Max number of Gaussians for the positive model. (default: 8)

-t TRUTH_SENSITIVITY_LEVEL, --truth-sensitivity-level TRUTH_SENSITIVITY_LEVEL

The truth sensitivity level at which to start filtering. (default: None)

-l LOD_SCORE_CUTOFF, --lod-score-cutoff LOD_SCORE_CUTOFF

The VQSLOD score below which to start filtering.

(default: None)

Common options:

--logfile LOGFILE

Path to the log file. If not specified, messages will only be written to the standard error output. (default: None)

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory. By default, this should have been installed at /opt/petagene. Use of this option also requires that the PetaLink library has been preloaded by setting the LD_PRELOAD environment variable. Optionally set the PETASUITE_REFPATH and PGCLOUD_CREDPATH environment variables that are used for data and credentials (default: None)

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in the installation directory.

--no-seccomp-override

Do not override seccomp options for docker (default: None).

--version

View compatible software versions.