VARIANT PROCESSING OVERVIEW
ACCELERATED VARIANT MANIPULATION METHOD
GPU accelerated CNNScorevariants
Generate variant scores using a Convolutional Neural Network.
QUICK START
$ pbrun cnnscorevariants --ref Ref.fa \
--in-bam sample.bam \
--in-vcf sample.vcf \
--out-vcf output.vcf
COMPATIBLE GATK4 COMMAND
gatk CNNScoreVariants -R Ref.fa \
-I sample.bam \
-V sample.vcf \
-O output.vcf \
--tensor-type read_tensor
POST-ANALYSIS FILTERING
CNNScoreVariants generates an info field for each variant called CNN_2D. This field can be used to create filters for each variant by running the GATK4 tool FilterVariantTranches on the CNNScoreVariants output.
OPTIONS
- --ref
- --in-bam
- --in-vcf
- --out-vcf
- --pb-model-file
- --num-gpus
- --gpu-devices
(required) Path to the reference file.
(required) Path to the input bam file.
(required) Path to the input vcf file.
(required) Path to the output vcf file.
Path of a non-default parabricks model file for cnnscorevariants.
Defaults to number of GPUs in the system.
Number of GPUs to use for a run.
Which GPU devices to use for a run. By default, all GPU devices will be used. To set specific GPU devices, enter a comma-separated list of GPU device numbers.
Accelerated variant filtration based on conditions
Filter a vcf using a boolean expression.
QUICK START
$ pbrun variantfiltration --in-vcf sample.vcf \
--out-file output.vcf \
--expression "QD < 2.0 || ReadPosRankSum < -20.0" \
--filter-name FILTER
COMPATIBLE GATK4 COMMAND
gatk VariantFiltration -V sample.vcf \
-O output.vcf \
--filter-expression "QD < 2.0 || ReadPosRankSum < -20.0" \
--filter-name FILTER
OPTIONS
- --in-vcf
- --out-file
- --expression
- --filter-name
- --mode
(required) Path to the input vcf file.
(required) Path to the output variants file with an extension of either ‘.vcf’ or ‘.csv’.
(required) Boolean expression for filtering variants.
(required) Field value for variants that pass the filter expression.
Defaults to BOTH.
Type of variants to include in the filter. Possible values are SNP, INDEL, or BOTH.
Accelerated variant filteration using VQSR
Build a recalibration model to score variant quality and apply a score cutoff to filter variants.
QUICK START
$ pbrun vqsr --in-vcf sample.vcf \
--out-vcf output.vcf
--out-recal output.recal \
--out-tranches output.tranches \
--resource omni,known=false,training=true,truth=true,prior=12.0:1000G_omni2.5.hg38.vcf \
--annotation QD --annotation MQ --annotation MQRankSum -annotation ReadPosRankSum
COMPATIBLE GATK4 COMMAND
gatk VariantRecalibrator -V sample.vcf \
-O output.recal \
--tranches-file output.tranches \
--resource omni,known=false,training=true,truth=true,prior=12.0:1000G_omni2.5.hg38.vcf \
-an QD -an MQ -an MQRankSum -an ReadPosRankSum \
--mode BOTH
gatk ApplyVQSR -V sample.vcf \
--recal-file output.recal \
--tranches-file output.tranches \
-O output.vcf \
--mode BOTH
OPTIONS
- --in-vcf
- --out-vcf
- --out-recal
- --out-tranches
- --resource
- --annotation
- --mode
- --max-gaussians
- --truth-sensitivity-level
- --lod-score-cutoff
(required) Path to the input vcf file.
(required) Path to the output vcf file.
(required) Path to the output recal file.
(required) Path to the output tranches file.
(required) Known, truth, and training sets. The format string is
<set name>,known=<boolean>,training=<boolean>,truth=<boolean>,prior=<float>:<path to the vcf file>.
There must be at least one resource that is training and one resource that is truth. Any resource can be both. This option can be used multiple times.
(required) Annotation which should be used for calculations. This option can be used multiple times.
Defaults to BOTH.
Type of variants to include in the recalibration. Possible values are SNP, INDEL, or BOTH.
Defaults to 8.
Max number of Gaussians for the positive model.
The truth sensitivity level at which to start filtering..
The VQSLOD score below which to start filtering.