VARIANT PROCESSING¶
ACCELERATED VARIANT MANIPULATION METHODS
DBSNP¶
Annotate variants based on a variant database
QUICK START¶
$ pbrun dbsnp --in-vcf sample.vcf \
--out-vcf output.vcf\
--in-dbsnp-file database.vcf.gz
OPTIONS¶
- --in-vcf
Path to the input vcf file (default: None)
- --in-dbsnp-file
Path to the input dbsnp file in vcf.gz format with its tabix index (default: None)
- --out-vcf
Output annotated vcf file (default: None)
MUTATION SIGNATURE¶
Generate graphs from mutational signature weights in a tumor sample. This tools is implemented similar to (DeconstructSig)
QUICK START¶
$ pbrun dbsnp --vcf sample.vcf \
--ref Ref/ref.fa \
--out-prefix output
OPTIONS¶
- --ref
Path to the reference file (default: None)
- --vcf
Path to the input vcf file (default: None)
- --out-prefix
Prefix filename for output data and graphs (default: None)
- --signatures-limit
Number of signatures to limit the search to (default: None)
- --signature-cutoff
Discard any signature contributions with a weight less than this amount (default: 0.06)
- --tri-counts-method
Additional method of normalization that should match how the input signatures were normalized. By default there is no further normalization. Possible values are {default, genome, exome, exome2genome, genome2exome} (default: default)
CNNSCOREVARIANTS¶
GPU accelerated CNNScorevariants
Generate variant scores using a Convolutional Neural Network.
QUICK START¶
$ pbrun cnnscorevariants --ref Ref.fa \
--in-bam sample.bam \
--in-vcf sample.vcf \
--out-vcf output.vcf
COMPATIBLE GATK4 COMMAND¶
gatk CNNScoreVariants -R Ref.fa \
-I sample.bam \
-V sample.vcf \
-O output.vcf \
--tensor-type read_tensor
POST-ANALYSIS FILTERING¶
CNNScoreVariants generates an info field for each variant called CNN_2D. This field can be used to create filters for each variant by running the GATK4 tool FilterVariantTranches on the CNNScoreVariants output.
OPTIONS¶
- --ref
(required) Path to the reference file.
- --in-bam
(required) Path to the input bam file.
- --in-vcf
(required) Path to the input vcf file.
- --out-vcf
(required) Path to the output vcf file.
- --pb-model-file
Path of a non-default parabricks model file for cnnscorevariants.
- --num-gpus
Defaults to number of GPUs in the system.
Number of GPUs to use for a run.
- --gpu-devices
Which GPU devices to use for a run. By default, all GPU devices will be used. To set specific GPU devices, enter a comma-separated list of GPU device numbers.
VARIANTFILTRATION¶
Accelerated variant filtration based on conditions
Filter a vcf using a boolean expression.
QUICK START¶
$ pbrun variantfiltration --in-vcf sample.vcf \
--out-file output.vcf \
--expression "QD < 2.0 || ReadPosRankSum < -20.0" \
--filter-name FILTER
COMPATIBLE GATK4 COMMAND¶
gatk VariantFiltration -V sample.vcf \
-O output.vcf \
--filter-expression "QD < 2.0 || ReadPosRankSum < -20.0" \
--filter-name FILTER
OPTIONS¶
- --in-vcf
(required) Path to the input vcf file.
- --out-file
(required) Path to the output variants file with an extension of either ‘.vcf’ or ‘.csv’.
- --expression
(required) Boolean expression for filtering variants.
- --filter-name
(required) Field value for variants that pass the filter expression.
- --mode
Defaults to BOTH.
Type of variants to include in the filter. Possible values are SNP, INDEL, or BOTH.
VQSR¶
Accelerated variant filteration using VQSR
Build a recalibration model to score variant quality and apply a score cutoff to filter variants.
QUICK START¶
$ pbrun vqsr --in-vcf sample.vcf \
--out-vcf output.vcf
--out-recal output.recal \
--out-tranches output.tranches \
--resource omni,known=false,training=true,truth=true,prior=12.0:1000G_omni2.5.hg38.vcf \
--annotation QD --annotation MQ --annotation MQRankSum -annotation ReadPosRankSum
COMPATIBLE GATK4 COMMAND¶
gatk VariantRecalibrator -V sample.vcf \
-O output.recal \
--tranches-file output.tranches \
--resource omni,known=false,training=true,truth=true,prior=12.0:1000G_omni2.5.hg38.vcf \
-an QD -an MQ -an MQRankSum -an ReadPosRankSum \
--mode BOTH
gatk ApplyVQSR -V sample.vcf \
--recal-file output.recal \
--tranches-file output.tranches \
-O output.vcf \
--mode BOTH
OPTIONS¶
- --in-vcf
(required) Path to the input vcf file.
- --out-vcf
(required) Path to the output vcf file.
- --out-recal
(required) Path to the output recal file.
- --out-tranches
(required) Path to the output tranches file.
- --resource
(required) Known, truth, and training sets. The format string is
<set name>,known=<boolean>,training=<boolean>,truth=<boolean>,prior=<float>:<path to the vcf file>.
There must be at least one resource that is training and one resource that is truth. Any resource can be both. This option can be used multiple times.
- --annotation
(required) Annotation which should be used for calculations. This option can be used multiple times.
- --mode
Defaults to BOTH.
Type of variants to include in the recalibration. Possible values are SNP, INDEL, or BOTH.
- --max-gaussians
Defaults to 8.
Max number of Gaussians for the positive model.
- --truth-sensitivity-level
The truth sensitivity level at which to start filtering..
- --lod-score-cutoff
The VQSLOD score below which to start filtering.