VARIANT PROCESSING
ACCELERATED VARIANT MANIPULATION METHODS
Generating QC charts from the input VCF. You need to pass “–extra-tools” to the installer to use this tool.
QUICK START
$ pbrun vcfqc --in-vcf sample.vcf \
--out-image-dir sample_charts\
--out-report sample_qc_report
OPTIONS
- --in-vcf
- --out-image-dir
- --out-report
- --in-bamqc-dir
- --caller
- --quality
- --mapq
- --depth
- --allele-depth
- --vaf
- --window-size
- --image-format
- --threads
(required) Path to the vcf file (default: None)
(required) Output Directory to store all images (default: None)
(required) Output report file (default: None)
Input directory containing BAMQC images. These should be the output of collectmultiplemetrics (default: None)
The variant caller tag for the input VCF (default:None)
Specify the quality field to use and create plot for the quality field in the vcf. Generally it is QUAL (default: None)
Specify the quality field to use and create plot for the mapping quality field in the vcf. Generally it is MQ (default: None)
Specify the quality field to use and create plot for the depth field in the vcf. Generally it is DP (default: None)
Specify the quality field to use and create plot for the allele depth field in the vcf. Generally it is AD (default: None)
Specify the quality field to use and create plot for the variant allele frequeny field in the vcf. Generally it is VAF (default: None)
Window size for the vcfqc tool (default: 1000)
Image format for saved plots [png] (default: png)
Number of processing threads for VCF reading (default: 4)
- --tmp-dir TMP_DIR
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.
Annotate variants based on a variant database
QUICK START
$ pbrun dbsnp --in-vcf sample.vcf \
--out-vcf output.vcf\
--in-dbsnp-file database.vcf.gz
OPTIONS
- --in-vcf
- --in-dbsnp-file
- --out-vcf
Path to the input VCF file (default: None)
Path to the input dbsnp file in vcf.gz format with its tabix index (default: None)
Output annotated VCF file (default: None)
- --tmp-dir TMP_DIR
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.
GPU accelerated CNNScorevariants
Generate variant scores using a Convolutional Neural Network.
QUICK START
$ pbrun cnnscorevariants --ref Ref.fa \
--in-bam sample.bam \
--in-vcf sample.vcf \
--out-vcf output.vcf
COMPATIBLE GATK4 COMMAND
gatk CNNScoreVariants -R Ref.fa \
-I sample.bam \
-V sample.vcf \
-O output.vcf \
--tensor-type read_tensor
POST-ANALYSIS FILTERING
CNNScoreVariants generates an info field for each variant called CNN_2D. This field can be used to create filters for each variant by running the GATK4 tool FilterVariantTranches on the CNNScoreVariants output.
OPTIONS
- --ref
- --in-bam
- --in-vcf
- --out-vcf
- --pb-model-file
(required) Path to the reference file.
(required) Path to the input bam file.
(required) Path to the input VCF file.
(required) Path to the output VCF file.
Path of a non-default parabricks model file for cnnscorevariants.
- --num-gpus NUM_GPUS
- --gpu-devices GPU_DEVICES
Number of GPUs to use for a run. GPUs 0..(NUM_GPUS-1) will be used. If you are using flexera, please include –gpu-devices too.
Which GPU devices to use for a run. By default, all GPU devices will be used. To use specific GPU devices enter a comma-separated list of GPU device numbers. Possible device numbers can be found by examining the output of the nvidia-smi command. For example, using –gpu-devices 0,1 would only use the first two GPUs.
- --tmp-dir TMP_DIR
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.
Accelerated variant filtration based on conditions
Filter a VCF using a boolean expression.
QUICK START
$ pbrun variantfiltration --in-vcf sample.vcf \
--out-file output.vcf \
--expression "QD < 2.0 || ReadPosRankSum < -20.0" \
--filter-name FILTER
COMPATIBLE GATK4 COMMAND
gatk VariantFiltration -V sample.vcf \
-O output.vcf \
--filter-expression "QD < 2.0 || ReadPosRankSum < -20.0" \
--filter-name FILTER
OPTIONS
- --in-vcf
- --out-file
- --expression
- --filter-name
- --mode
(required) Path to the input VCF file.
(required) Path to the output variants file with an extension of either ‘.vcf’ or ‘.csv’.
(required) Boolean expression for filtering variants.
(required) Field value for variants that pass the filter expression.
Defaults to BOTH.
Type of variants to include in the filter. Possible values are SNP, INDEL, or BOTH.
- --tmp-dir TMP_DIR
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.
Accelerated variant filtration using VQSR
Build a recalibration model to score variant quality and apply a score cutoff to filter variants.
QUICK START
$ pbrun vqsr --in-vcf sample.vcf \
--out-vcf output.vcf
--out-recal output.recal \
--out-tranches output.tranches \
--resource omni,known=false,training=true,truth=true,prior=12.0:1000G_omni2.5.hg38.vcf \
--annotation QD --annotation MQ --annotation MQRankSum -annotation ReadPosRankSum
COMPATIBLE GATK4 COMMAND
gatk VariantRecalibrator -V sample.vcf \
-O output.recal \
--tranches-file output.tranches \
--resource omni,known=false,training=true,truth=true,prior=12.0:1000G_omni2.5.hg38.vcf \
-an QD -an MQ -an MQRankSum -an ReadPosRankSum \
--mode BOTH
gatk ApplyVQSR -V sample.vcf \
--recal-file output.recal \
--tranches-file output.tranches \
-O output.vcf \
--mode BOTH
OPTIONS
- --in-vcf
- --out-vcf
- --out-recal
- --out-tranches
- --resource
- --annotation
- --mode
- --max-gaussians
- --truth-sensitivity-level
- --lod-score-cutoff
(required) Path to the input VCF file.
(required) Path to the output VCF file.
(required) Path to the output recal file.
(required) Path to the output tranches file.
(required) Known, truth, and training sets. The format string is
<set name>,known=<boolean>,training=<boolean>,truth=<boolean>,prior=<float>:<path to the VCF file>.
There must be at least one resource that is training and one resource that is truth. Any resource can be both. This option can be used multiple times.
(required) Annotation which should be used for calculations. This option can be used multiple times.
Defaults to BOTH.
Type of variants to include in the recalibration. Possible values are SNP, INDEL, or BOTH.
Defaults to 8.
Max number of Gaussians for the positive model.
The truth sensitivity level at which to start filtering..
The VQSLOD score below which to start filtering.
- --tmp-dir TMP_DIR
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.