VARIANT PROCESSING
ACCELERATED VARIANT MANIPULATION METHODS
Generating QC charts from the input VCF. You need to pass “–extra-tools” to the installer to use this tool. This tool supports single sample VCFs only.
QUICK START
$ pbrun vcfqc --in-vcf sample.vcf \
--output-dir sample_charts
OPTIONS
- --in-vcf
- --output-dir
- --in-bamqc-dir
- --caller
- --quality
- --mapq
- --depth
- --allele-depth
- --vaf
- --window-size
- --image-format
- --threads
(required) Path to the vcf file (default: None)
(required) Output Directory to store all images (default: None)
Input directory containing BAMQC images. These should be the output of collectmultiplemetrics (default: None)
The variant caller tag for the input VCF (default:None)
Specify the quality field to use and create plot for the quality field in the vcf. Generally it is QUAL (default: None)
Specify the quality field to use and create plot for the mapping quality field in the vcf. Generally it is MQ (default: None)
Specify the quality field to use and create plot for the depth field in the vcf. Generally it is DP (default: None)
Specify the quality field to use and create plot for the allele depth field in the vcf. Generally it is AD (default: None)
Specify the quality field to use and create plot for the variant allele frequeny field in the vcf. Generally it is VAF (default: None)
Window size for the vcfqc tool (default: 1000)
Image format for saved plots [png] (default: png)
Number of processing threads for VCF reading (default: 4)
- --tmp-dir TMP_DIR
- --no-seccomp-override
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Do not override seccomp options for docker
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.
Generate a summary file using samtoolsmpileup that can be used for plotting/report generation. You need to pass “–extra-tools” to the installer to use this tool.
QUICK START
$ pbrun vcfqcbybam --in-vcf sample.vcf \
--in-bam sample.bam \
--out-file output_pileup.txt\
--output-dir sample_qc
OPTIONS
- --ref
- --in-vcf
- --in-bam
- --out-file
- --output-dir
- --interval-file
- --num-threads
- --min-mapq MIN_MAPQ
- --enable-baq
Path to the reference file (default: None)
(required) Path to the vcf file to be QC’ed (default: None)
(required) Path to the bam file. Path can be a Google Cloud Storage object or AWS S3 Storage object. This option can be used multiple times (default: None)
(required) Path of output text pileup (default: None)
(required) Path to the directory that will contain all of the generated files (default: None)
Path to a BED file (.bed) for selective access. This option can be used multiple times (default: None)
Number of threads for worker (default: 12)
Skip alignments with mapping quality smaller than this value (default: 0)
Enable BAQ (per-Base Alignment Quality) (default: None)
–interval(-L) Interval within which to call the variants from the bam file. Interval files should be passed using the –interval-file option. This option can be used multiple times. e.g. “-L chr1 -L chr2:10000 -L chr3:20000+ -L chr4:10000-20000” (default: None)
- --anomalous-reads
- --window-size
Do not discard anomalous read pairs (default: None)
Size of output plot window (default: 1000)
- --tmp-dir TMP_DIR
- --no-seccomp-override
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Do not override seccomp options for docker
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.
Run vote based vcf merging on two or more vcf files to generate a merged vcf
QUICK START
$ pbrun vbvm --in-vcf deepvariant:deepvariant.vcf(.gz) \
--in-vcf haplotypecaller:haplotypecaller.vcf(.gz) \
--min-votes 2 \
--out-vcf merged_2votes.vcf
OPTIONS
- --in-vcf
- --out-vcf
- --min-votes
(required) A tag and VCF in the format <tag>:<vcf-file> where tag can be the name of the variant caller. The VCF file must be an absolute path. This option can be used multiple times, but at least two input VCFs are required (default: None)
(required) Path for output vcf file (default: None)
Minimum number of votes to consider for filtering the VCF (default: None)
- --tmp-dir TMP_DIR
- --no-seccomp-override
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Do not override seccomp options for docker
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.
Annotate variants based on a variant database
QUICK START
$ pbrun dbsnp --in-vcf sample.vcf \
--out-vcf output.vcf\
--in-dbsnp-file database.vcf.gz
OPTIONS
- --in-vcf
- --in-dbsnp-file
- --out-vcf
Path to the input VCF file (default: None)
Path to the input dbsnp file in vcf.gz format with its tabix index (default: None)
Output annotated VCF file (default: None)
- --tmp-dir TMP_DIR
- --no-seccomp-override
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Do not override seccomp options for docker
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.
GPU accelerated CNNScorevariants
Generate variant scores using a Convolutional Neural Network.
QUICK START
$ pbrun cnnscorevariants --ref Ref.fa \
--in-bam sample.bam \
--in-vcf sample.vcf \
--out-vcf output.vcf
COMPATIBLE GATK4 COMMAND
gatk CNNScoreVariants -R Ref.fa \
-I sample.bam \
-V sample.vcf \
-O output.vcf \
--tensor-type read_tensor
POST-ANALYSIS FILTERING
CNNScoreVariants generates an info field for each variant called CNN_2D. This field can be used to create filters for each variant by running the GATK4 tool FilterVariantTranches on the CNNScoreVariants output.
OPTIONS
- --ref
- --in-bam
- --in-vcf
- --out-vcf
- --pb-model-file
(required) Path to the reference file.
(required) Path to the input BAM/CRAM file.
(required) Path to the input VCF file.
(required) Path to the output VCF file.
Path of a non-default parabricks model file for cnnscorevariants.
- --num-gpus NUM_GPUS
- --gpu-devices GPU_DEVICES
Number of GPUs to use for a run. GPUs 0..(NUM_GPUS-1) will be used. If you are using flexera, please include –gpu-devices too.
Which GPU devices to use for a run. By default, all GPU devices will be used. To use specific GPU devices enter a comma-separated list of GPU device numbers. Possible device numbers can be found by examining the output of the nvidia-smi command. For example, using –gpu-devices 0,1 would only use the first two GPUs.
- --tmp-dir TMP_DIR
- --no-seccomp-override
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Do not override seccomp options for docker
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.
Accelerated variant filtration based on conditions
Filter a VCF using a boolean expression.
QUICK START
$ pbrun variantfiltration --in-vcf sample.vcf \
--out-file output.vcf \
--expression "QD < 2.0 || ReadPosRankSum < -20.0" \
--filter-name FILTER
COMPATIBLE GATK4 COMMAND
gatk VariantFiltration -V sample.vcf \
-O output.vcf \
--filter-expression "QD < 2.0 || ReadPosRankSum < -20.0" \
--filter-name FILTER
OPTIONS
- --in-vcf
- --out-file
- --expression
- --filter-name
- --mode
(required) Path to the input VCF file.
(required) Path to the output variants file with an extension of either ‘.vcf’ or ‘.csv’.
(required) Boolean expression for filtering variants.
(required) Field value for variants that pass the filter expression.
Defaults to BOTH.
Type of variants to include in the filter. Possible values are SNP, INDEL, or BOTH.
- --tmp-dir TMP_DIR
- --no-seccomp-override
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Do not override seccomp options for docker
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.
Filter variants within a VCF file by numeric fields containing frequency/count information.
QUICK START
$ pbrun frequencyfiltration --in-vcf input.vcf \
--out-vcf output.vcf \
--or-expression "gnomad_AF <= 0.02" \
--or-expression "dbsnpCOMMON != 1"
OPTIONS
- --in-vcf
- --out-vcf
- --excluded-vcf
- --and-expression
- --or-expression
- --drop-missing
(required) Path to the input VCF file to filter (default: None)
(required) Path to the output filtered VCF file (default: None)
A path to write variants which fail filtration (default: None)
A string of the form “VARIABLE OPERATOR THRESHOLD” to use for filtering (e.g. “AF < 0.02”). A variant must pass all AND expressions to pass filtering (default: None)
A string of the form “VARIABLE OPERATOR THRESHOLD” to use for filtering (e.g. “AF < 0.02”). A variant need only pass a single OR expression to pass filtering (default: None)
Drop variants that are missing any fields used in filtering expressions from output (default: None)
- --tmp-dir TMP_DIR
- --no-seccomp-override
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Do not override seccomp options for docker
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.
Annotate a VCF using dbsnp and annotation files (Original VCFANNO Project)
QUICK START
$ pbrun vcfanno --in-vcf input.vcf \
--out-vcf output.vcf \
--annotaions database.vcf.gz \
--dbsnp dbsnp.vcf.gz
OPTIONS
- --in-vcf
- --out-vcf
- --annotations
- --dbsnp
(required) Path to the input VCF file to annotate (default: None)
(required) Path to the output annotated VCF file (default: None)
A prefix and VCF in the format <prefix:/absolute/path/anno.vcf.gz>. INFO fields from <anno.vcf.gz> will be added to the input VCF. This option can be used multiple times and is required if –dbsnp is not used. Annotation VCFs must be bgzipped and tabix indexed. At least one of the dpsnp or annotation arguments is required (default: None)
dbSNP file(s) used to annotate the input VCF, passed in the same way as –annotations (A prefix and VCF in the format <prefix:/absolute/path/dbsnp.vcf.gz>). This option can be used multiple times and is required if –annotations is not used. Must be passed separately as special handing is performed to fix errors in the dbSNP VCF (default: None)
- --tmp-dir TMP_DIR
- --no-seccomp-override
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Do not override seccomp options for docker
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.
Accelerated variant filtration using VQSR
Build a recalibration model to score variant quality and apply a score cutoff to filter variants.
QUICK START
$ pbrun vqsr --in-vcf sample.vcf \
--out-vcf output.vcf
--out-recal output.recal \
--out-tranches output.tranches \
--resource omni,known=false,training=true,truth=true,prior=12.0:1000G_omni2.5.hg38.vcf \
--annotation QD --annotation MQ --annotation MQRankSum -annotation ReadPosRankSum
COMPATIBLE GATK4 COMMAND
gatk VariantRecalibrator -V sample.vcf \
-O output.recal \
--tranches-file output.tranches \
--resource omni,known=false,training=true,truth=true,prior=12.0:1000G_omni2.5.hg38.vcf \
-an QD -an MQ -an MQRankSum -an ReadPosRankSum \
--mode BOTH
gatk ApplyVQSR -V sample.vcf \
--recal-file output.recal \
--tranches-file output.tranches \
-O output.vcf \
--mode BOTH
OPTIONS
- --in-vcf
- --out-vcf
- --out-recal
- --out-tranches
- --resource
- --annotation
- --mode
- --max-gaussians
- --truth-sensitivity-level
- --lod-score-cutoff
(required) Path to the input VCF file.
(required) Path to the output VCF file.
(required) Path to the output recal file.
(required) Path to the output tranches file.
(required) Known, truth, and training sets. The format string is
<set name>,known=<boolean>,training=<boolean>,truth=<boolean>,prior=<float>:<path to the VCF file>.
There must be at least one resource that is training and one resource that is truth. Any resource can be both. This option can be used multiple times.
(required) Annotation which should be used for calculations. This option can be used multiple times.
Defaults to BOTH.
Type of variants to include in the recalibration. Possible values are SNP, INDEL, or BOTH.
Defaults to 8.
Max number of Gaussians for the positive model.
The truth sensitivity level at which to start filtering..
The VQSLOD score below which to start filtering.
- --tmp-dir TMP_DIR
- --no-seccomp-override
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Do not override seccomp options for docker
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.