snpswift
snpswift annotates variants in a VCF file with VCF or GTF databases.
$ pbrun snpswift \
--input-vcf input_to_be_annotated.vcf \
--anno-vcf prefix1:source_of_annotations_1.vcf.gz \
--anno-vcf prefix2:source_of_annotations_2.vcf.gz \
--output-vcf the_annotated_version.vcf
If the --ensembl
option (see below) is used, snpswift requires that the
chromosome names be prefixed with 'chr'. If the chromosome names do not
already have a 'chr' prefix it can be added with the following command:
$ awk '{if($0 !~ /^#/) print "chr"$0; else print $0}' < Homo_sapiens.GRCh38.104.gtf > Homo_sapiens.GRCh38.104.withchr.gtf
In cases where there are multiple matches for a query variant in a single annotation database, snpswift will annotate with details from the first match found. For example, given this query variant:
chr2 |
15013 |
T |
G |
and these matches in a single database:
chr2 |
15013 |
SNP1186 |
T |
G |
. |
. |
GENE=EXPSNP;STRAND=+;LEGACY_ID=EXID2045;SNP;CNT=2 |
chr2 |
15013 |
SNP1186 |
T |
G |
. |
. |
GENE=EXPSNP_ENST00000450;STRAND=+;LEGACY_ID=EXID2045;SNP;CNT=2 |
Snpswift annotation uses the information from the first match to produce this:
chr2 |
15013 |
T |
G |
GENE=EXPSNP;STRAND=+;LEGACY_ID=EXID2045;SNP;CNT=2 |
If the input VCF file contains multi-allelic variants we suggest splitting the multi-allelics
with bcftools
before using snpswift
for optimal annotation coverage:
$ bcftools norm --multiallelics- multiallelic_variants.vcf -o input_to_snpswift.vcf
Annotate a VCF file using VCF and GTF databases.
Input/Output file options
- --input-vcf INPUT_VCF
- --anno-vcf ANNO_VCF
- --ensembl ENSEMBL
- --output-vcf OUTPUT_VCF
An input VCF to annotate with VCF and GTF database files. (default: None)
Option is required.
A prefix and VCF in the format <prefix:/absolute/path/anno.vcf.gz>. INFO fields from <anno.vcf.gz> will be added to the input VCF. This option can be used multiple times. Annotation VCFs must be bgzipped and tabix indexed. (default: None)
A GTF file from ENSEMBL; the Gene Name and Gene ID fields will be added to the input VCF. (default: None)
Path to the output annotated VCF file. (default: None)
Option is required.
Options specific to this tool
- --num-threads NUM_THREADS
Number of worker threads to run for vcf annotation. (default: 8)
Common options:
- --logfile LOGFILE
- --tmp-dir TMP_DIR
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --no-seccomp-override
- --version
Path to the log file. If not specified, messages will only be written to the standard error output. (default: None)
Full path to the directory where temporary files will be stored.
Full path to the PetaGene installation directory. By default, this should have been installed at /opt/petagene. Use of this option also requires that the PetaLink library has been preloaded by setting the LD_PRELOAD environment variable. Optionally set the PETASUITE_REFPATH and PGCLOUD_CREDPATH environment variables that are used for data and credentials (default: None)
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in the installation directory.
Do not override seccomp options for docker (default: None).
View compatible software versions.