snpswift - NVIDIA Docs

snpswift annotates variants in a VCF file with VCF or GTF databases.

Quick Start

Copy
Copied!

            
            $ pbrun snpswift \
    --input-vcf input_to_be_annotated.vcf \
    --anno-vcf prefix1:source_of_annotations_1.vcf.gz \
    --anno-vcf prefix2:source_of_annotations_2.vcf.gz \
    --output-vcf the_annotated_version.vcf

If the --ensembl option (see below) is used, snpswift requires that the chromosome names be prefixed with 'chr'. If the chromosome names do not already have a 'chr' prefix it can be added with the following command:

Copy
Copied!

            
            $ awk '{if($0 !~ /^#/) print "chr"$0; else print $0}' < Homo_sapiens.GRCh38.104.gtf > Homo_sapiens.GRCh38.104.withchr.gtf

In cases where there are multiple matches for a query variant in a single annotation database, snpswift will annotate with details from the first match found. For example, given this query variant:

chr2

15013

T

G

and these matches in a single database:

chr2	15013	SNP1186	T	G	.	.	GENE=EXPSNP;STRAND=+;LEGACY_ID=EXID2045;SNP;CNT=2
chr2	15013	SNP1186	T	G	.	.	GENE=EXPSNP_ENST00000450;STRAND=+;LEGACY_ID=EXID2045;SNP;CNT=2

Snpswift annotation uses the information from the first match to produce this:

chr2

15013

T

G

GENE=EXPSNP;STRAND=+;LEGACY_ID=EXID2045;SNP;CNT=2

If the input VCF file contains multi-allelic variants we suggest splitting the multi-allelics with bcftools before using snpswift for optimal annotation coverage:

Copy
Copied!

            
            $ bcftools norm --multiallelics- multiallelic_variants.vcf -o input_to_snpswift.vcf

snpswift Reference

Annotate a VCF file using VCF and GTF databases.

Input/Output file options

--input-vcf INPUT_VCF
--anno-vcf ANNO_VCF
--ensembl ENSEMBL
--output-vcf OUTPUT_VCF

Options specific to this tool

--num-threads NUM_THREADS

Common options:

--logfile LOGFILE
--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--license-file LICENSE_FILE
--no-seccomp-override
--version