snpswift

snpswift annotates variants in a VCF file with VCF or GTF databases.

Copy
Copied!
            

$ pbrun snpswift \ --input-vcf input_to_be_annotated.vcf \ --anno-vcf prefix1:source_of_annotations_1.vcf.gz \ --anno-vcf prefix2:source_of_annotations_2.vcf.gz \ --output-vcf the_annotated_version.vcf

If the --ensembl option (see below) is used, snpswift requires that the chromosome names be prefixed with 'chr'. If the chromosome names do not already have a 'chr' prefix it can be added with the following command:

Copy
Copied!
            

$ awk '{if($0 !~ /^#/) print "chr"$0; else print $0}' < Homo_sapiens.GRCh38.104.gtf > Homo_sapiens.GRCh38.104.withchr.gtf

In cases where there are multiple matches for a query variant in a single annotation database, snpswift will annotate with details from the first match found. For example, given this query variant:

chr2

15013

T

G

and these matches in a single database:

chr2

15013

SNP1186

T

G

.

.

GENE=EXPSNP;STRAND=+;LEGACY_ID=EXID2045;SNP;CNT=2

chr2

15013

SNP1186

T

G

.

.

GENE=EXPSNP_ENST00000450;STRAND=+;LEGACY_ID=EXID2045;SNP;CNT=2

Snpswift annotation uses the information from the first match to produce this:

chr2

15013

T

G

GENE=EXPSNP;STRAND=+;LEGACY_ID=EXID2045;SNP;CNT=2

If the input VCF file contains multi-allelic variants we suggest splitting the multi-allelics with bcftools before using snpswift for optimal annotation coverage:

Copy
Copied!
            

$ bcftools norm --multiallelics- multiallelic_variants.vcf -o input_to_snpswift.vcf

Annotate a VCF file using VCF and GTF databases.

Input/Output file options

--input-vcf INPUT_VCF

An input VCF to annotate with VCF and GTF database files. (default: None)

Option is required.

--anno-vcf ANNO_VCF

A prefix and VCF in the format <prefix:/absolute/path/anno.vcf.gz>. INFO fields from <anno.vcf.gz> will be added to the input VCF. This option can be used multiple times. Annotation VCFs must be bgzipped and tabix indexed. (default: None)

--ensembl ENSEMBL

A GTF file from ENSEMBL; the Gene Name and Gene ID fields will be added to the input VCF. (default: None)

--output-vcf OUTPUT_VCF

Path to the output annotated VCF file. (default: None)

Option is required.

Options specific to this tool

--num-threads NUM_THREADS

Number of worker threads to run for vcf annotation. (default: 8)

Common options:

--logfile LOGFILE

Path to the log file. If not specified, messages will only be written to the standard error output. (default: None)

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory. By default, this should have been installed at /opt/petagene. Use of this option also requires that the PetaLink library has been preloaded by setting the LD_PRELOAD environment variable. Optionally set the PETASUITE_REFPATH and PGCLOUD_CREDPATH environment variables that are used for data and credentials (default: None)

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in the installation directory.

--no-seccomp-override

Do not override seccomp options for docker (default: None).

--version

View compatible software versions.

© Copyright 2022, Nvidia. Last updated on Jun 28, 2023.