NVIDIA Docs Hub NVIDIA Clara Clara Parabricks v3.8.0 snpswift

snpswift

snpswift annotates variants in a VCF file with VCF or GTF databases.

Quick Start

Copy
Copied!

            
            $ pbrun snpswift \
    --input-vcf input_to_be_annotated.vcf \
    --anno-vcf prefix1:source_of_annotations_1.vcf.gz \
    --anno-vcf prefix2:source_of_annotations_2.vcf.gz \
    --output-vcf the_annotated_version.vcf

If the --ensembl option (see below) is used, snpswift requires that the chromosome names be prefixed with 'chr'. If the chromosome names do not already have a 'chr' prefix it can be added with the following command:

Copy
Copied!

            
            $ awk '{if($0 !~ /^#/) print "chr"$0; else print $0}' < Homo_sapiens.GRCh38.104.gtf > Homo_sapiens.GRCh38.104.withchr.gtf

In cases where there are multiple matches for a query variant in a single annotation database, snpswift will annotate with details from the first match found. For example, given this query variant:

chr2

15013

and these matches in a single database:

chr2	15013	SNP1186	T	G	.	.	GENE=EXPSNP;STRAND=+;LEGACY_ID=EXID2045;SNP;CNT=2
chr2	15013	SNP1186	T	G	.	.	GENE=EXPSNP_ENST00000450;STRAND=+;LEGACY_ID=EXID2045;SNP;CNT=2

Snpswift annotation uses the information from the first match to produce this:

chr2

15013

GENE=EXPSNP;STRAND=+;LEGACY_ID=EXID2045;SNP;CNT=2

If the input VCF file contains multi-allelic variants we suggest splitting the multi-allelics with bcftools before using snpswift for optimal annotation coverage:

Copy
Copied!

            
            $ bcftools norm --multiallelics- multiallelic_variants.vcf -o input_to_snpswift.vcf

The TSV file must have column headers, and the first four columns must contain chromosome, position, ref and alt in that order. The name of the first column in the header line must start with a '#' i.e. '#Chrom position ref alt'.

snpswift Reference

Annotate a VCF file using VCF/GTF/TSV databases.

Input/Output file options

--input-vcf INPUT_VCF
--anno-vcf ANNO_VCF
--anno-tsv ANNO_TSV
--ensembl ENSEMBL
--output-vcf OUTPUT_VCF

Tool Options:

--num-threads NUM_THREADS

Common options:

--logfile LOGFILE
--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--license-file LICENSE_FILE
--no-seccomp-override
--version