Software Overview

Parabricks is a software suite for genomic analysis. It delivers major improvements in throughput time for common analytical tasks in genomics, including germline and somatic analysis. The core of the Parabricks software is its data pipeline, which takes raw data and transforms it according to the user's requirements.

Parabricks supports the tools shown below:

_images/pb_tools_v37.png

Parabricks supports the pipelines shown below:

_images/pb_pipelines2.png

The Parabricks software can be configured to run specific accelerated tools or run full pipelines that are commonly used. The standalone tools page covers individual tools and the pipelines page discuses how to run commonly used pipelines.

NVIDIA Parabricks pipelines have been tested on Dell, HPE, IBM, and NVIDIA servers at Amazon Web Services, Google Cloud, Oracle Cloud Infrastructure, and Microsoft Azure.

Software Tools

The following standalone tools can be used with the NVIDIA Clara Parabricks Pipelines software. Click on a tool name for tool-specific options.

Standalone Tools Overview

Tool

Details

annotatebamwithumis

Annotates existing BAM files with UMIs (Unique Molecular Indices) from a separate FASTQ file

applybqsr

Apply BQSR report to a bam file and generate new bam file

arriba

Tool for the detection of gene fusions from RNA-Seq data

bam2fq

Convert a BAM to FASTQ

bammetrics

Collect WGS Metrics on a bam file

bamsort

Sort BAM file

bcftoolscall

Call variants from mpileup output

bcftoolscsq

Consequence prediction for genomic variants

bcftoolsmpileup

Generate BCF/VCF pileup for one or multiple BAM files

bqsr

Collect BQSR report on a BAM file

cnnscorevariants

Generate variant scores using a Convolutional Neural Network

cnvkit

Run CNVkit with accelerated coverage calculation from read depths

collectmultiplemetrics

Collect multiple classes of metrics on a bam file

consensusreads

Calls consensus sequences from reads with the same unique molecular tag

dbsnp

Annotate variants based on a dbsnp

deeptrio

Run GPU-DeepTrio for calling de novo variants

deepvariant

Run GPU-DeepVariant for calling germline variants

demuxfastqs

Perform sample demultiplexing on FASTQs

duplexconsensusreads

Calls consensus sequences from reads with the same double-stranded source molecule

expansionhunter

A tool for estimating large repeats in the bam

fq2bam

Run bwa mem, co-ordinate sorting, marking duplicates and Base Quality Score Recalibration

fq2ubam

Convert FASTQs to an unaligned BAM file

frequencyfiltration

Filter a VCF by allele frequency or allele count

genotypegvcf

Convert a GVCF to VCF

glnexus

Merge and joint-call input gVCF files, emitting multi-sample BCF

groupreadsbyumi

Groups reads together that appear to have come from the same original molecule

haplotypecaller

Run GPU-HaplotypeCaller for calling germline variants

indexgvcf

Index a GVCF file

kallisto

Quantify abundances of transcripts from bulk and single-cell RNA-Seq data

lofreq

Call variants with high sensitivity, predicting variants below the average base-call quality

lofreq_call

Call variants from BAM file

manta

Analyze germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs

muse

Call somatic variants with accelerated MuSE variant caller

mutectcaller

Run GPU-Mutect2 for tumor-normal analysis

postpon

Generate the final vcf output of doing mutect pon

prepon

Build an index for pon file, which is the prerequisite to do mutect pon

rna_fq2bam

Run RNA-seq data through the fq2bam pipeline

samtoolsmpileup

Generate text pileup for one or multiple BAM files

setmateinfo

Adds and/or fixes mate information on paired-end reads

smoove

Call and genotype SVs for short reads

snpswift

Annotate variants in a VCF file with VCF or GTF databases

somaticsniper

Identify single nucleotide positions that are different between tumor and normal BAM files

somaticsniper_workflow

Run the somaticsniper variant caller workflow

splitncigar

Split reads in a BAM file that contain Ns in their cigar string

starfusion

Identify candidate fusion transcripts supported by Illumina reads

strelka

Analyze germline variation in small cohorts and somatic variation in tumor/normal sample pairs

strelka_workflow

Run the strelka variant caller workflow

triocombinegvcf

Combine GVCF of 2 or 3 samples

umi_fgbio

This UMI pipeline is based on Fulcrum Genomics toolkit, processes sequencing reads with molecular barcodes (also known as Unique Molecular Indices, UMIs), which provide impressive error correction and increased accuracy using a sequencing consensus read level

variantfiltration

Filter a VCF using a boolean expression

vcfanno

Annotate a VCF using dbsnp and annotation files

vcfqc

Generate QC plots on a VCF file

vcfqcbybam

Generate a summaryfile using samtoolsmpileup that can be used for plotting/report generation

votebasedvcfmerger

Create union and intersection VCFs based on a minimum number of variant callers supporting a variant

vqsr

Build a recalibration model to score variant quality and apply a score cutoff to filter variants

Pipelines

In Clara Parabricks, each pipeline is a collection of several individual tools that are commonly used together, all wrapped up as a single tool. For example, the deepvariant_germline takes FASTA and FASTQ files as input and produces a VCF and BAM file as output. Internally, it runs BWA mem alignment, performs coordinate sorting, marks duplicates, and then runs DeepVariant.

The following standalone pipelines can be used with the NVIDIA Clara Parabricks Pipelines software. Click on a tool name for tool-specific options.

Pipeline Tools Overview

Tool

Details

deepvariant_germline

Run the germline pipeline from FASTQ to VCF using a deep neural network analysis

denovomutation

(BETA) Run the de novo mutation pipeline with three samples for de novo variant detection

germline

Run the germline pipeline from FASTQ to VCF

human_par

Run the germline pipeline from FASTQ to VCF with correct ploidy values for human sex chromosome handling

rna_gatk

Run the GATK Best Practices pipeline for RNA-seq data from FASTQ to VCF

somatic

Run the somatic pipeline from FASTQ to VCF

Compatible CPU Software Versions

Clara Parabricks produces the same results as the following tools:

Tool

Version

arriba

2.1.0

bcftools

1.10.2

BWA

0.7.15

cnvkit

0.9.7

Deepvariant

1.1

Expansion Hunter

5.0.0

fgbio

1.4.0

GATK

4.2.0.0

glnexus

1.2.7

Kallisto

0.46.2

lofreq

2.1.5

manta

1.6.0

samtools

1.10

somaticsniper

1.0.5.0

STAR

2.7.2a

STAR-Fusion

1.7.0

strelka

2.9.0