WHAT’s NEW

New tools

  • Lofreq somatic caller: This tool has been added to the Parabricks toolkit and has a 10x acceleration compared to its native instance. LoFreq version 2.0 is the fourth somatic caller in Parabricks and is ideal for calling low frequency mutations within next generation sequencing data, specifically Illumina short read data. By using base-call qualities and other sources of errors inherent in NGS data from mapping and in/del alignments, LoFreq improves the accuracy for calling somartic mutations below the 10% allele frequency threshold You can read more here: https://csb5.github.io/lofreq/, Currently, this accelerated version supports only SNV calling, and In/del calling will be added in a future release.

  • Bam2Fastq: is an accelerated version (>10x) of GATK Sam2fastq. It converts a BAM or CRAM file to FASTQ. This tool extracts read sequences and qualities from the input SAM/BAM file and writes them into the output file in Sanger FASTQ format.

  • De novo mutation pipeline: Detection of de novo variants (DNVs) that occur in the germline genome when comparing an offspring to its parents is critical for studies of disease-related variation along with creating a baseline for generational mutation rates. A GPU-based workflow to call DNVs is added to Parabricks package that utilizes Google’s DeepVariant and it used for projects using trio’s and other pedigree sequencing strategied. You can learn more about pipeline here: https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.27.445979.full.pdf

  • Smoove structural variant caller: This tool is now part of Parabricks v3.6. This is the second structural variant caller in Parabricks and is based upon Lumpy, allowing for simplified calling and genotyping of structural variants within an individual and across populations. Smoove works with short read data and more information can be found here: https://github.com/brentp/smoove

  • BamBasedVCFQC: This is an NVIDIA generated tool to help QC VCF outputs by using SamTools mPileUp results, using the original BAM

  • Vcfanno: This tool allows users to annotate VCF outputs using third party data sources like dbSNP, adding allele frequencies to the VCF

  • Frequencyfilteration: This tool allows variants within a VCF to be filtered based upon numeric fields containing allele frequency and read count information.

  • Vote based somatic caller merger (vbvm): This is for merging two or more VCF files and then enables filtering of variants based upon a simple voting based mechanism where variants can be filtered based upon the number of somatic callers that have identified a specific variant

Improvement and bug fixes

General

  • Improved error messaging for BWA and GATK’s Haplotypecaller. You can expect further improvements for error messaging across all Parabricks tools in future releases.

Germline/Somatic

  • Germline pipeline is accelerated on A100 DGX hardware. A standard dataset of 30x WGS pipeline, using an on-prem instance will complete in 22 minutes.

  • New argument –read-from-temp-dir is added to help haplotypecaller read temp files instead of BAM file. This helps to reduce variant calling time, and the results will be the same as previous versions.

  • Fastq reading is accelerated for both BWA and STAR using new decompression algorithms.

  • Strelka workflow is updated to handle germline variant calling too.

  • Strelka’s number of threads is customizable now.

  • CNVKit now supports out of order or overlapping intervals

RNA

  • STAR now generates log file

  • Fix an assertion failure in rna_fq2bam ReadAlign_outputTranscriptCIGARp.cpp:81:string chimericDetector::outputTranscriptCIGARp_pb(const chimericTrans&, PBWindow*): Assertion P.readFilesIn.size() > 1 failed.

  • Fix a possibility of a deadlock in rna_fq2bam

  • Remove duplicate @HD lines in the output of rna_fq2bam

  • New parameter “–read-name-separator” is added to rna_fq2bam

QC

  • mpileup and collectmetrics can now accept intervals.

  • Output file of collectmultiplemetrics is tab separated now instead of space separated.

  • MAD_COVERAGE is updated for corner cases