We are providing a flexible end-to-end solution for analyzing Unique Molecular Indices (UMI) data. For this, we are accelerating the fgbio pipeline. The Clara Parabricks fgbio solution can be run with a single command or as individual steps.
Annotates existing BAM files with UMIs (Unique Molecular Indices) from a separate FASTQ file.
Sort a BAM file. Five sort modes are supported:
Coordinate sort (Picard-compatible)
Coordinate sort (fgbio-compatible)
Queryname sort (Picard-compatible)
Queryname sort (fgbio-compatible)
Template coordinate sort (fgbio-compatible)
Adds and/or fixes mate information on paired-end reads.
We are also adding the following new tools in this release:
GPU-accelerated DeepTrio for calling de novo variants. This is an accelerated version of Google deepvariant team's deeptrio.
The MuSE somatic caller tool has been added to the Parabricks toolkit and has a 10x acceleration compared to its original implementation. Muse is the fifth somatic caller in Parabricks. Muse utilizes a novel approach to mutation calling based on the F81 Markov substitution model for molecular evolution, which models the evolution of the reference allele to the allelic composition of the matched tumor and normal tissue at each genomic locus. You can read more here.
Annotate variants based on a Panel of Normals (PON) file, modify the "INFO" field of input vcf file. This is the post process of calling "--pon" in mutect. After the mutect2 vcf is generated, this is a needed step if your are using PON.
genotypegvcf now supports .gz files.
Problems in triocombinegvcf and genotypegvcf with deepvariant's gvcfs files are fixed.
Strelka workflow now accepts interval files.
STAR is roughly 2x faster for specific sets of data.
splitncigar is significantly faster than before.
Fixed a CRAM support bug for fq2bam.
Fixed a CRAM support bug for human_par.
Two sources of deadlock in lofreq are fixed.
STAR deadlock bug is fixed.
Fix an assertion failure in rna_fq2bam: ReadAlign_outputTranscriptCIGARp.cpp:81:string chimericDetector::outputTranscriptCIGARp_pb(const chimericTrans&, PBWindow*): Assertion P.readFilesIn.size() > 1 failed.
Fix a possibility of a deadlock in rna_fq2bam.
Remove duplicate @HD lines in the output of rna_fq2bam.
Output of collectmultiplemetrics is now correctly tab separated, instead of using spaces.
Use of the --gen-insert-size option would cause the code to fail.
The --gen-all-metrics option failed to create the sequencing artifact report. It now correctly generates all available reports.
Fix a report generation bug for collectmultiplemetrics when --gen-alignment or --gen-insert-size was specified.