10.19. Clara Genomics Analysis

10.19.1. Overview

Clara Genomics Analysis is a GPU-accelerated library for biological sequence analysis. This section provides a brief overview of the different components of ClaraGenomicsAnalysis.

Source code is at https://github.com/clara-genomics

10.19.1.1. Components in the toolset:

10.19.1.1.1. cudamapper

The cudamapper package provides minimizer-based GPU-accelerated approximate mapping. cudamapper outputs mappings in the PAF format and is currently optimised for all-vs-all long read (ONT, Pacific Biosciences) sequences.

Detailed documentation is at https://github.com/clara-genomics/ClaraGenomicsAnalysis

10.19.1.1.2. racon

Racon can be used as a polishing tool after the assembly with either Illumina data or data produced by third generation of sequencing. The type of data inputed is automatically detected.

Racon takes as input only three files: contigs in FASTA/FASTQ format, reads in FASTA/FASTQ format and overlaps/alignments between the reads and the contigs in MHAP/PAF/SAM format. Output is a set of polished contigs in FASTA format printed to stdout. All input files can be compressed with gzip (which will have impact on parsing time).

Racon can also be used as a read error-correction tool. In this scenario, the MHAP/PAF/SAM file needs to contain pairwise overlaps between reads including dual overlaps.

Detailed documentation is at https://github.com/clara-genomics/racon-gpu

10.19.1.2. 3rd party tools:

10.19.1.2.1. minimap2

Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. Typical use cases include:

Detailed documentation is at https://github.com/lh3/minimap2

10.19.1.2.2. miniasm

Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to the raw input reads.

Detailed documentation is at https://github.com/lh3/minimap2

10.19.2. Directory Structure

This sample includes the following folders and files:

Dockerfile
- Creates the docker image encapsulating all the tools mentioned above.