12.15. De Novo Sequence Assembly - Clara Genomics Analysis


This is a reference pipeline using Clara Genomic Analysis tools to assemble genome with Clara Deploy SDK.

These tools exploit the abilities of GPU to accelerate gene sequencing.


api-version: 0.4.0 name: denovo-gpu parameters: DOCKER_IMAGE: claraparabricks/cga_cuda10 DOCKER_TAG: v0.5.2 MAPPER_KMER_SIZE: 15 MAPPER_WINDOW_SIZE: 5 MAPPER_ADDITIONAL_PARAMS: '' RACON_LOOPS: 5 RACON_THREADS: 5 RACON_POLISH_BATCH_SIZE: 4 RACON_ADDITIONAL_PARAMS: '' JOB_ID: '.' operators: - name: mapper description: CUDA mapper container: image: ${{DOCKER_IMAGE}} tag: ${{DOCKER_TAG}} command: ["pef", "cudamapper", "-i", "/input", "-o", "/mapperOutput/${{JOB_ID}}", "-p", "-k${{MAPPER_KMER_SIZE}}-w${{MAPPER_WINDOW_SIZE}}"] requests: gpu: 1 memory: 16384 input: - path: /input/ output: - path: /mapperOutput - name: miniasm description: Miniasm container: image: ${{DOCKER_IMAGE}} tag: ${{DOCKER_TAG}} command: ["pef", "miniasm", "-i", "/input", "-l", "/mapperOutput/${{JOB_ID}}/overlaps.paf", "-o", "/asmOutput/${{JOB_ID}}"] input: - path: /input/ - from: mapper path: /mapperOutput output: - path: /asmOutput requests: memory: 16384 - name: racon description: Polish Assembly using racon container: image: ${{DOCKER_IMAGE}} tag: ${{DOCKER_TAG}} command: ["pef", "racon", "-i", "/input", "-a", "/asmOutput/${{JOB_ID}}/reads.fa", "-o", "/raconOutput/${{JOB_ID}}", "-t", "${{RACON_THREADS}}", "-l", "${{RACON_LOOPS}}"] requests: cpu: 4 gpu: 1 memory: 16384 input: - path: /input/ - from: miniasm path: /asmOutput - from: mapper path: /mapperOutput output: - path: /raconOutput/

Please refer to the Run Reference Pipelines using Local Input Files in the How to run a Reference Pipeline section to learn how to register a pipeline and execute the pipeline using local input files.

Input requires a folder containing the following files:

  • sample.fasta - Input fasta sample file for all-to-all mapping

  • jobConfig(optional) - A file containing param and value in shell script style. A sample (sample_job_config.sh) is provided. Following is content of a jobConfig file with default values.


    KMER_SIZE=15 # length of kmer to use for minimizers WINDOW_SIZE=5 # length of window to use for minimizers INDEX_SIZE=10000 # length of batch size used for query RACON_LOOPS=5 # Number of polishing loops RACON_THREADS=15 # number of threads POLISH_BATCH_SIZE=6 # number of batches for CUDA accelerated polishing

Assembled and polished sequence

© Copyright 2018-2020, NVIDIA Corporation. All rights reserved. Last updated on Jun 28, 2023.