11.15. De Novo Sequence Assembly - Clara Genomics Analysis¶
This is a reference pipeline using Clara Genomic Analysis tools to assemble genome with Clara Deploy SDK.
These tools exploit the abilities of GPU to accelerate gene sequencing.
11.15.1. Pipeline Definition¶
api-version: 0.3.0
name: denovo-gpu
parameters:
DOCKER_IMAGE: claraomics/cga_cuda10
DOCKER_TAG: 0.4.3
MAPPER_THREADS: 15
MAPPER_KMER_SIZE: 15
MAPPER_WINDOW_SIZE: 5
MAPPER_INDEX_SIZE: 8000
MAPPER_ADDITIONAL_PARAMS: ''
RACON_LOOPS: 5
RACON_THREADS: 5
RACON_POLISH_BATCH_SIZE: 4
RACON_ADDITIONAL_PARAMS: ''
JOB_ID: '.'
operators:
- name: mapper
description: CUDA mapper
container:
image: ${{DOCKER_IMAGE}}
tag: ${{DOCKER_TAG}}
command: ["/bin/sh", "-c",
"mapperWrapper.sh ${{MAPPER_ADDITIONAL_PARAMS}} -i /input -d /mapperOutput/${{JOB_ID}} -o /mapperOutput/${{JOB_ID}}/overlaps.paf -t ${{MAPPER_THREADS}} -k ${{MAPPER_KMER_SIZE}} -w ${{MAPPER_WINDOW_SIZE}} -s ${{MAPPER_INDEX_SIZE}}"]
requests:
gpu: 1
input:
- path: /input/
output:
- path: /mapperOutput
- name: miniasm
description: Miniasm
container:
image: ${{DOCKER_IMAGE}}
tag: ${{DOCKER_TAG}}
command: ["/bin/sh", "-c",
"miniasmWrapper.sh -f /mapperOutput/${{JOB_ID}}/sample.fasta -l /mapperOutput/${{JOB_ID}}/overlaps.paf -o /asmOutput/${{JOB_ID}}/reads.gfa"]
input:
- path: /input/
- from: mapper
path: /mapperOutput
output:
- path: /asmOutput
- name: racon
description: Polish Assembly using racon
container:
image: ${{DOCKER_IMAGE}}
tag: ${{DOCKER_TAG}}
command: ["/bin/sh", "-c",
"raconWrapper.sh ${{RACON_ADDITIONAL_PARAMS}} -r /mapperOutput/${{JOB_ID}}/sample.fasta -t ${{RACON_THREADS}} -l ${{RACON_LOOPS}} -p ${{RACON_POLISH_BATCH_SIZE}} -f /asmOutput/${{JOB_ID}}/reads.gfa -o /raconOutput/${{JOB_ID}} -a /raconOutput/${{JOB_ID}}/polished_assembly.fa"]
requests:
gpu: 1
input:
- path: /input/
- from: miniasm
path: /asmOutput
- from: mapper
path: /mapperOutput
output:
- path: /raconOutput/
11.15.2. Executing the Pipeline¶
Please refer to the Run Reference Pipelines using Local Input Files
in the How to run a Reference Pipeline section to learn how to register a pipeline and
execute the pipeline using local input files.
11.15.3. Data Input¶
Input requires a folder containing the following files:
sample.fasta - Input fasta sample file for all-to-all mapping
jobConfig(optional) - A file containing param and value in shell script style. A sample (sample_job_config.sh) is provided. Following is content of a jobConfig file with default values.
KMER_SIZE=15 # length of kmer to use for minimizers WINDOW_SIZE=5 # length of window to use for minimizers INDEX_SIZE=10000 # length of batch size used for query RACON_LOOPS=5 # Number of polishing loops RACON_THREADS=15 # number of threads POLISH_BATCH_SIZE=6 # number of batches for CUDA accelerated polishing
11.15.4. Data Output¶
Assembled and polished sequence