Best Performance
Parabricks software can give very high performance when all the required computing resources are provided to it. It should meet all the requirements in Installation Requirements section. Here are a few examples of when Parabricks software gives best performance.
See the Performance Tuning section for basic performance enhancement ideas.
See the Hardware Requirements section for minimum hardware requirements.
# This command assumes all the inputs are in INPUT_DIR and all the outputs go to OUTPUT_DIR.
$ docker run --rm --gpus all --volume INPUT_DIR:/workdir --volume OUTPUT_DIR:/outputdir \ \
--workdir /workdir \
--env TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=268435456 \
nvcr.io/nvidia/clara/clara-parabricks:4.1.0-1 \
pbrun germline \
--ref /workdir/Homo_sapiens_assembly38.fasta \
--in-fq /workdir/fastq1.gz /workdir/fastq2.gz \
--out-bam /outputdir/fq2bam_output.bam \
--tmp-dir /workdir \
--num-cpu-threads 16 \
--out-recal-file recal.txt \
--knownSites /workdir/hg.known_indels.vcf \
--out-variants /outputdir/out.vcf \
--run-partition --no-alt-contigs \
--gpusort \
--gpuwrite \
--read-from-tmp-dir
# This command assumes all the inputs are in INPUT_DIR and all the outputs go to OUTPUT_DIR.
$ docker run --rm --gpus all --volume INPUT_DIR:/workdir --volume OUTPUT_DIR:/outputdir \ \
--workdir /workdir --env TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=268435456
nvcr.io/nvidia/clara/clara-parabricks:4.1.0-1 \
pbrun fq2bam \
--ref /workdir/Homo_sapiens_assembly38.fasta \
--in-fq /workdir/fastq1.gz /workdir/fastq2.gz \
--out-bam /outputdir/fq2bam_output.bam \
--tmp-dir /workdir \
--num-cpu-threads 16 \
--out-recal-file recal.txt \
--knownSites /workdir/hg.known_indels.vcf \
--gpusort \
--gpuwrite
DeepVariant from Parabricks has the ability to use multiple streams on a GPU. The number of streams that can be used depends on the available resources. The default number of streams is set to two but can be increased up to a maximum of six to get better performance. This is something that has to be experimented with, before getting the optimal number on your system.
# This command assumes all the inputs are in INPUT_DIR and all the outputs go to OUTPUT_DIR.
$ docker run --rm --gpus all --volume INPUT_DIR:/workdir --volume OUTPUT_DIR:/outputdir \ \
--workdir /workdir --env TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=268435456
nvcr.io/nvidia/clara/clara-parabricks:4.1.0-1 \
pbrun deepvariant \
--ref /workdir/Homo_sapiens_assembly38.fasta \
--in-bam /outputdir/fq2bam_output.bam \
--out-variants /outputdir/out.vcf \
--num-streams-per-gpu 4 \
--run-partition \
--gpu-num-per-partition 1