Clara Parabricks v3.7.0
3.7.0

Performance Tuning

The goal of Parabricks software is to get the highest performance for bioinformatics and genomic analysis. There are a few key system options that you can tune to achieve maximum performance.

Parabricks software operates with two kinds of files:

  • Input/output files specified by the user

  • Temporary files created during execution and deleted at the end

The best performance is achieved when both kinds of files are on a fast, local SSD.

If this is not possible, you can place the input/output files on a fast network storage device and the temporary files on a local SSD using the --tmp-dir option.

Note

Tests have shown that you can use up to 4 GPUs and still get good performance with the Lustre network for Input/Output files. If you plan to use more than 4 GPUs, we highly recommend using local SSDs for all kinds of files.

DGX Users

The DGX comes with a SSD, usually mounted on /raid. Use this disk, and use a directory on this disk as the --tmp-dir. For initial testing, you can even copy the input files to this disk to eliminate variability in performance.

You can choose the number of GPUs to run using the command line option --num-gpus N for tools and pipelines that use GPUs. With this command, the GPUs used will be limited to the first N GPUs listed in the output of the nvidia-smi command.

To select specific GPUs, set the environment variable NVIDIA_VISIBLE_DEVICES. GPUs are numbered starting with zero. For example, this command will use only the second (GPU #1) and fourth (GPU #3) GPUs:

Copy
Copied!
            

$ NVIDIA_VISIBLE_DEVICES="1,3" pbrun fq2bam --num-gpus 2 --ref Ref.fa --in-fq S1_1.fastq.gz --in-fq S1_2.fastq.gz

© Copyright 2022, Nvidia. Last updated on Jun 28, 2023.