Performance Tuning - NVIDIA Docs

The goal of Parabricks software is to get the highest performance for bioinformatics and genomic analysis. There are a few key system options that you can tune to achieve maximum performance.

Use a Fast SSD

Parabricks software operates with two kinds of files:

Input/output files specified by the user
Temporary files created during execution and deleted at the end

The best performance is achieved when both kinds of files are on a fast, local SSD.

If this is not possible, you can place the input/output files on a fast network storage device and the temporary files on a local SSD using the --tmp-dir option.

Note

Tests have shown that you can use up to 4 GPUs and still get good performance with the Lustre network for Input/Output files. If you plan to use more than 4 GPUs, we highly recommend using local SSDs for all kinds of files.

DGX Users

The DGX comes with a SSD, usually mounted on /raid. Use this disk, and use a directory on this disk as the --tmp-dir. For initial testing, you can even copy the input files to this disk to eliminate variability in performance.

Specifying which GPUs to use

You can choose the number of GPUs to run using the command line option --num-gpus N for tools and pipelines that use GPUs. With this command, the GPUs used will be limited to the first N GPUs listed in the output of the nvidia-smi command.

To select specific GPUs, set the environment variable NVIDIA_VISIBLE_DEVICES. GPUs are numbered starting with zero. For example, this command will use only the second (GPU #1) and fourth (GPU #3) GPUs:

Copy
Copied!

            
            $ NVIDIA_VISIBLE_DEVICES="1,3" pbrun fq2bam --num-gpus 2 --ref Ref.fa --in-fq S1_1.fastq.gz --in-fq S1_2.fastq.gz