The goal of Parabricks software is to get the highest performance for bioinformatics and genomic analysis. There are a few key system options that you can tune to achieve maximum performance.
Parabricks software operates with two kinds of files:
Input/output files specified by the user
Temporary files created during execution and deleted at the end
The best performance is achieved when both kinds of files are on a fast, local SSD.
If this is not possible, you can place the input/output files on a fast network storage device and
the temporary files on a local SSD using the
Tests have shown that you can use up to 4 GPUs and still get good performance with the Lustre network for Input/Output files. If you plan to use more than 4 GPUs, we highly recommend using local SSDs for all kinds of files.
The DGX comes with a SSD, usually mounted on
/raid. Use this disk, and use a
directory on this disk as the
--tmp-dir. For initial testing, you can even copy the
input files to this disk to eliminate variability in performance.
You can choose the number of GPUs to run using the command line option
for tools and pipelines that use GPUs. With this command, the GPUs used will be
limited to the first
N GPUs listed in the output of the
To select specific GPUs, set the environment variable
NVIDIA_VISIBLE_DEVICES. GPUs are
numbered starting with zero. For example, this command will use only the second (GPU #1) and
fourth (GPU #3) GPUs:
$ NVIDIA_VISIBLE_DEVICES="1,3" pbrun fq2bam --num-gpus 2 --ref Ref.fa --in-fq S1_1.fastq.gz --in-fq S1_2.fastq.gz