Getting Started with Clara Parabricks

Contents

Getting Started with Clara Parabricks

Installation Requirements

Hardware Requirements

Access to the internet.
Any NVIDIA GPU that supports CUDA architecture 60, 70, 75 or 80 and has at least 12GB of GPU RAM. Parabricks has been tested on NVIDIA V100, NVIDIA A100 and NVIDIA T4 GPUs.
System Requirements:
- A 2 GPU server should have at least 100GB CPU RAM and at least 24 CPU threads.
- A 4 GPU server should have at least 196GB CPU RAM and at least 32 CPU threads.
- A 8 GPU server should have at least 392GB CPU RAM and at least 48 CPU threads.

Please note that Clara Parabricks is not supported on virtual (vGPU) or MIG (Multi-Instance) GPUs.

Software Requirements

The following are software requirements for running Clara Parabricks.

An NVIDIA driver that supports cuda-10.1 or higher. If you're using an Ampere GPU, support for cuda-11.0 or higher is required.
Any Linux Operating System that supports one of the following:
- nvidia-docker2
- singularity version 3.0 (or higher)
- Bare metal installation is supported for Ubuntu 18.04 only
Python 3

Verifying Hardware and Software Requirements

Checking available Nvidia hardware and driver

To check what Nvidia hardware and driver version you have, use the nvidia-smi command:

Copy
Copied!

            
            $ nvidia-smi +--------------------------------- | NVIDIA-SMI 450.119.04 |---------- | GPU  Name | Fan | |========== | | N/A   44C | +-------------------------------+- | | N/A   44C | +-------------------------------+- | | N/A   44C | +-------------------------------+- | | N/A   44C | +-------------------------------+-

+--------------------------------- | Processes: |  GPU   GI   CI |        ID   ID |========== |    0   N/A  N/A |    0   N/A  N/A |    1   N/A  N/A |    2   N/A  N/A |    3   N/A  N/A +---------------------------------

--------------------------------------------+ Driver Version: 450.119.04   CUDA Version: 11.0     | ---------------------+----------------------+----------------------+ Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC | Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. | |                      |               MIG M. | =====================+======================+======================| 0  Tesla V100-DGXS...  On   | 00000000:07:00.0 Off |                    0 | P0    38W / 300W |     74MiB / 16155MiB |      0%      Default | |                      |                  N/A | ---------------------+----------------------+ 1  Tesla V100-DGXS...  On   | 00000000:08:00.0 Off |                    0 | P0    37W / 300W |      6MiB / 16158MiB |      0%      Default | |                      |                  N/A | ---------------------+----------------------+ 2  Tesla V100-DGXS...  On   | 00000000:0E:00.0 Off |                    0 | P0    39W / 300W |      6MiB / 16158MiB |      0%      Default | |                      |                  N/A | ---------------------+----------------------+ 3  Tesla V100-DGXS...  On   | 00000000:0F:00.0 Off |                    0 | P0    38W / 300W |      6MiB / 16158MiB |      0%      Default | |                      |                  N/A | ---------------------+----------------------+ --------------------------------------------+ | PID   Type   Process name                  GPU Memory | Usage      | ===================================================================| 3019      G   /usr/lib/xorg/Xorg                 56MiB | 3350      G   /usr/bin/gnome-shell               16MiB | 3019      G   /usr/lib/xorg/Xorg                  4MiB | 3019      G   /usr/lib/xorg/Xorg                  4MiB | 3019      G   /usr/lib/xorg/Xorg                  4MiB | --------------------------------------------+

This shows the following important information:

The NVIDIA driver version is 450.119.04.
The CUDA version is 11.0.
There are four Tesla V100 GPUs.
Each GPU has 16 GB of memory.

Checking available CPU RAM and threads

To see how much RAM and CPU threads in your machine, you can run the following:

Copy
Copied!

            
            #To check available memory
$ cat /proc/meminfo | grep MemTotal

#To check available number of threads
$ cat /proc/cpuinfo | grep processor | wc -l

Checking nvidia-docker2 installation

To make sure you have nvidia-docker2 installed run this command:

Copy
Copied!

            
            $ docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

When it finishes downloading the container it will run the nvidia-smi command and show you the same output as above.

Checking python version

To see which version of Python you have, enter the following command:

Copy
Copied!

            
            $ python3 --version

Make sure it's at least version 3 (3.6.9, 3.7, etc).

Licensing

There are two types of Parabricks installation licenses:

Node Locked: licenses are tied to a specific set of GPUs on a server.
Flexera based: licenses allow for a set amount of GPUs to be used at once using a license server. This will use the Nvidia License Server. You can read more about it here (optional).

Distribution Options

The software can be installed and run in 3 ways:

Docker container
Singularity container
Bare-metal Debian package (.deb)

Getting the Software

The software can be accessed either through the trial program, through Nvidia License Server for paid users or through Amazon Web Services Marketplace.

Installing the Software

Parabricks software uses 2 methods for licensing:

node-locked license
Flexera license manager

Parabricks software can be installed and used by the following three methods:

a Docker container
a Singularity container
a bare metal via a .deb file

Consult the appropriate section below for you desired licensing and installation method.

Additional Installation Options

The common way of installing parabricks package is described using one of the methods above. However, you can modify the following options when using the installer.py script

--install-location
--container
--release
--extra-tools
--ampere
--cpu-only
--symlink
--uninstall
--force
--container-path
--registry
--username
--access-token
--enable-fakeroot
--flexera-server
--skip-image-check

Package contents:

installer.py: The installer to download and install the software
license.bin: The license file based on your license with NVIDIA Parabricks
EULA.txt: The End User License Agreement

Performance Tuning

The goal of Parabricks software is to get the highest performance for bioinformatics and genomic analysis. There are a few key system options that you can tune to achieve maximum performance.

Use a Fast SSD

Parabricks software operates with two kinds of files:

Input/output files specified by the user
Temporary files created during execution and deleted at the end

The best performance is achieved when both kinds of files are on a fast, local SSD.

If this is not possible, you can place the input/output files on a fast network storage device and the temporary files on a local SSD using the --tmp-dir option.

Note

Tests have shown that you can use up to 4 GPUs and still get good performance with the Lustre network for Input/Output files. If you plan to use more than 4 GPUs, we highly recommend using local SSDs for all kinds of files.

DGX Users

The DGX comes with a SSD, usually mounted on /raid. Use this disk, and use a directory on this disk as the --tmp-dir. For initial testing, you can even copy the input files to this disk to eliminate variability in performance.

Specifying which GPUs to use

You can choose the number of GPUs to run using the command line option --num-gpus N for tools and pipelines that use GPUs. With this command, the GPUs used will be limited to the first N GPUs listed in the output of the nvidia-smi command.

To select specific GPUs, set the environment variable NVIDIA_VISIBLE_DEVICES. GPUs are numbered starting with zero. For example, this command will use only the second (GPU #1) and fourth (GPU #3) GPUs:

Copy
Copied!

            
            $ NVIDIA_VISIBLE_DEVICES="1,3" pbrun fq2bam --num-gpus 2 --ref Ref.fa --in-fq S1_1.fastq.gz --in-fq S1_2.fastq.gz

Uninstalling the software

You can uninstall Clara Parabricks using the following commands, based on your installation method:

If Parabricks was installed with Docker, use this command:

Copy
Copied!

            
            $ sudo ./parabricks/installer.py --uninstall

If Parabricks was installed via apt (i.e. bare metal), use this command:

Copy
Copied!

            
            $ sudo apt purge parabricks

# If you get the warning **directory not empty so not removed** you'll need to remove those folders manually.

Note

Answers to most FAQs can be found on the Developer Forum. If additional support is needed, you can email us following the below instructions:

Customers with paid Parabricks licenses have direct access to support and can contact EnterpriseSupport@nvidia.com.
Users of free evaluation licenses can contact parabricks-eval-support@nvidia.com for troubleshooting any questions regarding these eval licenses