Getting Started with Clara Parabricks

Hardware Requirements

  • Access to the internet.

  • Any NVIDIA GPU that supports CUDA architecture 60, 70, 75 or 80 and has at least 12GB of GPU RAM. Parabricks has been tested on NVIDIA V100, NVIDIA A100 and NVIDIA T4 GPUs.

  • System Requirements:

    • A 2 GPU server should have at least 100GB CPU RAM and at least 24 CPU threads.

    • A 4 GPU server should have at least 196GB CPU RAM and at least 32 CPU threads.

    • A 8 GPU server should have at least 392GB CPU RAM and at least 48 CPU threads.

Please note that Clara Parabricks is not supported on virtual (vGPU) or MIG (Multi-Instance) GPUs.

Software Requirements

The following are software requirements for running Clara Parabricks.

  • An NVIDIA driver that supports cuda-10.1 or higher. If you're using an Ampere GPU, support for cuda-11.0 or higher is required.

  • Any Linux Operating System that supports one of the following:

  • Python 3

Verifying Hardware and Software Requirements

Checking available Nvidia hardware and driver

To check what Nvidia hardware and driver version you have, use the nvidia-smi command:

Copy
Copied!
            

$ nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.119.04 Driver Version: 450.119.04 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100-DGXS... On | 00000000:07:00.0 Off | 0 | | N/A 44C P0 38W / 300W | 74MiB / 16155MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-DGXS... On | 00000000:08:00.0 Off | 0 | | N/A 44C P0 37W / 300W | 6MiB / 16158MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-DGXS... On | 00000000:0E:00.0 Off | 0 | | N/A 44C P0 39W / 300W | 6MiB / 16158MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100-DGXS... On | 00000000:0F:00.0 Off | 0 | | N/A 44C P0 38W / 300W | 6MiB / 16158MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 3019 G /usr/lib/xorg/Xorg 56MiB | | 0 N/A N/A 3350 G /usr/bin/gnome-shell 16MiB | | 1 N/A N/A 3019 G /usr/lib/xorg/Xorg 4MiB | | 2 N/A N/A 3019 G /usr/lib/xorg/Xorg 4MiB | | 3 N/A N/A 3019 G /usr/lib/xorg/Xorg 4MiB | +-----------------------------------------------------------------------------+

This shows the following important information:

  • The NVIDIA driver version is 450.119.04.

  • The CUDA version is 11.0.

  • There are four Tesla V100 GPUs.

  • Each GPU has 16 GB of memory.

Checking available CPU RAM and threads

To see how much RAM and CPU threads in your machine, you can run the following:

Copy
Copied!
            

#To check available memory $ cat /proc/meminfo | grep MemTotal #To check available number of threads $ cat /proc/cpuinfo | grep processor | wc -l


Checking nvidia-docker2 installation

To make sure you have nvidia-docker2 installed run this command:

Copy
Copied!
            

$ docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

When it finishes downloading the container it will run the nvidia-smi command and show you the same output as above.

Checking python version

To see which version of Python you have, enter the following command:

Copy
Copied!
            

$ python3 --version

Make sure it's at least version 3 (3.6.9, 3.7, etc).

Licensing

There are two types of Parabricks installation licenses:

  • Node Locked: licenses are tied to a specific set of GPUs on a server.

  • Flexera based: licenses allow for a set amount of GPUs to be used at once using a license server. This will use the Nvidia License Server. You can read more about it here (optional).

Distribution Options

The software can be installed and run in 3 ways:

  • Docker container

  • Singularity container

  • Bare-metal Debian package (.deb)

The software can be accessed either through the trial program, through Nvidia License Server for paid users or through Amazon Web Services Marketplace.

Parabricks software uses 2 methods for licensing:

  • node-locked license

  • Flexera license manager

Parabricks software can be installed and used by the following three methods:

  • a Docker container

  • a Singularity container

  • a bare metal via a .deb file

Consult the appropriate section below for you desired licensing and installation method.

Additional Installation Options

The common way of installing parabricks package is described using one of the methods above. However, you can modify the following options when using the installer.py script

--install-location

The location where the parabricks folder will be created. The the parabricks folder contains everything needed to run the software. The user must have permission to write to this location. (default: /opt/)

--container

{docker,singularity} The type of container technology to use (default: docker)

--release

The parabricks release version to install. Contact parabricks-support@nvidia.com if you do not want the default version and need more information.

--extra-tools

Install extra tools like vcfqc, vcfqcbybam, and cnvkit. This will make installation slower. (default: False)

--ampere

Install the Ampere-based container. (default: False)

--cpu-only

Install on a CPU server. No GPU accelerated tools will run with this option.

--symlink

Create a symlink at /usr/bin/pbrun. You can choose to do this during installation.

--uninstall

Uninstall parabricks. Removes all the images and scripts from --install-location.

--force

Disable interactive installation. (default: False)

--container-path

A specific path for the container, relative to the registry (default: None)

--registry

The registry to pull from (default: None)

--username

Username to login to the registry (default: None)

--access-token

Access Token/Password for the supplied username

--enable-fakeroot

Install using --fakeroot capabilities in singularity 3.3 or higher. You will need sudo to build for singularity versions below 3.3. (default: False)

--flexera-server

The Flexera server name (default: None)

--skip-image-check

Use the locally present Docker image. (default: False)

Package contents:

  • installer.py: The installer to download and install the software

  • license.bin: The license file based on your license with NVIDIA Parabricks

  • EULA.txt: The End User License Agreement

The goal of Parabricks software is to get the highest performance for bioinformatics and genomic analysis. There are a few key system options that you can tune to achieve maximum performance.

Use a Fast SSD

Parabricks software operates with two kinds of files:

  • Input/output files specified by the user

  • Temporary files created during execution and deleted at the end

The best performance is achieved when both kinds of files are on a fast, local SSD.

If this is not possible, you can place the input/output files on a fast network storage device and the temporary files on a local SSD using the --tmp-dir option.

Note

Tests have shown that you can use up to 4 GPUs and still get good performance with the Lustre network for Input/Output files. If you plan to use more than 4 GPUs, we highly recommend using local SSDs for all kinds of files.

DGX Users

The DGX comes with a SSD, usually mounted on /raid. Use this disk, and use a directory on this disk as the --tmp-dir. For initial testing, you can even copy the input files to this disk to eliminate variability in performance.

Specifying which GPUs to use

You can choose the number of GPUs to run using the command line option --num-gpus N for tools and pipelines that use GPUs. With this command, the GPUs used will be limited to the first N GPUs listed in the output of the nvidia-smi command.

To select specific GPUs, set the environment variable NVIDIA_VISIBLE_DEVICES. GPUs are numbered starting with zero. For example, this command will use only the second (GPU #1) and fourth (GPU #3) GPUs:

Copy
Copied!
            

$ NVIDIA_VISIBLE_DEVICES="1,3" pbrun fq2bam --num-gpus 2 --ref Ref.fa --in-fq S1_1.fastq.gz --in-fq S1_2.fastq.gz


You can uninstall Clara Parabricks using the following commands, based on your installation method:

If Parabricks was installed with Docker, use this command:

Copy
Copied!
            

$ sudo ./parabricks/installer.py --uninstall

If Parabricks was installed via apt (i.e. bare metal), use this command:

Copy
Copied!
            

$ sudo apt purge parabricks # If you get the warning **directory not empty so not removed** you'll need to remove those folders manually.

Note

Answers to most FAQs can be found on the Developer Forum. If additional support is needed, you can email us following the below instructions:

  1. Customers with paid Parabricks licenses have direct access to support and can contact EnterpriseSupport@nvidia.com.

  2. Users of free evaluation licenses can contact parabricks-eval-support@nvidia.com for troubleshooting any questions regarding these eval licenses

© Copyright 2022, Nvidia. Last updated on Jun 28, 2023.