Configure NVIDIA Earth-2 Correction Diffusion NIM at Runtime#

Use this documentation for details about how to configure the NVIDIA Earth-2 Correction Diffusion (CorrDiff) NIM at runtime.

GPU Selection#

Passing --gpus all to docker run is acceptable in homogeneous environments with 1 or more of the same GPU. In some environments, it is beneficial to run the container on specific GPUs. Expose specific GPUs inside the container by using either:

  • The --gpus flag, for example --gpus='"device=1"'.

  • The environment variable NVIDIA_VISIBLE_DEVICES, for example -e NVIDIA_VISIBLE_DEVICES=1.

The device IDs to use as inputs are the output of nvidia-smi -L.

GPU 0: Tesla H100 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)
GPU 1: NVIDIA GeForce RTX 3080 (UUID: GPU-b404a1a1-d532-5b5c-20bc-b34e37f3ac46)

See the NVIDIA Container Toolkit documentation for more instructions.

Shared Memory Flag#

CorrDiff NIM uses Triton’s Python backend capabilities that scales with the number of CPU cores available. You may need to increase the available shared memory given to the microservice container.

Example providing 1g of shared memory:

docker run ... --shm-size=1g ...

Model Profiles#

The CorrDiff NIM has the following model profiles that can be used:

CorrDiff US GEFS HRRR#

NIM_MODEL_PROFILE: bf8e1ed158c1bf27d2e36fc4936a3d2989948a3f4e4e80e2b0e7a7124661911c

Corrector Diffusion (CorrDiff) US GEFS-HRRR model down-scales several surface and atmospheric variables from 25-km resolution forecast data from the Global Ensemble Forecast System (GEFS) and predicts 3-km resolution High-Resolution Rapid Refresh (HRRR) data.

Environment Variables#

The CorrDiff NIM allows a few customizations that are referenced on the start up of the container. The below variables can be used to change the NIM behavior.

Variable

Default

Description

NGC_API_KEY

Your NGC API key with read access to the model registry for the model profile you are using.

NIM_MODEL_PROFILE

“bf8e1….1911c”

The model package to load into NIM on launch. This is downloaded from NGC assuming that you have the correct permissions.

NIM_HTTP_API_PORT

8000

Publish the NIM service to the specified port inside the container. Make sure to adjust the port passed to the -p/--publish flag of docker run to reflect that.

NIM_DISABLE_MODEL_DOWNLOAD

Disable model download on container startup.

EARTH2NIM_TARGET_BATCHSIZE

8

The target sample batch size that the NIM initially splits a request into. This is then dynamically batched across model instances. You might need to lower this for GPUs with lower VRAM. The preferred batch sizes are 4, 8, 12, 16

Mounted Volumes#

The following paths inside the container can be mounted to enhance the runtime of the NIM:

Container Path

Required

Description

Example

/opt/nim/.cache

No

This is the directory within which models are downloaded inside the container. This directory must be accessible from inside the container. This can be achieved by adding the option -u $(id -u) to the docker run command.

-v ~/.cache/nim:/opt/nim/.cache