Performance of NVIDIA Earth-2 Correction Diffusion NIM#

Use this documentation for details about the performance of the NVIDIA Earth-2 Correction Diffusion (CorrDiff) NIM.

Evaluation Process#

Performance of the CorrDiff NIM is measured by the time for an inference request to complete for a single sample. Two tests are performed, each with a different number of diffusion steps. The first test uses eight steps and represents a low-quality prediction. The second test uses 18 steps and represents a high-quality prediction. Both the NIM and client share the same computer to minimize network latency. The performance is measured on several supported GPUs and averaged across three different runs. It is important to note that the model contained in the NIM is dynamically compiled and requires warm-up before becoming fully performant.

Performance Results#

The performance results appear in the following table.

Important

The measurements that appear here can deviate between computers and runs. The performance values depend on the specific hardware configuration that the NIM and client are running on.

v1.0.0#

Diffusion Steps

L40s

RTX6000

A100

H100

8 Steps

45.27s

38.52s

25.10s

17.66s

18 Steps

95.46s

81.71s

51.43s

32.04s

v1.1.0#

Diffusion Steps

L40s

RTX6000

A100

H100

B200

8 Steps

11.28s

11.95s

9.12s

4.88s

4.13s

18 Steps

25.01s

28.20s

20.12s

10.31s

8.81s

Replicating Benchmarking Results#

To approximately recreate the table of results above, one can use the following script. In this example we do not tune batch size (the benchmarks are only for a single sample and input file). To manually tune the batch size, see Configure NVIDIA Earth-2 Correction Diffusion NIM at Runtime, and set the environment variable EARTH2NIM_TARGET_BATCHSIZE upon initialization.

""" Simple client to time the corrdiff model. 

Assumes that the input is already in the input.npy file.
Assumes that the model is running on the localhost and is accessible at the url.
"""
import requests
from time import perf_counter

def time_corrdiff_nim(
    url: str,
    file: str,
    data: dict,
    headers: dict,
    timeout: int = 180,
    num_burn_in: int = 3,
    num_trials: int = 10,
):
    
    # Burn in is important for warming up the NIM
    for _ in range(num_burn_in):
        file_dict = {
            "input_array": ("input_array", open(file, "rb")),
        }
        r = requests.post(
            url, 
            headers=headers, 
            data=data, 
            files=files, 
            timeout=timeout
        )
        if r.status_code != 200:
            raise Exception(r.content)

    # Measure time
    total_time = 0
    for _ in range(num_trials):
        file_dict = {
            "input_array": ("input_array", open(file, "rb")),
        }
        start_time = perf_counter()
        r = requests.post(
            url, 
            headers=headers, 
            data=data, 
            files=files, 
            timeout=timeout
        )
        if r.status_code != 200:
            raise Exception(r.content)
        else:
            # Dump response to file
            end_time = perf_counter()
            total_time += end_time - start_time
    return total_time / num_trials


if __name__ == "__main__":
    url = "http://localhost:8000/v1/infer"
    file = "input.npy"
    samples = 1
    for steps in [8, 18]:
        data = {
            "samples": samples,
            "steps": steps,
            "seed": 0,
        }
        headers = {
            "accept": "application/x-tar",
        }
        print(f"Sending post request of file {file} to {url}")
        print(f"\t for samples: {samples}, steps: {steps}")
        r = time_corrdiff_nim(
            url, 
            files, 
            data, 
            headers, 
            timeout=180, 
            num_trials=10, 
            num_burn_in=3
        )
        print(f"Samples: {samples}, Steps: {steps}, Time: {r} seconds")