Performance of NVIDIA Earth-2 Correction Diffusion NIM#

Use this documentation for details about the performance of the NVIDIA Earth-2 Correction Diffusion (CorrDiff) NIM.

Evaluation Process#

Performance of the CorrDiff NIM is measured by the time for an inference request to complete for a single sample. Two tests are performed, each with a different number of diffusion steps. The first test uses eight steps and represents a low-quality prediction. The second test uses 18 steps and represents a high-quality prediction. Both the NIM and client share the same computer to minimize network latency. The performance is measured on several supported GPUs and averaged across three different runs. It is important to note that the model contained in the NIM is dynamically compiled and requires warm-up before becoming fully performant.

Performance Results#

The performance results appear in the following table.

Important

The measurements that appear here can deviate between computers and runs. The performance values depend on the specific hardware configuration that the NIM and client are running on.

v1.0.0#

Diffusion Steps	L40s	RTX6000	A100	H100
8 Steps	45.27s	38.52s	25.10s	17.66s
18 Steps	95.46s	81.71s	51.43s	32.04s

v1.1.0#

Diffusion Steps	L40s	RTX6000	A100	H100	B200
8 Steps	11.28s	11.95s	9.12s	4.88s	4.13s
18 Steps	25.01s	28.20s	20.12s	10.31s	8.81s

Replicating Benchmarking Results#

To approximately recreate the table of results above, one can use the following script. In this example we do not tune batch size (the benchmarks are only for a single sample and input file). To manually tune the batch size, see Configure NVIDIA Earth-2 Correction Diffusion NIM at Runtime, and set the environment variable EARTH2NIM_TARGET_BATCHSIZE upon initialization.

""" Simple client to time the corrdiff model. 

Assumes that the input is already in the input.npy file.
Assumes that the model is running on the localhost and is accessible at the url.
"""
import requests
from time import perf_counter

def time_corrdiff_nim(
    url: str,
    file: str,
    data: dict,
    headers: dict,
    timeout: int = 180,
    num_burn_in: int = 3,
    num_trials: int = 10,
):
    
    # Burn in is important for warming up the NIM
    for _ in range(num_burn_in):
        file_dict = {
            "input_array": ("input_array", open(file, "rb")),
        }
        r = requests.post(
            url, 
            headers=headers, 
            data=data, 
            files=files, 
            timeout=timeout
        )
        if r.status_code != 200:
            raise Exception(r.content)

    # Measure time
    total_time = 0
    for _ in range(num_trials):
        file_dict = {
            "input_array": ("input_array", open(file, "rb")),
        }
        start_time = perf_counter()
        r = requests.post(
            url, 
            headers=headers, 
            data=data, 
            files=files, 
            timeout=timeout
        )
        if r.status_code != 200:
            raise Exception(r.content)
        else:
            # Dump response to file
            end_time = perf_counter()
            total_time += end_time - start_time
    return total_time / num_trials


if __name__ == "__main__":
    url = "http://localhost:8000/v1/infer"
    file = "input.npy"
    samples = 1
    for steps in [8, 18]:
        data = {
            "samples": samples,
            "steps": steps,
            "seed": 0,
        }
        headers = {
            "accept": "application/x-tar",
        }
        print(f"Sending post request of file {file} to {url}")
        print(f"\t for samples: {samples}, steps: {steps}")
        r = time_corrdiff_nim(
            url, 
            files, 
            data, 
            headers, 
            timeout=180, 
            num_trials=10, 
            num_burn_in=3
        )
        print(f"Samples: {samples}, Steps: {steps}, Time: {r} seconds")