Performance of NVIDIA Earth-2 Correction Diffusion NIM#
Use this documentation for details about the performance of the NVIDIA Earth-2 Correction Diffusion (CorrDiff) NIM.
Evaluation Process#
Performance of the CorrDiff NIM is measured by the time for an inference request to complete for a single sample. Two tests are performed, each with a different number of diffusion steps. The first test uses eight steps and represents a low-quality prediction. The second test uses 18 steps and represents a high-quality prediction. Both the NIM and client share the same computer to minimize network latency. The performance is measured on several supported GPUs and averaged across three different runs. It is important to note that the model contained in the NIM is dynamically compiled and requires warm-up before becoming fully performant.
Performance Results#
The performance results appear in the following table.
Important
The measurements that appear here can deviate between computers and runs. The performance values depend on the specific hardware configuration that the NIM and client are running on.
v1.0.0#
Diffusion Steps |
L40s |
RTX6000 |
A100 |
H100 |
|---|---|---|---|---|
8 Steps |
45.27s |
38.52s |
25.10s |
17.66s |
18 Steps |
95.46s |
81.71s |
51.43s |
32.04s |
v1.1.0#
Diffusion Steps |
L40s |
RTX6000 |
A100 |
H100 |
B200 |
|---|---|---|---|---|---|
8 Steps |
11.28s |
11.95s |
9.12s |
4.88s |
4.13s |
18 Steps |
25.01s |
28.20s |
20.12s |
10.31s |
8.81s |
Replicating Benchmarking Results#
To approximately recreate the table of results above, one can use the following script.
In this example we do not tune batch size (the benchmarks are only for a single sample
and input file). To manually tune the batch size, see
Configure NVIDIA Earth-2 Correction Diffusion NIM at Runtime,
and set the environment variable EARTH2NIM_TARGET_BATCHSIZE upon initialization.
""" Simple client to time the corrdiff model.
Assumes that the input is already in the input.npy file.
Assumes that the model is running on the localhost and is accessible at the url.
"""
import requests
from time import perf_counter
def time_corrdiff_nim(
url: str,
file: str,
data: dict,
headers: dict,
timeout: int = 180,
num_burn_in: int = 3,
num_trials: int = 10,
):
# Burn in is important for warming up the NIM
for _ in range(num_burn_in):
file_dict = {
"input_array": ("input_array", open(file, "rb")),
}
r = requests.post(
url,
headers=headers,
data=data,
files=files,
timeout=timeout
)
if r.status_code != 200:
raise Exception(r.content)
# Measure time
total_time = 0
for _ in range(num_trials):
file_dict = {
"input_array": ("input_array", open(file, "rb")),
}
start_time = perf_counter()
r = requests.post(
url,
headers=headers,
data=data,
files=files,
timeout=timeout
)
if r.status_code != 200:
raise Exception(r.content)
else:
# Dump response to file
end_time = perf_counter()
total_time += end_time - start_time
return total_time / num_trials
if __name__ == "__main__":
url = "http://localhost:8000/v1/infer"
file = "input.npy"
samples = 1
for steps in [8, 18]:
data = {
"samples": samples,
"steps": steps,
"seed": 0,
}
headers = {
"accept": "application/x-tar",
}
print(f"Sending post request of file {file} to {url}")
print(f"\t for samples: {samples}, steps: {steps}")
r = time_corrdiff_nim(
url,
files,
data,
headers,
timeout=180,
num_trials=10,
num_burn_in=3
)
print(f"Samples: {samples}, Steps: {steps}, Time: {r} seconds")