Performance of NVIDIA Earth-2 FourCastNet NIM#
Use this documentation for details about the performance of the NVIDIA Earth-2 FourCastNet NIM.
Evaluation Process#
The FourCastNet NIM’s API streams back forecast data as it is generated,
such that you receive each time-step sequentially.
Performance is measured based on the time needed to complete a set of three different forecasts.
Each forecast uses an initial condition at 2020-01-01T00:00:00,
and requests five output variables ['t2m', 'z500', 't850', 'u10m', 'v10m'].
The requested length of each forecast is the following:
6 hour forecast: 6 hour / 1 step
5 day forecast: 120 hour / 20 steps
10 day forecast: 240 hour / 40 steps
Note
Both model profiles that the NIM supports are expected to have similar performance values.
Performance Results#
The performance results appear in the following table. Latency is the total forecast time in seconds, and throughput is reported as seconds per step (lower is better).
Important
It is expected that measurements listed can deviate between machine and runs. The performance values depend greatly on the specific hardware configuration the NIM and client are running on.
H100#
6 hour |
5 day |
10 day |
|
|---|---|---|---|
Latency |
7.5s |
30.0s |
60.0s |
Throughput |
1.75 sec/step |
1.5 sec/step |
1.5 sec/step |
Replicating Benchmarking Results#
To approximately recreate benchmarking results, you can use the following script to time end-to-end forecast requests. This script assumes:
The NIM is running locally and reachable at
http://localhost:8000.You have already created an
fcn_inputs.npyfile, as described in the quickstart guide.
"""Simple client to time FourCastNet NIM inference.
Assumes the model is running on localhost and the input is in fcn_inputs.npy.
"""
from time import perf_counter
import requests
def time_fourcastnet_nim(
url: str,
input_file: str,
input_time: str,
simulation_length: int,
variables: str | None = None,
timeout: int = 300,
num_burn_in: int = 2,
num_trials: int = 5,
) -> float:
headers = {"accept": "application/x-tar"}
data = {"input_time": input_time, "simulation_length": simulation_length}
if variables is not None:
data["variables"] = variables
# Burn-in requests help avoid measuring one-time initialization effects.
for _ in range(num_burn_in):
files = {"input_array": ("input_array", open(input_file, "rb"))}
r = requests.post(url, headers=headers, data=data, files=files, timeout=timeout)
r.raise_for_status()
total = 0.0
for _ in range(num_trials):
files = {"input_array": ("input_array", open(input_file, "rb"))}
t0 = perf_counter()
r = requests.post(url, headers=headers, data=data, files=files, timeout=timeout)
r.raise_for_status()
total += perf_counter() - t0
return total / num_trials
if __name__ == "__main__":
url = "http://localhost:8000/v1/infer"
input_file = "fcn_inputs.npy"
input_time = "2020-01-01T00:00:00Z"
# Example: match the evaluation process section
variables = "t2m,z500,t850,u10m,v10m"
for simulation_length in [1, 20, 40]:
avg = time_fourcastnet_nim(
url=url,
input_file=input_file,
input_time=input_time,
simulation_length=simulation_length,
variables=variables,
timeout=600,
)
print(f"simulation_length={simulation_length}: avg latency {avg:.2f}s")