Advanced features for DoMINO-Automotive-Aero NIM#

Use this documentation to learn about the advanced features of the DoMINO-Automotive-Aero NIM.

Inference with Custom Checkpoints#

By default, when launching the NIM container, a pre-trained model checkpoint is automatically pulled from the NGC registry and used to instantiate the model. However, users may wish to run inference with their own custom checkpoints—such as those trained on proprietary datasets or fine-tuned for specific applications.

To use a custom checkpoint, simply mount your checkpoint file into the container at runtime using Docker’s volume mounting feature.

export NGC_API_KEY=<NGC API Key>

docker run --rm --runtime=nvidia --gpus 1 --shm-size 2g \
    -p 8000:8000 \
    -e NGC_API_KEY \
    -v <PATH_TO_CUSTOM_CHECKPOINT>:/opt/nim/custom_checkpoint/checkpoint.pt \
    -t nvcr.io/nim/nvidia/domino-automotive-aero:2.1.0

Replace <PATH_TO_CUSTOM_CHECKPOINT> with the full path to your custom .pt checkpoint file.

How it works:
When the NIM container starts, it checks for the presence of a checkpoint at /opt/nim/custom_checkpoint/checkpoint.pt. If this file exists, the container uses your custom checkpoint for model initialization and skips downloading the default checkpoint from NGC.

This allows you to easily run inference with your own trained or fine-tuned models.

Note

If you’re using a custom checkpoint, you may also want to provide a custom YAML configuration to control model defaults. See “Inference with Custom Configs” below.

Inference with Custom Configs#

By default, the NIM uses its built-in Hydra configuration. You can override it with your own YAML config by mounting a config file into the container at runtime.

export NGC_API_KEY=<NGC API Key>

docker run --rm --runtime=nvidia --gpus 1 --shm-size 2g \
    -p 8000:8000 \
    -e NGC_API_KEY \
    -v <PATH_TO_CUSTOM_CONFIG>:/opt/nim/custom_config/config.yaml \
    -t nvcr.io/nim/nvidia/domino-automotive-aero:2.1.0

Replace <PATH_TO_CUSTOM_CONFIG> with the full path to your .yaml config.

Start by modifying the default configs to build your custom configuration file. The default configs are also available via the /v1/model/config endpoint.

How it works: On startup, the container looks for exactly one .yaml file under /opt/nim/custom_config/. If one is found, it is used as the model configuration (overrides the default Hydra config). If more than one .yaml is present, startup fails to avoid ambiguity. If none are found, the default built-in Hydra config is used. If your custom config references other files, ensure those paths are valid inside the container (absolute paths recommended).

Using a custom checkpoint? You may also want to pair it with a custom config to keep hyperparameters and preprocessing in sync. You can mount both in the same run:

export NGC_API_KEY=<NGC API Key>

docker run --rm --runtime=nvidia --gpus 1 --shm-size 2g \
    -p 8000:8000 \
    -e NGC_API_KEY \
    -v <PATH_TO_CUSTOM_CHECKPOINT>:/opt/nim/custom_checkpoint/checkpoint.pt \
    -v <PATH_TO_CUSTOM_CONFIG>:/opt/nim/custom_config/config.yaml \
    -t nvcr.io/nim/nvidia/domino-automotive-aero:2.1.0

Multi-GPU Support#

The NIM container is built to fully leverage systems with multiple GPUs. When multiple GPUs are available, the inference server can automatically distribute incoming requests across the available devices, allowing for concurrent processing and improved throughput. Each inference request is assigned to a specific GPU, ensuring that different requests can be executed in parallel on different GPUs. This enables efficient scaling for high-throughput or multi-user scenarios, making full use of the hardware resources in your environment without requiring manual device management.

Start your container with the --gpus all option to enable the use of all available GPUs on your system.

docker run --rm --runtime=nvidia --gpus all --shm-size 2g \
    -p 8000:8000 \
    -e NGC_API_KEY \
    -t nvcr.io/nim/nvidia/domino-automotive-aero:2.1.0

Flexible Batched Inference#

The NIM supports flexible batched inference, allowing efficient processing of large numbers of query points. During inference, the set of query points is automatically divided into batches, and each batch is processed sequentially until all points have been evaluated. The batch size used for inference is configurable via the batch_size parameter in the inference endpoints. By adjusting this parameter, users can optimize performance and maximize GPU utilization based on the available hardware resources. Selecting an appropriate batch size helps balance memory usage and throughput, ensuring efficient and scalable inference for a wide range of workloads.

Inference on Custom Volume Point Clouds#

By default, for volume predictions, the NIM samples query points within the computational domain using a uniform random distribution. However, users may wish to use a different set of query points, such as a custom point cloud with higher density near the car surface or a point cloud derived from simulation mesh nodes. To enable this, instead of specifying the point_cloud_size parameter, users can provide the point_cloud parameter, which should point to a .npy file containing the desired point cloud coordinates (with shape (N, 3)). This allows for greater flexibility and control over where predictions are made, supporting advanced workflows and custom analysis requirements. Below is a sample client code that shows how custom point clouds can be used.

import io, httpx, numpy

url = "http://localhost:8000/v1/infer"
point_cloud_path = 'random_points.npy'
stl_file_path = 'drivaer_112_single_solid.stl'

data = {
    "stream_velocity": "30.0", 
    "stencil_size": "1",
    "batch_size": "128000",
}
with open(stl_file_path, "rb") as stl_file, open(point_cloud_path, "rb") as pc_file:
    files = {
        "design_stl": (stl_file_path, stl_file),
        "point_cloud": ("point_cloud.npy", pc_file)
    }
    r = httpx.post(url, files=files, data=data, timeout=120.0)
if r.status_code != 200:
    raise Exception(r.content)
with numpy.load(io.BytesIO(r.content)) as output_data:
    output_dict = {key: output_data[key] for key in output_data.keys()}  
print(output_dict.keys())

Chunked Parallel Requests#

For very large point clouds, you can partition volume inference across multiple requests while issuing a single surface request, then merge the results client‑side.

Volume endpoint: /v1/infer/volume (send partitioned point clouds)
Surface endpoint: /v1/infer/surface (send once)

This pattern lets Triton schedule chunks across multiple instances/GPUs to reduce wall‑clock time. Returned volume arrays are shaped as (1, N_chunk, F) per chunk, so concatenate along axis 1 to get (1, N_total, F).

import asyncio, io, httpx, numpy as np
import pyvista as pv

URL_VOLUME = "http://localhost:8000/v1/infer/volume"
URL_SURFACE = "http://localhost:8000/v1/infer/surface"
STL_PATH = "drivaer_112_single_solid.stl"
POINTS_PATH = "random_points.npy"
CHUNK_SIZE = 200_000

COMMON_DATA = {
    "stream_velocity": "30.0",
    "stencil_size": "1",
    "batch_size": "128000",
}

def load_bytes(path):
    with open(path, "rb") as f:
        return f.read()

def npy_bytes(arr: np.ndarray) -> bytes:
    buf = io.BytesIO()
    np.save(buf, arr)
    return buf.getvalue()

async def send_volume_chunk(client, stl_bytes: bytes, points_chunk: np.ndarray):
    files = {
        "design_stl": (STL_PATH, io.BytesIO(stl_bytes)),
        "point_cloud": ("point_cloud.npy", io.BytesIO(npy_bytes(points_chunk))),
    }
    r = await client.post(URL_VOLUME, files=files, data=COMMON_DATA, timeout=180.0)
    r.raise_for_status()
    with np.load(io.BytesIO(r.content)) as out:
        return {k: out[k] for k in out.keys()}

async def send_surface_once(client, stl_bytes: bytes):
    # surface endpoint requires either point_cloud_size or point_cloud
    data = {**COMMON_DATA, "point_cloud_size": "1"}
    files = {"design_stl": (STL_PATH, io.BytesIO(stl_bytes))}
    r = await client.post(URL_SURFACE, files=files, data=data, timeout=180.0)
    r.raise_for_status()
    with np.load(io.BytesIO(r.content)) as out:
        return {k: out[k] for k in out.keys()}

async def main():
    stl_bytes = load_bytes(STL_PATH)
    points = np.load(POINTS_PATH)  # (N, 3)
    slices = [slice(i, min(i+CHUNK_SIZE, len(points))) for i in range(0, len(points), CHUNK_SIZE)]

    async with httpx.AsyncClient() as client:
        volume_tasks = [send_volume_chunk(client, stl_bytes, points[s]) for s in slices]
        # run surface alongside volume (optional); or await after
        surface_task = asyncio.create_task(send_surface_once(client, stl_bytes))
        volume_results = await asyncio.gather(*volume_tasks)
        surface_result = await surface_task

    # merge volume arrays
    merge_keys = ["coordinates","velocity","pressure","turbulent_kinetic_energy","turbulent_viscosity"]
    merged = {k: np.concatenate([r[k] for r in volume_results], axis=1) for k in merge_keys}
    merged["bounding_box_dims"] = volume_results[0]["bounding_box_dims"]

    # add surface arrays
    for k in ["surface_coordinates","pressure_surface","wall_shear_stress","drag_force","lift_force"]:
        merged[k] = surface_result[k]

    return merged

if __name__ == "__main__":
    out_dict = asyncio.run(main())
    print(out_dict.keys())