Guardrails#

Guardrails help validate inputs and outputs of physics ML models to build confidence in predictions. PhysicsNeMo provides tools for both pre-inference (input validation) and post-inference (output quality assessment) workflows.

Note

Guardrails are an experimental feature. APIs and functionality may change in future releases without backward compatibility guarantees. Contributions are welcome to advance the guardrails.

Overview#

Type	Tool	Purpose
Pre-inference	Geometry Guardrail	Offline out-of-distribution (OOD) detection on 3D meshes using hand-crafted shape descriptors
In-model	Embedded OOD Guard	Runtime OOD detection inside the model, calibrated during training
Post-inference	PDE Residuals	Physics consistency via continuity and momentum
Post-inference	Model Ensemble Variance	Error quantification from multiple predictions

Tip

Which guardrail should I use for OOD detection?

If you are using a model that ships with an embedded OOD guard (currently GeoTransolver) and were able to train or obtain a checkpoint with the guard enabled, prefer the Embedded OOD Guard — calibration is automatic during training, the check runs inside the model’s forward pass, and it uses the model’s own learned geometry latent (more sensitive than a fixed descriptor set).
If you are working with raw STL geometries — e.g. screening a batch of inputs before inference, or your model does not have an embedded guard — use the Geometry Guardrail. It is model-agnostic and runs entirely offline from mesh files.
The two are complementary. Running the offline mesh check first and the embedded check at inference catches both gross geometry anomalies (wrong vehicle class, broken mesh) and subtler shape/parameter drift within the same domain.

Pre-inference Geometry Guardrail#

The geometry guardrail detects out-of-distribution (OOD) 3D shapes before running inference. Models trained on a specific geometry distribution (e.g., automotive body shapes) may perform poorly on geometries that differ in position, orientation, scale, or shape. The guardrail learns the distribution of geometries from training data and flags unusual shapes at query time, so you can reject or investigate them before inference.

Location: physicsnemo.experimental.guardrails

Feature extraction

Each mesh is reduced to a 22-dimensional feature vector. Features are intentionally non-invariant to translation, rotation, and scale, so the guardrail can detect geometries that differ in absolute position, orientation, or size from the training distribution. The feature set includes centroid position, principal component axes and eigenvalues, bounding box extents, second moments of inertia, and total and projected surface areas.

Density modeling

The guardrail fits a probabilistic density model on the training feature set. Two methods are available:

GMM (default): Gaussian Mixture Model.
PCE: Polynomial Chaos Expansion with Hermite polynomials.

Classification

Anomaly scores are converted to empirical percentiles relative to the training distribution. Configurable thresholds (warn_pct, reject_pct) define:

OK: Percentile < warn_pct — typical geometry, safe for inference
WARN: warn_pct ≤ percentile < reject_pct — unusual, investigate
REJECT: Percentile ≥ reject_pct — highly anomalous, likely OOD

Usage

Fit from a directory of STL files (parallel processing), or from a list of mesh objects. Query returns status and percentile per geometry. Save and load fitted guardrails via .npz; schema compatibility is checked on load.

from pathlib import Path
from physicsnemo.experimental.guardrails import GeometryGuardrail

guardrail = GeometryGuardrail(
    method="gmm",        # or "pce"
    gmm_components=1,
    warn_pct=99.0,
    reject_pct=99.9,
    device="cuda",
)
guardrail.fit_from_dir(Path("data/train_stl"), n_workers=8)
results = guardrail.query_from_dir(Path("data/test_stl"))

for r in results:
    print(f"{r['name']}: {r['status']} (p={r['percentile']:.1f}%)")

# Save for reuse
guardrail.save(Path("guardrail.npz"))

Example: See examples/minimal/guardrails/ for a full workflow using DrivAerML and AhmedML datasets.

In-model Embedded OOD Guard#

The embedded OOD guard is a lightweight detector that lives inside a model’s forward pass. It calibrates passively during training and emits warnings at inference when inputs drift outside the training distribution. Unlike the offline geometry guardrail, the embedded guard sees the model’s own learned representation of the input rather than hand-crafted descriptors, so it can catch parameter and shape drift that survives gross-geometry screening.

Location: physicsnemo.experimental.guardrails.embedded.OODGuard

What it watches

The guard maintains two complementary checks:

Global parameters — per-channel bounding box on the model’s global embedding (e.g. inlet velocity, density, thickness scale). During training the guard tracks running min/max per channel; at inference it warns if any channel of a query falls outside the observed range.
Geometry latent (kNN) — \(k\)-nearest-neighbour distance in a fixed-dimensional pooled geometry latent. During training a first in, first out (FIFO) ring buffer accumulates pooled latents; the OOD threshold is set lazily on the first inference call as a multiple of the 99th-percentile leave-one-out kNN distance. At inference, queries whose mean kNN distance exceeds the threshold trigger a warning. The kNN search dispatches to physicsnemo.nn.functional.knn (cuML on GPU, SciPy KDTree on CPU).

Supported models

Currently GeoTransolver (in physicsnemo.experimental.models.geotransolver). Other models can adopt the guard by instantiating an OODGuard submodule and calling its collect() / check() methods during the forward pass.

Usage

Enable the guard through the model config. buffer_size is required; knn_k (default 10) and sensitivity (default 1.5, a multiplier on the threshold — higher is less sensitive) are optional.

from physicsnemo.experimental.models.geotransolver import GeoTransolver

model = GeoTransolver(
    functional_dim=64,
    out_dim=3,
    geometry_dim=3,
    global_dim=2,
    guard_config={
        "buffer_size": 500,   # FIFO buffer; set to ~training-set size
        "knn_k": 10,
        "sensitivity": 1.5,
    },
)

Or via Hydra in a YAML config:

model:
  guard_config:
    buffer_size: 500
    knn_k: 10
    sensitivity: 1.5

No changes to train.py or inference.py are required: during training the guard silently updates its bounds and FIFO buffer; during inference it emits Python warnings of the form OOD Guard: geometry sample ... above threshold ... or OOD Guard: global_embedding dim ... above training max ... without halting inference.

Configuration tips

buffer_size — set to at least the training-set size so calibration sees every sample. Under Distributed Data Parallel (DDP) each rank keeps its own buffer; the distributed sampler shuffles, so each rank’s FIFO covers most of the dataset after a few epochs.
knn_k — smaller k is more sensitive to isolated training-set outliers; larger k is smoother but can blur multi-modal cluster boundaries. The default of 10 works for buffer sizes in the 100–few-thousand range.
sensitivity — raise toward 2.0–3.0 if validation data trips warnings; lower toward 1.0 if known-OOD inputs slip through.

Checkpoint compatibility

The guard adds extra buffers under the ood_guard submodule (global_min, global_max, geo_embeddings, geo_ptr, geo_full, knn_threshold). Loading a non-guard checkpoint into a guard-enabled model leaves these buffers at their initial values until training repopulates them. Loading a guard-enabled checkpoint into a non-guard model (guard_config=None) requires strict=False to ignore the extra keys.

Example: See examples/structural_mechanics/crash/ for end-to-end configuration and OOD inference drivers.

Post-inference PDE Residuals#

PDE residuals measure how well model predictions satisfy the governing equations (e.g., continuity and momentum equations for Navier-Stokes equations). Predictions that violate these equations are less trustworthy; high residuals often indicate model uncertainty.

PhysicsNeMo-CFD offers a sample workflow for quantifying PDE residuals for continuity and momentum. This sample can serve as a template for other use cases or PDEs. In that workflow, the functions compute_continuity_residuals and compute_momentum_residuals in physicsnemo.cfd.bench.metrics.physics compute mass conservation (divergence of velocity) and RANS momentum balance, respectively. The typical steps are to run inference, interpolate predictions onto a volume mesh, then compute residuals. High residuals in wake or high-shear regions often indicate model uncertainty. See the volume_benchmarking notebook in the PhysicsNeMo-CFD benchmarking workflow.

Post-inference Model Ensemble Variance#

PhysicsNeMo-CFD offers a sample workflow for quantifying prediction uncertainty via ensemble variance. This sample can serve as a template for other use cases. In that workflow, prediction variance across multiple realizations provides an uncertainty proxy: run several inferences and use the standard deviation of predictions as an error indicator. Two common ensemble variants are demonstrated: input (mesh) sensitivity—create variants of the same geometry via decimation, subdivision, or remeshing and run inference on each; and checkpoint sensitivity—use multiple model checkpoints and run inference with each. For each input, collect predictions from N realizations, compute mean and std at each point, and visualize std to identify regions of high uncertainty (e.g., front, wheels, mirrors in automotive aero). See the benchmarking_in_absence_of_gt notebook in the PhysicsNeMo-CFD benchmarking workflow.