Guardrails#
Guardrails help validate inputs and outputs of physics ML models to build confidence in predictions. PhysicsNeMo provides tools for both pre-inference (input validation) and post-inference (output quality assessment) workflows.
Note
Guardrails are an experimental feature. APIs and functionality may change in future releases without backward compatibility guarantees. Contributions are welcome to advance the guardrails.
Overview#
Type |
Tool |
Purpose |
|---|---|---|
Pre-inference |
Geometry Guardrail |
Offline out-of-distribution (OOD) detection on 3D meshes using hand-crafted shape descriptors |
In-model |
Embedded OOD Guard |
Runtime OOD detection inside the model, calibrated during training |
Post-inference |
PDE Residuals |
Physics consistency via continuity and momentum |
Post-inference |
Model Ensemble Variance |
Error quantification from multiple predictions |
Tip
Which guardrail should I use for OOD detection?
If you are using a model that ships with an embedded OOD guard (currently
GeoTransolver) and were able to train or obtain a checkpoint with the guard enabled, prefer the Embedded OOD Guard — calibration is automatic during training, the check runs inside the model’s forward pass, and it uses the model’s own learned geometry latent (more sensitive than a fixed descriptor set).If you are working with raw STL geometries — e.g. screening a batch of inputs before inference, or your model does not have an embedded guard — use the Geometry Guardrail. It is model-agnostic and runs entirely offline from mesh files.
The two are complementary. Running the offline mesh check first and the embedded check at inference catches both gross geometry anomalies (wrong vehicle class, broken mesh) and subtler shape/parameter drift within the same domain.
Pre-inference Geometry Guardrail#
The geometry guardrail detects out-of-distribution (OOD) 3D shapes before running inference. Models trained on a specific geometry distribution (e.g., automotive body shapes) may perform poorly on geometries that differ in position, orientation, scale, or shape. The guardrail learns the distribution of geometries from training data and flags unusual shapes at query time, so you can reject or investigate them before inference.
Location: physicsnemo.experimental.guardrails
Feature extraction
Each mesh is reduced to a 22-dimensional feature vector. Features are intentionally non-invariant to translation, rotation, and scale, so the guardrail can detect geometries that differ in absolute position, orientation, or size from the training distribution. The feature set includes centroid position, principal component axes and eigenvalues, bounding box extents, second moments of inertia, and total and projected surface areas.
Density modeling
The guardrail fits a probabilistic density model on the training feature set. Two methods are available:
GMM (default): Gaussian Mixture Model.
PCE: Polynomial Chaos Expansion with Hermite polynomials.
Classification
Anomaly scores are converted to empirical percentiles relative to the training
distribution. Configurable thresholds (warn_pct, reject_pct) define:
OK: Percentile < warn_pct — typical geometry, safe for inference
WARN: warn_pct ≤ percentile < reject_pct — unusual, investigate
REJECT: Percentile ≥ reject_pct — highly anomalous, likely OOD
Usage
Fit from a directory of STL files (parallel processing), or from a list of mesh
objects. Query returns status and percentile per geometry. Save and load fitted
guardrails via .npz; schema compatibility is checked on load.
from pathlib import Path
from physicsnemo.experimental.guardrails import GeometryGuardrail
guardrail = GeometryGuardrail(
method="gmm", # or "pce"
gmm_components=1,
warn_pct=99.0,
reject_pct=99.9,
device="cuda",
)
guardrail.fit_from_dir(Path("data/train_stl"), n_workers=8)
results = guardrail.query_from_dir(Path("data/test_stl"))
for r in results:
print(f"{r['name']}: {r['status']} (p={r['percentile']:.1f}%)")
# Save for reuse
guardrail.save(Path("guardrail.npz"))
Example: See examples/minimal/guardrails/ for a full workflow using
DrivAerML and AhmedML datasets.
In-model Embedded OOD Guard#
The embedded OOD guard is a lightweight detector that lives inside a model’s forward pass. It calibrates passively during training and emits warnings at inference when inputs drift outside the training distribution. Unlike the offline geometry guardrail, the embedded guard sees the model’s own learned representation of the input rather than hand-crafted descriptors, so it can catch parameter and shape drift that survives gross-geometry screening.
Location: physicsnemo.experimental.guardrails.embedded.OODGuard
What it watches
The guard maintains two complementary checks:
Global parameters — per-channel bounding box on the model’s global embedding (e.g. inlet velocity, density, thickness scale). During training the guard tracks running min/max per channel; at inference it warns if any channel of a query falls outside the observed range.
Geometry latent (kNN) — \(k\)-nearest-neighbour distance in a fixed-dimensional pooled geometry latent. During training a first in, first out (FIFO) ring buffer accumulates pooled latents; the OOD threshold is set lazily on the first inference call as a multiple of the 99th-percentile leave-one-out kNN distance. At inference, queries whose mean kNN distance exceeds the threshold trigger a warning. The kNN search dispatches to
physicsnemo.nn.functional.knn(cuML on GPU, SciPy KDTree on CPU).
Supported models
Currently GeoTransolver (in
physicsnemo.experimental.models.geotransolver). Other models can adopt
the guard by instantiating an OODGuard submodule and calling its
collect() / check() methods during the forward pass.
Usage
Enable the guard through the model config. buffer_size is required;
knn_k (default 10) and sensitivity (default 1.5, a
multiplier on the threshold — higher is less sensitive) are optional.
from physicsnemo.experimental.models.geotransolver import GeoTransolver
model = GeoTransolver(
functional_dim=64,
out_dim=3,
geometry_dim=3,
global_dim=2,
guard_config={
"buffer_size": 500, # FIFO buffer; set to ~training-set size
"knn_k": 10,
"sensitivity": 1.5,
},
)
Or via Hydra in a YAML config:
model:
guard_config:
buffer_size: 500
knn_k: 10
sensitivity: 1.5
No changes to train.py or inference.py are required: during
training the guard silently updates its bounds and FIFO buffer; during
inference it emits Python warnings of the form
OOD Guard: geometry sample ... above threshold ... or
OOD Guard: global_embedding dim ... above training max ... without
halting inference.
Configuration tips
buffer_size— set to at least the training-set size so calibration sees every sample. Under Distributed Data Parallel (DDP) each rank keeps its own buffer; the distributed sampler shuffles, so each rank’s FIFO covers most of the dataset after a few epochs.knn_k— smallerkis more sensitive to isolated training-set outliers; largerkis smoother but can blur multi-modal cluster boundaries. The default of10works for buffer sizes in the100–few-thousand range.sensitivity— raise toward2.0–3.0if validation data trips warnings; lower toward1.0if known-OOD inputs slip through.
Checkpoint compatibility
The guard adds extra buffers under the ood_guard submodule
(global_min, global_max, geo_embeddings, geo_ptr,
geo_full, knn_threshold). Loading a non-guard checkpoint into a
guard-enabled model leaves these buffers at their initial values until
training repopulates them. Loading a guard-enabled checkpoint into a
non-guard model (guard_config=None) requires strict=False to
ignore the extra keys.
Example: See examples/structural_mechanics/crash/ for end-to-end
configuration and OOD inference drivers.
Post-inference PDE Residuals#
PDE residuals measure how well model predictions satisfy the governing equations (e.g., continuity and momentum equations for Navier-Stokes equations). Predictions that violate these equations are less trustworthy; high residuals often indicate model uncertainty.
PhysicsNeMo-CFD
offers a sample workflow for quantifying PDE residuals for
continuity and momentum. This sample can serve as a template for other use cases
or PDEs. In that workflow, the functions compute_continuity_residuals and
compute_momentum_residuals in physicsnemo.cfd.bench.metrics.physics
compute mass conservation (divergence of velocity) and RANS momentum balance,
respectively. The typical steps are to run inference, interpolate predictions
onto a volume mesh, then compute residuals. High residuals in wake or high-shear
regions often indicate model uncertainty. See the volume_benchmarking
notebook in the PhysicsNeMo-CFD benchmarking workflow.
Post-inference Model Ensemble Variance#
PhysicsNeMo-CFD offers a sample workflow for quantifying prediction uncertainty via ensemble variance. This sample can serve as a template for other use cases. In that workflow, prediction variance across multiple realizations provides an uncertainty proxy: run several inferences and use the standard deviation of predictions as an error indicator. Two common ensemble variants are demonstrated: input (mesh) sensitivity—create variants of the same geometry via decimation, subdivision, or remeshing and run inference on each; and checkpoint sensitivity—use multiple model checkpoints and run inference with each. For each input, collect predictions from N realizations, compute mean and std at each point, and visualize std to identify regions of high uncertainty (e.g., front, wheels, mirrors in automotive aero). See the benchmarking_in_absence_of_gt notebook in the PhysicsNeMo-CFD benchmarking workflow.