DoMINO: Decomposable Multi-scale Iterative Neural Operator for External Aerodynamics
DoMINO is a local, multi-scale, point-cloud based model architecture to model large-scale physics problems such as external aerodynamics. The DoMINO model architecture takes STL geometries as input and evaluates flow quantities such as pressure and wall shear stress on the surface of the car as well as velocity fields and pressure in the volume around it. The DoMINO architecture is designed to be a fast, accurate and scalable surrogate model for large-scale industrial simulations.
DoMINO uses local geometric information to predict solutions on discrete points. First, a global geometry encoding is learnt from point clouds using a multi-scale, iterative approach. The geometry representation takes into account both short- and long-range depdencies that are typically encountered in elliptic PDEs. Additional information as signed distance field (SDF), positional encoding are used to enrich the global encoding. Next, discrete points are randomly sampled, a sub-region is constructed around each point and the local geometry encoding is extracted in this region from the global encoding. The local geometry information is learnt using dynamic point convolution kernels. Finally, a computational stencil is constructed dynamically around each discrete point by sampling random neighboring points within the same sub-region. The local-geometry encoding and the computational stencil are aggregrated to predict the solutions on the discrete points.
A preprint describing additional details about the model architecture can be found here paper.
Install the required dependencies by running below:
pip install -r requirements.txt
Configuration basics
DoMINO data processing, training and testing is managed through YAML
configuration files powered by Hydra. The base configuration file,
config.yaml
is located in src/conf
directory.
To select a specific configuration, use the --config-name
option
when running the scripts. You can modify configuration options in two
ways:
Direct Editing: Modify the YAML files directly
Command Line Override: Use Hydra’s
++
syntax to override settings at runtime
For example, to change the training epochs (controlled by
train.epochs
):
python train.py ++training.epochs=200 # Sets number of epochs to 200
This modular configuration system allows for flexible experimentation while maintaining reproducibility.
Project logs
Save and track project logs, experiments, tensorboard files etc. by
specifying a project directory with project.name
. Tag experiments
with expt
.
Data processing
The first step for running the DoMINO pipeline requires processing the
raw data (vtp, vtu and stl). The related configs can be set in the
data_processor
tab. Also, specify the variable names used in the raw
dataset and their types in variables.surface
and
variables.volume
. For example, you can set the input directory for
raw data using data_processor.input_dir
and output directory for
processed files using data_processor.output_dir
.
Training
Specify the training and validation data paths, bounding box sizes etc.
in the data
tab and the training configs such as epochs, batch size
etc. in the train
tab.
Testing
The testing is directly carried out on raw files. Specify the testing
configs in the test
tab.
Dataset details
In this example, the DoMINO model is trained using DrivAerML dataset from the CAE ML Dataset collection. This high-fidelity, open-source (CC-BY-SA) public dataset is specifically designed for automotive aerodynamics research. It comprises 500 parametrically morphed variants of the widely utilized DrivAer notchback generic vehicle. Mesh generation and scale-resolving computational fluid dynamics (CFD) simulations were executed using consistent and validated automatic workflows that represent the industrial state-of-the-art. Geometries and comprehensive aerodynamic data are published in open-source formats. For more technical details about this dataset, please refer to their paper.
Download the DrivAer ML dataset using the provided
download_aws_dataset.sh
script or using the Hugging Face
repo.
Data processing for DoMINO model
Each of the raw simulations files in the vtp
, vtu
and stl
format need to be processed and saved into npy
files. The data
processing script extracts minmal information from these raw files such
as STL mesh, surface mesh and fields, volume point cloud and fields. Run
process_data.py
with the correct configurations for kicking off data
processing. Additionally, run cache_data.py
to save outputs of
DoMINO datapipe in the .npy
files. The DoMINO datapipe is set up to
calculate Signed Distance Field and Nearest Neighbor interpolations
on-the-fly during training. Caching will save these as a preprocessing
step and should be used in cases where the STL surface meshes are
upwards of 30 million cells. Data processing is parallelized and takes a
couple of hours to write all the processed files.
The final processed dataset should be divided and saved into 2 directories, for training and validation.
Training the DoMINO model
To train and test the DoMINO model on AWS dataset, follow these steps:
Specify the configuration settings in
conf/config.yaml
.Run
train.py
to start the training. Modify data, train and model keys in config file. If using cached data then useconf/cached.yaml
instead ofconf/config.yaml
.Run
test.py
to test on.vtp
/.vtu
. Predictions are written to the same file. Modify eval key in config file to specify checkpoint, input and output directory. Important to note that the data used for testing is in the raw simulation format and should not be processed to.npy
.Download the validation results (saved in form of point clouds in
.vtp
/.vtu
format), and visualize in Paraview.
Training Guidelines:
Duration: A couple of days on a single node of H100 GPU
Checkpointing: Automatically resumes from latest checkpoint if interrupted
Multi-GPU Support: Compatible with
torchrun
or MPI for distributed trainingIf the training crashes because of OOO, modify the points sampled in volume
model.volume_points_sample
and surfacemodel.volume_points_sample
to manage memory requirements for your GPUThe DoMINO model allows for training both volume and surface fields using a single model but currently the recommendation is to train the volume and surface models separately. This can be controlled through the
conf/config.yaml
.MSE loss for both volume and surface model gives the best results.
Bounding box is configurable and will depend on the usecase. The presets are suitable for the DriveAer-ML dataset.
Training with Domain Parallelism
DoMINO has support for training and inference using domain parallelism
in PhysicsNeMo, via the ShardTensor
mechanisms and pytorch’s FSDP
tools. ShardTensor
, built on PyTorch’s DTensor
object, is a
domain-parallel-aware tensor that can live on multiple GPUs and perform
operations in a numerically consistent way. For more information about
the techniques of domain parallelism and ShardTensor
, refer to
PhysicsNeMo tutorials such as
`ShardTensor
<https://docs.nvidia.com/deeplearning/physicsnemo/physicsnemo-core/api/physicsnemo.distributed.shardtensor.html>`__.
In DoMINO specifically, domain parallelism has been abled in two ways, which can be used concurrently or separately. First, the input sampled volumetric and surface points can be sharded to accomodate higher resolution point sampling Second, the latent space of the model - typically a regularlized grid - can be sharded to reduce computational complexity of the latent processing. When training with sharded models in DoMINO, the primary objective is to enable higher resolution inputs and larger latent spaces without sacrificing substantial compute time.
When configuring DoMINO for sharded training, adjust the following
parameters from src/conf/config.yaml
:
domain_parallelism:
domain_size: 2
shard_grid: True
shard_points: True
The domain_size
represents the number of GPUs used for each batch -
setting domain_size: 1
is not advised since that is the standard
training regime, but with extra overhead. shard_grid
and
shard_points
will enable domain parallelism over the latent space
and input/output points, respectively.
Please see src/train_sharded.py
for more details regarding the
changes from the standard training script required for domain parallel
DoMINO training.
As one last note regarding domain-parallel training: in the phase of the
DoMINO where the output solutions are calculated, the model can used two
different techniques (numerically identical) to calculate the output.
Due to the overhead of potential communication at each operation, it’s
recommended to use the one-loop
mode with
model.solution_calculation_mode
when doing sharded training. This
technique launches vectorized kernels with less launch overhead at the
cost of more memory use. For non-sharded training, the two-loop
setting is more optimal. The difference in one-loop
or two-loop
is purely computational, not algorithmic.
Retraining recipe for DoMINO model
To enable retraining the DoMINO model from a pre-trained checkpoint, follow the steps:
Add the pre-trained checkpoints in the resume_dir defined in
conf/config.yaml
.Add the volume and surface scaling factors to the output dir defined in
conf/config.yaml
.Run
retraining.py
for specified number of epochs to retrain model at a small learning rate starting from checkpoint.Run
test.py
to test on.vtp
/.vtu
. Predictions are written to the same file. Modify eval key in config file to specify checkpoint, input and output directory.Download the validation results (saved in form of point clouds in
.vtp
/.vtu
format), and visualize in Paraview.
DoMINO model pipeline for inference on STLs
The DoMINO model can be evaluated directly on unknown STLs using the pre-trained checkpoint. Follow the steps outlined below:
Run the
inference_on_stl.py
script to perform inference on an STL.Specify the STL paths, velocity inlets, stencil size and model checkpoint path in the script.
The volume predictions are carried out on points sampled in a bounding box around STL.
The surface predictions are carried out on the STL surface. The drag and lift accuracy will depend on the resolution of the STL.
This repository includes examples of DoMINO training on the DrivAerML dataset. However, many use cases require training DoMINO on a custom dataset. The steps below outline the process.
Reorganize that dataset to have the same directory structure as DrivAerML. The raw data directory should contain a sepearte directory for each simulation. Each simulation directory needs to contain mainly 3 files,
stl
,vtp
andvtu
, correspoinding to the geometry, surface and volume fields information. Additional details such as boundary condition information, for example inlet velocity, may be added in a separate.csv
file, in case these vary from one case to the next.Modify the following parameters in
conf/config.yaml
project.name
: Specify a name for your project.expt
: This is the experiment tag.data_processor.input_dir
: Input directory where the raw simulation dataset is stored.data_processor.output_dir
: Output directory to save the processed dataset (.npy
).data_processor.num_processors
: Number of parallel processors for data processing.variables.surface
: Variable names of surface fields and fields type (vector or scalar).variables.volume
: Variable names of volume fields and fields type (vector or scalar).data.input_dir
: Processed files used for training.data.input_dir_val
: Processed files used for validation.data.bounding_box
: Dimensions of computational domain where most prominent solution field variations. Volume fields are modeled inside this bounding box.data.bounding_box_surface
: Dimensions of bounding box enclosing the biggest geometry in dataset. Surface fields are modeled inside this bounding box.train.epochs
: Set the number of training epochs.model.volume_points_sample
: Number of points to sample in the volume mesh per epoch per batch. Tune based on GPU memory.model.surface_points_sample
: Number of points to sample on the surface mesh per epoch per batch. Tune based on GPU memory.model.geom_points_sample
: Number of points to sample on STL mesh per epoch per batch. Ensure point sampled is lesser than number of points on STL (for coarser STLs).eval.test_path
: Path of directory of raw simulations files for testing and verification.eval.save_path
: Path of directory where the AI predicted simulations files are saved.eval.checkpoint_name
: Checkpoint nameoutputs/{project.name}/models
to evaluate model.eval.scaling_param_path
: Scaling parameters populated inoutputs/{project.name}
.
Before running
process_data.py
to process the data, be sure to modifyopenfoam_datapipe.py
. This is the entry point for the user to modify the datapipe for dataprocessing. A couple of things that might need to be changed are non-dimensionalizing schemes based on the order of your variables and theDrivAerAwsPaths
class with the internal directory structure of your dataset. For example, here is the custom class written for a different dataset.class DriveSimPaths: # Specify the name of the STL in your dataset @staticmethod def geometry_path(car_dir: Path) -> Path: return car_dir / "body.stl" # Specify the name of the VTU and directory structure in your dataset @staticmethod def volume_path(car_dir: Path) -> Path: return car_dir / "VTK/simpleFoam_steady_3000/internal.vtu" # Specify the name of the VTP and directory structure in your dataset @staticmethod def surface_path(car_dir: Path) -> Path: return car_dir / "VTK/simpleFoam_steady_3000/boundary/aero_suv.vtp"
Before running
train.py
, modify the loss functions. The surface loss functions currently, specificallyintegral_loss_fn
,loss_fn_surface
andloss_fn_area
, assume the variables to be in a specific order, Pressure followed by Wall-Shear-Stress vector. Please modify these formulations if your variables are in a different order or don’t require these losses.Run
test.py
to validate the trained model.Use
inference_on_stl.py
script to deploy the model in applications where inference is needed only from STL inputs and the volume mesh is not calculated.
The DoMINO model architecture is used to support the Real Time Digital Twin Blueprint and the DoMINO-Automotive-Aero NIM.
Some of the results are shown below.

Fig. 16 Results from DoMINO for RTWT SC demo