DoMINO: Decomposable Multi-scale Iterative Neural Operator for External Aerodynamics#
DoMINO is a local, multi-scale, point-cloud based model architecture to model large-scale physics problems such as external aerodynamics. The DoMINO model architecture takes STL geometries as input and evaluates flow quantities such as pressure and wall shear stress on the surface of the car as well as velocity fields and pressure in the volume around it. The DoMINO architecture is designed to be a fast, accurate and scalable surrogate model for large-scale industrial simulations.
DoMINO uses local geometric information to predict solutions on discrete points. First, a global geometry encoding is learnt from point clouds using a multi-scale, iterative approach. The geometry representation takes into account both short- and long-range depdencies that are typically encountered in elliptic PDEs. Additional information as signed distance field (SDF), positional encoding are used to enrich the global encoding. Next, discrete points are randomly sampled, a sub-region is constructed around each point and the local geometry encoding is extracted in this region from the global encoding. The local geometry information is learnt using dynamic point convolution kernels. Finally, a computational stencil is constructed dynamically around each discrete point by sampling random neighboring points within the same sub-region. The local-geometry encoding and the computational stencil are aggregrated to predict the solutions on the discrete points.
A preprint describing additional details about the model architecture can be found here paper.
Prerequisites#
Install the required dependencies by running below:
pip install -r requirements.txt
Getting started with the DrivAerML example#
Configuration basics#
DoMINO training and testing is managed through YAML configuration files powered by Hydra. The base configuration file, config.yaml is located in src/conf directory.
To select a specific configuration, use the --config-name option when running the scripts. You can modify configuration options in two ways:
Direct Editing: Modify the YAML files directly
Command Line Override: Use Hydra’s
++syntax to override settings at runtime
For example, to change the training epochs (controlled by train.epochs):
python train.py ++train.epochs=200 # Sets number of epochs to 200
This modular configuration system allows for flexible experimentation while maintaining reproducibility.
Project logs#
Save and track project logs, experiments, tensorboard files etc. by specifying a project directory with project.name. Tag experiments with expt.
Data#
Dataset details#
In this example, the DoMINO model is trained using DrivAerML dataset from the CAE ML Dataset collection. This high-fidelity, open-source (CC-BY-SA) public dataset is specifically designed for automotive aerodynamics research. It comprises 500 parametrically morphed variants of the widely utilized DrivAer notchback generic vehicle. Mesh generation and scale-resolving computational fluid dynamics (CFD) simulations were executed using consistent and validated automatic workflows that represent the industrial state-of-the-art. Geometries and comprehensive aerodynamic data are published in open-source formats. For more technical details about this dataset, please refer to their paper.
Data Preprocessing#
PhysicsNeMo has a related project to help with data processing, called PhysicsNeMo-Curator. Using PhysicsNeMo-Curator, the data needed to train a DoMINO model can be setup easily. Please refer to these instructions on getting started with PhysicsNeMo-Curator.
Download the DrivAer ML dataset using the provided instructions in PhysicsNeMo-Curator. The first step for running the DoMINO pipeline requires processing the raw data (vtp, vtu and stl) into either Zarr or NumPy format for training. Each of the raw simulations files are downloaded in vtp, vtu and stl formats. For instructions on running data processing to produce a DoMINO training ready dataset, please refer to How-to Curate data for DoMINO Model.
Caching is implemented in `CachedDoMINODataset <NVIDIA/physicsnemo>`__. Optionally, users can run cache_data.py to save outputs of DoMINO datapipe in the .npy files. The DoMINO datapipe is set up to calculate Signed Distance Field and Nearest Neighbor interpolations on-the-fly during training. Caching will save these as a preprocessing step and can be used in cases where the STL surface meshes are upwards of 30 million cells. Data processing is parallelized and takes a couple of hours to write all the processed files.
The final processed dataset should be divided and saved into 2 directories, for training and validation.
Data Scaling factors#
DoMINO has several data-specific configuration tools that rely on some knowledge of the dataset:
The output fields (the labels) are normalized during training to a mean of zero and a standard deviation of one, averaged over the dataset. The scaling is controlled by passing the
volume_factorsandsurface_factorsvalues to the datapipe.The input locations are scaled by, and optionally cropped to, used defined bounding boxes for both surface and volume. Whether cropping occurs, or not, is controlled by the
sample_in_bboxvalue of the datapipe. Normalization to the bounding box is enabled withnormalize_coordinates. By default, both are set to true. The value of the boxes are configured in theconfig.yamlfile, and are configured separately for surface and volume.
Note: The datapipe module has a helper function
create_domino_datasetwith sensible defaults to help create a Domino Datapipe.
To facilitate setting reasonable values of these, you can use the compute_statistics.py script. This will load the core dataset as defined in your config.yaml file, loop over several events (200, by default), and both print and store the surface/volume field statistics as well as the coordinate statistics.
Note that, for volumetric fields especially, the min/max found may be significantly outside the surface region. Many simulations extend volumetric sampling to far field, and you may instead want to crop significant amounts of volumetric distance.
Training#
Specify the training and validation data paths, bounding box sizes etc. in the data tab and the training configs such as epochs, batch size etc. in the train tab.
Testing#
The testing is directly carried out on raw files. Specify the testing configs in the test tab.
Training the DoMINO model#
To train and test the DoMINO model on AWS dataset, follow these steps:
Specify the configuration settings in
conf/config.yaml.Run
train.pyto start the training. Modify data, train and model keys in config file. If using cached data then useconf/cached.yamlinstead ofconf/config.yaml.Run
test.pyto test on.vtp/.vtu. Predictions are written to the same file. Modify eval key in config file to specify checkpoint, input and output directory. Important to note that the data used for testing is in the raw simulation format and should not be processed to.npy.Download the validation results (saved in form of point clouds in
.vtp/.vtuformat), and visualize in Paraview.
Training Guidelines:
Duration: A couple of days on a single node of H100 GPU
Checkpointing: Automatically resumes from latest checkpoint if interrupted
Multi-GPU Support: Compatible with
torchrunor MPI for distributed trainingIf the training crashes because of OOO, modify the points sampled in volume
model.volume_points_sampleand surfacemodel.volume_points_sampleto manage memory requirements for your GPUThe DoMINO model allows for training both volume and surface fields using a single model but currently the recommendation is to train the volume and surface models separately. This can be controlled through the
conf/config.yaml.MSE loss for both volume and surface model gives the best results.
Bounding box is configurable and will depend on the usecase. The presets are suitable for the DriveAer-ML dataset.
Training with Domain Parallelism#
DoMINO has support for training and inference using domain parallelism in PhysicsNeMo, via the ShardTensor mechanisms and pytorch’s FSDP tools. ShardTensor, built on PyTorch’s DTensor object, is a domain-parallel-aware tensor that can live on multiple GPUs and perform operations in a numerically consistent way. For more information about the techniques of domain parallelism and ShardTensor, refer to PhysicsNeMo tutorials such as `ShardTensor <https://docs.nvidia.com/deeplearning/physicsnemo/physicsnemo-core/api/physicsnemo.distributed.shardtensor.html>`__.
In DoMINO specifically, domain parallelism has been enabled in two ways, which can be used concurrently or separately. First, the input sampled volumetric and surface points can be sharded to accomodate higher resolution point sampling Second, the latent space of the model - typically a regularlized grid - can be sharded to reduce computational complexity of the latent processing. When training with sharded models in DoMINO, the primary objective is to enable higher resolution inputs and larger latent spaces without sacrificing substantial compute time.
When configuring DoMINO for sharded training, adjust the following parameters from src/conf/config.yaml:
domain_parallelism:
domain_size: 2
shard_grid: True
shard_points: True
The domain_size represents the number of GPUs used for each batch - setting domain_size: 1 is the standard training regime, and domain_parallelism will be ignored. shard_grid and shard_points will enable domain parallelism over the latent space and input/output points, respectively.
Setting domain_size > 1 without specifying shard_points=True or shard_grid=True will result in a runtime error during configuration - if you do not want to use domain_parallelism, leave domain_size=1.
Performance Optimizations#
The training and inference scripts for DoMINO contain several performance enhancements to accelerate the training and usage of the model. In this section we’ll highlight several of them, as well as how to customize them if needed.
Memory Pool Optimizations#
The preprocessor of DoMINO requires a computation of k Nearest Neighbors, which is accelerated via the cuml Neighbors tool. By default, cuml and torch both use memory allocation pools to speed up allocating tensors, but they do not use the same pool. This means that during preprocessing, it’s possible for the kNN operation to spend a significant amount of time in memory allocations - and further, it limits the available memory to torch.
To mitigate this, by default in DoMINO we use the Rapids Memory Manager (`rmm <rapidsai/rmm>`__). If, for some reason, you wish to disable this you can do so with an environment variable:
export PHYSICSNEMO_DISABLE_RMM=True
Or remove this line from the training script:
from physicsnemo.utils.memory import unified_gpu_memory
Note - why not make it configurable? We have to set up the shared memory pool allocation very early in the program, before the config has even been read. So, we enable by default and the opt-out path is via the environment.
Reduced Volume Reads#
The dataset size for volumetric data can be quite substantial - DrivAerML, for example, has mesh sizes of 160M points per example. Even though the models do not process all 160M points, in order to down sample dynamically they all must be read from disk - which can exceed bandwidth and CPU decoding capacity on nodes with multiple GPUs.
As a performance enhancement, DoMINO’s data pipeline offers a mitigation: instead of reading an entire volumetric mesh, during preprocessing we shuffle the volumetric inputs and outputs (in tandem) and subsequent reads choose random slices of the volumetric data. By default, DoMINO will read about 100x more data than necessary for the sampling size. This allows the pipeline to still apply cuts for data inside of the bounding box, and further random sampling to improve training stability. To enable/disable this parameter, set data.volume_sample_from_disk=True (enable) or False (disable)
Note - if you volumetric data is not larger than a few million mesh points, pre-shuffling and sampling from disk is likely not necessary for you.
physicsnemo-curator supports shuffling the volumetric data during preprocessing. If, however, you’ve already preprocessed your data and just want to apply shuffling, use the script at src/shuffle_volumetric_curator_output.py
The shuffling script will also apply sharding to the output files, which improves IO performance. So, zarr>=3.0 is required to use the outputs from curator. src/shuffle_volumetric_curator_output.py is meant to be an example of how to apply shuffling, so modify and update as you need for your dataset.
If you have tensorstore installed (it’s in
requirements.txt), the data reader will work equally well with Zarr 2 or Zarr 3 files.
Overall Performance#
DoMINO is a computationally complex and challenging workload. Over the course of several releases, we have chipped away at performance bottlenecks to speed up the training and inference time (with inference_on_stl.py). Overall training performance has decreased from about 5 days to just over 4 hours, with eight H100 GPUs. We hope these optimizations enable you to explore more parameters and surrogate models; if there is a performance issue you see, please open an issue on GitHub.
Fig. 29 Results from DoMINO for RTWT SC demo#
Example Training Results#
To provide an example of what a successful training should look like, we include here some example results. Training curves may look similar to this:
Fig. 30 Combined Training Curve#
And, when evaluating the results on the validation dataset, this particular run had the following L2 and R2 Metrics:
Metric |
Surface Only |
Combined |
|---|---|---|
X Velocity |
N/A |
0.086 |
Y Velocity |
N/A |
0.185 |
Z Velocity |
N/A |
0.197 |
Volumetric Pressure |
N/A |
0.106 |
Turb. V |
N/A |
0.134 |
Surface Pressure |
0.101 |
0.105 |
X-Tau (Shear) |
0.138 |
0.145 |
Y-Tau (Shear) |
0.174 |
0.185 |
Z-Tau (Shear) |
0.198 |
0.207 |
Drag R2 |
0.983 |
0.975 |
Lift R2 |
0.971 |
0.968 |
With the PhysicsNeMo CFD tool, you can create plots of the lift and drag forces computed by domino vs. the CFD Solver. For example, here is the drag force:
Fig. 31 Draf Force R^2#
Training with Physics Losses#
DoMINO supports enforcing of PDE residuals as soft constraints. This can be used to improve the model predictions’ adherence to the governing laws of the problem which include Continuity and Navier Stokes equations.
Note, if you wish to modify the PDEs used for DoMINO, please edit the compute_physics_loss function from train.py appropriately.
Prerequisites for PDE residuals#
The computation of Physics residuals is supported using the PhysicsNeMo-Sym library. Install it using
pip install "Cython"
pip install "nvidia-physicsnemo.sym>2.1.0" --no-build-isolation
To execute the training using physics losses, run the train.py with the configuration below
torchrun --nproc_per_node=<num-gpus> train.py \
++train.add_physics_loss=True ++model.num_neighbors_volume=8
Note, the num_neighbors_volume is set to 8 to reduce the memory requirement. Also, when the Physics losses are applied, it will automatically sample num_neighbors_volume // 2 additional points, for each point in num_neighbors_volume. These are considered as “2-hop” neighbors, which are required to compute the higher order gradients required for Navier-Stokes equations. Hence, even if num_neighbors_volume is set to 8, for the fields, it will sample num_neighbors_volume (num_neighbors_volume // 2 ) + 1 (in this case 40) total points.
The results of physics addition can be found below (using the DrivAerML dataset). The results are computed on the design ID 419 and 439 from the validation set and averaged.
We observe that, addition of physics losses improves the model predictions’ ability to respect the governing laws better.
L2 Errors | |||
|---|---|---|---|
Type | Variable | Baseline (full dataset) | Baseline + Physics (full dataset) |
Volume | p | 0.15413 | 0.17203 |
U_x | 0.15566 | 0.16397 | |
U_y | 0.32229 | 0.34383 | |
U_z | 0.31027 | 0.32450 | |
nut | 0.21049 | 0.21883 | |
Surface | p | 0.16003 | 0.14298 |
wss_x | 0.21476 | 0.20519 | |
wss_y | 0.31697 | 0.30335 | |
wss_z | 0.35056 | 0.32095 | |
Residual L2 Error (Computed w.r.t true Residuals) | |||
|---|---|---|---|
Variable | Baseline (full dataset) | Baseline + Physics (full dataset) | % Improvement |
continuity | 30.352072 | 2.11262 | 93.04% |
momentum_x | 19.109278 | 2.33800 | 87.77% |
momentum_y | 99.36662 | 3.18452 | 96.80% |
momentum_z | 45.73862 | 2.691725 | 94.11% |
Addition of physics constraints to the DoMINO training is under active development and might introduce breaking changes in the future
Retraining recipe for DoMINO model#
To enable retraining the DoMINO model from a pre-trained checkpoint, follow the steps:
Add the pre-trained checkpoints in the resume_dir defined in
conf/config.yaml.Add the volume and surface scaling factors to the output dir defined in
conf/config.yaml.Run
retraining.pyfor specified number of epochs to retrain model at a small learning rate starting from checkpoint.Run
test.pyto test on.vtp/.vtu. Predictions are written to the same file. Modify eval key in config file to specify checkpoint, input and output directory.Download the validation results (saved in form of point clouds in
.vtp/.vtuformat), and visualize in Paraview.
DoMINO model pipeline for inference on test samples#
After training is completed, test.py script can be used to run inference on test samples. Follow the below steps to run the test.py
Update the config in the
conf/config.yamlunder theTesting data Configstab.The test script is designed to run inference on the raw
.stl,.vtpand.vtufiles for each test sample. Use the same scaling parameters that were generated during the training. Typically this isoutputs/<project.name>/, whereproject.nameis as defined in theconfig.yaml. Update theeval.scaling_param_pathaccordingly.Run the
test.py. The test script can be run in parallel as well. Refer to the training guidelines for Multi-GPU. Note, for runningtest.pyin parallel, the number of GPUs chosen must be <= the number of test samples.
DoMINO model pipeline for inference on STLs#
The DoMINO model can be evaluated directly on unknown STLs using the pre-trained checkpoint. Follow the steps outlined below:
Run the
inference_on_stl.pyscript to perform inference on an STL.Specify the STL paths, velocity inlets, stencil size and model checkpoint path in the script.
The volume predictions are carried out on points sampled in a bounding box around STL.
The surface predictions are carried out on the STL surface. The drag and lift accuracy will depend on the resolution of the STL.
Incorporating multiple global simulation parameters for training/inference#
DoMINO supports incorporating multiple global simulation parameters (such as inlet velocity, air density, etc.) that can vary across different simulations.
Define global parameters in the
variables.global_parameterssection ofconf/config.yaml. Each parameter must specify its type (vectororscalar) and reference values for non-dimensionalization.For
vectortype parameters:If values are single-direction vectors (e.g., [30, 0, 0]), define reference as [30]
If values are two-direction vectors (e.g., [30, 30, 0]), define reference as [30, 30]
Enable parameter encoding in the model configuration by setting
model.encode_parameters: true. This will:Create a dedicated parameter encoding network (
ParameterModel)Non-dimensionalize parameters using reference values from
config.yamlIntegrate parameter encodings into both surface and volume predictions
Ensure your simulation data includes global parameter values. The DoMINO datapipe expects these parameters in the pre-processed
.npy/.npzfiles:Examine
openfoam_datapipe.pyandprocess_data.pyfor examples of how global parameter values are incorporated for external aerodynamicsFor the automotive example,
air_densityandinlet_velocityremain constant across simulationsAdapt these files for your specific case to correctly calculate
global_params_valuesandglobal_params_referenceduring data preprocessing
During training, the model automatically handles global parameter encoding when
model.encode_parameters: trueis setYou may need to adapt
train.pyif you plan to use global parameters in loss functions or de-non-dimensionalization
During testing with
test.py, defineglobal_params_valuesfor each test sample:Global parameters must match those defined in
config.yamlFor each parameter (e.g., “inlet_velocity”, “air_density”), provide appropriate values for each simulation
See the
main()function intest.pyfor implementation examplesIf using global parameters for de-non-dimensionalization, modify
test_step()
When inferencing on unseen geometries with
inference_on_stl.py:Define
global_params_valuesandglobal_params_referencein bothcompute_solution_in_volume()andcompute_solution_on_surface()methodsAdjust these parameters based on your specific use case and parameters defined in
config.yaml
Extending DoMINO to a custom dataset#
This repository includes examples of DoMINO training on the DrivAerML dataset. However, many use cases require training DoMINO on a custom dataset. The steps below outline the process.
Reorganize that dataset to have the same directory structure as DrivAerML. The raw data directory should contain a sepearte directory for each simulation. Each simulation directory needs to contain mainly 3 files,
stl,vtpandvtu, correspoinding to the geometry, surface and volume fields information. Additional details such as boundary condition information, for example inlet velocity, may be added in a separate.csvfile, in case these vary from one case to the next.Modify the following parameters in
conf/config.yamlproject.name: Specify a name for your project.expt: This is the experiment tag.data_processor.input_dir: Input directory where the raw simulation dataset is stored.data_processor.output_dir: Output directory to save the processed dataset (.npy).data_processor.num_processors: Number of parallel processors for data processing.variables.surface: Variable names of surface fields and fields type (vector or scalar).variables.volume: Variable names of volume fields and fields type (vector or scalar).data.input_dir: Processed files used for training.data.input_dir_val: Processed files used for validation.data.bounding_box: Dimensions of computational domain where most prominent solution field variations. Volume fields are modeled inside this bounding box.data.bounding_box_surface: Dimensions of bounding box enclosing the biggest geometry in dataset. Surface fields are modeled inside this bounding box.train.epochs: Set the number of training epochs.model.volume_points_sample: Number of points to sample in the volume mesh per epoch per batch. Tune based on GPU memory.model.surface_points_sample: Number of points to sample on the surface mesh per epoch per batch. Tune based on GPU memory.model.geom_points_sample: Number of points to sample on STL mesh per epoch per batch. Ensure point sampled is lesser than number of points on STL (for coarser STLs).eval.test_path: Path of directory of raw simulations files for testing and verification.eval.save_path: Path of directory where the AI predicted simulations files are saved.eval.checkpoint_name: Checkpoint nameoutputs/{project.name}/modelsto evaluate model.eval.scaling_param_path: Scaling parameters populated inoutputs/{project.name}.
Before running
process_data.pyto process the data, be sure to modifyopenfoam_datapipe.py. This is the entry point for the user to modify the datapipe for dataprocessing. A couple of things that might need to be changed are non-dimensionalizing schemes based on the order of your variables and theDrivAerAwsPathsclass with the internal directory structure of your dataset. For example, here is the custom class written for a different dataset.class DriveSimPaths: # Specify the name of the STL in your dataset @staticmethod def geometry_path(car_dir: Path) -> Path: return car_dir / "body.stl" # Specify the name of the VTU and directory structure in your dataset @staticmethod def volume_path(car_dir: Path) -> Path: return car_dir / "VTK/simpleFoam_steady_3000/internal.vtu" # Specify the name of the VTP and directory structure in your dataset @staticmethod def surface_path(car_dir: Path) -> Path: return car_dir / "VTK/simpleFoam_steady_3000/boundary/aero_suv.vtp"
Before running
train.py, modify the loss functions. The surface loss functions currently, specificallyintegral_loss_fn,loss_fn_surfaceandloss_fn_area, assume the variables to be in a specific order, Pressure followed by Wall-Shear-Stress vector. Please modify these formulations if your variables are in a different order or don’t require these losses.Run
test.pyto validate the trained model.Use
inference_on_stl.pyscript to deploy the model in applications where inference is needed only from STL inputs and the volume mesh is not calculated.
The DoMINO model architecture is used to support the Real Time Digital Twin Blueprint and the DoMINO-Automotive-Aero NIM.
Some of the results are shown below.
Fig. 32 Results from DoMINO for RTWT SC demo#