NVIDIA Modulus Getting Started

Modulus Overview

Modulus is an open source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods.

Whether you are exploring the use of Neural operators like Fourier Neural Operators or interested in Physics informed Neural Networks or a hybrid approach in between, Modulus provides you with the optimized stack that will enable you to train your models at real world scale.

Modulus code is open source and packaged in a modular fashion into two repos - Modulus Core and Modulus Sym.

Modulus Core provides a pytorch like experience for those proficient with python for AI. It includes all the core algorithms, network architectures and utilities to cover the broad spectrum of physics-constrained and data-driven workflows to suit the diversity of use cases in the science and engineering disciplines.

Modulus Symbolic (Modulus Sym) provides pythonic APIs, algorithms and utilities to be used with Modulus core, to explicitly physics inform the model training. This includes symbolic APIs for PDEs, domain sampling and PDE-based residuals. It also provides higher level abstraction to compose a training loop from specification of the geometry, PDEs and constraints like boundary conditions using simple symbolic APIs.

We also package the entire Modulus source in a single container image to simplify the ease of use and is freely available on NGC. The container does not include the reference applications due to their size and we recommend you to download the examples directly from Github source.

Roles and scope:

Generally, installing from source gives you all the fleixbility and also the responsibility to manage your environment, installing using container gives you the ease of use with the flexibility that is sufficient for most users. Here we are listing a few scenarios to spell out the paths available in the context of your goals, expertise and scope of work.

a. I’m an AI researcher and want to train a model in Modulus We recommend you to use the container image. You will be able to create any model architecture from scratch. Refer Simple Training Tutorial for more details.

b. I’m a domain expert and want to use existing model with some modifications We recommend you to use the container image. We also recommend that you download the examples from the github source to use as a starting point.

c. I’m an AI researcher and want to contribute to Modulus We recommend you to clone the source code from Github and develop on the fork as it will make it easy for you to contribute your work to the repos.

System Requirements

  • Operating System

    • Ubuntu 20.04 or Linux 5.13 kernel

  • Driver and GPU Requirements

    • pip: NVIDIA driver that is compatible with local PyTorch installation.

    • Docker container: Modulus container release 24.01 is based on CUDA 12.3.2, which requires NVIDIA Driver release 545 or later. However, if you are running on a data center GPU (for example, T4 or any other data center GPU), you can use NVIDIA driver release 450.51 (or later R450), 470.57 (or later R470), 510.47 (or later R510), or 525.85 (or later R525), or 535.86 (or later R535). The CUDA driver’s compatibility package only supports particular drivers. Thus, users should upgrade from all R418, R440, R460, and R520 drivers, which are not forward-compatible with CUDA 12.3. For a complete list of supported drivers, see the CUDA Application Compatibility topic. For more information, see CUDA Compatibility and Upgrades PyTorch NVIDIA Container Release Notes.

  • Required installations for pip install

    • Python >= 3.8

  • Recommended Hardware

    • 64-bit x86

    • NVIDIA GPUs:

      • Hopper GPUs - H100

      • NVIDIA Ampere GPUs - A100, A30, A40, A4000, A6000

      • Volta GPUs - V100

      • Turing GPUs - T4

    • Other Supported GPUs:

      • NVIDIA Ampere GPUs - RTX 30xx

      • Volta GPUs - Titan V, Quadro GV100

    • For others, please reach us out at Modulus Forums

    • 64-bit ARM architecture is also supported.

Note

To get the benefits of all the performance improvements (e.g. AMP, multi-GPU scaling, etc.), use the NVIDIA container for Modulus. This container comes with all the prerequisites and dependencies and allows you to get started efficiently with Modulus.

NVIDIA Modulus NGC Container is the easiest way to start using Modulus. This comes will all Modulus software and its dependencies pre-installed allowing you to get started with Modulus examples with ease. The Modulus container is built on top of NVIDIA PyTorch NGC Container which is optimized for GPU acceleration.

Install the Docker Engine

To start working with Modulus repos, ensure that you have Docker Engine installed.

You will also need to install the NVIDIA docker toolkit. This should work on most debian based systems:

Copy
Copied!
            

sudo apt-get install nvidia-docker2

Running Modulus in the docker image while using SDF library may require NVIDIA container toolkit version greater or equal to 1.0.4.

To run the docker commands without sudo, add yourself to the docker group by following the steps 1-4 found in Manage Docker as a non-root user

Get Modulus Container

Download the Modulus docker container from NGC using:

Copy
Copied!
            

docker pull nvcr.io/nvidia/modulus/modulus:<tag>

A shell session can be launched in the container using:

Copy
Copied!
            

docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \ --runtime nvidia -it --rm nvcr.io/nvidia/modulus/modulus:<tag> bash

The current directory can be mounted inside the docker container using:

Copy
Copied!
            

docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \ --runtime nvidia -v ${PWD}:/workspace \ -it --rm nvcr.io/nvidia/modulus/modulus:<tag> bash

To verify the installation, refer Running examples section. A quick installation check can also be done by running the following:

Copy
Copied!
            

python >>> import torch >>> from modulus.models.mlp.fully_connected import FullyConnected >>> model = FullyConnected(in_features=32, out_features=64) >>> input = torch.randn(128, 32) >>> output = model(input) >>> output.shape torch.Size([128, 64])

Modulus with Docker Image - Singularity

To build and run Modulus Docker image with Singularity:

Copy
Copied!
            

singularity build --fakeroot --sandbox Modulus.sif Modulus-Singularity singularity run --writable --nv Modulus.simg jupyter-lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace/python

Modulus with pip install

While NVIDIA recommends using the docker image provided to run Modulus examples, installation instructions for custom python environment are also provided. Currently a lot of dependencies are not fully supported using the pip method, especially the tesselated geometry module in Modulus Sym. If this is required please use the docker image provided.

Typically it is recommended to install python packages inside a virtual environment. Depending on your preference, you can choose to create either a conda environment or a python virtual environment.

Once you have the appropriate python environment set up, Modulus can be pip installed using:

Copy
Copied!
            

pip install nvidia-modulus nvidia-modulus-sym

To verify the installation, refer Running examples section. A quick installation check can also be done by running the following:

Copy
Copied!
            

python >>> import torch >>> from modulus.models.mlp.fully_connected import FullyConnected >>> model = FullyConnected(in_features=32, out_features=64) >>> input = torch.randn(128, 32) >>> output = model(input) >>> output.shape torch.Size([128, 64])

Optional dependencies

  1. Additional packages are required for Modulus examples. Refer to the README.md of each example for instructions on adding required dependencies. Typically, you will have to install dgl library, nvFuser library, and then do a full installation of Modulus core (in addition to the above pip install steps). This can be done using pip install nvidia-modulus[all].

  2. Add packages for quadpy, orthopy, ndim and gdown if you intend to use the quadrature functionality of Modulus Sym or wish to download the example data for the Neural Operator training. This can be done using pip install quadpy orthopy ndim gdown

Running examples

A summary of case studies and tutorials using Modulus can be found on the GitHub:

Modulus Examples

Modulus Sym Examples

Modulus examples

Clone the Modulus repository in your working directory using:

Copy
Copied!
            

git clone https://github.com/NVIDIA/modulus.git

Note

It is highly recommended to read the README.md file from each individual example to learn about the problem description and any other run instructions specific to each example.

To verify the examples run correctly, run these commands:

Copy
Copied!
            

cd ./modulus/examples/cfd/darcy_fno/ pip install warp-lang # Warp is not included in Modulus container python train_fno_darcy.py

If you see the outputs/ directory created after the execution of the command (~5 min), the installation is successful.

Modulus Sym examples

Clone the Modulus Sym repository in your working directory using:

Copy
Copied!
            

git clone https://github.com/NVIDIA/modulus-sym.git

Note

The modulus-sym repository has Git LFS enabled. You will need to have Git LFS installed for the clone to work correctly. More information about Git LFS can be found here .

To verify the examples run correctly, run these commands:

Copy
Copied!
            

cd ./modulus-sym/examples/helmholtz/ python helmholtz.py

If you see the outputs/ directory created after the execution of the command (~5 min), the installation is successful.

Development using Modulus

To start developing using Modulus, the key step is to install Modulus in an editable mode. This allows you to make changes to the source code without re-installing Modulus every time a change is made. The below instructions will be same irrespective of whether you are using docker or pip (virtual environment) method to work with modulus. Docker method is still preferred over the virtual environment method as that comes with the other dependencies pre-installed. This is especially useful if you intend to do development using the tesselated geometry features of Modulus Sym which rely on pysdf.

To get started, first uninstall the existing Modulus installation using:

Copy
Copied!
            

pip uninstall -y nvidia-modulus nvidia-modulus.sym

Next, clone the relevant Modulus repositories and install the source code in editable mode. The below instructions show the steps for the core modulus package.

Copy
Copied!
            

git clone https://github.com/NVIDIA/modulus.git cd modulus/ pip install -e .

The editable install now allows you to make changes to the source files from the modulus directory and the changes are immediately available without requiring new installation.

If you have made some developments that you are eager contribute back to the Modulus repos, this can simply be done following the typical GitHub Contribution workflow. Refer Modulus Contribution Guidelines for more details.

Testing the developments

It is always a good idea to test the developments locally before submitting your code. While the Contribution Guides talk in detail about this process, for completeness, a few instructions are provided here. To install modulus will all its development packages (this includes packages like pytest, black, interrogate etc. which are also part of the CI/CD system):

Copy
Copied!
            

git clone https://github.com/NVIDIA/modulus.git cd modulus/ pip install -e .[dev]

Then you can run pytests using following

Copy
Copied!
            

cd test/ pytest

This should give out an output similar to below

Copy
Copied!
            

================================= test session starts ================================== platform linux -- Python 3.10.6, pytest-7.4.0, pluggy-1.2.0 rootdir: /examples/release_23.09/modulus-release-build-guide/modulus plugins: hydra-core-1.3.2, shard-0.1.2, rerunfailures-12.0, xdist-3.3.1, xdoctest-1.0.2, hypothesis-5.35.1, flakefinder-1.1.0 collected 591 items Running 591 items in this shard test_multi_gpu_sample.py s [ 0%] datapipes/test_ahmed_body.py ss [ 0%] datapipes/test_climate_hdf5.py sssssssssssssssssssssssssss [ 5%] datapipes/test_darcy.py sssssssssssssssss [ 7%] datapipes/test_era5_hdf5.py sssssssssssssssssssssssssss [ 12%] datapipes/test_kelvin_helmholtz.py sssssssssssssssssssssssssssss [ 17%] datapipes/sfno/test_data_loader_dummy.py . [ 17%] deploy/test_onnx_fft.py ......................................................... [ 27%] ................... [ 30%] deploy/test_onnx_utils.py .... [ 31%] distributed/test_autograd.py ssss [ 31%] distributed/test_manager.py .....s [ 32%] distributed/test_utils.py . [ 32%] metrics/test_metrics_climate.py ...... [ 34%] metrics/test_metrics_general.py ............ [ 36%] models/test_afno.py .......... [ 37%] models/test_dlwp.py ......................................................... [ 47%] models/test_entrypoints.py ......... [ 48%] models/test_fcn_mip_plugin.py ssssssss [ 50%] models/test_fno.py .................................. [ 56%] models/test_from_torch.py .......... [ 57%] models/test_fully_connected.py .......... [ 59%] models/test_layers_activations.py ....... [ 60%] models/test_layers_weightnorm.py .. [ 60%] models/test_model_factory.py .. [ 61%] models/test_nd_conv_layers.py .............. [ 63%] models/test_pix2pix.py .............. [ 65%] models/test_rnn.py ............................ [ 70%] models/test_rnn_layers.py ....................................................... [ 80%] ................. [ 82%] models/test_super_res_net.py .......... [ 84%] models/graphcast/test_concat_trick.py . [ 84%] models/graphcast/test_cugraphops.py . [ 84%] models/graphcast/test_grad_checkpointing.py .. [ 85%] models/graphcast/test_graphcast.py .......... [ 86%] models/meshgraphnet/test_meshgraphnet.py .......... [ 88%] models/sfno/test_activations.py .... [ 89%] models/sfno/test_sfno.py ................ [ 92%] utils/test_capture.py .................................. [ 97%] utils/test_filesystem.py .. [ 98%] utils/graphcast/test_coordinate_transform.py ... [ 98%] utils/graphcast/test_loss.py . [ 98%] utils/sfno/test_img_utils.py .. [ 99%] utils/sfno/test_logging.py . [ 99%] utils/sfno/test_warmup_scheduler.py . [ 99%] utils/sfno/test_yparams.py ... [100%] ================================== warnings summary ==================================== ============== 475 passed, 116 skipped, 28 warnings in 134.51s (0:02:14) ===============

Note

This is just an example of what information is shown. Test failures are not necessarily indicative of a broken modulus installation.

Modulus on Public Cloud instances

Modulus can be used on public cloud instances like AWS and GCP. To install and run Modulus,

  1. Get your GPU instance on AWS or GCP. (Please see System Requirements for recommended hardware platform)

  2. Use the NVIDIA GPU-Optimized VMI on the cloud instance. For detailed instructions on setting up VMI refer NGC Certified Public Clouds.

  3. Once the instance spins up, follow the NVIDIA Modulus Getting Started to load the Modulus Docker container and the examples.

© Copyright 2023, NVIDIA Modulus Team. Last updated on Sep 25, 2023.