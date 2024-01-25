There are a few different ways to construct a Modulus model. If you are a seasoned PyTorch user, the easiest way would be to write your model using the optimized layers and utilities from Modulus or Pytorch. Lets take a look at a simple example of a UNet model first showing a simple PyTorch implementation and then a Modulus implementation that supports CUDA Graphs and Automatic Mixed-Precision.

Copy Copied! import torch.nn as nn class UNet(nn.Module): def __init__(self, in_channels=1, out_channels=1): super(UNet, self).__init__() self.enc1 = self.conv_block(in_channels, 64) self.enc2 = self.conv_block(64, 128) self.dec1 = self.upconv_block(128, 64) self.final = nn.Conv2d(64, out_channels, kernel_size=1) def conv_block(self, in_channels, out_channels): return nn.Sequential( nn.Conv2d(in_channels, out_channels, 3, padding=1), nn.ReLU(inplace=True), nn.MaxPool2d(2) ) def upconv_block(self, in_channels, out_channels): return nn.Sequential( nn.ConvTranspose2d(in_channels, out_channels, 2, stride=2), nn.Conv2d(out_channels, out_channels, 3, padding=1), nn.ReLU(inplace=True) ) def forward(self, x): x1 = self.enc1(x) x2 = self.enc2(x1) x = self.dec1(x2) return self.final(x)

Now we show this model rewritten in Modulus. First, let’s subclass the model from modulus.Module instead of torch.nn.Module . The modulus.Module class acts like a direct replacement for the torch.nn.Module and provides additional functionality for saving and loading checkpoints, etc. Refer to the API docs of modulus.Module for further details. Additionally we will add metadata to the model to capture the optimizations that this model supports. In this case we will enable CUDA Graphs and Automatic Mixed-Precision.

Copy Copied! from dataclasses import dataclass import modulus import torch.nn as nn @dataclass class UNetMetaData(modulus.ModelMetaData): name: str = "UNet" # Optimization jit: bool = True cuda_graphs: bool = True amp_cpu: bool = True amp_gpu: bool = True class UNet(modulus.Module): def __init__(self, in_channels=1, out_channels=1): super(UNet, self).__init__(meta=UNetMetaData()) self.enc1 = self.conv_block(in_channels, 64) self.enc2 = self.conv_block(64, 128) self.dec1 = self.upconv_block(128, 64) self.final = nn.Conv2d(64, out_channels, kernel_size=1) def conv_block(self, in_channels, out_channels): return nn.Sequential( nn.Conv2d(in_channels, out_channels, 3, padding=1), nn.ReLU(inplace=True), nn.MaxPool2d(2) ) def upconv_block(self, in_channels, out_channels): return nn.Sequential( nn.ConvTranspose2d(in_channels, out_channels, 2, stride=2), nn.Conv2d(out_channels, out_channels, 3, padding=1), nn.ReLU(inplace=True) ) def forward(self, x): x1 = self.enc1(x) x2 = self.enc2(x1) x = self.dec1(x2) return self.final(x)

Now that we have our Modulus model, we can make use of these optimizations using the modulus.utils.StaticCaptureTraining decorator. This decorator will capture the training step function and optimize it for the specified optimizations.

Copy Copied! import torch from modulus.utils import StaticCaptureTraining model = UNet().to("cuda") input = torch.randn(8, 1, 128, 128).to("cuda") output = torch.zeros(8, 1, 64, 64).to("cuda") optim = torch.optim.Adam(model.parameters(), lr=0.001) # Create training step function with optimization wrapper # StaticCaptureTraining calls `backward` on the loss and # `optimizer.step()` so you don't have to do that # explicitly. @StaticCaptureTraining( model=model, optim=optim, cuda_graph_warmup=11, ) def training_step(invar, outvar): predvar = model(invar) loss = torch.sum(torch.pow(predvar - outvar, 2)) return loss # Sample training loop for i in range(20): # In place copy of input and output to support cuda graphs input.copy_(torch.randn(8, 1, 128, 128).to("cuda")) output.copy_(torch.zeros(8, 1, 64, 64).to("cuda")) # Run training step loss = training_step(input, output)

For the simple model above, you can observe ~1.1x speed-up due to CUDA Graphs and AMP. The speed-up observed changes from model to model and is typically greater for more complex models.

Note The ModelMetaData and modulus.Module do not make the model support CUDA Graphs, AMP, etc. optimizations automatically. The user is responsible to write the model code that enables each of these optimizations. Models in the Modulus Model Zoo are written to support many of these optimizations and checked against Modulus’s CI to ensure that they work correctly.