Modulus Datapipes

class modulus.datapipes.benchmarks.darcy.Darcy2D(resolution: int = 256, batch_size: int = 64, nr_permeability_freq: int = 5, max_permeability: float = 2.0, min_permeability: float = 0.5, max_iterations: int = 30000, convergence_threshold: float = 1e-06, iterations_per_convergence_check: int = 1000, nr_multigrids: int = 4, normaliser: Optional[Dict[str, Tuple[float, float]]] = None, device: Union[str, device] = 'cuda')[source]

Bases: Datapipe

2D Darcy flow benchmark problem datapipe.

This datapipe continuously generates solutions to the 2D Darcy equation with variable permeability. All samples are generated on the fly and is meant to be a benchmark problem for testing data driven models. Permeability is drawn from a random Fourier series and threshold it to give a piecewise constant function. The solution is obtained using a GPU enabled multi-grid Jacobi iterative method.

Parameters
  • resolution (int, optional) – Resolution to run simulation at, by default 256

  • batch_size (int, optional) – Batch size of simulations, by default 64

  • nr_permeability_freq (int, optional) – Number of frequencies to use for generating random permeability. Higher values will give higher freq permeability fields., by default 5

  • max_permeability (float, optional) – Max permeability, by default 2.0

  • min_permeability (float, optional) – Min permeability, by default 0.5

  • max_iterations (int, optional) – Maximum iterations to use for each multi-grid, by default 30000

  • convergence_threshold (float, optional) – Solver L-Infinity convergence threshold, by default 1e-6

  • iterations_per_convergence_check (int, optional) – Number of Jacobi iterations to run before checking convergence, by default 1000

  • nr_multigrids (int, optional) – Number of multi-grid levels, by default 4

  • normaliser (Union[Dict[str, Tuple[float, float]], None], optional) – Dictionary with keys permeability and darcy. The values for these keys are two floats corresponding to mean and std (mean, std).

  • device (Union[str, torch.device], optional) – Device for datapipe to run place data on, by default “cuda”

Raises

ValueError – Incompatable multi-grid and resolution settings

generate_batch() → None[source]

Solve for new batch of simulations

initialize_batch() → None[source]

Initializes arrays for new batch of simulations

class modulus.datapipes.benchmarks.darcy.MetaData(name: str = 'Darcy2D', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = False)[source]

Bases: DatapipeMetaData

class modulus.datapipes.benchmarks.kelvin_helmholtz.KelvinHelmholtz2D(resolution: int = 512, batch_size: int = 16, seq_length: int = 8, nr_perturbation_freq: int = 5, perturbation_range: float = 0.1, nr_snapshots: int = 256, iteration_per_snapshot: int = 32, gamma: float = 1.6666666666666667, normaliser: Optional[Dict[str, Tuple[float, float]]] = None, device: Union[str, device] = 'cuda')[source]

Bases: Datapipe

Kelvin-Helmholtz instability benchmark problem datapipe.

This datapipe continuously generates samples with random initial conditions. All samples are generated on the fly and is meant to be a benchmark problem for testing data driven models. Initial conditions are given in the form of small perturbations. The solution is obtained using a GPU enabled Finite Volume Method.

Parameters
  • resolution (int, optional) – Resolution to run simulation at, by default 512

  • batch_size (int, optional) – Batch size of simulations, by default 16

  • seq_length (int, optional) – Sequence length of output samples, by default 8

  • nr_perturbation_freq (int, optional) – Number of frequencies to use for generating random initial perturbations, by default 5

  • perturbation_range (float, optional) – Range to use for random perturbations. This value will be the max amplitude of the initial perturbation, by default 0.1

  • nr_snapshots (int, optional) – Number of snapshots of simulation to generate for data generation. This will control how long the simulation is run for, by default 256

  • iteration_per_snapshot (int, optional) – Number of finite volume steps to take between each snapshot. Each step size is fixed as the smallest possible value that satisfies the Courant-Friedrichs-Lewy condition, by default 32

  • gamma (float, optional) – Heat capacity ratio, by default 5.0/3.0

  • normaliser (Union[Dict[str, Tuple[float, float]], None], optional) – Dictionary with keys density, velocity, and pressure. The values for these keys are two floats corresponding to mean and std (mean, std).

  • device (Union[str, torch.device], optional) – Device for datapipe to run place data on, by default “cuda”

generate_batch() → None[source]

Solve for new batch of simulations

initialize_batch() → None[source]

Initializes arrays for new batch of simulations

class modulus.datapipes.benchmarks.kelvin_helmholtz.MetaData(name: str = 'KelvinHelmholtz2D', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = False)[source]

Bases: DatapipeMetaData

class modulus.datapipes.climate.era5_hdf5.ERA5DaliExternalSource(data_paths: Iterable[str], num_samples: int, channels: Iterable[int], num_steps: int, stride: int, num_samples_per_year: int, batch_size: int = 1, shuffle: bool = True, process_rank: int = 0, world_size: int = 1)[source]

Bases: object

DALI Source for lazy-loading the HDF5 ERA5 files

Parameters
  • data_paths (Iterable[str]) – Directory where ERA5 data is stored

  • num_samples (int) – Total number of training samples

  • channels (Iterable[int]) – List representing which ERA5 variables to load

  • stride (int) – Number of steps between input and output variables

  • num_steps (int) – Number of timesteps are included in the output variables

  • num_samples_per_year (int) – Number of samples randomly taken from each year

  • batch_size (int, optional) – Batch size, by default 1

  • shuffle (bool, optional) – Shuffle dataset, by default True

  • process_rank (int, optional) – Rank ID of local process, by default 0

  • world_size (int, optional) – Number of training processes, by default 1

class modulus.datapipes.climate.era5_hdf5.ERA5HDF5Datapipe(data_dir: str, stats_dir: Optional[str] = None, channels: Optional[List[int]] = None, batch_size: int = 1, num_steps: int = 1, stride: int = 1, patch_size: Optional[Union[Tuple[int, int], int]] = None, num_samples_per_year: Optional[int] = None, shuffle: bool = True, num_workers: int = 1, device: Union[str, device] = 'cuda', process_rank: int = 0, world_size: int = 1)[source]

Bases: Datapipe

ERA5 DALI data pipeline for HDF5 files

Parameters
  • data_dir (str) – Directory where ERA5 data is stored

  • stats_dir (Union[str, None], optional) – Directory to data statistic numpy files for normalization, if None, no normalization will be used, by default None

  • channels (Union[List[int], None], optional) – Defines which ERA5 variables to load, if None will use all in HDF5 file, by default None

  • batch_size (int, optional) – Batch size, by default 1

  • stride (int, optional) – Number of steps between input and output variables. For example, if the dataset contains data at every 6 hours, a stride 1 = 6 hour delta t and stride 2 = 12 hours delta t, by default 1

  • num_steps (int, optional) – Number of timesteps are included in the output variables, by default 1

  • patch_size (Union[Tuple[int, int], int, None], optional) – If specified, crops input and output variables so image dimensions are divisible by patch_size, by default None

  • num_samples_per_year (int, optional) – Number of samples randomly taken from each year. If None, all will be use, by default None

  • shuffle (bool, optional) – Shuffle dataset, by default True

  • num_workers (int, optional) – Number of workers, by default 1

  • device (Union[str, torch.device], optional) – Device for DALI pipeline to run on, by default cuda

  • process_rank (int, optional) – Rank ID of local process, by default 0

  • world_size (int, optional) – Number of training processes, by default 1

load_statistics() → None[source]

Loads ERA5 statistics from pre-computed numpy files

The statistic files should be of name global_means.npy and global_std.npy with a shape of [1, C, 1, 1] located in the stat_dir.

Raises
  • IOError – If mean or std numpy files are not found

  • AssertionError – If loaded numpy arrays are not of correct size

parse_dataset_files() → None[source]

Parses the data directory for valid HDF5 files and determines training samples

Raises

ValueError – In channels specified or number of samples per year is not valid

class modulus.datapipes.climate.era5_hdf5.MetaData(name: str = 'ERA5HDF5', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = True)[source]

Bases: DatapipeMetaData

class modulus.datapipes.gnn.vortex_shedding_dataset.VortexSheddingDataset(name='dataset', data_dir=None, split='train', num_samples=1000, num_steps=600, noise_std=0.02, force_reload=False, verbose=False)[source]

Bases: DGLDataset

In-memory MeshGraphNet Dataset for stationary mesh .. rubric:: Notes

Parameters
  • name (str, optional) – Name of the dataset, by default “dataset”

  • data_dir (_type_, optional) – Specifying the directory that stores the raw data in .TFRecord format., by default None

  • split (str, optional) – Dataset split [“train”, “eval”, “test”], by default “train”

  • num_samples (int, optional) – Number of samples, by default 1000

  • num_steps (int, optional) – Number of time steps in each sample, by default 600

  • noise_std (float, optional) – The standard deviation of the noise added to the “train” split, by default 0.02

  • force_reload (bool, optional) – force reload, by default False

  • verbose (bool, optional) – verbose, by default False

static add_edge_features(graph, pos)[source]

adds relative displacement & displacement norm as edge features

static cell_to_adj(cells)[source]

creates adjancy matrix in COO format from mesh cells

static create_graph(src, dst, dtype=torch.int32)[source]

creates a DGL graph from an adj matrix in COO format. torch.int32 can handle graphs with up to 2**31-1 nodes or edges.

static denormalize(invar, mu, std)[source]

denormalizes a tensor

static normalize_edge(graph, mu, std)[source]

normalizes a tensor

static normalize_node(invar, mu, std)[source]

normalizes a tensor

class modulus.datapipes.gnn.ahmed_body_dataset.AhmedBodyDataset(data_dir: str, split: str = 'train', num_samples: int = 10, invar_keys: List[str] = ['pos', 'velocity', 'reynolds_number', 'length', 'width', 'height', 'ground_clearance', 'slant_angle', 'fillet_radius'], outvar_keys: List[str] = ['p', 'wallShearStress'], normalize_keys: List[str] = ['p', 'wallShearStress', 'velocity', 'reynolds_number', 'length', 'width', 'height', 'ground_clearance', 'slant_angle', 'fillet_radius'], normalization_bound: Tuple[float, float] = (-1.0, 1.0), force_reload: bool = False, name: str = 'dataset', verbose: bool = False, compute_drag: bool = False)[source]

Bases: DGLDataset, Datapipe

In-memory Ahmed body Dataset

Parameters
  • data_dir (str) – The directory where the data is stored.

  • split (str, optional) – The dataset split. Can be ‘train’, ‘validation’, or ‘test’, by default ‘train’.

  • num_samples (int, optional) – The number of samples to use, by default 10.

  • invar_keys (List[str], optional) – The input node features to consider. Default includes ‘pos’, ‘velocity’, ‘reynolds_number’, ‘length’, ‘width’, ‘height’, ‘ground_clearance’, ‘slant_angle’, and ‘fillet_radius’.

  • outvar_keys (List[str], optional) – The output features to consider. Default includes ‘p’ and ‘wallShearStress’.

  • List[str] (normalize_keys) – The features to normalize. Default includes ‘p’, ‘wallShearStress’, ‘velocity’, ‘length’, ‘width’, ‘height’, ‘ground_clearance’, ‘slant_angle’, and ‘fillet_radius’.

  • optional – The features to normalize. Default includes ‘p’, ‘wallShearStress’, ‘velocity’, ‘length’, ‘width’, ‘height’, ‘ground_clearance’, ‘slant_angle’, and ‘fillet_radius’.

  • normalization_bound (Tuple[float, float], optional) – The lower and upper bounds for normalization. Default is (-1, 1).

  • force_reload (bool, optional) – If True, forces a reload of the data, by default False.

  • name (str, optional) – The name of the dataset, by default ‘dataset’.

  • verbose (bool, optional) – If True, enables verbose mode, by default False.

  • compute_drag (bool, optional) – If True, also returns the coefficient and mesh area and normals that are required for computing the drag coefficient.

add_edge_features() → List[DGLGraph][source]

Add relative displacement and displacement norm as edge features for each graph in the list of graphs. The calculations are done using the ‘pos’ attribute in the node data of each graph. The resulting edge features are stored in the ‘x’ attribute in the edge data of each graph.

This method will modify the list of graphs in-place.

Returns

The list of graphs with updated edge features.

Return type

List[dgl.DGLGraph]

denormalize(pred, gt, device) → Tuple[Tensor, Tensor][source]

Denormalize the graph node data.

pred: Tensor

Normalized prediction

gt: Tensor

Normalized ground truth

device: Any

The device

Tuple(Tensor, Tensor)

Denormalized prediction and ground truth

normalize_edge() → List[DGLGraph][source]

Normalize edge data ‘x’ in each graph in the list of graphs using min-max normalization. The normalization is performed in-place. The normalization formula used is:

normalized_x = 2.0 * normalization_bound[1] * (x - edge_min) / (edge_max - edge_min) + normalization_bound[0]

This will bring the edge data ‘x’ in each graph into the range of [normalization_bound[0], normalization_bound[1]].

Returns

The list of graphs with normalized edge data ‘x’.

Return type

List[dgl.DGLGraph]

normalize_node() → List[DGLGraph][source]

Normalize node data in each graph in the list of graphs using min-max normalization. The normalization is performed in-place. The normalization formula used is:

normalized_data = 2.0 * normalization_bound[1] * (data - node_min) / (node_max - node_min) + normalization_bound[0]

This will bring the node data in each graph into the range of [normalization_bound[0], normalization_bound[1]]. After normalization, node data is concatenated according to the keys defined in ‘self.input_keys’ and ‘self.output_keys’, resulting in new node data ‘x’ and ‘y’, respectively.

Returns

The list of graphs with normalized and concatenated node data.

Return type

List[dgl.DGLGraph]

class modulus.datapipes.gnn.ahmed_body_dataset.MetaData(name: str = 'AhmedBody', auto_device: bool = True, cuda_graphs: bool = False, ddp_sharding: bool = True)[source]

Bases: DatapipeMetaData

modulus.datapipes.gnn.utils.load_json(file: str) → Dict[str, Tensor][source]

Loads a JSON file into a dictionary of PyTorch tensors.

Parameters

file (str) – Path to the JSON file.

Returns

Dictionary where each value is a PyTorch tensor.

Return type

Dict[str, torch.Tensor]

modulus.datapipes.gnn.utils.read_vtp_file(file_path: str) → Any[source]

Read a VTP file and return the polydata.

Parameters

file_path (str) – Path to the VTP file.

Returns

The polydata read from the VTP file.

Return type

vtkPolyData

modulus.datapipes.gnn.utils.save_json(var: Dict[str, Tensor], file: str) → None[source]

Saves a dictionary of tensors to a JSON file.

Parameters
  • var (Dict[str, torch.Tensor]) – Dictionary where each value is a PyTorch tensor.

  • file (str) – Path to the output JSON file.

© Copyright 2023, NVIDIA Modulus Team. Last updated on Sep 21, 2023.