NVIDIA Modulus Core v0.1.0
v0.1.0

Modulus Datapipes

class modulus.datapipes.benchmarks.darcy.Darcy2D(resolution: int = 256, batch_size: int = 64, nr_permeability_freq: int = 5, max_permeability: float = 2.0, min_permeability: float = 0.5, max_iterations: int = 30000, convergence_threshold: float = 1e-06, iterations_per_convergence_check: int = 1000, nr_multigrids: int = 4, normaliser: Optional[Dict[str, Tuple[float, float]]] = None, device: Union[str, device] = 'cuda')[source]

Bases: Datapipe

2D Darcy flow benchmark problem datapipe.

This datapipe continuously generates solutions to the 2D Darcy equation with variable permeability. All samples are generated on the fly and is meant to be a benchmark problem for testing data driven models. Permeability is drawn from a random Fourier series and threshold it to give a piecewise constant function. The solution is obtained using a GPU enabled multi-grid Jacobi iterative method.

Parameters
  • resolution (int, optional) – Resolution to run simulation at, by default 256

  • batch_size (int, optional) – Batch size of simulations, by default 64

  • nr_permeability_freq (int, optional) – Number of frequencies to use for generating random permeability. Higher values will give higher freq permeability fields., by default 5

  • max_permeability (float, optional) – Max permeability, by default 2.0

  • min_permeability (float, optional) – Min permeability, by default 0.5

  • max_iterations (int, optional) – Maximum iterations to use for each multi-grid, by default 30000

  • convergence_threshold (float, optional) – Solver L-Infinity convergence threshold, by default 1e-6

  • iterations_per_convergence_check (int, optional) – Number of Jacobi iterations to run before checking convergence, by default 1000

  • nr_multigrids (int, optional) – Number of multi-grid levels, by default 4

  • normaliser (Union[Dict[str, Tuple[float, float]], None], optional) – Dictionary with keys permeability and darcy. The values for these keys are two floats corresponding to mean and std (mean, std).

  • device (Union[str, torch.device], optional) – Device for datapipe to run place data on, by default “cuda”

Raises

ValueError – Incompatable multi-grid and resolution settings

generate_batch() → None[source]

Solve for new batch of simulations

initialize_batch() → None[source]

Initializes arrays for new batch of simulations

class modulus.datapipes.benchmarks.darcy.MetaData(name: str = 'Darcy2D', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = False)[source]

Bases: DatapipeMetaData

class modulus.datapipes.benchmarks.kelvin_helmholtz.KelvinHelmholtz2D(resolution: int = 512, batch_size: int = 16, seq_length: int = 8, nr_perturbation_freq: int = 5, perturbation_range: float = 0.1, nr_snapshots: int = 256, iteration_per_snapshot: int = 32, gamma: float = 1.6666666666666667, normaliser: Optional[Dict[str, Tuple[float, float]]] = None, device: Union[str, device] = 'cuda')[source]

Bases: Datapipe

Kelvin-Helmholtz instability benchmark problem datapipe.

This datapipe continuously generates samples with random initial conditions. All samples are generated on the fly and is meant to be a benchmark problem for testing data driven models. Initial conditions are given in the form of small perturbations. The solution is obtained using a GPU enabled Finite Volume Method.

Parameters
  • resolution (int, optional) – Resolution to run simulation at, by default 512

  • batch_size (int, optional) – Batch size of simulations, by default 16

  • seq_length (int, optional) – Sequence length of output samples, by default 8

  • nr_perturbation_freq (int, optional) – Number of frequencies to use for generating random initial perturbations, by default 5

  • perturbation_range (float, optional) – Range to use for random perturbations. This value will be the max amplitude of the initial perturbation, by default 0.1

  • nr_snapshots (int, optional) – Number of snapshots of simulation to generate for data generation. This will control how long the simulation is run for, by default 256

  • iteration_per_snapshot (int, optional) – Number of finite volume steps to take between each snapshot. Each step size is fixed as the smallest possible value that satisfies the Courant-Friedrichs-Lewy condition, by default 32

  • gamma (float, optional) – Heat capacity ratio, by default 5.0/3.0

  • normaliser (Union[Dict[str, Tuple[float, float]], None], optional) – Dictionary with keys density, velocity, and pressure. The values for these keys are two floats corresponding to mean and std (mean, std).

  • device (Union[str, torch.device], optional) – Device for datapipe to run place data on, by default “cuda”

generate_batch() → None[source]

Solve for new batch of simulations

initialize_batch() → None[source]

Initializes arrays for new batch of simulations

class modulus.datapipes.benchmarks.kelvin_helmholtz.MetaData(name: str = 'KelvinHelmholtz2D', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = False)[source]

Bases: DatapipeMetaData

class modulus.datapipes.climate.era5_hdf5.ERA5DaliExternalSource(data_paths: Iterable[str], num_samples: int, channels: Iterable[int], num_steps: int, stride: int, num_samples_per_year: int, batch_size: int = 1, shuffle: bool = True, process_rank: int = 0, world_size: int = 1)[source]

Bases: object

DALI Source for lazy-loading the HDF5 ERA5 files

Parameters
  • data_paths (Iterable[str]) – Directory where ERA5 data is stored

  • num_samples (int) – Total number of training samples

  • channels (Iterable[int]) – List representing which ERA5 variables to load

  • stride (int) – Number of steps between input and output variables

  • num_steps (int) – Number of timesteps are included in the output variables

  • num_samples_per_year (int) – Number of samples randomly taken from each year

  • batch_size (int, optional) – Batch size, by default 1

  • shuffle (bool, optional) – Shuffle dataset, by default True

  • process_rank (int, optional) – Rank ID of local process, by default 0

  • world_size (int, optional) – Number of training processes, by default 1

class modulus.datapipes.climate.era5_hdf5.ERA5HDF5Datapipe(data_dir: str, stats_dir: Optional[str] = None, channels: Optional[List[int]] = None, batch_size: int = 1, num_steps: int = 1, stride: int = 1, patch_size: Optional[Union[Tuple[int, int], int]] = None, num_samples_per_year: Optional[int] = None, shuffle: bool = True, num_workers: int = 1, device: Union[str, device] = 'cuda', process_rank: int = 0, world_size: int = 1)[source]

Bases: Datapipe

ERA5 DALI data pipeline for HDF5 files

Parameters
  • data_dir (str) – Directory where ERA5 data is stored

  • stats_dir (Union[str, None], optional) – Directory to data statistic numpy files for normalization, if None, no normalization will be used, by default None

  • channels (Union[List[int], None], optional) – Defines which ERA5 variables to load, if None will use all in HDF5 file, by default None

  • batch_size (int, optional) – Batch size, by default 1

  • stride (int, optional) – Number of steps between input and output variables. For example, if the dataset contains data at every 6 hours, a stride 1 = 6 hour delta t and stride 2 = 12 hours delta t, by default 1

  • num_steps (int, optional) – Number of timesteps are included in the output variables, by default 1

  • patch_size (Union[Tuple[int, int], int, None], optional) – If specified, crops input and output variables so image dimensions are divisible by patch_size, by default None

  • num_samples_per_year (int, optional) – Number of samples randomly taken from each year. If None, all will be use, by default None

  • shuffle (bool, optional) – Shuffle dataset, by default True

  • num_workers (int, optional) – Number of workers, by default 1

  • device (Union[str, torch.device], optional) – Device for DALI pipeline to run on, by default cuda

  • process_rank (int, optional) – Rank ID of local process, by default 0

  • world_size (int, optional) – Number of training processes, by default 1

load_statistics() → None[source]

Loads ERA5 statistics from pre-computed numpy files

The statistic files should be of name global_means.npy and global_std.npy with a shape of [1, C, 1, 1] located in the stat_dir.

Raises
  • IOError – If mean or std numpy files are not found

  • AssertionError – If loaded numpy arrays are not of correct size

parse_dataset_files() → None[source]

Parses the data directory for valid HDF5 files and determines training samples

Raises

ValueError – In channels specified or number of samples per year is not valid

class modulus.datapipes.climate.era5_hdf5.MetaData(name: str = 'ERA5HDF5', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = True)[source]

Bases: DatapipeMetaData

class modulus.datapipes.gnn.mgn_dataset.MGNDataset(name='dataset', data_dir=None, split='train', num_samples=1000, num_steps=600, noise_std=0.02, force_reload=False, verbose=False)[source]

Bases: DGLDataset

In-memory MeshGraphNet Dataset for stationary mesh .. rubric:: Notes

Parameters
  • name (str, optional) – Name of the dataset, by default “dataset”

  • data_dir (_type_, optional) – Specifying the directory that stores the raw data in .TFRecord format., by default None

  • split (str, optional) – Dataset split [“train”, “eval”, “test”], by default “train”

  • num_samples (int, optional) – Number of samples, by default 1000

  • num_steps (int, optional) – Number of time steps in each sample, by default 600

  • noise_std (float, optional) – The standard deviation of the noise added to the “train” split, by default 0.02

  • force_reload (bool, optional) – force reload, by default False

  • verbose (bool, optional) – verbose, by default False

static add_edge_features(graph, pos)[source]

adds relative displacement & displacement norm as edge features

static cell_to_adj(cells)[source]

creates adjancy matrix in COO format from mesh cells

static create_graph(src, dst, dtype=torch.int32)[source]

creates a DGL graph from an adj matrix in COO format. torch.int32 can handle graphs with up to 2**31-1 nodes or edges.

static denormalize(invar, mu, std)[source]

denormalizes a tensor

static normalize_edge(graph, mu, std)[source]

normalizes a tensor

static normalize_node(invar, mu, std)[source]

normalizes a tensor

© Copyright 2023, NVIDIA Modulus Team. Last updated on Aug 8, 2023.