Modulus Datapipes
- class modulus.datapipes.benchmarks.darcy.Darcy2D(resolution: int = 256, batch_size: int = 64, nr_permeability_freq: int = 5, max_permeability: float = 2.0, min_permeability: float = 0.5, max_iterations: int = 30000, convergence_threshold: float = 1e-06, iterations_per_convergence_check: int = 1000, nr_multigrids: int = 4, normaliser: Optional[Dict[str, Tuple[float, float]]] = None, device: Union[str, device] = 'cuda')[source]
2D Darcy flow benchmark problem datapipe.
This datapipe continuously generates solutions to the 2D Darcy equation with variable permeability. All samples are generated on the fly and is meant to be a benchmark problem for testing data driven models. Permeability is drawn from a random Fourier series and threshold it to give a piecewise constant function. The solution is obtained using a GPU enabled multi-grid Jacobi iterative method.
- Parameters
resolution (int, optional) – Resolution to run simulation at, by default 256
batch_size (int, optional) – Batch size of simulations, by default 64
nr_permeability_freq (int, optional) – Number of frequencies to use for generating random permeability. Higher values will give higher freq permeability fields., by default 5
max_permeability (float, optional) – Max permeability, by default 2.0
min_permeability (float, optional) – Min permeability, by default 0.5
max_iterations (int, optional) – Maximum iterations to use for each multi-grid, by default 30000
convergence_threshold (float, optional) – Solver L-Infinity convergence threshold, by default 1e-6
iterations_per_convergence_check (int, optional) – Number of Jacobi iterations to run before checking convergence, by default 1000
nr_multigrids (int, optional) – Number of multi-grid levels, by default 4
normaliser (Union[Dict[str, Tuple[float, float]], None], optional) – Dictionary with keys permeability and darcy. The values for these keys are two floats corresponding to mean and std (mean, std).
device (Union[str, torch.device], optional) – Device for datapipe to run place data on, by default “cuda”
- Raises
ValueError – Incompatable multi-grid and resolution settings
- generate_batch() → None[source]
Solve for new batch of simulations
- initialize_batch() → None[source]
Initializes arrays for new batch of simulations
- class modulus.datapipes.benchmarks.darcy.MetaData(name: str = 'Darcy2D', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = False)[source]
Bases: DatapipeMetaData
- class modulus.datapipes.benchmarks.kelvin_helmholtz.KelvinHelmholtz2D(resolution: int = 512, batch_size: int = 16, seq_length: int = 8, nr_perturbation_freq: int = 5, perturbation_range: float = 0.1, nr_snapshots: int = 256, iteration_per_snapshot: int = 32, gamma: float = 1.6666666666666667, normaliser: Optional[Dict[str, Tuple[float, float]]] = None, device: Union[str, device] = 'cuda')[source]
Kelvin-Helmholtz instability benchmark problem datapipe.
This datapipe continuously generates samples with random initial conditions. All samples are generated on the fly and is meant to be a benchmark problem for testing data driven models. Initial conditions are given in the form of small perturbations. The solution is obtained using a GPU enabled Finite Volume Method.
- Parameters
resolution (int, optional) – Resolution to run simulation at, by default 512
batch_size (int, optional) – Batch size of simulations, by default 16
seq_length (int, optional) – Sequence length of output samples, by default 8
nr_perturbation_freq (int, optional) – Number of frequencies to use for generating random initial perturbations, by default 5
perturbation_range (float, optional) – Range to use for random perturbations. This value will be the max amplitude of the initial perturbation, by default 0.1
nr_snapshots (int, optional) – Number of snapshots of simulation to generate for data generation. This will control how long the simulation is run for, by default 256
iteration_per_snapshot (int, optional) – Number of finite volume steps to take between each snapshot. Each step size is fixed as the smallest possible value that satisfies the Courant-Friedrichs-Lewy condition, by default 32
gamma (float, optional) – Heat capacity ratio, by default 5.0/3.0
normaliser (Union[Dict[str, Tuple[float, float]], None], optional) – Dictionary with keys density, velocity, and pressure. The values for these keys are two floats corresponding to mean and std (mean, std).
device (Union[str, torch.device], optional) – Device for datapipe to run place data on, by default “cuda”
- generate_batch() → None[source]
Solve for new batch of simulations
- initialize_batch() → None[source]
Initializes arrays for new batch of simulations
- class modulus.datapipes.benchmarks.kelvin_helmholtz.MetaData(name: str = 'KelvinHelmholtz2D', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = False)[source]
Bases: DatapipeMetaData
- class modulus.datapipes.climate.era5_hdf5.ERA5DaliExternalSource(data_paths: Iterable[str], num_samples: int, channels: Iterable[int], num_steps: int, stride: int, num_samples_per_year: int, batch_size: int = 1, shuffle: bool = True, process_rank: int = 0, world_size: int = 1)[source]
DALI Source for lazy-loading the HDF5 ERA5 files
- Parameters
data_paths (Iterable[str]) – Directory where ERA5 data is stored
num_samples (int) – Total number of training samples
channels (Iterable[int]) – List representing which ERA5 variables to load
stride (int) – Number of steps between input and output variables
num_steps (int) – Number of timesteps are included in the output variables
num_samples_per_year (int) – Number of samples randomly taken from each year
batch_size (int, optional) – Batch size, by default 1
shuffle (bool, optional) – Shuffle dataset, by default True
process_rank (int, optional) – Rank ID of local process, by default 0
world_size (int, optional) – Number of training processes, by default 1
NoteFor more information about DALI external source operator:
- class modulus.datapipes.climate.era5_hdf5.ERA5HDF5Datapipe(data_dir: str, stats_dir: Optional[str] = None, channels: Optional[List[int]] = None, batch_size: int = 1, num_steps: int = 1, stride: int = 1, patch_size: Optional[Union[Tuple[int, int], int]] = None, num_samples_per_year: Optional[int] = None, shuffle: bool = True, num_workers: int = 1, device: Union[str, device] = 'cuda', process_rank: int = 0, world_size: int = 1)[source]
ERA5 DALI data pipeline for HDF5 files
- Parameters
data_dir (str) – Directory where ERA5 data is stored
stats_dir (Union[str, None], optional) – Directory to data statistic numpy files for normalization, if None, no normalization will be used, by default None
channels (Union[List[int], None], optional) – Defines which ERA5 variables to load, if None will use all in HDF5 file, by default None
batch_size (int, optional) – Batch size, by default 1
stride (int, optional) – Number of steps between input and output variables. For example, if the dataset contains data at every 6 hours, a stride 1 = 6 hour delta t and stride 2 = 12 hours delta t, by default 1
num_steps (int, optional) – Number of timesteps are included in the output variables, by default 1
patch_size (Union[Tuple[int, int], int, None], optional) – If specified, crops input and output variables so image dimensions are divisible by patch_size, by default None
num_samples_per_year (int, optional) – Number of samples randomly taken from each year. If None, all will be use, by default None
shuffle (bool, optional) – Shuffle dataset, by default True
num_workers (int, optional) – Number of workers, by default 1
device (Union[str, torch.device], optional) – Device for DALI pipeline to run on, by default cuda
process_rank (int, optional) – Rank ID of local process, by default 0
world_size (int, optional) – Number of training processes, by default 1
- load_statistics() → None[source]
Loads ERA5 statistics from pre-computed numpy files
The statistic files should be of name global_means.npy and global_std.npy with a shape of [1, C, 1, 1] located in the stat_dir.
- Raises
IOError – If mean or std numpy files are not found
AssertionError – If loaded numpy arrays are not of correct size
- parse_dataset_files() → None[source]
Parses the data directory for valid HDF5 files and determines training samples
- Raises
ValueError – In channels specified or number of samples per year is not valid
- class modulus.datapipes.climate.era5_hdf5.MetaData(name: str = 'ERA5HDF5', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = True)[source]
Bases: DatapipeMetaData
- class modulus.datapipes.gnn.mgn_dataset.MGNDataset(name='dataset', data_dir=None, split='train', num_samples=1000, num_steps=600, noise_std=0.02, force_reload=False, verbose=False)[source]
In-memory MeshGraphNet Dataset for stationary mesh .. rubric:: Notes
- This dataset prepares and processes the data available in MeshGraphNet’s repo:
- A single adj matrix is used for each transient simulation.
Do not use with adaptive mesh or remeshing
- Parameters
name (str, optional) – Name of the dataset, by default “dataset”
data_dir (_type_, optional) – Specifying the directory that stores the raw data in .TFRecord format., by default None
split (str, optional) – Dataset split [“train”, “eval”, “test”], by default “train”
num_samples (int, optional) – Number of samples, by default 1000
num_steps (int, optional) – Number of time steps in each sample, by default 600
noise_std (float, optional) – The standard deviation of the noise added to the “train” split, by default 0.02
force_reload (bool, optional) – force reload, by default False
verbose (bool, optional) – verbose, by default False
- static add_edge_features(graph, pos)[source]
adds relative displacement & displacement norm as edge features
- static cell_to_adj(cells)[source]
creates adjancy matrix in COO format from mesh cells
- static create_graph(src, dst, dtype=torch.int32)[source]
creates a DGL graph from an adj matrix in COO format. torch.int32 can handle graphs with up to 2**31-1 nodes or edges.
- static denormalize(invar, mu, std)[source]
denormalizes a tensor
- static normalize_edge(graph, mu, std)[source]
normalizes a tensor
- static normalize_node(invar, mu, std)[source]
normalizes a tensor