Modulus Datapipes
- class modulus.datapipes.benchmarks.darcy.Darcy2D(resolution: int = 256, batch_size: int = 64, nr_permeability_freq: int = 5, max_permeability: float = 2.0, min_permeability: float = 0.5, max_iterations: int = 30000, convergence_threshold: float = 1e-06, iterations_per_convergence_check: int = 1000, nr_multigrids: int = 4, normaliser: Optional[Dict[str, Tuple[float, float]]] = None, device: Union[str, device] = 'cuda')[source]
Bases:
Datapipe
2D Darcy flow benchmark problem datapipe.
This datapipe continuously generates solutions to the 2D Darcy equation with variable permeability. All samples are generated on the fly and is meant to be a benchmark problem for testing data driven models. Permeability is drawn from a random Fourier series and threshold it to give a piecewise constant function. The solution is obtained using a GPU enabled multi-grid Jacobi iterative method.
- Parameters
resolution (int, optional) – Resolution to run simulation at, by default 256
batch_size (int, optional) – Batch size of simulations, by default 64
nr_permeability_freq (int, optional) – Number of frequencies to use for generating random permeability. Higher values will give higher freq permeability fields., by default 5
max_permeability (float, optional) – Max permeability, by default 2.0
min_permeability (float, optional) – Min permeability, by default 0.5
max_iterations (int, optional) – Maximum iterations to use for each multi-grid, by default 30000
convergence_threshold (float, optional) – Solver L-Infinity convergence threshold, by default 1e-6
iterations_per_convergence_check (int, optional) – Number of Jacobi iterations to run before checking convergence, by default 1000
nr_multigrids (int, optional) – Number of multi-grid levels, by default 4
normaliser (Union[Dict[str, Tuple[float, float]], None], optional) – Dictionary with keys permeability and darcy. The values for these keys are two floats corresponding to mean and std (mean, std).
device (Union[str, torch.device], optional) – Device for datapipe to run place data on, by default “cuda”
- Raises
ValueError – Incompatable multi-grid and resolution settings
- generate_batch() → None[source]
Solve for new batch of simulations
- initialize_batch() → None[source]
Initializes arrays for new batch of simulations
- class modulus.datapipes.benchmarks.darcy.MetaData(name: str = 'Darcy2D', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = False)[source]
Bases: DatapipeMetaData
- class modulus.datapipes.benchmarks.kelvin_helmholtz.KelvinHelmholtz2D(resolution: int = 512, batch_size: int = 16, seq_length: int = 8, nr_perturbation_freq: int = 5, perturbation_range: float = 0.1, nr_snapshots: int = 256, iteration_per_snapshot: int = 32, gamma: float = 1.6666666666666667, normaliser: Optional[Dict[str, Tuple[float, float]]] = None, device: Union[str, device] = 'cuda')[source]
Bases:
Datapipe
Kelvin-Helmholtz instability benchmark problem datapipe.
This datapipe continuously generates samples with random initial conditions. All samples are generated on the fly and is meant to be a benchmark problem for testing data driven models. Initial conditions are given in the form of small perturbations. The solution is obtained using a GPU enabled Finite Volume Method.
- Parameters
resolution (int, optional) – Resolution to run simulation at, by default 512
batch_size (int, optional) – Batch size of simulations, by default 16
seq_length (int, optional) – Sequence length of output samples, by default 8
nr_perturbation_freq (int, optional) – Number of frequencies to use for generating random initial perturbations, by default 5
perturbation_range (float, optional) – Range to use for random perturbations. This value will be the max amplitude of the initial perturbation, by default 0.1
nr_snapshots (int, optional) – Number of snapshots of simulation to generate for data generation. This will control how long the simulation is run for, by default 256
iteration_per_snapshot (int, optional) – Number of finite volume steps to take between each snapshot. Each step size is fixed as the smallest possible value that satisfies the Courant-Friedrichs-Lewy condition, by default 32
gamma (float, optional) – Heat capacity ratio, by default 5.0/3.0
normaliser (Union[Dict[str, Tuple[float, float]], None], optional) – Dictionary with keys density, velocity, and pressure. The values for these keys are two floats corresponding to mean and std (mean, std).
device (Union[str, torch.device], optional) – Device for datapipe to run place data on, by default “cuda”
- generate_batch() → None[source]
Solve for new batch of simulations
- initialize_batch() → None[source]
Initializes arrays for new batch of simulations
- class modulus.datapipes.benchmarks.kelvin_helmholtz.MetaData(name: str = 'KelvinHelmholtz2D', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = False)[source]
Bases: DatapipeMetaData
- class modulus.datapipes.climate.era5_hdf5.ERA5DaliExternalSource(data_paths: Iterable[str], num_samples: int, channels: Iterable[int], num_steps: int, stride: int, num_samples_per_year: int, batch_size: int = 1, shuffle: bool = True, process_rank: int = 0, world_size: int = 1)[source]
Bases:
object
DALI Source for lazy-loading the HDF5 ERA5 files
- Parameters
data_paths (Iterable[str]) – Directory where ERA5 data is stored
num_samples (int) – Total number of training samples
channels (Iterable[int]) – List representing which ERA5 variables to load
stride (int) – Number of steps between input and output variables
num_steps (int) – Number of timesteps are included in the output variables
num_samples_per_year (int) – Number of samples randomly taken from each year
batch_size (int, optional) – Batch size, by default 1
shuffle (bool, optional) – Shuffle dataset, by default True
process_rank (int, optional) – Rank ID of local process, by default 0
world_size (int, optional) – Number of training processes, by default 1
NoteFor more information about DALI external source operator: https://docs.nvidia.com/deeplearning/dali/archives/dali_1_13_0/user-guide/docs/examples/general/data_loading/parallel_external_source.html
- class modulus.datapipes.climate.era5_hdf5.ERA5HDF5Datapipe(data_dir: str, stats_dir: Optional[str] = None, channels: Optional[List[int]] = None, batch_size: int = 1, num_steps: int = 1, stride: int = 1, patch_size: Optional[Union[Tuple[int, int], int]] = None, num_samples_per_year: Optional[int] = None, shuffle: bool = True, num_workers: int = 1, device: Union[str, device] = 'cuda', process_rank: int = 0, world_size: int = 1)[source]
Bases:
Datapipe
ERA5 DALI data pipeline for HDF5 files
- Parameters
data_dir (str) – Directory where ERA5 data is stored
stats_dir (Union[str, None], optional) – Directory to data statistic numpy files for normalization, if None, no normalization will be used, by default None
channels (Union[List[int], None], optional) – Defines which ERA5 variables to load, if None will use all in HDF5 file, by default None
batch_size (int, optional) – Batch size, by default 1
stride (int, optional) – Number of steps between input and output variables. For example, if the dataset contains data at every 6 hours, a stride 1 = 6 hour delta t and stride 2 = 12 hours delta t, by default 1
num_steps (int, optional) – Number of timesteps are included in the output variables, by default 1
patch_size (Union[Tuple[int, int], int, None], optional) – If specified, crops input and output variables so image dimensions are divisible by patch_size, by default None
num_samples_per_year (int, optional) – Number of samples randomly taken from each year. If None, all will be use, by default None
shuffle (bool, optional) – Shuffle dataset, by default True
num_workers (int, optional) – Number of workers, by default 1
device (Union[str, torch.device], optional) – Device for DALI pipeline to run on, by default cuda
process_rank (int, optional) – Rank ID of local process, by default 0
world_size (int, optional) – Number of training processes, by default 1
- load_statistics() → None[source]
Loads ERA5 statistics from pre-computed numpy files
The statistic files should be of name global_means.npy and global_std.npy with a shape of [1, C, 1, 1] located in the stat_dir.
- Raises
IOError – If mean or std numpy files are not found
AssertionError – If loaded numpy arrays are not of correct size
- parse_dataset_files() → None[source]
Parses the data directory for valid HDF5 files and determines training samples
- Raises
ValueError – In channels specified or number of samples per year is not valid
- class modulus.datapipes.climate.era5_hdf5.MetaData(name: str = 'ERA5HDF5', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = True)[source]
Bases: DatapipeMetaData
- class modulus.datapipes.gnn.ahmed_body_dataset.AhmedBodyDataset(data_dir: str, split: str = 'train', num_samples: int = 10, invar_keys: List[str] = ['pos', 'velocity', 'reynolds_number', 'length', 'width', 'height', 'ground_clearance', 'slant_angle', 'fillet_radius'], outvar_keys: List[str] = ['p', 'wallShearStress'], normalize_keys: List[str] = ['p', 'wallShearStress', 'velocity', 'reynolds_number', 'length', 'width', 'height', 'ground_clearance', 'slant_angle', 'fillet_radius'], normalization_bound: Tuple[float, float] = (-1.0, 1.0), force_reload: bool = False, name: str = 'dataset', verbose: bool = False, compute_drag: bool = False)[source]
Bases:
DGLDataset
,Datapipe
In-memory Ahmed body Dataset
- Parameters
data_dir (str) – The directory where the data is stored.
split (str, optional) – The dataset split. Can be ‘train’, ‘validation’, or ‘test’, by default ‘train’.
num_samples (int, optional) – The number of samples to use, by default 10.
invar_keys (List[str], optional) – The input node features to consider. Default includes ‘pos’, ‘velocity’, ‘reynolds_number’, ‘length’, ‘width’, ‘height’, ‘ground_clearance’, ‘slant_angle’, and ‘fillet_radius’.
outvar_keys (List[str], optional) – The output features to consider. Default includes ‘p’ and ‘wallShearStress’.
List[str] (normalize_keys) – The features to normalize. Default includes ‘p’, ‘wallShearStress’, ‘velocity’, ‘length’, ‘width’, ‘height’, ‘ground_clearance’, ‘slant_angle’, and ‘fillet_radius’.
optional – The features to normalize. Default includes ‘p’, ‘wallShearStress’, ‘velocity’, ‘length’, ‘width’, ‘height’, ‘ground_clearance’, ‘slant_angle’, and ‘fillet_radius’.
normalization_bound (Tuple[float, float], optional) – The lower and upper bounds for normalization. Default is (-1, 1).
force_reload (bool, optional) – If True, forces a reload of the data, by default False.
name (str, optional) – The name of the dataset, by default ‘dataset’.
verbose (bool, optional) – If True, enables verbose mode, by default False.
compute_drag (bool, optional) – If True, also returns the coefficient and mesh area and normals that are required for computing the drag coefficient.
- add_edge_features() → List[DGLGraph][source]
Add relative displacement and displacement norm as edge features for each graph in the list of graphs. The calculations are done using the ‘pos’ attribute in the node data of each graph. The resulting edge features are stored in the ‘x’ attribute in the edge data of each graph.
This method will modify the list of graphs in-place.
- Returns
- Return type
The list of graphs with updated edge features.
List[dgl.DGLGraph]
- denormalize(pred, gt, device) → Tuple[Tensor, Tensor][source]
Denormalize the graph node data.
- pred: Tensor
- gt: Tensor
- device: Any
Normalized prediction
Normalized ground truth
The device
- Tuple(Tensor, Tensor)
Denormalized prediction and ground truth
- normalize_edge() → List[DGLGraph][source]
Normalize edge data ‘x’ in each graph in the list of graphs using min-max normalization. The normalization is performed in-place. The normalization formula used is:
normalized_x = 2.0 * normalization_bound[1] * (x - edge_min) / (edge_max - edge_min) + normalization_bound[0]
This will bring the edge data ‘x’ in each graph into the range of [normalization_bound[0], normalization_bound[1]].
- Returns
- Return type
The list of graphs with normalized edge data ‘x’.
List[dgl.DGLGraph]
- normalize_node() → List[DGLGraph][source]
Normalize node data in each graph in the list of graphs using min-max normalization. The normalization is performed in-place. The normalization formula used is:
normalized_data = 2.0 * normalization_bound[1] * (data - node_min) / (node_max - node_min) + normalization_bound[0]
This will bring the node data in each graph into the range of [normalization_bound[0], normalization_bound[1]]. After normalization, node data is concatenated according to the keys defined in ‘self.input_keys’ and ‘self.output_keys’, resulting in new node data ‘x’ and ‘y’, respectively.
- Returns
- Return type
The list of graphs with normalized and concatenated node data.
List[dgl.DGLGraph]
- class modulus.datapipes.gnn.ahmed_body_dataset.MetaData(name: str = 'AhmedBody', auto_device: bool = True, cuda_graphs: bool = False, ddp_sharding: bool = True)[source]
Bases: DatapipeMetaData
- modulus.datapipes.gnn.utils.load_json(file: str) → Dict[str, Tensor][source]
Loads a JSON file into a dictionary of PyTorch tensors.
- Parameters
- Returns
- Return type
file (str) – Path to the JSON file.
Dictionary where each value is a PyTorch tensor.
Dict[str, torch.Tensor]
- modulus.datapipes.gnn.utils.read_vtp_file(file_path: str) → Any[source]
Read a VTP file and return the polydata.
- Parameters
- Returns
- Return type
file_path (str) – Path to the VTP file.
The polydata read from the VTP file.
vtkPolyData
- modulus.datapipes.gnn.utils.save_json(var: Dict[str, Tensor], file: str) → None[source]
Saves a dictionary of tensors to a JSON file.
- Parameters
var (Dict[str, torch.Tensor]) – Dictionary where each value is a PyTorch tensor.
file (str) – Path to the output JSON file.