Modulus Datapipes
- class modulus.datapipes.climate.era5_hdf5.ERA5DaliExternalSource(data_paths: Iterable[str], num_samples: int, channels: Iterable[int], num_steps: int, num_history: int, stride: int, num_samples_per_year: int, use_cos_zenith: bool, cos_zenith_args: Dict, use_time_of_year_index: bool, batch_size: int = 1, shuffle: bool = True, process_rank: int = 0, world_size: int = 1)[source]
Bases:
object
DALI Source for lazy-loading the HDF5 ERA5 files
- Parameters
data_paths (Iterable[str]) – Directory where ERA5 data is stored
num_samples (int) – Total number of training samples
channels (Iterable[int]) – List representing which ERA5 variables to load
start_year (int, optional) – Start year of dataset
stride (int) – Number of steps between input and output variables
num_steps (int) – Number of timesteps are included in the output variables
num_history (int) – Number of previous timesteps included in the input variables
num_samples_per_year (int) – Number of samples randomly taken from each year
batch_size (int, optional) – Batch size, by default 1
use_cos_zenith (bool) – If True, the cosine zenith angles corresponding to the coordinates will be produced,
cos_zenith_args (Dict) –
Dictionary containing the following dt: float
Time in hours between each timestep in the dataset
- start_year
Start year of dataset
shuffle (bool, optional) – Shuffle dataset, by default True
process_rank (int, optional) – Rank ID of local process, by default 0
world_size (int, optional) – Number of training processes, by default 1
NoteFor more information about DALI external source operator: https://docs.nvidia.com/deeplearning/dali/archives/dali_1_13_0/user-guide/docs/examples/general/data_loading/parallel_external_source.html
- class modulus.datapipes.climate.era5_hdf5.ERA5HDF5Datapipe(data_dir: str, stats_dir: Optional[str] = None, channels: Optional[List[int]] = None, batch_size: int = 1, num_steps: int = 1, num_history: int = 0, stride: int = 1, latlon_resolution: Optional[Tuple[int, int]] = None, interpolation_type: Optional[str] = None, patch_size: Optional[Union[Tuple[int, int], int]] = None, num_samples_per_year: Optional[int] = None, use_cos_zenith: bool = False, cos_zenith_args: Dict = {}, use_time_of_year_index: bool = False, shuffle: bool = True, num_workers: int = 1, device: Union[str, device] = 'cuda', process_rank: int = 0, world_size: int = 1)[source]
Bases:
Datapipe
ERA5 DALI data pipeline for HDF5 files
- Parameters
data_dir (str) – Directory where ERA5 data is stored
stats_dir (Union[str, None], optional) – Directory to data statistic numpy files for normalization, if None, no normalization will be used, by default None
channels (Union[List[int], None], optional) – Defines which ERA5 variables to load, if None will use all in HDF5 file, by default None
batch_size (int, optional) – Batch size, by default 1
stride (int, optional) – Number of steps between input and output variables. For example, if the dataset contains data at every 6 hours, a stride 1 = 6 hour delta t and stride 2 = 12 hours delta t, by default 1
num_steps (int, optional) – Number of timesteps are included in the output variables, by default 1
num_history (int, optional) – Number of previous timesteps included in the input variables, by default 0
latlon_resolution (Tuple[int, int], optional) – The resolution for the latitude-longitude grid (H, W). Needs to be specified for cos zenith angle computation, or interpolation. By default None
interpolation_type (str, optional) – Interpolation type for resizing. Supports [“INTERP_NN”, “INTERP_LINEAR”, “INTERP_CUBIC”, “INTERP_LANCZOS3”, “INTERP_TRIANGULAR”, “INTERP_GAUSSIAN”]. By default None (no interpolation is done)
patch_size (Union[Tuple[int, int], int, None], optional) – If specified, crops input and output variables so image dimensions are divisible by patch_size, by default None
num_samples_per_year (int, optional) – Number of samples randomly taken from each year. If None, all will be use, by default None
use_cos_zenith (bool, optional) – If True, the cosine zenith angles corresponding to the coordinates will be produced, by default False
cos_zenith_args (Dict, optional) –
Dictionary containing the following dt: float, optional
Time in hours between each timestep in the dataset, by default 6 hr
- start_year: int, optional
Start year of dataset, by default 1980
- latlon_boundsTuple[Tuple[float, float], Tuple[float, float]], optional
Bounds of latitude and longitude in the data, in the format ((lat_start, lat_end,), (lon_start, lon_end)). By default ((90, -90), (0, 360)).
Defaults are only applicable if use_cos_zenith is True. Otherwise, defaults to {}.
use_time_of_year_index (bool) – If true, also returns the index that can be sued to determine the time of the year corresponding to each sample. By default False.
shuffle (bool, optional) – Shuffle dataset, by default True
num_workers (int, optional) – Number of workers, by default 1
device (Union[str, torch.device], optional) – Device for DALI pipeline to run on, by default cuda
process_rank (int, optional) – Rank ID of local process, by default 0
world_size (int, optional) – Number of training processes, by default 1
- load_statistics() → None[source]
Loads ERA5 statistics from pre-computed numpy files
The statistic files should be of name global_means.npy and global_std.npy with a shape of [1, C, 1, 1] located in the stat_dir.
- Raises
IOError – If mean or std numpy files are not found
AssertionError – If loaded numpy arrays are not of correct size
- parse_dataset_files() → None[source]
Parses the data directory for valid HDF5 files and determines training samples
- Raises
ValueError – In channels specified or number of samples per year is not valid
- class modulus.datapipes.climate.era5_hdf5.MetaData(name: str = 'ERA5HDF5', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = True)[source]
Bases:
DatapipeMetaData
- class modulus.datapipes.climate.climate.ClimateDaliExternalSource(data_paths: Iterable[str], num_samples: int, channels: Iterable[int], num_steps: int, stride: int, dt: float, start_year: int, num_samples_per_year: int, latlon: ndarray, variables: Optional[List[str]] = None, aux_variables: List[Union[str, Callable]] = (), batch_size: int = 1, shuffle: bool = True, process_rank: int = 0, world_size: int = 1, backend_kwargs: Optional[dict] = None)[source]
Bases:
ABC
DALI Source for lazy-loading the HDF5/NetCDF4 climate files
- Parameters
data_paths (Iterable[str]) – Directory where climate data is stored
num_samples (int) – Total number of training samples
channels (Iterable[int]) – List representing which climate variables to load
num_steps (int) – Number of timesteps to load
stride (int) – Number of steps between input and output variables
dt (float, optional) – Time in hours between each timestep in the dataset, by default 6 hr
start_year (int, optional) – Start year of dataset, by default 1980
num_samples_per_year (int) – Number of samples randomly taken from each year
variables (Union[List[str], None], optional for HDF5 files, mandatory for NetCDF4 files) – List of named variables to load. Variables will be read in the order specified by this parameter.
aux_variables (Union[Mapping[str, Callable], None], optional) – A dictionary mapping strings to callables that accept arguments (timestamps: numpy.ndarray, latlon: numpy.ndarray). These define any auxiliary variables returned from this source.
batch_size (int, optional) – Batch size, by default 1
shuffle (bool, optional) – Shuffle dataset, by default True
process_rank (int, optional) – Rank ID of local process, by default 0
world_size (int, optional) – Number of training processes, by default 1
NoteFor more information about DALI external source operator: https://docs.nvidia.com/deeplearning/dali/archives/dali_1_13_0/user-guide/docs/examples/general/data_loading/parallel_external_source.html
- class modulus.datapipes.climate.climate.ClimateDataSourceSpec(data_dir: str, name: Optional[str] = None, file_type: str = 'hdf5', stats_files: Optional[Mapping[str, str]] = None, metadata_path: Optional[str] = None, channels: Optional[List[int]] = None, variables: Optional[List[str]] = None, use_cos_zenith: bool = False, aux_variables: Optional[Mapping[str, Callable]] = None, num_steps: int = 1, stride: int = 1, backend_kwargs: Optional[dict] = None)[source]
Bases:
object
A data source specification for ClimateDatapipe.
HDF5 files should contain the following variable with the corresponding name: fields: Tensor of shape (num_timesteps, num_channels, height, width), containing climate data. The order of the channels should match the order of the channels in the statistics files. The statistics files should be .npy files with the shape (1, num_channels, 1, 1). The names of the variables are found in the metadata file found in metadata_path.
NetCDF4 files should contain a variable of shape (num_timesteps, height, width) for each variable they provide. Only the variables listed in variables will be loaded.
- Parameters
data_dir (str) – Directory where climate data is stored
name (Union[str, None], optional) – The name that is used to label datapipe outputs from this source. If None, the datapipe uses the number of the source in sequential order.
file_type (str) – Type of files to read, supported values are “hdf5” (default) and “netcdf4”
stats_files (Union[Mapping[str, str], None], optional) – Numpy files to data statistics for normalization. Supports either a channels format, in which case the dict should contain the keys “mean” and “std”, or a named-variable format, in which case the dict should contain the key “norm” . If None, no normalization will be used, by default None
metadata_path (Union[Mapping[str, str], None], optional for NetCDF, required for HDF5) – Path to the metadata JSON file for the dataset (usually called data.json).
channels (Union[List[int], None], optional) – Defines which climate variables to load, if None will use all in HDF5 file, by default None
variables (Union[List[str], None], optional for HDF5 files, mandatory for NetCDF4 files) – List of named variables to load. Variables will be read in the order specified by this parameter. Must be used for NetCDF4 files. Supported for HDF5 files in which case it will override channels.
use_cos_zenith (bool, optional) – If True, the cosine zenith angles corresponding to the coordinates of this data source will be produced, default False
aux_variables (Union[Mapping[str, Callable], None], optional) – A dictionary mapping strings to callables that accept arguments (timestamps: numpy.ndarray, latlon: numpy.ndarray). These define any auxiliary variables returned from this source.
num_steps (int, optional) – Number of timesteps to return, by default 1
stride (int, optional) – Number of steps between input and output variables. For example, if the dataset contains data at every 6 hours, a stride 1 = 6 hour delta t and stride 2 = 12 hours delta t, by default 1
- dimensions_compatible(other) → bool[source]
Basic sanity check to test if two ClimateDataSourceSpec are compatible.
- parse_dataset_files(num_samples_per_year: Optional[int] = None, patch_size: Optional[int] = None) → None[source]
Parses the data directory for valid files and determines training samples
- Parameters
num_samples_per_year (int, optional) – Number of samples taken from each year. If None, all will be used, by default None
patch_size (Union[Tuple[int, int], int, None], optional) – If specified, crops input and output variables so image dimensions are divisible by patch_size, by default None
- Raises
ValueError – In channels specified or number of samples per year is not valid
- class modulus.datapipes.climate.climate.ClimateDatapipe(sources: Iterable[ClimateDataSourceSpec], batch_size: int = 1, dt: float = 6.0, start_year: int = 1980, latlon_bounds: Tuple[Tuple[float, float], Tuple[float, float]] = ((90, -90), (0, 360)), crop_window: Optional[Tuple[Tuple[float, float], Tuple[float, float]]] = None, invariants: Optional[Mapping[str, Callable]] = None, num_samples_per_year: Optional[int] = None, shuffle: bool = True, num_workers: int = 1, device: Union[str, device] = 'cuda', process_rank: int = 0, world_size: int = 1)[source]
Bases:
Datapipe
A Climate DALI data pipeline. This pipeline loads data from HDF5/NetCDF4 files. It can also return additional data such as the solar zenith angle for each time step. Additionally, it normalizes the data if a statistics file is provided. The pipeline returns a dictionary with the following structure, where {name} indicates the name of the data source provided:
- state_seq-{name}: Tensors of shape
(batch_size, num_steps, num_channels, height, width). This sequence is drawn from the data file and normalized if a statistics file is provided.
- timestamps-{name}: Tensors of shape (batch_size, num_steps), containing
timestamps for each timestep in the sequence.
- {aux_variable}-{name}: Tensors of shape
(batch_size, num_steps, aux_channels, height, width), containing the auxiliary variables returned by each data source
- cos_zenith-{name}: Tensors of shape (batch_size, num_steps, 1, height, width),
containing the cosine of the solar zenith angle if specified.
- `{invariant_name}: Tensors of shape (batch_size, invariant_channels, height, width),
containing the time-invariant data (depending only on spatial coordinates) returned by the datapipe. These can include e.g. land-sea mask and geopotential/surface elevation.
To use this data pipeline, your data directory must be structured as follows:
` data_dir ├── 1980.h5 ├── 1981.h5 ├── 1982.h5 ├── ... └── 2020.h5 `
The files are assumed have no metadata, such as timestamps. Because of this, it’s important to specify the dt parameter and the start_year parameter so that the pipeline can compute the correct timestamps for each timestep. These timestamps are then used to compute the cosine of the solar zenith angle, if specified.
- Parameters
sources (Iterable[ClimateDataSpec]) – A list of data specifications defining the sources for the climate variables
batch_size (int, optional) – Batch size, by default 1
dt (float, optional) – Time in hours between each timestep in the dataset, by default 6 hr
start_year (int, optional) – Start year of dataset, by default 1980
latlon_bounds (Tuple[Tuple[float, float], Tuple[float, float]], optional) – Bounds of latitude and longitude in the data, in the format ((lat_start, lat_end,), (lon_start, lon_end)). By default ((90, -90), (0, 360)).
crop_window (Union[Tuple[Tuple[float, float], Tuple[float, float]], None], optional) – The window to crop the data to, in the format ((i0,i1), (j0,j1)) where the first spatial dimension will be cropped to i0:i1 and the second to j0:j1. If not given, all data will be used.
invariants (Mapping[str,Callable], optional) – Specifies the time-invariant data (for example latitude and longitude) included in the data samples. Should be a dict where the keys are the names of the invariants and the values are the corresponding functions. The functions need to accept an argument of the shape (2, data_shape[0], data_shape[1]) where the first dimension contains latitude and longitude in degrees and the other dimensions corresponding to the shape of data in the data files. For example, invariants={“trig_latlon”: invariants.LatLon()} will include the sin/cos of lat/lon in the output.
num_samples_per_year (int, optional) – Number of samples taken from each year. If None, all will be used, by default None
shuffle (bool, optional) – Shuffle dataset, by default True
num_workers (int, optional) – Number of workers, by default 1
device (Union[str, torch.device], optional) – Device for DALI pipeline to run on, by default cuda
process_rank (int, optional) – Rank ID of local process, by default 0
world_size (int, optional) – Number of training processes, by default 1
- class modulus.datapipes.climate.climate.ClimateHDF5DaliExternalSource(data_paths: Iterable[str], num_samples: int, channels: Iterable[int], num_steps: int, stride: int, dt: float, start_year: int, num_samples_per_year: int, latlon: ndarray, variables: Optional[List[str]] = None, aux_variables: List[Union[str, Callable]] = (), batch_size: int = 1, shuffle: bool = True, process_rank: int = 0, world_size: int = 1, backend_kwargs: Optional[dict] = None)[source]
Bases:
ClimateDaliExternalSource
DALI source for reading HDF5 formatted climate data files.
- class modulus.datapipes.climate.climate.ClimateNetCDF4DaliExternalSource(data_paths: Iterable[str], num_samples: int, channels: Iterable[int], num_steps: int, stride: int, dt: float, start_year: int, num_samples_per_year: int, latlon: ndarray, variables: Optional[List[str]] = None, aux_variables: List[Union[str, Callable]] = (), batch_size: int = 1, shuffle: bool = True, process_rank: int = 0, world_size: int = 1, backend_kwargs: Optional[dict] = None)[source]
Bases:
ClimateDaliExternalSource
DALI source for reading NetCDF4 formatted climate data files.
- class modulus.datapipes.climate.climate.MetaData(name: str = 'Climate', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = True)[source]
Bases:
DatapipeMetaData
- class modulus.datapipes.climate.synthetic.SyntheticWeatherDataLoader(*args, **kwargs)[source]
Bases:
DataLoader
This custom DataLoader initializes the SyntheticWeatherDataset with given arguments.
- class modulus.datapipes.climate.synthetic.SyntheticWeatherDataset(channels: List[int], num_samples_per_year: int, num_steps: int, device: str | torch.device = 'cuda', grid_size: Tuple[int, int] = (721, 1440), base_temp: float = 15, amplitude: float = 10, noise_level: float = 2, **kwargs: Any)[source]
Bases:
Dataset
A dataset for generating synthetic temperature data on a latitude-longitude grid for multiple atmospheric layers.
- Parameters
channels (list) – List of channels representing different atmospheric layers.
num_samples_per_year (int) – Total number of days to simulate per year.
num_steps (int) – Number of consecutive days in each training sample.
grid_size (tuple) – Latitude by longitude dimensions of the temperature grid.
base_temp (float) – Base temperature around which variations are simulated.
amplitude (float) – Amplitude of the sinusoidal temperature variation.
noise_level (float) – Standard deviation of the noise added to temperature data.
**kwargs – Additional keyword arguments for advanced configurations.
- generate_data(num_days: int, num_channels: int, grid_size: Tuple[int, int], base_temp: float, amplitude: float, noise_level: float) → ndarray[source]
Generates synthetic temperature data over a specified number of days for multiple atmospheric layers.
- Parameters
num_days (int) – Number of days to generate data for.
num_channels (int) – Number of channels representing different layers.
grid_size (tuple) – Grid size (latitude, longitude).
base_temp (float) – Base mean temperature for the data.
amplitude (float) – Amplitude of temperature variations.
noise_level (float) – Noise level to add stochasticity to the temperature.
- Returns
A 4D array of temperature values across days, channels, latitudes, and longitudes.
- Return type
numpy.ndarray
- class modulus.datapipes.healpix.timeseries_dataset.MetaData(name: str = 'TimeSeries', auto_device: bool = False, cuda_graphs: bool = False, ddp_sharding: bool = False)[source]
Bases:
DatapipeMetaData
Metadata for this datapipe
- class modulus.datapipes.healpix.timeseries_dataset.TimeSeriesDataset(dataset: Dataset, scaling: Optional[DictConfig] = None, input_time_dim: int = 1, output_time_dim: int = 1, data_time_step: Union[int, str] = '3h', time_step: Union[int, str] = '6h', gap: Optional[Union[int, str]] = None, batch_size: int = 32, drop_last: bool = False, add_insolation: bool = False, forecast_init_times: Optional[Sequence] = None, meta: DatapipeMetaData = MetaData(name='TimeSeries', auto_device=False, cuda_graphs=False, ddp_sharding=False))[source]
Bases:
Dataset
,Datapipe
Dataset for sampling from continuous time-series data, compatible with pytorch data loading.
- get_constants()[source]
Returns the constants used in this dataset
- Returns
np.ndarray
- Return type
The list of constants, None if there are no constants
- class modulus.datapipes.gnn.ahmed_body_dataset.AhmedBodyDataset(data_dir: str, split: str = 'train', num_samples: int = 10, invar_keys: Iterable[str] = ('pos', 'velocity', 'reynolds_number', 'length', 'width', 'height', 'ground_clearance', 'slant_angle', 'fillet_radius'), outvar_keys: Iterable[str] = ('p', 'wallShearStress'), normalize_keys: Iterable[str] = ('p', 'wallShearStress', 'velocity', 'reynolds_number', 'length', 'width', 'height', 'ground_clearance', 'slant_angle', 'fillet_radius'), normalization_bound: Tuple[float, float] = (-1.0, 1.0), force_reload: bool = False, name: str = 'dataset', verbose: bool = False, compute_drag: bool = False, num_workers: Optional[int] = None)[source]
Bases:
DGLDataset
,Datapipe
In-memory Ahmed body Dataset
- Parameters
data_dir (str) – The directory where the data is stored.
split (str, optional) – The dataset split. Can be ‘train’, ‘validation’, or ‘test’, by default ‘train’.
num_samples (int, optional) – The number of samples to use, by default 10.
invar_keys (Iterable[str], optional) – The input node features to consider. Default includes ‘pos’, ‘velocity’, ‘reynolds_number’, ‘length’, ‘width’, ‘height’, ‘ground_clearance’, ‘slant_angle’, and ‘fillet_radius’.
outvar_keys (Iterable[str], optional) – The output features to consider. Default includes ‘p’ and ‘wallShearStress’.
Iterable[str] (normalize_keys) – The features to normalize. Default includes ‘p’, ‘wallShearStress’, ‘velocity’, ‘length’, ‘width’, ‘height’, ‘ground_clearance’, ‘slant_angle’, and ‘fillet_radius’.
optional – The features to normalize. Default includes ‘p’, ‘wallShearStress’, ‘velocity’, ‘length’, ‘width’, ‘height’, ‘ground_clearance’, ‘slant_angle’, and ‘fillet_radius’.
normalization_bound (Tuple[float, float], optional) – The lower and upper bounds for normalization. Default is (-1, 1).
force_reload (bool, optional) – If True, forces a reload of the data, by default False.
name (str, optional) – The name of the dataset, by default ‘dataset’.
verbose (bool, optional) – If True, enables verbose mode, by default False.
compute_drag (bool, optional) – If True, also returns the coefficient and mesh area and normals that are required for computing the drag coefficient.
num_workers (int, optional) – Number of dataset pre-loading workers. If None, will be chosen automatically.
- add_edge_features() → List[DGLGraph][source]
Add relative displacement and displacement norm as edge features for each graph in the list of graphs. The calculations are done using the ‘pos’ attribute in the node data of each graph. The resulting edge features are stored in the ‘x’ attribute in the edge data of each graph.
This method will modify the list of graphs in-place.
- Returns
The list of graphs with updated edge features.
- Return type
List[dgl.DGLGraph]
- create_graph(index: int, file_path: str, info_path: str) → None[source]
Creates a graph from VTP file.
This method is used in parallel loading of graphs.
- Return type
Tuple that contains graph index, graph, and optionally coeff, normal and area values.
- denormalize(pred, gt, device) → Tuple[Tensor, Tensor][source]
Denormalize the graph node data.
- Parameters
pred (Tensor) – Normalized prediction
gt (Tensor) – Normalized ground truth
device (Any) – The device
- Returns
Denormalized prediction and ground truth
- Return type
Tuple(Tensor, Tensor)
- normalize_edge() → List[DGLGraph][source]
Normalize edge data ‘x’ in each graph in the list of graphs.
- Returns
The list of graphs with normalized edge data ‘x’.
- Return type
List[dgl.DGLGraph]
- normalize_node() → List[DGLGraph][source]
Normalize node data in each graph in the list of graphs.
- Returns
The list of graphs with normalized and concatenated node data.
- Return type
List[dgl.DGLGraph]
- class modulus.datapipes.gnn.ahmed_body_dataset.FileInfo(velocity: float, reynolds_number: float, length: float, width: float, height: float, ground_clearance: float, slant_angle: float, fillet_radius: float)[source]
Bases:
object
VTP file info storage.
- class modulus.datapipes.gnn.ahmed_body_dataset.MetaData(name: str = 'AhmedBody', auto_device: bool = True, cuda_graphs: bool = False, ddp_sharding: bool = True)[source]
Bases:
DatapipeMetaData
- class modulus.datapipes.gnn.drivaernet_dataset.DrivAerNetDataset(data_dir: str | pathlib.Path, split: str = 'train', num_samples: int = 10, coeff_filename: str = 'AeroCoefficients_DrivAerNet_FilteredCorrected.csv', invar_keys: Iterable[str] = ('pos',), outvar_keys: Iterable[str] = ('p', 'wallShearStress'), normalize_keys: Iterable[str] = ('p', 'wallShearStress'), cache_dir: str | pathlib.Path = './cache/', force_reload: bool = False, name: str = 'dataset', verbose: bool = False, **kwargs)[source]
Bases:
DGLDataset
,Datapipe
DrivAerNet dataset.
Note: DrivAerNetDataset does not use default DGLDataset caching functionality such as has_cache, download etc, as it is invoked during the __init__ call so takes a lot of time. Instead, DrivAerNetDataset caches graphs in __getitem__ call thus avoiding long initialization delay.
- Parameters
data_dir (str) – The directory where the data is stored.
split (str, optional) – The dataset split. Can be ‘train’, ‘validation’, or ‘test’, by default ‘train’.
num_samples (int, optional) – The number of samples to use, by default 10.
coeff_filename (str, optional) – DrivAerNet coefficients file name, default is from the dataset location.
invar_keys (Iterable[str], optional) – The input node features to consider. Default includes ‘pos’.
outvar_keys (Iterable[str], optional) – The output features to consider. Default includes ‘p’ and ‘wallShearStress’.
Iterable[str] (normalize_keys) – The features to normalize. Default includes ‘p’ and ‘wallShearStress’.
optional – The features to normalize. Default includes ‘p’ and ‘wallShearStress’.
cache_dir (str, optional) – Path to the cache directory to store graphs in DGL format for fast loading. Default is ./cache/.
force_reload (bool, optional) – If True, forces a reload of the data, by default False.
name (str, optional) – The name of the dataset, by default ‘dataset’.
verbose (bool, optional) – If True, enables verbose mode, by default False.
- denormalize(pred: Tensor, gt: Tensor, device: device) → tuple[torch.Tensor, torch.Tensor][source]
Denormalizes the inputs using previously collected statistics.
- class modulus.datapipes.gnn.drivaernet_dataset.MetaData(name: str = 'DrivAerNet', auto_device: bool = True, cuda_graphs: bool = False, ddp_sharding: bool = True)[source]
Bases:
DatapipeMetaData
- class modulus.datapipes.gnn.stokes_dataset.StokesDataset(data_dir, split='train', num_samples=10, invar_keys=['pos', 'marker'], outvar_keys=['u', 'v', 'p'], normalize_keys=['u', 'v', 'p'], force_reload=False, name='dataset', verbose=False)[source]
Bases:
DGLDataset
In-memory Stokes flow Dataset
- Parameters
data_dir (str) – The directory where the data is stored.
split (str, optional) – The dataset split. Can be ‘train’, ‘validation’, or ‘test’, by default ‘train’.
num_samples (int, optional) – The number of samples to use, by default 10.
invar_keys (List[str], optional) – The input node features to consider. Default includes ‘pos’ and ‘marker’
outvar_keys (List[str], optional) – The output features to consider. Default includes ‘u’, ‘v’, and ‘p’.
List[str] (normalize_keys) – The features to normalize. Default includes ‘u’, ‘v’, and ‘p’.
optional – The features to normalize. Default includes ‘u’, ‘v’, and ‘p’.
force_reload (bool, optional) – If True, forces a reload of the data, by default False.
name (str, optional) – The name of the dataset, by default ‘dataset’.
verbose (bool, optional) – If True, enables verbose mode, by default False.
- add_edge_features()[source]
adds relative displacement & displacement norm as edge features
- static denormalize(invar, mu, std)[source]
denormalizes a tensor
- normalize_edge()[source]
normalizes a tensor
- normalize_node()[source]
normalizes node features
- modulus.datapipes.gnn.utils.load_json(file: str) → Dict[str, Tensor][source]
Loads a JSON file into a dictionary of PyTorch tensors.
- Parameters
file (str) – Path to the JSON file.
- Returns
Dictionary where each value is a PyTorch tensor.
- Return type
Dict[str, torch.Tensor]
- modulus.datapipes.gnn.utils.read_vtp_file(file_path: str) → Any[source]
Read a VTP file and return the polydata.
- Parameters
file_path (str) – Path to the VTP file.
- Returns
The polydata read from the VTP file.
- Return type
vtkPolyData
- modulus.datapipes.gnn.utils.save_json(var: Dict[str, Tensor], file: str) → None[source]
Saves a dictionary of tensors to a JSON file.
- Parameters
var (Dict[str, torch.Tensor]) – Dictionary where each value is a PyTorch tensor.
file (str) – Path to the output JSON file.
- class modulus.datapipes.cae.mesh_datapipe.MeshDaliExternalSource(data_paths: Iterable[str], file_format: str, variables: List[str], num_samples: int, batch_size: int = 1, shuffle: bool = True, process_rank: int = 0, world_size: int = 1)[source]
Bases:
object
DALI Source for lazy-loading with caching of mesh data
- Parameters
data_paths (Iterable[str]) – Directory where data is stored
num_samples (int) – Total number of training samples
batch_size (int, optional) – Batch size, by default 1
shuffle (bool, optional) – Shuffle dataset, by default True
process_rank (int, optional) – Rank ID of local process, by default 0
world_size (int, optional) – Number of training processes, by default 1
NoteFor more information about DALI external source operator: https://docs.nvidia.com/deeplearning/dali/archives/dali_1_13_0/user-guide/docs/examples/general/data_loading/parallel_external_source.html
- class modulus.datapipes.cae.mesh_datapipe.MeshDatapipe(data_dir: str, variables: List[str], num_variables: int, file_format: str = 'vtp', stats_dir: Optional[str] = None, batch_size: int = 1, num_samples: int = 1, shuffle: bool = True, num_workers: int = 1, device: Union[str, device] = 'cuda', process_rank: int = 0, world_size: int = 1)[source]
Bases:
Datapipe
DALI data pipeline for mesh data
- Parameters
data_dir (str) – Directory where ERA5 data is stored
variables (List[str, None]) – Ordered list of variables to be loaded from the files
num_variables (int) – Number of variables to be loaded from the files
file_format (str, optional) – File format of the data, by default “vtp” Supported formats: “vtp”, “vtu”, “cgns”
stats_dir (Union[str, None], optional) – Directory where statistics are stored, by default None If provided, the statistics are used to normalize the attributes
batch_size (int, optional) – Batch size, by default 1
num_steps (int, optional) – Number of timesteps are included in the output variables, by default 1
shuffle (bool, optional) – Shuffle dataset, by default True
num_workers (int, optional) – Number of workers, by default 1
device (Union[str, torch.device], optional) – Device for DALI pipeline to run on, by default cuda
process_rank (int, optional) – Rank ID of local process, by default 0
world_size (int, optional) – Number of training processes, by default 1
- load_statistics() → None[source]
Loads statistics from pre-computed numpy files
The statistic files should be of name global_means.npy and global_std.npy with a shape of [1, C] located in the stat_dir.
- Raises
IOError – If mean or std numpy files are not found
AssertionError – If loaded numpy arrays are not of correct size
- parse_dataset_files() → None[source]
Parses the data directory for valid files and determines training samples
- Raises
ValueError – In channels specified or number of samples per year is not valid
- class modulus.datapipes.cae.mesh_datapipe.MetaData(name: str = 'MeshDatapipe', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = True)[source]
Bases:
DatapipeMetaData