- class modulus.datapipes.climate.era5_hdf5.ERA5DaliExternalSource(data_paths: Iterable[str], num_samples: int, channels: Iterable[int], num_steps: int, stride: int, num_samples_per_year: int, batch_size: int = 1, shuffle: bool = True, process_rank: int = 0, world_size: int = 1)[source]
Bases:
object
DALI Source for lazy-loading the HDF5 ERA5 files
- Parameters
data_paths (Iterable[str]) – Directory where ERA5 data is stored
num_samples (int) – Total number of training samples
channels (Iterable[int]) – List representing which ERA5 variables to load
stride (int) – Number of steps between input and output variables
num_steps (int) – Number of timesteps are included in the output variables
num_samples_per_year (int) – Number of samples randomly taken from each year
batch_size (int, optional) – Batch size, by default 1
shuffle (bool, optional) – Shuffle dataset, by default True
process_rank (int, optional) – Rank ID of local process, by default 0
world_size (int, optional) – Number of training processes, by default 1
NoteFor more information about DALI external source operator: https://docs.nvidia.com/deeplearning/dali/archives/dali_1_13_0/user-guide/docs/examples/general/data_loading/parallel_external_source.html
- class modulus.datapipes.climate.era5_hdf5.ERA5HDF5Datapipe(data_dir: str, stats_dir: Optional[str] = None, channels: Optional[List[int]] = None, batch_size: int = 1, num_steps: int = 1, stride: int = 1, patch_size: Optional[Union[Tuple[int, int], int]] = None, num_samples_per_year: Optional[int] = None, shuffle: bool = True, num_workers: int = 1, device: Union[str, device] = 'cuda', process_rank: int = 0, world_size: int = 1)[source]
Bases:
Datapipe
ERA5 DALI data pipeline for HDF5 files
- Parameters
data_dir (str) – Directory where ERA5 data is stored
stats_dir (Union[str, None], optional) – Directory to data statistic numpy files for normalization, if None, no normalization will be used, by default None
channels (Union[List[int], None], optional) – Defines which ERA5 variables to load, if None will use all in HDF5 file, by default None
batch_size (int, optional) – Batch size, by default 1
stride (int, optional) – Number of steps between input and output variables. For example, if the dataset contains data at every 6 hours, a stride 1 = 6 hour delta t and stride 2 = 12 hours delta t, by default 1
num_steps (int, optional) – Number of timesteps are included in the output variables, by default 1
patch_size (Union[Tuple[int, int], int, None], optional) – If specified, crops input and output variables so image dimensions are divisible by patch_size, by default None
num_samples_per_year (int, optional) – Number of samples randomly taken from each year. If None, all will be use, by default None
shuffle (bool, optional) – Shuffle dataset, by default True
num_workers (int, optional) – Number of workers, by default 1
device (Union[str, torch.device], optional) – Device for DALI pipeline to run on, by default cuda
process_rank (int, optional) – Rank ID of local process, by default 0
world_size (int, optional) – Number of training processes, by default 1
- load_statistics() → None[source]
Loads ERA5 statistics from pre-computed numpy files
The statistic files should be of name global_means.npy and global_std.npy with a shape of [1, C, 1, 1] located in the stat_dir.
- Raises
IOError – If mean or std numpy files are not found
AssertionError – If loaded numpy arrays are not of correct size
- parse_dataset_files() → None[source]
Parses the data directory for valid HDF5 files and determines training samples
- Raises
ValueError – In channels specified or number of samples per year is not valid
- class modulus.datapipes.climate.era5_hdf5.MetaData(name: str = 'ERA5HDF5', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = True)[source]
Bases: DatapipeMetaData
- class modulus.datapipes.gnn.ahmed_body_dataset.AhmedBodyDataset(data_dir: str, split: str = 'train', num_samples: int = 10, invar_keys: List[str] = ['pos', 'velocity', 'reynolds_number', 'length', 'width', 'height', 'ground_clearance', 'slant_angle', 'fillet_radius'], outvar_keys: List[str] = ['p', 'wallShearStress'], normalize_keys: List[str] = ['p', 'wallShearStress', 'velocity', 'reynolds_number', 'length', 'width', 'height', 'ground_clearance', 'slant_angle', 'fillet_radius'], normalization_bound: Tuple[float, float] = (-1.0, 1.0), force_reload: bool = False, name: str = 'dataset', verbose: bool = False, compute_drag: bool = False)[source]
Bases:
DGLDataset
,Datapipe
In-memory Ahmed body Dataset
- Parameters
data_dir (str) – The directory where the data is stored.
split (str, optional) – The dataset split. Can be ‘train’, ‘validation’, or ‘test’, by default ‘train’.
num_samples (int, optional) – The number of samples to use, by default 10.
invar_keys (List[str], optional) – The input node features to consider. Default includes ‘pos’, ‘velocity’, ‘reynolds_number’, ‘length’, ‘width’, ‘height’, ‘ground_clearance’, ‘slant_angle’, and ‘fillet_radius’.
outvar_keys (List[str], optional) – The output features to consider. Default includes ‘p’ and ‘wallShearStress’.
List[str] (normalize_keys) – The features to normalize. Default includes ‘p’, ‘wallShearStress’, ‘velocity’, ‘length’, ‘width’, ‘height’, ‘ground_clearance’, ‘slant_angle’, and ‘fillet_radius’.
optional – The features to normalize. Default includes ‘p’, ‘wallShearStress’, ‘velocity’, ‘length’, ‘width’, ‘height’, ‘ground_clearance’, ‘slant_angle’, and ‘fillet_radius’.
normalization_bound (Tuple[float, float], optional) – The lower and upper bounds for normalization. Default is (-1, 1).
force_reload (bool, optional) – If True, forces a reload of the data, by default False.
name (str, optional) – The name of the dataset, by default ‘dataset’.
verbose (bool, optional) – If True, enables verbose mode, by default False.
compute_drag (bool, optional) – If True, also returns the coefficient and mesh area and normals that are required for computing the drag coefficient.
- add_edge_features() → List[DGLGraph][source]
Add relative displacement and displacement norm as edge features for each graph in the list of graphs. The calculations are done using the ‘pos’ attribute in the node data of each graph. The resulting edge features are stored in the ‘x’ attribute in the edge data of each graph.
This method will modify the list of graphs in-place.
- Returns
- Return type
The list of graphs with updated edge features.
List[dgl.DGLGraph]
- denormalize(pred, gt, device) → Tuple[Tensor, Tensor][source]
Denormalize the graph node data.
- pred: Tensor
- gt: Tensor
- device: Any
Normalized prediction
Normalized ground truth
The device
- Tuple(Tensor, Tensor)
Denormalized prediction and ground truth
- normalize_edge() → List[DGLGraph][source]
Normalize edge data ‘x’ in each graph in the list of graphs.
- Returns
- Return type
The list of graphs with normalized edge data ‘x’.
List[dgl.DGLGraph]
- normalize_node() → List[DGLGraph][source]
Normalize node data in each graph in the list of graphs.
- Returns
- Return type
The list of graphs with normalized and concatenated node data.
List[dgl.DGLGraph]
- class modulus.datapipes.gnn.ahmed_body_dataset.MetaData(name: str = 'AhmedBody', auto_device: bool = True, cuda_graphs: bool = False, ddp_sharding: bool = True)[source]
Bases: DatapipeMetaData
- modulus.datapipes.gnn.utils.load_json(file: str) → Dict[str, Tensor][source]
Loads a JSON file into a dictionary of PyTorch tensors.
- Parameters
- Returns
- Return type
file (str) – Path to the JSON file.
Dictionary where each value is a PyTorch tensor.
Dict[str, torch.Tensor]
- modulus.datapipes.gnn.utils.read_vtp_file(file_path: str) → Any[source]
Read a VTP file and return the polydata.
- Parameters
- Returns
- Return type
file_path (str) – Path to the VTP file.
The polydata read from the VTP file.
vtkPolyData
- modulus.datapipes.gnn.utils.save_json(var: Dict[str, Tensor], file: str) → None[source]
Saves a dictionary of tensors to a JSON file.
- Parameters
var (Dict[str, torch.Tensor]) – Dictionary where each value is a PyTorch tensor.
file (str) – Path to the output JSON file.