Modulus Datapipes

class modulus.datapipes.climate.era5_hdf5.ERA5DaliExternalSource(data_paths: Iterable[str], num_samples: int, channels: Iterable[int], num_steps: int, stride: int, num_samples_per_year: int, batch_size: int = 1, shuffle: bool = True, process_rank: int = 0, world_size: int = 1)[source]

Bases: object

DALI Source for lazy-loading the HDF5 ERA5 files

Parameters
  • data_paths (Iterable[str]) – Directory where ERA5 data is stored

  • num_samples (int) – Total number of training samples

  • channels (Iterable[int]) – List representing which ERA5 variables to load

  • stride (int) – Number of steps between input and output variables

  • num_steps (int) – Number of timesteps are included in the output variables

  • num_samples_per_year (int) – Number of samples randomly taken from each year

  • batch_size (int, optional) – Batch size, by default 1

  • shuffle (bool, optional) – Shuffle dataset, by default True

  • process_rank (int, optional) – Rank ID of local process, by default 0

  • world_size (int, optional) – Number of training processes, by default 1

class modulus.datapipes.climate.era5_hdf5.ERA5HDF5Datapipe(data_dir: str, stats_dir: Optional[str] = None, channels: Optional[List[int]] = None, batch_size: int = 1, num_steps: int = 1, stride: int = 1, patch_size: Optional[Union[Tuple[int, int], int]] = None, num_samples_per_year: Optional[int] = None, shuffle: bool = True, num_workers: int = 1, device: Union[str, device] = 'cuda', process_rank: int = 0, world_size: int = 1)[source]

Bases: Datapipe

ERA5 DALI data pipeline for HDF5 files

Parameters
  • data_dir (str) – Directory where ERA5 data is stored

  • stats_dir (Union[str, None], optional) – Directory to data statistic numpy files for normalization, if None, no normalization will be used, by default None

  • channels (Union[List[int], None], optional) – Defines which ERA5 variables to load, if None will use all in HDF5 file, by default None

  • batch_size (int, optional) – Batch size, by default 1

  • stride (int, optional) – Number of steps between input and output variables. For example, if the dataset contains data at every 6 hours, a stride 1 = 6 hour delta t and stride 2 = 12 hours delta t, by default 1

  • num_steps (int, optional) – Number of timesteps are included in the output variables, by default 1

  • patch_size (Union[Tuple[int, int], int, None], optional) – If specified, crops input and output variables so image dimensions are divisible by patch_size, by default None

  • num_samples_per_year (int, optional) – Number of samples randomly taken from each year. If None, all will be use, by default None

  • shuffle (bool, optional) – Shuffle dataset, by default True

  • num_workers (int, optional) – Number of workers, by default 1

  • device (Union[str, torch.device], optional) – Device for DALI pipeline to run on, by default cuda

  • process_rank (int, optional) – Rank ID of local process, by default 0

  • world_size (int, optional) – Number of training processes, by default 1

load_statistics() → None[source]

Loads ERA5 statistics from pre-computed numpy files

The statistic files should be of name global_means.npy and global_std.npy with a shape of [1, C, 1, 1] located in the stat_dir.

Raises
  • IOError – If mean or std numpy files are not found

  • AssertionError – If loaded numpy arrays are not of correct size

parse_dataset_files() → None[source]

Parses the data directory for valid HDF5 files and determines training samples

Raises

ValueError – In channels specified or number of samples per year is not valid

class modulus.datapipes.climate.era5_hdf5.MetaData(name: str = 'ERA5HDF5', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = True)[source]

Bases: DatapipeMetaData

class modulus.datapipes.gnn.ahmed_body_dataset.AhmedBodyDataset(data_dir: str, split: str = 'train', num_samples: int = 10, invar_keys: List[str] = ['pos', 'velocity', 'reynolds_number', 'length', 'width', 'height', 'ground_clearance', 'slant_angle', 'fillet_radius'], outvar_keys: List[str] = ['p', 'wallShearStress'], normalize_keys: List[str] = ['p', 'wallShearStress', 'velocity', 'reynolds_number', 'length', 'width', 'height', 'ground_clearance', 'slant_angle', 'fillet_radius'], normalization_bound: Tuple[float, float] = (-1.0, 1.0), force_reload: bool = False, name: str = 'dataset', verbose: bool = False, compute_drag: bool = False)[source]

Bases: DGLDataset, Datapipe

In-memory Ahmed body Dataset

Parameters
  • data_dir (str) – The directory where the data is stored.

  • split (str, optional) – The dataset split. Can be ‘train’, ‘validation’, or ‘test’, by default ‘train’.

  • num_samples (int, optional) – The number of samples to use, by default 10.

  • invar_keys (List[str], optional) – The input node features to consider. Default includes ‘pos’, ‘velocity’, ‘reynolds_number’, ‘length’, ‘width’, ‘height’, ‘ground_clearance’, ‘slant_angle’, and ‘fillet_radius’.

  • outvar_keys (List[str], optional) – The output features to consider. Default includes ‘p’ and ‘wallShearStress’.

  • List[str] (normalize_keys) – The features to normalize. Default includes ‘p’, ‘wallShearStress’, ‘velocity’, ‘length’, ‘width’, ‘height’, ‘ground_clearance’, ‘slant_angle’, and ‘fillet_radius’.

  • optional – The features to normalize. Default includes ‘p’, ‘wallShearStress’, ‘velocity’, ‘length’, ‘width’, ‘height’, ‘ground_clearance’, ‘slant_angle’, and ‘fillet_radius’.

  • normalization_bound (Tuple[float, float], optional) – The lower and upper bounds for normalization. Default is (-1, 1).

  • force_reload (bool, optional) – If True, forces a reload of the data, by default False.

  • name (str, optional) – The name of the dataset, by default ‘dataset’.

  • verbose (bool, optional) – If True, enables verbose mode, by default False.

  • compute_drag (bool, optional) – If True, also returns the coefficient and mesh area and normals that are required for computing the drag coefficient.

add_edge_features() → List[DGLGraph][source]

Add relative displacement and displacement norm as edge features for each graph in the list of graphs. The calculations are done using the ‘pos’ attribute in the node data of each graph. The resulting edge features are stored in the ‘x’ attribute in the edge data of each graph.

This method will modify the list of graphs in-place.

Returns

The list of graphs with updated edge features.

Return type

List[dgl.DGLGraph]

denormalize(pred, gt, device) → Tuple[Tensor, Tensor][source]

Denormalize the graph node data.

pred: Tensor

Normalized prediction

gt: Tensor

Normalized ground truth

device: Any

The device

Tuple(Tensor, Tensor)

Denormalized prediction and ground truth

normalize_edge() → List[DGLGraph][source]

Normalize edge data ‘x’ in each graph in the list of graphs.

Returns

The list of graphs with normalized edge data ‘x’.

Return type

List[dgl.DGLGraph]

normalize_node() → List[DGLGraph][source]

Normalize node data in each graph in the list of graphs.

Returns

The list of graphs with normalized and concatenated node data.

Return type

List[dgl.DGLGraph]

class modulus.datapipes.gnn.ahmed_body_dataset.MetaData(name: str = 'AhmedBody', auto_device: bool = True, cuda_graphs: bool = False, ddp_sharding: bool = True)[source]

Bases: DatapipeMetaData

modulus.datapipes.gnn.utils.load_json(file: str) → Dict[str, Tensor][source]

Loads a JSON file into a dictionary of PyTorch tensors.

Parameters

file (str) – Path to the JSON file.

Returns

Dictionary where each value is a PyTorch tensor.

Return type

Dict[str, torch.Tensor]

modulus.datapipes.gnn.utils.read_vtp_file(file_path: str) → Any[source]

Read a VTP file and return the polydata.

Parameters

file_path (str) – Path to the VTP file.

Returns

The polydata read from the VTP file.

Return type

vtkPolyData

modulus.datapipes.gnn.utils.save_json(var: Dict[str, Tensor], file: str) → None[source]

Saves a dictionary of tensors to a JSON file.

Parameters
  • var (Dict[str, torch.Tensor]) – Dictionary where each value is a PyTorch tensor.

  • file (str) – Path to the output JSON file.

Previous Modulus Models
Next Modulus Metrics
© Copyright 2023, NVIDIA Modulus Team. Last updated on Jan 25, 2024.