Benchmark Datapipes#

Benchmark datapipes are a unique class of datapipes: they are generating data on the fly, rather than reading from disk, and are therefore highly portable, great for testing new applications against known datapipes without worrying about IO, and overall useful for development.

The Benchmark Datapipes are targeted v1 datapipes for specific datasets. These are largely maintained but not actively developed.

class physicsnemo.datapipes.benchmarks.darcy.Darcy2D( resolution: int = 256, batch_size: int = 64, nr_permeability_freq: int = 5, max_permeability: float = 2.0, min_permeability: float = 0.5, max_iterations: int = 30000, convergence_threshold: float = 1e-06, iterations_per_convergence_check: int = 1000, nr_multigrids: int = 4, normaliser: Dict[str, Tuple[float, float]] | None = None, device: str | device = 'cuda', )[source]#

Bases: Datapipe

2D Darcy flow benchmark problem datapipe.

This datapipe continuously generates solutions to the 2D Darcy equation with variable permeability. All samples are generated on the fly and is meant to be a benchmark problem for testing data driven models. Permeability is drawn from a random Fourier series and threshold it to give a piecewise constant function. The solution is obtained using a GPU enabled multi-grid Jacobi iterative method.

Parameters:

resolution (int, optional) – Resolution to run simulation at, by default 256
batch_size (int, optional) – Batch size of simulations, by default 64
nr_permeability_freq (int, optional) – Number of frequencies to use for generating random permeability. Higher values will give higher freq permeability fields., by default 5
max_permeability (float, optional) – Max permeability, by default 2.0
min_permeability (float, optional) – Min permeability, by default 0.5
max_iterations (int, optional) – Maximum iterations to use for each multi-grid, by default 30000
convergence_threshold (float, optional) – Solver L-Infinity convergence threshold, by default 1e-6
iterations_per_convergence_check (int, optional) – Number of Jacobi iterations to run before checking convergence, by default 1000
nr_multigrids (int, optional) – Number of multi-grid levels, by default 4
normaliser (Union[Dict[str, Tuple[float, float]], None], optional) – Dictionary with keys permeability and darcy. The values for these keys are two floats corresponding to mean and std (mean, std).
device (Union[str, torch.device], optional) – Device for datapipe to run place data on, by default “cuda”

Raises:

ValueError – Incompatable multi-grid and resolution settings

generate_batch() → None[source]#: Solve for new batch of simulations

initialize_batch() → None[source]#: Initializes arrays for new batch of simulations

class physicsnemo.datapipes.benchmarks.darcy.MetaData( name: str = 'Darcy2D', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = False, )[source]#: Bases: DatapipeMetaData

The Darcy2D provides data loading and preprocessing utilities for 2D Darcy flow simulations. It handles permeability fields and pressure solutions, supporting various boundary conditions and mesh resolutions.

import torch
from physicsnemo.datapipes.benchmarks.darcy import Darcy2D

def main():
    # Create a datapipe for Darcy flow simulation data
    datapipe = Darcy2D(
        batch_size=32,
        device="cuda" if torch.cuda.is_available() else "cpu"
    )

    # Iterate through the datapipe
    for batch in datapipe:
        # batch contains input features and target values
        input_features = batch["permeability"]
        target_values = batch["darcy"]

        # Use the data for training or inference
        ...

if __name__ == "__main__":
    main()

class physicsnemo.datapipes.benchmarks.kelvin_helmholtz.KelvinHelmholtz2D( resolution: int = 512, batch_size: int = 16, seq_length: int = 8, nr_perturbation_freq: int = 5, perturbation_range: float = 0.1, nr_snapshots: int = 256, iteration_per_snapshot: int = 32, gamma: float = 1.6666666666666667, normaliser: Dict[str, Tuple[float, float]] | None = None, device: str | device = 'cuda', )[source]#

Bases: Datapipe

Kelvin-Helmholtz instability benchmark problem datapipe.

This datapipe continuously generates samples with random initial conditions. All samples are generated on the fly and is meant to be a benchmark problem for testing data driven models. Initial conditions are given in the form of small perturbations. The solution is obtained using a GPU enabled Finite Volume Method.

Parameters:

resolution (int, optional) – Resolution to run simulation at, by default 512
batch_size (int, optional) – Batch size of simulations, by default 16
seq_length (int, optional) – Sequence length of output samples, by default 8
nr_perturbation_freq (int, optional) – Number of frequencies to use for generating random initial perturbations, by default 5
perturbation_range (float, optional) – Range to use for random perturbations. This value will be the max amplitude of the initial perturbation, by default 0.1
nr_snapshots (int, optional) – Number of snapshots of simulation to generate for data generation. This will control how long the simulation is run for, by default 256
iteration_per_snapshot (int, optional) – Number of finite volume steps to take between each snapshot. Each step size is fixed as the smallest possible value that satisfies the Courant-Friedrichs-Lewy condition, by default 32
gamma (float, optional) – Heat capacity ratio, by default 5.0/3.0
normaliser (Union[Dict[str, Tuple[float, float]], None], optional) – Dictionary with keys density, velocity, and pressure. The values for these keys are two floats corresponding to mean and std (mean, std).
device (Union[str, torch.device], optional) – Device for datapipe to run place data on, by default “cuda”

generate_batch() → None[source]#: Solve for new batch of simulations

initialize_batch() → None[source]#: Initializes arrays for new batch of simulations

class physicsnemo.datapipes.benchmarks.kelvin_helmholtz.MetaData( name: str = 'KelvinHelmholtz2D', auto_device: bool = True, cuda_graphs: bool = True, ddp_sharding: bool = False, )[source]#: Bases: DatapipeMetaData

The KelvinHelmholtz2D manages data for Kelvin-Helmholtz instability simulations, including velocity fields and density distributions. It supports both 2D and 3D simulation data with various initial conditions.