Fully Connected and MLP Layers#
- class physicsnemo.nn.module.fully_connected_layers.Conv1dFCLayer(*args, **kwargs)[source]#
Bases:
ConvFCLayerChannel-wise fully connected layer using 1D convolutions.
Applies a 1x1 convolution followed by an optional activation function. This is equivalent to a fully connected layer operating on the channel dimension of 1D signals.
- Parameters:
in_features (int) – Number of input channels \(C_{in}\).
out_features (int) – Number of output channels \(C_{out}\).
activation_fn (Union[nn.Module, Callable[[Tensor], Tensor], None], optional, default=None) – Activation function to use. Can be
Nonefor no activation.activation_par (Union[nn.Parameter, None], optional, default=None) – Learnable scaling parameter for adaptive activations.
weight_norm (bool, optional, default=False) – Weight normalization (not currently supported, raises
NotImplementedError).
- Forward:
x (torch.Tensor) – Input tensor of shape \((B, C_{in}, L)\) where \(B\) is batch size and \(L\) is sequence length.
- Outputs:
torch.Tensor – Output tensor of shape \((B, C_{out}, L)\).
- class physicsnemo.nn.module.fully_connected_layers.Conv2dFCLayer(*args, **kwargs)[source]#
Bases:
ConvFCLayerChannel-wise fully connected layer using 2D convolutions.
Applies a 1x1 convolution followed by an optional activation function. This is equivalent to a fully connected layer operating on the channel dimension of 2D images.
- Parameters:
in_channels (int) – Number of input channels \(C_{in}\).
out_channels (int) – Number of output channels \(C_{out}\).
activation_fn (Union[nn.Module, Callable[[Tensor], Tensor], None], optional, default=None) – Activation function to use. Can be
Nonefor no activation.activation_par (Union[nn.Parameter, None], optional, default=None) – Learnable scaling parameter for adaptive activations.
- Forward:
x (torch.Tensor) – Input tensor of shape \((B, C_{in}, H, W)\) where \(B\) is batch size, \(H\) is height, and \(W\) is width.
- Outputs:
torch.Tensor – Output tensor of shape \((B, C_{out}, H, W)\).
- class physicsnemo.nn.module.fully_connected_layers.Conv3dFCLayer(*args, **kwargs)[source]#
Bases:
ConvFCLayerChannel-wise fully connected layer using 3D convolutions.
Applies a 1x1x1 convolution followed by an optional activation function. This is equivalent to a fully connected layer operating on the channel dimension of 3D volumes.
- Parameters:
in_channels (int) – Number of input channels \(C_{in}\).
out_channels (int) – Number of output channels \(C_{out}\).
activation_fn (Union[nn.Module, Callable[[Tensor], Tensor], None], optional, default=None) – Activation function to use. Can be
Nonefor no activation.activation_par (Union[nn.Parameter, None], optional, default=None) – Learnable scaling parameter for adaptive activations.
- Forward:
x (torch.Tensor) – Input tensor of shape \((B, C_{in}, D, H, W)\) where \(B\) is batch size, \(D\) is depth, \(H\) is height, and \(W\) is width.
- Outputs:
torch.Tensor – Output tensor of shape \((B, C_{out}, D, H, W)\).
- class physicsnemo.nn.module.fully_connected_layers.ConvFCLayer(*args, **kwargs)[source]#
Bases:
ModuleBase class for 1x1 convolutional layers acting on image channels.
This abstract base class provides activation handling for convolutional layers that act like fully connected layers over the channel dimension.
- Parameters:
activation_fn (Union[nn.Module, Callable[[Tensor], Tensor], None], optional, default=None) – Activation function to use. Can be
Nonefor no activation.activation_par (Union[nn.Parameter, None], optional, default=None) – Learnable scaling parameter for adaptive activations.
- Forward:
x (torch.Tensor) – Input tensor (shape depends on subclass).
- Outputs:
torch.Tensor – Output tensor with activation applied.
- class physicsnemo.nn.module.fully_connected_layers.ConvNdFCLayer(*args, **kwargs)[source]#
Bases:
ConvFCLayerChannel-wise fully connected layer with N-dimensional convolutions.
Applies a kernel-1 convolution followed by an optional activation function. For dimensions 1, 2, or 3, use
Conv1dFCLayer,Conv2dFCLayer, orConv3dFCLayerinstead for better performance.- Parameters:
in_channels (int) – Number of input channels \(C_{in}\).
out_channels (int) – Number of output channels \(C_{out}\).
activation_fn (Union[nn.Module, None], optional, default=None) – Activation function to use. Can be
Nonefor no activation.activation_par (Union[nn.Parameter, None], optional, default=None) – Learnable scaling parameter for adaptive activations.
- Forward:
x (torch.Tensor) – Input tensor of shape \((B, C_{in}, *spatial)\) where \(B\) is batch size and \(*spatial\) represents arbitrary spatial dimensions.
- Outputs:
torch.Tensor – Output tensor of shape \((B, C_{out}, *spatial)\).
- class physicsnemo.nn.module.fully_connected_layers.ConvNdKernel1Layer(*args, **kwargs)[source]#
Bases:
ModuleKernel-1 convolution layer for N-dimensional inputs.
Implements a 1x1 convolution by reshaping the input to 1D, applying a 1D convolution, and reshaping back. For dimensions 1, 2, or 3, use the specialized layer classes for better performance.
- Parameters:
in_channels (int) – Number of input channels \(C_{in}\).
out_channels (int) – Number of output channels \(C_{out}\).
- Forward:
x (torch.Tensor) – Input tensor of shape \((B, C_{in}, *spatial)\) where \(B\) is batch size and \(*spatial\) represents arbitrary spatial dimensions.
- Outputs:
torch.Tensor – Output tensor of shape \((B, C_{out}, *spatial)\).
- class physicsnemo.nn.module.fully_connected_layers.FCLayer(*args, **kwargs)[source]#
Bases:
ModuleDensely connected neural network layer.
A single fully connected layer with optional activation, weight normalization, and weight factorization.
- Parameters:
in_features (int) – Size of input features \(D_{in}\).
out_features (int) – Size of output features \(D_{out}\).
activation_fn (Union[nn.Module, Callable[[Tensor], Tensor], None], optional, default=None) – Activation function to use. Can be
Nonefor no activation.weight_norm (bool, optional, default=False) – Applies weight normalization to the layer.
weight_fact (bool, optional, default=False) – Applies weight factorization to the layer.
activation_par (Union[nn.Parameter, None], optional, default=None) – Learnable scaling parameter for adaptive activations.
- Forward:
x (torch.Tensor) – Input tensor of shape \((*, D_{in})\) where \(*\) denotes any number of leading batch dimensions.
- Outputs:
torch.Tensor – Output tensor of shape \((*, D_{out})\).
- class physicsnemo.nn.module.fully_connected_layers.Linear(*args, **kwargs)[source]#
Bases:
ModuleFully connected (dense) layer with customizable initialization.
The layer’s weights and biases can be initialized using custom strategies like
"kaiming_normal", and scaled byinit_weightandinit_bias. :param in_features: Size of each input sample \(D_{in}\). :type in_features: int :param out_features: Size of each output sample \(D_{out}\). :type out_features: int :param bias: IfTrue, adds a learnable bias to the output. IfFalse, the layerwill not learn an additive bias.
- Parameters:
init_mode (str, optional, default="kaiming_normal") – The initialization mode for weights and biases. Supported modes are:
"xavier_uniform","xavier_normal","kaiming_uniform","kaiming_normal".init_weight (float, optional, default=1) – A scaling factor to multiply with the initialized weights.
init_bias (float, optional, default=0) – A scaling factor to multiply with the initialized biases.
amp_mode (bool, optional, default=False) – Whether mixed-precision (AMP) training is enabled.
- Forward:
x (torch.Tensor) – Input tensor of shape \((*, D_{in})\) where \(*\) denotes any number of leading batch dimensions.
- Outputs:
torch.Tensor – Output tensor of shape \((*, D_{out})\).
Multi-layer perceptron (MLP) module with optional Transformer Engine support.
- class physicsnemo.nn.module.mlp_layers.Mlp(
- in_features: int,
- hidden_features: int | list[int] | None = None,
- out_features: int | None = None,
- act_layer: Module | type[Module] | str = <class 'torch.nn.modules.activation.GELU'>,
- drop: float = 0.0,
- final_dropout: bool = True,
- bias: bool = True,
- use_batchnorm: bool = False,
- spectral_norm: bool = False,
- use_te: bool = False,
Bases:
ModuleMulti-layer perceptron with configurable architecture.
Supports arbitrary depth, dropout, batch normalization, spectral normalization, bias control, and optional Transformer Engine linear layers.
- Parameters:
in_features (int) – Number of input features.
hidden_features (int | list[int] | None, optional) – Hidden layer dimension(s). Can be: -
int: Single hidden layer with this dimension -list[int]: Multiple hidden layers with specified dimensions -None: Single hidden layer within_featuresdimension Default isNone.out_features (int | None, optional) – Number of output features. If
None, defaults toin_features. Default isNone.act_layer (nn.Module | type[nn.Module] | str, optional) – Activation function. Can be: -
str: Name of activation (e.g.,"gelu","relu","silu") -type: Activation class to instantiate (e.g.,nn.GELU) -nn.Module: Pre-instantiated activation module Default isnn.GELU.drop (float, optional) – Dropout rate applied after each hidden layer. Default is
0.0.final_dropout (bool, optional) – Whether to apply dropout after the final linear layer. Default is
True.bias (bool, optional) – Whether to include bias terms in the linear layers. Default is
True.use_batchnorm (bool, optional) – If
True, appliesBatchNorm1dafter each linear layer (including the output layer). Default isFalse.spectral_norm (bool, optional) – If
True, applies spectral normalization to all linear layer weights, constraining the spectral norm to 1. Default isFalse.use_te (bool, optional) – Whether to use Transformer Engine linear layers for optimized performance. Requires Transformer Engine to be installed. Default is
False.
Examples
>>> import torch >>> mlp = Mlp(in_features=64, hidden_features=128, out_features=32) >>> x = torch.randn(2, 64) >>> out = mlp(x) >>> out.shape torch.Size([2, 32])
>>> mlp = Mlp(in_features=64, hidden_features=[128, 256, 128], out_features=32) >>> x = torch.randn(2, 64) >>> out = mlp(x) >>> out.shape torch.Size([2, 32])
>>> # With batch normalization and spectral normalization >>> mlp = Mlp( ... in_features=10, ... hidden_features=[32, 16], ... out_features=4, ... use_batchnorm=True, ... spectral_norm=True, ... ) >>> mlp(torch.randn(8, 10)).shape torch.Size([8, 4])