Activations#

class physicsnemo.nn.module.activations.CappedGELU(cap_value=1.0, **kwargs)[source]#

Bases: Module

Implements a GELU with capped maximum value.

Example

>>> capped_gelu_func = physicsnemo.nn.CappedGELU()
>>> input = torch.Tensor([[-2,-1],[0,1],[2,3]])
>>> capped_gelu_func(input)
tensor([[-0.0455, -0.1587],
        [ 0.0000,  0.8413],
        [ 1.0000,  1.0000]])
forward(inputs)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.nn.module.activations.CappedLeakyReLU(cap_value=1.0, **kwargs)[source]#

Bases: Module

Implements a ReLU with capped maximum value.

Example

>>> capped_leakyReLU_func = physicsnemo.nn.CappedLeakyReLU()
>>> input = torch.Tensor([[-2,-1],[0,1],[2,3]])
>>> capped_leakyReLU_func(input)
tensor([[-0.0200, -0.0100],
        [ 0.0000,  1.0000],
        [ 1.0000,  1.0000]])
forward(inputs)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.nn.module.activations.Identity(*args: Any, **kwargs: Any)[source]#

Bases: Module

Identity activation function

Dummy function for removing activations from a model

Example

>>> idnt_func = physicsnemo.nn.Identity()
>>> input = torch.randn(2, 2)
>>> output = idnt_func(input)
>>> torch.allclose(input, output)
True
forward(x: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.nn.module.activations.SquarePlus[source]#

Bases: Module

Squareplus activation

Note

Reference: arXiv preprint arXiv:2112.11687

Example

>>> sqr_func = physicsnemo.nn.SquarePlus()
>>> input = torch.Tensor([[1,2],[3,4]])
>>> sqr_func(input)
tensor([[1.6180, 2.4142],
        [3.3028, 4.2361]])
forward(x: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class physicsnemo.nn.module.activations.Stan(out_features: int = 1)[source]#

Bases: Module

Self-scalable Tanh (Stan) for 1D Tensors

Parameters:

out_features (int, optional) – Number of features, by default 1

Note

References: Gnanasambandam, Raghav and Shen, Bo and Chung, Jihoon and Yue, Xubo and others. Self-scalable Tanh (Stan): Faster Convergence and Better Generalization in Physics-informed Neural Networks. arXiv preprint arXiv:2204.12589, 2022.

Example

>>> stan_func = physicsnemo.nn.Stan(out_features=1)
>>> input = torch.Tensor([[0],[1],[2]])
>>> stan_func(input)
tensor([[0.0000],
        [1.5232],
        [2.8921]], grad_fn=<MulBackward0>)
forward(x: Tensor) Tensor[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

physicsnemo.nn.module.activations.get_activation(activation: str) Module[source]#

Returns an activation function given a string

Parameters:

activation (str) – String identifier for the desired activation function

Return type:

Activation function

Raises:

KeyError – If the specified activation function is not found in the dictionary

class physicsnemo.nn.module.fused_silu.FusedSiLU(*args, **kwargs)[source]#

Bases: Function

Fused SiLU activation implementation using nvfuser for a custom fused backward with activation recomputation

static backward(ctx, grad_output)[source]#

Backward method for SiLU activation

Parameters:
  • ctx – torch context

  • grad_output – output gradients

Return type:

input gradients

static forward(ctx, x)[source]#

Forward method for SiLU activation

Parameters:
  • ctx – torch context

  • x – input tensor

Return type:

output activation

class physicsnemo.nn.module.fused_silu.FusedSiLU_deriv_1(*args, **kwargs)[source]#

Bases: Function

Fused SiLU first derivative implementation using nvfuser with activation recomputation

static backward(ctx, grad_output)[source]#

Define a formula for differentiating the operation with backward mode automatic differentiation.

This function is to be overridden by all subclasses. (Defining this function is equivalent to defining the vjp function.)

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computed w.r.t. the output.

static forward(ctx, x)[source]#

Define the forward of the custom autograd Function.

This function is to be overridden by all subclasses. There are two ways to define forward:

Usage 1 (Combined forward and ctx):

@staticmethod
def forward(ctx: Any, *args: Any, **kwargs: Any) -> Any:
    pass
  • It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).

  • See combining-forward-context for more details

Usage 2 (Separate forward and ctx):

@staticmethod
def forward(*args: Any, **kwargs: Any) -> Any:
    pass


@staticmethod
def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None:
    pass
  • The forward no longer accepts a ctx argument.

  • Instead, you must also override the torch.autograd.Function.setup_context() staticmethod to handle setting up the ctx object. output is the output of the forward, inputs are a Tuple of inputs to the forward.

  • See extending-autograd for more details

The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with ctx.save_for_backward() if they are intended to be used in backward (equivalently, vjp) or ctx.save_for_forward() if they are intended to be used for in jvp.

class physicsnemo.nn.module.fused_silu.FusedSiLU_deriv_2(*args, **kwargs)[source]#

Bases: Function

Fused SiLU second derivative implementation using nvfuser with activation recomputation

static backward(ctx, grad_output)[source]#

Define a formula for differentiating the operation with backward mode automatic differentiation.

This function is to be overridden by all subclasses. (Defining this function is equivalent to defining the vjp function.)

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computed w.r.t. the output.

static forward(ctx, x)[source]#

Define the forward of the custom autograd Function.

This function is to be overridden by all subclasses. There are two ways to define forward:

Usage 1 (Combined forward and ctx):

@staticmethod
def forward(ctx: Any, *args: Any, **kwargs: Any) -> Any:
    pass
  • It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).

  • See combining-forward-context for more details

Usage 2 (Separate forward and ctx):

@staticmethod
def forward(*args: Any, **kwargs: Any) -> Any:
    pass


@staticmethod
def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None:
    pass
  • The forward no longer accepts a ctx argument.

  • Instead, you must also override the torch.autograd.Function.setup_context() staticmethod to handle setting up the ctx object. output is the output of the forward, inputs are a Tuple of inputs to the forward.

  • See extending-autograd for more details

The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with ctx.save_for_backward() if they are intended to be used in backward (equivalently, vjp) or ctx.save_for_forward() if they are intended to be used for in jvp.

class physicsnemo.nn.module.fused_silu.FusedSiLU_deriv_3(*args, **kwargs)[source]#

Bases: Function

Fused SiLU third derivative implementation using nvfuser with activation recomputation

static backward(ctx, grad_output)[source]#

Define a formula for differentiating the operation with backward mode automatic differentiation.

This function is to be overridden by all subclasses. (Defining this function is equivalent to defining the vjp function.)

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computed w.r.t. the output.

static forward(ctx, x)[source]#

Define the forward of the custom autograd Function.

This function is to be overridden by all subclasses. There are two ways to define forward:

Usage 1 (Combined forward and ctx):

@staticmethod
def forward(ctx: Any, *args: Any, **kwargs: Any) -> Any:
    pass
  • It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).

  • See combining-forward-context for more details

Usage 2 (Separate forward and ctx):

@staticmethod
def forward(*args: Any, **kwargs: Any) -> Any:
    pass


@staticmethod
def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None:
    pass
  • The forward no longer accepts a ctx argument.

  • Instead, you must also override the torch.autograd.Function.setup_context() staticmethod to handle setting up the ctx object. output is the output of the forward, inputs are a Tuple of inputs to the forward.

  • See extending-autograd for more details

The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with ctx.save_for_backward() if they are intended to be used in backward (equivalently, vjp) or ctx.save_for_forward() if they are intended to be used for in jvp.

physicsnemo.nn.module.fused_silu.silu_backward_for(
fd: FusionDefinition,
dtype: dtype,
dim: int,
size: Size,
stride: Tuple[int, ...],
)[source]#

nvfuser frontend implmentation of SiLU backward as a fused kernel and with activations recomputation

Parameters:
  • fd (FusionDefition) – nvFuser’s FusionDefition class

  • dtype (torch.dtype) – Data type to use for the implementation

  • dim (int) – Dimension of the input tensor

  • size (torch.Size) – Size of the input tensor

  • stride (Tuple[int, ...]) – Stride of the input tensor

physicsnemo.nn.module.fused_silu.silu_double_backward_for(
fd: FusionDefinition,
dtype: dtype,
dim: int,
size: Size,
stride: Tuple[int, ...],
)[source]#

nvfuser frontend implmentation of SiLU double backward as a fused kernel and with activations recomputation

Parameters:
  • fd (FusionDefition) – nvFuser’s FusionDefition class

  • dtype (torch.dtype) – Data type to use for the implementation

  • dim (int) – Dimension of the input tensor

  • size (torch.Size) – Size of the input tensor

  • stride (Tuple[int, ...]) – Stride of the input tensor

physicsnemo.nn.module.fused_silu.silu_triple_backward_for(
fd: FusionDefinition,
dtype: dtype,
dim: int,
size: Size,
stride: Tuple[int, ...],
)[source]#

nvfuser frontend implmentation of SiLU triple backward as a fused kernel and with activations recomputation

Parameters:
  • fd (FusionDefition) – nvFuser’s FusionDefition class

  • dtype (torch.dtype) – Data type to use for the implementation

  • dim (int) – Dimension of the input tensor

  • size (torch.Size) – Size of the input tensor

  • stride (Tuple[int, ...]) – Stride of the input tensor

class physicsnemo.nn.module.gumbel_softmax.GumbelSoftmax(tau: float = 1.0, learnable: bool = False)[source]#

Bases: Module

Gumbel-Softmax module for differentiable categorical sampling.

This module wraps the gumbel_softmax() function as an nn.Module, allowing it to be used as a layer in neural network architectures.

The Gumbel-Softmax trick provides a differentiable approximation to sampling from a categorical distribution, enabling end-to-end training of models with discrete latent variables.

Parameters:
  • tau (float, optional, default=1.0) – Initial temperature parameter. Lower values make the distribution more concentrated (closer to one-hot). Can be modified after initialization.

  • learnable (bool, optional, default=False) – If True, the temperature parameter is registered as a learnable nn.Parameter. If False, it is a fixed buffer.

Examples

>>> import torch
>>> gs = GumbelSoftmax(tau=0.5)
>>> logits = torch.randn(2, 10)  # batch_size=2, num_categories=10
>>> probs = gs(logits)
>>> probs.shape
torch.Size([2, 10])
>>> torch.allclose(probs.sum(dim=-1), torch.ones(2))  # Each row sums to 1
True
>>> # With learnable temperature
>>> gs_learnable = GumbelSoftmax(tau=1.0, learnable=True)
>>> gs_learnable.tau.requires_grad
True

See also

gumbel_softmax()

Functional implementation of Gumbel-Softmax.

forward(
logits: Float[Tensor, '... num_categories'],
) Float[Tensor, '... num_categories'][source]#

Apply Gumbel-Softmax to input logits.

Parameters:

logits (torch.Tensor) – Input logits tensor of shape \((*, K)\) where \(K\) is the number of categories.

Returns:

Gumbel-Softmax output of the same shape as logits.

Return type:

torch.Tensor

physicsnemo.nn.module.gumbel_softmax.gumbel_softmax(
logits: Float[Tensor, '... num_categories'],
tau: Tensor | float = 1.0,
) Float[Tensor, '... num_categories'][source]#

Implementation of Gumbel Softmax from Transolver++.

Applies a differentiable approximation to sampling from a categorical distribution using the Gumbel-Softmax trick.

Original code: https://github.com/thuml/Transolver_plus/blob/main/models/Transolver_plus.py#L69

Parameters:
  • logits (torch.Tensor) – Input logits tensor of shape \((*, K)\) where \(K\) is the number of categories.

  • tau (torch.Tensor | float, optional, default=1.0) – Temperature parameter. Lower values make the distribution more concentrated.

Returns:

Gumbel-Softmax output of the same shape as logits.

Return type:

torch.Tensor