Activations#
- class physicsnemo.nn.module.activations.CappedGELU(cap_value=1.0, **kwargs)[source]#
Bases:
ModuleImplements a GELU with capped maximum value.
Example
>>> capped_gelu_func = physicsnemo.nn.CappedGELU() >>> input = torch.Tensor([[-2,-1],[0,1],[2,3]]) >>> capped_gelu_func(input) tensor([[-0.0455, -0.1587], [ 0.0000, 0.8413], [ 1.0000, 1.0000]])
- forward(inputs)[source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.nn.module.activations.CappedLeakyReLU(cap_value=1.0, **kwargs)[source]#
Bases:
ModuleImplements a ReLU with capped maximum value.
Example
>>> capped_leakyReLU_func = physicsnemo.nn.CappedLeakyReLU() >>> input = torch.Tensor([[-2,-1],[0,1],[2,3]]) >>> capped_leakyReLU_func(input) tensor([[-0.0200, -0.0100], [ 0.0000, 1.0000], [ 1.0000, 1.0000]])
- forward(inputs)[source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.nn.module.activations.Identity(*args: Any, **kwargs: Any)[source]#
Bases:
ModuleIdentity activation function
Dummy function for removing activations from a model
Example
>>> idnt_func = physicsnemo.nn.Identity() >>> input = torch.randn(2, 2) >>> output = idnt_func(input) >>> torch.allclose(input, output) True
- forward(x: Tensor) Tensor[source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.nn.module.activations.SquarePlus[source]#
Bases:
ModuleSquareplus activation
Note
Reference: arXiv preprint arXiv:2112.11687
Example
>>> sqr_func = physicsnemo.nn.SquarePlus() >>> input = torch.Tensor([[1,2],[3,4]]) >>> sqr_func(input) tensor([[1.6180, 2.4142], [3.3028, 4.2361]])
- forward(x: Tensor) Tensor[source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class physicsnemo.nn.module.activations.Stan(out_features: int = 1)[source]#
Bases:
ModuleSelf-scalable Tanh (Stan) for 1D Tensors
- Parameters:
out_features (int, optional) – Number of features, by default 1
Note
References: Gnanasambandam, Raghav and Shen, Bo and Chung, Jihoon and Yue, Xubo and others. Self-scalable Tanh (Stan): Faster Convergence and Better Generalization in Physics-informed Neural Networks. arXiv preprint arXiv:2204.12589, 2022.
Example
>>> stan_func = physicsnemo.nn.Stan(out_features=1) >>> input = torch.Tensor([[0],[1],[2]]) >>> stan_func(input) tensor([[0.0000], [1.5232], [2.8921]], grad_fn=<MulBackward0>)
- forward(x: Tensor) Tensor[source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- physicsnemo.nn.module.activations.get_activation(activation: str) Module[source]#
Returns an activation function given a string
- Parameters:
activation (str) – String identifier for the desired activation function
- Return type:
Activation function
- Raises:
KeyError – If the specified activation function is not found in the dictionary
- class physicsnemo.nn.module.fused_silu.FusedSiLU(*args, **kwargs)[source]#
Bases:
FunctionFused SiLU activation implementation using nvfuser for a custom fused backward with activation recomputation
- class physicsnemo.nn.module.fused_silu.FusedSiLU_deriv_1(*args, **kwargs)[source]#
Bases:
FunctionFused SiLU first derivative implementation using nvfuser with activation recomputation
- static backward(ctx, grad_output)[source]#
Define a formula for differentiating the operation with backward mode automatic differentiation.
This function is to be overridden by all subclasses. (Defining this function is equivalent to defining the
vjpfunction.)It must accept a context
ctxas the first argument, followed by as many outputs as theforward()returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs toforward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.The context can be used to retrieve tensors saved during the forward pass. It also has an attribute
ctx.needs_input_gradas a tuple of booleans representing whether each input needs gradient. E.g.,backward()will havectx.needs_input_grad[0] = Trueif the first input toforward()needs gradient computed w.r.t. the output.
- static forward(ctx, x)[source]#
Define the forward of the custom autograd Function.
This function is to be overridden by all subclasses. There are two ways to define forward:
Usage 1 (Combined forward and ctx):
@staticmethod def forward(ctx: Any, *args: Any, **kwargs: Any) -> Any: pass
It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).
See combining-forward-context for more details
Usage 2 (Separate forward and ctx):
@staticmethod def forward(*args: Any, **kwargs: Any) -> Any: pass @staticmethod def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None: pass
The forward no longer accepts a ctx argument.
Instead, you must also override the
torch.autograd.Function.setup_context()staticmethod to handle setting up thectxobject.outputis the output of the forward,inputsare a Tuple of inputs to the forward.See extending-autograd for more details
The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with
ctx.save_for_backward()if they are intended to be used inbackward(equivalently,vjp) orctx.save_for_forward()if they are intended to be used for injvp.
- class physicsnemo.nn.module.fused_silu.FusedSiLU_deriv_2(*args, **kwargs)[source]#
Bases:
FunctionFused SiLU second derivative implementation using nvfuser with activation recomputation
- static backward(ctx, grad_output)[source]#
Define a formula for differentiating the operation with backward mode automatic differentiation.
This function is to be overridden by all subclasses. (Defining this function is equivalent to defining the
vjpfunction.)It must accept a context
ctxas the first argument, followed by as many outputs as theforward()returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs toforward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.The context can be used to retrieve tensors saved during the forward pass. It also has an attribute
ctx.needs_input_gradas a tuple of booleans representing whether each input needs gradient. E.g.,backward()will havectx.needs_input_grad[0] = Trueif the first input toforward()needs gradient computed w.r.t. the output.
- static forward(ctx, x)[source]#
Define the forward of the custom autograd Function.
This function is to be overridden by all subclasses. There are two ways to define forward:
Usage 1 (Combined forward and ctx):
@staticmethod def forward(ctx: Any, *args: Any, **kwargs: Any) -> Any: pass
It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).
See combining-forward-context for more details
Usage 2 (Separate forward and ctx):
@staticmethod def forward(*args: Any, **kwargs: Any) -> Any: pass @staticmethod def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None: pass
The forward no longer accepts a ctx argument.
Instead, you must also override the
torch.autograd.Function.setup_context()staticmethod to handle setting up thectxobject.outputis the output of the forward,inputsare a Tuple of inputs to the forward.See extending-autograd for more details
The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with
ctx.save_for_backward()if they are intended to be used inbackward(equivalently,vjp) orctx.save_for_forward()if they are intended to be used for injvp.
- class physicsnemo.nn.module.fused_silu.FusedSiLU_deriv_3(*args, **kwargs)[source]#
Bases:
FunctionFused SiLU third derivative implementation using nvfuser with activation recomputation
- static backward(ctx, grad_output)[source]#
Define a formula for differentiating the operation with backward mode automatic differentiation.
This function is to be overridden by all subclasses. (Defining this function is equivalent to defining the
vjpfunction.)It must accept a context
ctxas the first argument, followed by as many outputs as theforward()returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs toforward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.The context can be used to retrieve tensors saved during the forward pass. It also has an attribute
ctx.needs_input_gradas a tuple of booleans representing whether each input needs gradient. E.g.,backward()will havectx.needs_input_grad[0] = Trueif the first input toforward()needs gradient computed w.r.t. the output.
- static forward(ctx, x)[source]#
Define the forward of the custom autograd Function.
This function is to be overridden by all subclasses. There are two ways to define forward:
Usage 1 (Combined forward and ctx):
@staticmethod def forward(ctx: Any, *args: Any, **kwargs: Any) -> Any: pass
It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).
See combining-forward-context for more details
Usage 2 (Separate forward and ctx):
@staticmethod def forward(*args: Any, **kwargs: Any) -> Any: pass @staticmethod def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None: pass
The forward no longer accepts a ctx argument.
Instead, you must also override the
torch.autograd.Function.setup_context()staticmethod to handle setting up thectxobject.outputis the output of the forward,inputsare a Tuple of inputs to the forward.See extending-autograd for more details
The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with
ctx.save_for_backward()if they are intended to be used inbackward(equivalently,vjp) orctx.save_for_forward()if they are intended to be used for injvp.
- physicsnemo.nn.module.fused_silu.silu_backward_for(
- fd: FusionDefinition,
- dtype: dtype,
- dim: int,
- size: Size,
- stride: Tuple[int, ...],
nvfuser frontend implmentation of SiLU backward as a fused kernel and with activations recomputation
- Parameters:
fd (FusionDefition) – nvFuser’s FusionDefition class
dtype (torch.dtype) – Data type to use for the implementation
dim (int) – Dimension of the input tensor
size (torch.Size) – Size of the input tensor
stride (Tuple[int, ...]) – Stride of the input tensor
- physicsnemo.nn.module.fused_silu.silu_double_backward_for(
- fd: FusionDefinition,
- dtype: dtype,
- dim: int,
- size: Size,
- stride: Tuple[int, ...],
nvfuser frontend implmentation of SiLU double backward as a fused kernel and with activations recomputation
- Parameters:
fd (FusionDefition) – nvFuser’s FusionDefition class
dtype (torch.dtype) – Data type to use for the implementation
dim (int) – Dimension of the input tensor
size (torch.Size) – Size of the input tensor
stride (Tuple[int, ...]) – Stride of the input tensor
- physicsnemo.nn.module.fused_silu.silu_triple_backward_for(
- fd: FusionDefinition,
- dtype: dtype,
- dim: int,
- size: Size,
- stride: Tuple[int, ...],
nvfuser frontend implmentation of SiLU triple backward as a fused kernel and with activations recomputation
- Parameters:
fd (FusionDefition) – nvFuser’s FusionDefition class
dtype (torch.dtype) – Data type to use for the implementation
dim (int) – Dimension of the input tensor
size (torch.Size) – Size of the input tensor
stride (Tuple[int, ...]) – Stride of the input tensor
- class physicsnemo.nn.module.gumbel_softmax.GumbelSoftmax(tau: float = 1.0, learnable: bool = False)[source]#
Bases:
ModuleGumbel-Softmax module for differentiable categorical sampling.
This module wraps the
gumbel_softmax()function as annn.Module, allowing it to be used as a layer in neural network architectures.The Gumbel-Softmax trick provides a differentiable approximation to sampling from a categorical distribution, enabling end-to-end training of models with discrete latent variables.
- Parameters:
tau (float, optional, default=1.0) – Initial temperature parameter. Lower values make the distribution more concentrated (closer to one-hot). Can be modified after initialization.
learnable (bool, optional, default=False) – If
True, the temperature parameter is registered as a learnablenn.Parameter. IfFalse, it is a fixed buffer.
Examples
>>> import torch >>> gs = GumbelSoftmax(tau=0.5) >>> logits = torch.randn(2, 10) # batch_size=2, num_categories=10 >>> probs = gs(logits) >>> probs.shape torch.Size([2, 10]) >>> torch.allclose(probs.sum(dim=-1), torch.ones(2)) # Each row sums to 1 True
>>> # With learnable temperature >>> gs_learnable = GumbelSoftmax(tau=1.0, learnable=True) >>> gs_learnable.tau.requires_grad True
See also
gumbel_softmax()Functional implementation of Gumbel-Softmax.
- forward(
- logits: Float[Tensor, '... num_categories'],
Apply Gumbel-Softmax to input logits.
- Parameters:
logits (torch.Tensor) – Input logits tensor of shape \((*, K)\) where \(K\) is the number of categories.
- Returns:
Gumbel-Softmax output of the same shape as
logits.- Return type:
torch.Tensor
- physicsnemo.nn.module.gumbel_softmax.gumbel_softmax(
- logits: Float[Tensor, '... num_categories'],
- tau: Tensor | float = 1.0,
Implementation of Gumbel Softmax from Transolver++.
Applies a differentiable approximation to sampling from a categorical distribution using the Gumbel-Softmax trick.
Original code: https://github.com/thuml/Transolver_plus/blob/main/models/Transolver_plus.py#L69
- Parameters:
logits (torch.Tensor) – Input logits tensor of shape \((*, K)\) where \(K\) is the number of categories.
tau (torch.Tensor | float, optional, default=1.0) – Temperature parameter. Lower values make the distribution more concentrated.
- Returns:
Gumbel-Softmax output of the same shape as
logits.- Return type:
torch.Tensor