`core.fusions.fused_bias_geglu`#

Module Contents#

Classes#

`BiasGeGLUFunction`	Custom autograd function for GEGLU activation with bias support.
`GeGLUFunction`	Custom autograd function for GEGLU activation without bias.
`WeightedQuickGeGLUFunction`	Autograd function for token-wise weighted Quick-GEGLU (no bias).
`WeightedBiasQuickGeGLUFunction`	Autograd function for token-wise weighted Quick-GEGLU with bias support.

Functions#

`geglu`	Performs GEGLU (GELU-Gated Linear Unit) activation.
`bias_geglu`	Performs GEGLU activation with bias addition.
`geglu_back`	Computes the gradient for the GEGLU activation.
`bias_geglu_back`	Computes the gradient for the biased GEGLU activation.
`bias_geglu_impl`	Implementation of biased GEGLU that handles different input shapes.
`quick_gelu`	Sigmoid approximation of gelu
`quick_geglu`	Performs Quick-GELU-based GEGLU activation : quick_gelu(y1) * (y2 + offset).
`weighted_quick_geglu`	Token-wise-weighted Quick-GEGLU activation.
`quick_geglu_back`	Backward helper for Quick-GEGLU.
`weighted_quick_geglu_back`	Backward helper for weighted Quick-GEGLU. Returns gradient w.r.t input `y` and `weights`.
`weighted_bias_quick_geglu`	Token-wise weighted Quick-GEGLU activation with bias.
`weighted_bias_quick_geglu_back`	Backward helper for weighted Quick-GEGLU with bias.
`weighted_bias_quick_geglu_impl`	Token-wise-weighted bias quick_geglu fusion. input: [num_selected_experts * seq_len, hidden_size * 2] bias: None weights: [num_selected_experts * seq_len, 1] fp8_input_store: bool linear_offset: float output: [num_selected_experts * seq_len, hidden_size]

API#

core.fusions.fused_bias_geglu.geglu(y)#

Performs GEGLU (GELU-Gated Linear Unit) activation.

Parameters:: y (torch.Tensor) – Input tensor to be split into two halves along the last dimension.
Returns:: Result of GEGLU activation: GELU(y1) * y2, where y1, y2 are the split halves.
Return type:: torch.Tensor

core.fusions.fused_bias_geglu.bias_geglu(bias, y)#

Performs GEGLU activation with bias addition.

Parameters:

bias (torch.Tensor) – Bias tensor to be added to the input.
y (torch.Tensor) – Input tensor to be split and gated.

Returns:

Result of bias addition followed by GEGLU activation.

Return type:

torch.Tensor

core.fusions.fused_bias_geglu.geglu_back(g, y)#

Computes the gradient for the GEGLU activation.

Parameters:

g (torch.Tensor) – Gradient tensor from the subsequent layer.
y (torch.Tensor) – Input tensor that was used in the forward pass.

Returns:

Gradient with respect to the input tensor.

Return type:

torch.Tensor

core.fusions.fused_bias_geglu.bias_geglu_back(g, y, bias)#

Computes the gradient for the biased GEGLU activation.

Parameters:

g (torch.Tensor) – Gradient tensor from the subsequent layer.
y (torch.Tensor) – Input tensor that was used in the forward pass.
bias (torch.Tensor) – Bias tensor that was added in the forward pass.

Returns:

Gradient with respect to the input tensor after bias addition.

Return type:

torch.Tensor

class core.fusions.fused_bias_geglu.BiasGeGLUFunction#

Bases: torch.autograd.Function

Custom autograd function for GEGLU activation with bias support.

static forward(ctx, input, bias)#

Forward pass of biased GEGLU activation.

Parameters:

ctx – Autograd context object for saving tensors for backward pass.
input (torch.Tensor) – Input tensor to apply GEGLU to.
bias (torch.Tensor) – Bias tensor to be added to input before GEGLU.

Returns:

Result of applying bias addition followed by GEGLU activation.

Return type:

torch.Tensor

static backward(ctx, grad_output)#

Backward pass of biased GEGLU activation.

Parameters:

ctx – Autograd context object containing saved tensors from forward pass.
grad_output (torch.Tensor) – Gradient of the loss with respect to the output.

Returns:

Tuple containing gradients with respect to the input and bias tensors.

Return type:

tuple

class core.fusions.fused_bias_geglu.GeGLUFunction#

Bases: torch.autograd.Function

Custom autograd function for GEGLU activation without bias.

static forward(ctx, input)#

Forward pass of GEGLU activation.

Parameters:

ctx – Autograd context object for saving tensors for backward pass.
input (torch.Tensor) – Input tensor to apply GEGLU to.

Returns:

Result of applying GEGLU activation.

Return type:

torch.Tensor

static backward(ctx, grad_output)#

Backward pass of GEGLU activation.

Parameters:

ctx – Autograd context object containing saved tensors from forward pass.
grad_output (torch.Tensor) – Gradient of the loss with respect to the output.

Returns:

Gradient with respect to the input tensor.

Return type:

torch.Tensor

core.fusions.fused_bias_geglu.bias_geglu_impl(input, bias)#

Implementation of biased GEGLU that handles different input shapes.

This function reshapes the input if necessary, applies the GEGLU activation (with or without bias), and restores the original shape.

Parameters:

input (torch.Tensor) – Input tensor to apply GEGLU activation.
bias (torch.Tensor, optional) – Bias tensor to be added to input. If None, uses the bias-free GEGLU variant.

Returns:

Result of biased GEGLU activation.

Return type:

torch.Tensor

Raises:

AssertionError – If input tensor does not have 2 or 3 dimensions.

core.fusions.fused_bias_geglu.quick_gelu(y: torch.Tensor) → torch.Tensor#: Sigmoid approximation of gelu

core.fusions.fused_bias_geglu.quick_geglu( y: torch.Tensor, linear_offset: float = 0.0, ) → torch.Tensor#

Performs Quick-GELU-based GEGLU activation : quick_gelu(y1) * (y2 + offset).

Parameters:

y – Input tensor split into two halves on the last dimension.
linear_offset – Optional linear offset added to the second half before gating.

Returns:

Tensor after applying the GEGLU activation.

core.fusions.fused_bias_geglu.weighted_quick_geglu( y: torch.Tensor, weights: torch.Tensor, linear_offset: float = 0.0, ) → torch.Tensor#

Token-wise-weighted Quick-GEGLU activation.

The weights tensor is expected to have the same first-dimension length as y and a trailing singleton dimension so that it broadcasts over the feature dimension.

core.fusions.fused_bias_geglu.quick_geglu_back(g, y, linear_offset: float = 0.0) → torch.Tensor#

Backward helper for Quick-GEGLU.

Parameters:

g (torch.Tensor) – Upstream gradient tensor.
y (torch.Tensor) – Input tensor used in the forward pass.
linear_offset (float, optional) – Linear offset used in the forward pass. Defaults to 0.0.

Returns:

Gradient with respect to the input tensor.

Return type:

torch.Tensor

core.fusions.fused_bias_geglu.weighted_quick_geglu_back(g, y, weights, linear_offset: float = 0.0)#: Backward helper for weighted Quick-GEGLU. Returns gradient w.r.t input y and weights.

core.fusions.fused_bias_geglu.weighted_bias_quick_geglu( y: torch.Tensor, bias: torch.Tensor, weights: torch.Tensor, linear_offset: float = 0.0, ) → torch.Tensor#

Token-wise weighted Quick-GEGLU activation with bias.

Parameters:

y – Input tensor before bias addition.
bias – Bias tensor broadcastable to y.
weights – Weight tensor with shape [tokens, 1] broadcasting over feature dim.
linear_offset – Optional linear offset for the second half before gating.

Returns:

Activated tensor with same dtype as y.

core.fusions.fused_bias_geglu.weighted_bias_quick_geglu_back( g, y, bias, weights, linear_offset: float = 0.0, )#

Backward helper for weighted Quick-GEGLU with bias.

Returns gradients w.r.t input y, bias, and weights.

class core.fusions.fused_bias_geglu.WeightedQuickGeGLUFunction#

Bases: torch.autograd.Function

Autograd function for token-wise weighted Quick-GEGLU (no bias).

static forward( ctx, input: torch.Tensor, weights: torch.Tensor, fp8_input_store: bool, linear_offset: torch.Tensor, )#

Forward pass of weighted Quick-GEGLU.

Parameters:

ctx – Autograd context object for saving tensors for backward pass.
input (torch.Tensor) – Input tensor of shape [N, 2H].
weights (torch.Tensor) – Per-token weights of shape [N, 1].
fp8_input_store (bool) – If True, stores input for backward in FP8.
linear_offset (torch.Tensor) – Scalar tensor offset added to the linear half.

Returns:

Output tensor of shape [N, H] after weighted Quick-GEGLU.

Return type:

torch.Tensor

static backward(ctx, grad_output)#

Backward pass of weighted Quick-GEGLU.

Parameters:

ctx – Autograd context object containing saved tensors from forward pass.
grad_output (torch.Tensor) – Upstream gradient w.r.t. the output.

Returns:

Gradients with respect to (input, weights, fp8_input_store, linear_offset). The latter two gradients are None.

Return type:

tuple

class core.fusions.fused_bias_geglu.WeightedBiasQuickGeGLUFunction#

Bases: torch.autograd.Function

Autograd function for token-wise weighted Quick-GEGLU with bias support.

static forward( ctx, input: torch.Tensor, bias: torch.Tensor, weights: torch.Tensor, fp8_input_store: bool, linear_offset: torch.Tensor, )#

Forward pass of weighted Quick-GEGLU.

Parameters:

ctx – Autograd context object for saving tensors for backward pass.
input (torch.Tensor) – Input tensor of shape [N, 2H].
bias (torch.Tensor) – Bias tensor of shape [N, 1].
weights (torch.Tensor) – Per-token weights of shape [N, 1].
fp8_input_store (bool) – If True, stores input for backward in FP8.
linear_offset (torch.Tensor) – Scalar tensor offset added to the linear half.

Returns:

Output tensor of shape [N, H] after weighted Quick-GEGLU with bias.

Return type:

torch.Tensor

static backward(ctx, grad_output)#

Backward pass of weighted Quick-GEGLU with bias.

Parameters:

ctx – Autograd context object containing saved tensors from forward pass.
grad_output (torch.Tensor) – Upstream gradient w.r.t. the output.

Returns:

Gradients with respect to (input, bias, weights, fp8_input_store, linear_offset). The latter two gradients are None.

Return type:

tuple

core.fusions.fused_bias_geglu.weighted_bias_quick_geglu_impl( input, bias, weights, fp8_input_store=False, linear_offset=0.0, clamp_value=None, )#: Token-wise-weighted bias quick_geglu fusion. input: [num_selected_experts * seq_len, hidden_size * 2] bias: None weights: [num_selected_experts * seq_len, 1] fp8_input_store: bool linear_offset: float output: [num_selected_experts * seq_len, hidden_size]

core.fusions.fused_bias_geglu#

Module Contents#

Classes#

Functions#

API#

`core.fusions.fused_bias_geglu`#