activation.h

Activation functions.

Enums

enum class NVTE_Activation_Type

Computes activation of the input. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Param input:

[in] Input tensor for activation.

Param output:

[inout] Output tensor.

Param stream:

[in] CUDA stream used for the operation.

Values:

enumerator GELU
enumerator GEGLU
enumerator SILU
enumerator SWIGLU
enumerator RELU
enumerator REGLU
enumerator QGELU
enumerator QGEGLU
enumerator SRELU
enumerator SREGLU

Functions

void nvte_gelu(const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the GeLU activation of the input. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • input[in] Input tensor for activation.

  • output[inout] Output tensor.

  • stream[in] CUDA stream used for the operation.

void nvte_silu(const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the SiLU activation of the input. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • input[in] Input tensor for activation.

  • output[inout] Output tensor.

  • stream[in] CUDA stream used for the operation.

void nvte_relu(const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the ReLU activation of the input. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • input[in] Input tensor for activation.

  • output[inout] Output tensor.

  • stream[in] CUDA stream used for the operation.

void nvte_qgelu(const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the Quick GeLU activation of the input. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • input[in] Input tensor for activation.

  • output[inout] Output tensor.

  • stream[in] CUDA stream used for the operation.

void nvte_srelu(const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the Squared ReLU activation of the input. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • input[in] Input tensor for activation.

  • output[inout] Output tensor.

  • stream[in] CUDA stream used for the operation.

void nvte_dgelu(const NVTETensor grad, const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the GeLU activation gradient. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • grad[in] Incoming gradient.

  • input[in] Input tensor for activation.

  • output[inout] Output tensor.

  • stream[in] CUDA stream used for the operation.

void nvte_dsilu(const NVTETensor grad, const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the SiLU activation gradient. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • grad[in] Incoming gradient.

  • input[in] Input tensor for activation.

  • output[inout] Output tensor.

  • stream[in] CUDA stream used for the operation.

void nvte_drelu(const NVTETensor grad, const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the ReLU activation gradient. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • grad[in] Incoming gradient.

  • input[in] Input tensor for activation.

  • output[inout] Output tensor.

  • stream[in] CUDA stream used for the operation.

void nvte_dqgelu(const NVTETensor grad, const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the Quick GeLU activation gradient. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • grad[in] Incoming gradient.

  • input[in] Input tensor for activation.

  • output[inout] Output tensor.

  • stream[in] CUDA stream used for the operation.

void nvte_dsrelu(const NVTETensor grad, const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the Squared ReLU activation gradient. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • grad[in] Incoming gradient.

  • input[in] Input tensor for activation.

  • output[inout] Output tensor.

  • stream[in] CUDA stream used for the operation.

void nvte_geglu(const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the gated GeLU activation of the input. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • input[in] Input tensor of shape [N, H * 2].

  • output[inout] Output tensor of shape [N, H]. It computes Act(input[N, :H]) x input[N, H:]

  • stream[in] CUDA stream used for the operation.

void nvte_swiglu(const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the gated Swish activation of the input. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • input[in] Input tensor of shape [N, H * 2].

  • output[inout] Output tensor of shape [N, H]. It computes Act(input[N, :H]) x input[N, H:]

  • stream[in] CUDA stream used for the operation.

void nvte_reglu(const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the gated ReLU activation of the input. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • input[in] Input tensor of shape [N, H * 2].

  • output[inout] Output tensor of shape [N, H]. It computes Act(input[N, :H]) x input[N, H:]

  • stream[in] CUDA stream used for the operation.

void nvte_qgeglu(const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the gated Quick GeLU activation of the input. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • input[in] Input tensor of shape [N, H * 2].

  • output[inout] Output tensor of shape [N, H]. It computes Act(input[N, :H]) x input[N, H:]

  • stream[in] CUDA stream used for the operation.

void nvte_sreglu(const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the gated Squared ReLU activation of the input. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • input[in] Input tensor of shape [N, H * 2].

  • output[inout] Output tensor of shape [N, H]. It computes Act(input[N, :H]) x input[N, H:]

  • stream[in] CUDA stream used for the operation.

void nvte_dgeglu(const NVTETensor grad, const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the gated GeLU activation gradient. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • grad[in] Incoming gradient of shape [N, H].

  • input[in] Forward input tensor of shape [N, H * 2].

  • output[inout] Outgoing gradient of shape [N, H * 2].

  • stream[in] CUDA stream used for the operation.

void nvte_dswiglu(const NVTETensor grad, const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the gated Swish activation gradient. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • grad[in] Incoming gradient of shape [N, H].

  • input[in] Forward input tensor of shape [N, H * 2].

  • output[inout] Outgoing gradient of shape [N, H * 2].

  • stream[in] CUDA stream used for the operation.

void nvte_dreglu(const NVTETensor grad, const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the gated ReLU activation gradient. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • grad[in] Incoming gradient of shape [N, H].

  • input[in] Forward input tensor of shape [N, H * 2].

  • output[inout] Outgoing gradient of shape [N, H * 2].

  • stream[in] CUDA stream used for the operation.

void nvte_dqgeglu(const NVTETensor grad, const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the gated Quick GeLU activation gradient. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • grad[in] Incoming gradient of shape [N, H].

  • input[in] Forward input tensor of shape [N, H * 2].

  • output[inout] Outgoing gradient of shape [N, H * 2].

  • stream[in] CUDA stream used for the operation.

void nvte_dsreglu(const NVTETensor grad, const NVTETensor input, NVTETensor output, cudaStream_t stream)

Computes the gated Squared ReLU activation gradient. If the scaling mode of the output tensor is set to NVTE_MXFP8_1D_SCALING, the block quantization (MXFP8) of the specified shape of the block will be used.

Parameters:
  • grad[in] Incoming gradient of shape [N, H].

  • input[in] Forward input tensor of shape [N, H * 2].

  • output[inout] Outgoing gradient of shape [N, H * 2].

  • stream[in] CUDA stream used for the operation.