transpose.h¶
Functions handling transposes.
Functions

void nvte_cast_transpose(const NVTETensor input, const NVTETensor scale, NVTETensor cast_output, NVTETensor transposed_output, NVTETensor amax, NVTETensor scale_inv, cudaStream_t stream)¶
Cast and transpose the input.
This function casts the input and produces 2 results:
cast_output
is the result of the casttransposed_output
is the transposed result of the cast.
 Parameters
input – [in] Input tensor of shape [N, H].
scale – [in] Scaling factor used for outputs.
cast_output – [out] Result of the cast. Shape: [N, H].
transposed_output – [out] Result of the cast and transpose. Shape: [H, N].
amax – [inout] AMAX value of the output tensor.
scale_inv – [out] Inverse of the output’s scaling factor.
stream – [in] CUDA stream used for the operation.

void nvte_transpose(const NVTETensor input, NVTETensor transposed_output, cudaStream_t stream)¶
Transpose the input.
 Parameters
input – [in] Input tensor of shape [N, H].
transposed_output – [out] Result of the transpose. Shape: [H, N].
stream – [in] CUDA stream used for the operation.

void nvte_cast_transpose_dbias(const NVTETensor input, const NVTETensor scale, NVTETensor cast_output, NVTETensor transposed_output, NVTETensor amax, NVTETensor dbias, NVTETensor scale_inv, NVTETensor workspace, cudaStream_t stream)¶
Cast and transpose the input. Additionally, reduce the input along the first dimension.
This function casts the input and produces 3 results:
cast_output
is the result of the casttransposed_output
is the transposed result of the cast.dbias
is the result of the reduction of the input along the first dimension.
Calling this function with workspace being an empty tensor will not perform the operation, but instead set the shape and type of the workspace tensor to the required values.
 Parameters
input – [in] Input tensor of shape [N, H].
scale – [in] Scaling factor used for outputs.
cast_output – [out] Result of the cast. Shape: [N, H].
transposed_output – [out] Result of the cast and transpose. Shape: [H, N].
amax – [inout] AMAX value of the output tensor.
dbias – [out] Result of the reduction of the input along the first dimension. Shape: [H].
scale_inv – [out] Inverse of the output’s scaling factor.
workspace – [out] Workspace tensor.
stream – [in] CUDA stream used for the operation.

void nvte_cast_transpose_dbias_dgelu(const NVTETensor input, const NVTETensor gelu_input, const NVTETensor scale, NVTETensor cast_output, NVTETensor transposed_output, NVTETensor amax, NVTETensor dbias, NVTETensor scale_inv, NVTETensor workspace, cudaStream_t stream)¶
Compute backward of GELU operation on the input, then cast and transpose. Additionally, reduce the result of the GELU backward along the first dimension.
This function produces 3 results:
cast_output
is equal tocast(dGELU(input))
transposed_output
is equal totranspose(cast(dGELU(input)))
dbias
is equal toreduce(dGELU(input), axis=0)
Calling this function with workspace being an empty tensor will not perform the operation, but instead set the shape and type of the workspace tensor to the required values.
 Parameters
input – [in] Input tensor of shape [N, H].
gelu_input – [in] Tensor used as input to the forward of GELU operation. Shape [N, H].
scale – [in] Scaling factor used for outputs.
cast_output – [out] Result of the cast. Shape: [N, H].
transposed_output – [out] Result of the cast and transpose. Shape: [H, N].
amax – [inout] AMAX value of the output tensor.
dbias – [out] Result of the reduction of the dGELU(input) along the first dimension. Shape: [H].
scale_inv – [out] Inverse of the output’s scaling factor.
workspace – [out] Workspace tensor.
stream – [in] CUDA stream used for the operation.