Reductions and Scans
A reduction on a tile like object along a dimension is the result of repeatedly accumulating the elements of that dimension into a single value according to a binary operation. The resulting tile contains the accumulated result of each vector along the operating dimension. A scan is similar to a reduction except it preserves partial accumulation results in the final tile.
Definitions
In the following definitions, let \(a\) be a tile like object of rank \(N\), shape \(S\), and element type \(E\). Let \(0 \leq d < N\) be a dimension of \(a\) and \(\operatorname{op} : E \times E \rightarrow E\) denote a binary operation with identity element \(\mathrm{id}\) of scalar type \(E\).
- tile reduction
-
A reduction of \(a\) along \(d\) for operation \(\operatorname{op}\) is any tile like object \(r\) satisfying the following constraints.
The shape \(R\) of \(r\) matches \(S\) except at dimension \(d\) where its length is \(1\):
\[\begin{split}R_k = \begin{cases} S_k & k \neq d \\ 1 & k = d \end{cases}\end{split}\]Let
\[I = (i_0, i_1, ..., i_{d-1}, 1, i_{d+1}, \ldots , i_{N-1})\]be an element in the index space of \(r\). Consider a collection consisting of the elements of \(a\) along \(d\) at \(I\):
\[V = \left \{ a( i_0, i_1, ..., i_{d-1}, j, i_{d+1}, \ldots , i_{N-1}) \quad | \quad 0 \leq j < S_d \right \}\]Let \(v = (v_0, v_1, \ldots, v_k)\) be any sequence consisting of the elements of \(V\) and \(0\) or more instances of \(\mathrm{id}\).
The value of \(r\) at \(I\) is any application of \(\operatorname{op}\) to the sequence \(v\):
\[r(I) = v_0 \operatorname{op} v_1 \operatorname{op} \ldots \operatorname{op} v_k\]where the grouping of the application of \(\operatorname{op}\) and the ordering of the elements of \(v\) is unspecified.
If \(\operatorname{op}\) would incur undefined behavior on any possible grouping or ordering of the elements of \(v\), the computation of the reduction as a whole is undefined behavior.
Note
If \(\operatorname{op}\) is not commutative or not associative or if \(\mathrm{id}\) is not a true identity element, there may be multiple possible reductions of \(a\).
If a particular grouping or ordering of \(v\) is observed at index \(I\) in the result, this does not imply any constraint on the grouping or ordering of \(v\) for any other index of \(r\).
- tile scan
-
A scan of \(a\) along \(d\) for operation \(\operatorname{op}\) is any tile like object \(r\) of shape \(S\) satisfying the following constraints:
Let
\[I = (i_0, i_1, \ldots, i_{d-1}, i_d, i_{d+1}, \ldots, i_{N-1})\]be an element in the index space of \(r\).
Consider a collection consisting of the elements of \(a\) up to index \(i_d\) along \(d\):
\[V = \left \{ a( i_0, i_1, ..., i_{d-1}, j, i_{d+1}, ..., i_{N-1}) \quad | \quad 0 <= j <= i_d \right \}\]Let \(v = (v_0, v_1, \ldots, v_k)\) be any sequence consisting of the elements of \(V\) and \(0\) or more instances of \(\mathrm{id}\).
The value of \(r\) at \(I\) is any application of \(\operatorname{op}\) to the sequence \(v\)
\[r(I) = v_0 \operatorname{op} v_1 \operatorname{op} \ldots \operatorname{op} v_k\]where the grouping of the application of \(\operatorname{op}\) and the ordering of the elements of \(v\) is unspecified.
If \(\operatorname{op}\) would incur undefined behavior on any possible grouping or ordering of the elements of \(v\), the computation of the scan as a whole is undefined behavior.
Note
If \(\operatorname{op}\) is not commutative or not associative or if \(\mathrm{id}\) is not a true identity element, there may be multiple possible scans for \(a\).
If a particular grouping or ordering of \(v\) is observed at index \(I\) in the result, this does not imply any constraint on the grouping or ordering of \(v\) for any other index of \(r\).
cuda::tiles::reduction_result_t
cuda::tiles::reduce_max
-
template<
ct::integral auto D,
ct::arithmetic_tile T
>
requires (D >= 0) && (D < tile_rank_v<T>)
__tile__ ct::reduction_result_t<T, D> reduce_max(T a, ct::integral_constant<D> = {}) noexcept;
-
template<
ct::integral auto D,
ct::nan_propagation_mode Nan = ct::default_nan_propagation_mode(),
ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(),
ct::arithmetic_tile T
>
requires (D >= 0) && (D < tile_rank_v<T>) && /* atomic constraint */
__tile__ ct::reduction_result_t<T, D> reduce_max(T a, ct::integral_constant<D>, ct::nan_propagation_mode_constant<Nan>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;
-
Yields an unspecified reduction of operand
aalong dimensionDaccording to binary operator \(\operatorname{op}(x, y)\) defined as:Overload 1: ct::max(x, y)
Overload 2: ct::max<Nan, SubMode>(x, y)
Let \(E\) be the element type of \(T\).
If \(E\) is an integral scalar. the identity element \(\mathrm{id}\) is the minimal representable value of \(E\).
If \(E\) is a basic floating point scalar. the identity element \(\mathrm{id}\) is negative infinity.
The atomic constraint of overload (2) validates that
\(T\) is a basic floating point scalar
If
SubModeis round subnormals to zero, then \(T\) isfloat.The values
NanModeandSubModeare enumerators of their respective types.
Note
The identity element for floating point arguments \(\mathrm{id} = -\infty\) may not be a true identity of \(\operatorname{op}\) depending on the provided numeric flags. For example, when suppress NaN (the default) is used:
\[\operatorname{op}(-\infty, \text{NaN}) = -\infty\]In this scenario, multiple valid reduction results are possible when any element of the tile is a \(\text{NaN}\).
Example
The following example performs a max reduction:
namespace ct = ::cuda::tiles; using namespace ct::literals; using i32x2x4 = ct::tile<int, ct::shape<2, 4>>; float xData[2][4] = { {0, 10, 2, 5}, {-3, 2, 22, 7}, }; auto x = ct::load(&xData[0][0] + ct::iota<i32x2x4>()); auto r0 = ct::reduce_max(x, 1_ic); // No flags auto r1 = ct::reduce_max(x, 1_ic, // With flags ct::suppress_nan_t{}, ct::preserve_subnormals_t{});
\[\begin{split}\begin{pmatrix} 0 & 10 & 2 & 5 \\ -3 & 2 & 22 & 7 \end{pmatrix} \rightarrow \begin{pmatrix} 10 \\ 22 \end{pmatrix}\end{split}\]
cuda::tiles::reduce_min
-
template<
ct::integral auto D,
ct::arithmetic_tile T
>
requires (D >= 0) && (D < tile_rank_v<T>)
__tile__ ct::reduction_result_t<T, D> reduce_min(T a, ct::integral_constant<D> = {}) noexcept;
-
template<
ct::integral auto D,
ct::nan_propagation_mode Nan = ct::default_nan_propagation_mode(),
ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(),
ct::arithmetic_tile T
>
requires (D >= 0) && (D < tile_rank_v<T>) && /* atomic constraint */
__tile__ ct::reduction_result_t<T, D> reduce_min(T a, ct::integral_constant<D>, ct::nan_propagation_mode_constant<Nan>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;
-
Yields an unspecified reduction of operand
aalong dimensionDaccording to binary operator \(\operatorname{op}(x, y)\) defined as:Overload 1: ct::min(x, y)
Overload 2: ct::min<Nan, SubMode>(x, y)
Let \(E\) be the element type of \(T\).
If \(E\) is an integral scalar. the identity element \(\mathrm{id}\) is the maximal representable value of \(E\).
If \(E\) is a basic floating point scalar. the identity element \(\mathrm{id}\) is positive infinity.
The atomic constraint of overload (2) validates that
\(T\) is a basic floating point scalar
If
SubModeis round subnormals to zero, then \(T\) isfloat.The values
NanModeandSubModeare enumerators of their respective types.
Note
The identity element for floating point arguments \(\mathrm{id} = \infty\) may not be a true identity of \(\operatorname{op}\) depending on the provided numeric flags. For example, when suppress NaN (the default) is used:
\[\operatorname{op}(\infty, \text{NaN}) = \infty\]In this scenario, multiple valid reduction results are possible when any element of the tile is a \(\text{NaN}\).
Example
The following example performs a min reduction:
namespace ct = ::cuda::tiles; using namespace ct::literals; using i32x2x4 = ct::tile<int, ct::shape<2, 4>>; float xData[2][4] = { {0, 10, 2, 5}, {-3, 2, 22, 7}, }; auto x = ct::load(&xData[0][0] + ct::iota<i32x2x4>()); auto r0 = ct::reduce_min(x, 0_ic); // No flags auto r1 = ct::reduce_min(x, 0_ic, // With flags ct::suppress_nan_t{}, ct::preserve_subnormals_t{});
\[\begin{split}\begin{pmatrix} 0 & 10 & 2 & 5 \\ -3 & 2 & 22 & 7 \end{pmatrix} \rightarrow \begin{pmatrix} -3 & 2 & 2 & 5 \end{pmatrix}\end{split}\]
cuda::tiles::all_of
-
template<ct::integral auto D, ct::bool_tile_convertible T>
requires (D >= 0) && (D < tile_rank_v<T>)
__tile__ ct::reduction_result_t<ct::tile_with_element_t<T, bool>, D> all_of(T a, ct::integral_constant<D> = {}) noexcept;
-
Yields the logical AND reduction of the bool tile converted operand
aalong dimension \(D\). The reduction operator \(\operatorname{op}(x, y)\) is the expression x && y and the identity element istrue.Example
namespace ct = ::cuda::tiles; using namespace ct::literals; using i32x2x4 = ct::tile<int, ct::shape<2, 4>>; bool xData[2][4] = { {true, true, false, false}, {true, false, true, false}, }; auto x = ct::load(&xData[0][0] + ct::iota<i32x2x4>()); auto r = ct::all_of(x, 0_ic);
\[\begin{split}\begin{pmatrix} \text{true} & \text{true} & \text{false} & \text{false} \\ \text{true} & \text{false} & \text{true} & \text{false} \end{pmatrix} \rightarrow \begin{pmatrix} \text{true} & \text{false} & \text{false} & \text{false} \end{pmatrix}\end{split}\]
cuda::tiles::any_of
-
template<ct::integral auto D, ct::bool_tile_convertible T>
requires (D >= 0) && (D < tile_rank_v<T>)
__tile__ ct::reduction_result_t<ct::tile_with_element_t<T, bool>, D> any_of(T a, ct::integral_constant<D> = {}) noexcept;
-
Yields the logical OR reduction of the bool tile converted operand
aalong dimension \(D\). The reduction operator \(\operatorname{op}(x, y)\) is the expression x || y and the identity element isfalse.Example
namespace ct = ::cuda::tiles; using namespace ct::literals; using i32x2x4 = ct::tile<int, ct::shape<2, 4>>; bool xData[2][4] = { {true, true, false, false}, {true, false, true, false}, }; auto x = ct::load(&xData[0][0] + ct::iota<i32x2x4>()); auto r = ct::all_of(x, 0_ic);
\[\begin{split}\begin{pmatrix} \text{true} & \text{true} & \text{false} & \text{false} \\ \text{true} & \text{false} & \text{true} & \text{false} \end{pmatrix} \rightarrow \begin{pmatrix} \text{true} & \text{true} & \text{true} & \text{false} \end{pmatrix}\end{split}\]
cuda::tiles::sum
-
template<
ct::integral auto D,
ct::arithmetic_tile T
>
requires (D >= 0) && (D < tile_rank_v<T>)
__tile__ ct::reduction_result_t<T, D> sum(T a, ct::integral_constant<D> = {}) noexcept;
-
template<
ct::integral auto D,
ct::rounding_mode Mode = ct::default_rounding_mode(),
ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(),
ct::arithmetic_tile T
>
requires (D >= 0) && (D < tile_rank_v<T>) && /* atomic constraint */
__tile__ ct::reduction_result_t<T, D> sum(T a, ct::integral_constant<D>, ct::rounding_mode_constant<Mode>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;
-
Yields an unspecified summation reduction of operand
aalong dimension \(D\). The binary operation \(\operatorname{op}(x, y)\) is defined asOverload 1: ct::add(x, y)
Overload 2: ct::add<Mode, SubMode>(x, y)
The identity element \(\mathrm{id}\) is \(0\) for integral scalar elements and positive zero for floating point elements of
a.The atomic constraint of overload (2) validates that:
\(T\) is a basic floating point scalar
Modeis a precise rounding modeIf
SubModeis round subnormals to zero, then \(T\) isfloat.The values
ModeandSubModeare enumerators of their respective types.
Note
The binary operation \(\operatorname{op}\) is not associative for floating point arguments. There may be multiple valid results of the reduction.
Undefined behavior may occur if any ordering or grouping of elements triggers signed integer overflow. See undefined behavior annex for an example.
Example
The following example sums the rows of a tile:
namespace ct = ::cuda::tiles; using namespace ct::literals; using i32x2x4 = ct::tile<int, ct::shape<2, 4>>; float xData[2][4] = { {3, 2, 1, 4}, {-3, 2, 1, 5}, }; auto x = ct::load(&xData[0][0] + ct::iota<i32x2x4>()); auto r0 = ct::sum(x, 1_ic); // No flags auto r1 = ct::sum(x, 1_ic, // With flags ct::round_ties_to_even_t{}, ct::preserve_subnormals_t{});
\[\begin{split}\begin{pmatrix} 3 & 2 & 1 & 4 \\ -3 & 2 & 1 & 5 \\ \end{pmatrix} \rightarrow \begin{pmatrix} 10 \\ 5 \end{pmatrix}\end{split}\]
cuda::tiles::prod
-
template<
ct::integral auto D,
ct::arithmetic_tile T
>
requires (D >= 0) && (D < tile_rank_v<T>)
__tile__ ct::reduction_result_t<T, D> prod(T a, ct::integral_constant<D> = {}) noexcept;
-
template<
ct::integral auto D,
ct::rounding_mode Mode = ct::default_rounding_mode(),
ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(),
ct::arithmetic_tile T
>
requires (D >= 0) && (D < tile_rank_v<T>) && /* atomic constraint */
__tile__ ct::reduction_result_t<T, D> prod(T a, ct::integral_constant<D>, ct::rounding_mode_constant<Mode>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;
-
Yields an unspecified product reduction of operand
aalong dimension \(D\). The binary operation \(\operatorname{op}(x, y)\) is defined asOverload 1: ct::mul(x, y)
Overload 2: ct::mul<Mode, SubMode>(x, y)
The identity element \(\mathrm{id}\) is \(1\).
The atomic constraint of overload (2) validates that:
\(T\) is a basic floating point scalar
Modeis a precise rounding modeIf
SubModeis round subnormals to zero, then \(T\) isfloat.The values
ModeandSubModeare enumerators of their respective types.
Note
The binary operation \(\operatorname{op}\) is not associative for floating point arguments. There may be multiple valid results of the reduction.
Undefined behavior may occur if any ordering or grouping of elements triggers signed integer overflow. See the undefined behavior annex for an example.
Example
The following example performs a product along the rows of a tile:
namespace ct = ::cuda::tiles; using namespace ct::literals; using i32x2x4 = ct::tile<int, ct::shape<2, 4>>; float xData[2][4] = { {3, 2, 1, 4}, {-3, 2, 1, 5}, }; auto x = ct::load(&xData[0][0] + ct::iota<i32x2x4>()); auto r0 = ct::prod(x, 1_ic); // No flags auto r1 = ct::prod(x, 1_ic, // With flags ct::round_ties_to_even_t{}, ct::preserve_subnormals_t{});
\[\begin{split}\begin{pmatrix} 3 & 2 & 1 & 4 \\ -3 & 2 & 1 & 5 \\ \end{pmatrix} \rightarrow \begin{pmatrix} 24 \\ -30 \end{pmatrix}\end{split}\]
cuda::tiles::reduce_bitand
-
template<
ct::integral auto D,
ct::integral_tile T
>
requires (D >= 0) && (D < tile_rank_v<T>)
__tile__ ct::reduction_result_t<T, D> reduce_bitand(T a, ct::integral_constant<D> = {}) noexcept;
-
Yields the unique bitwise AND reduction of the operand
aalong dimension \(D\).The binary operator \(\operatorname{op}(x, y)\) is the expression ct::operator&(x, y).
Let \(E\) be the element type of
a. If \(E\) is a signed integral scalar, the identity element \(\mathrm{id}\) is \(-1\). If \(E\) is an unsigned integral scalar the identity element is the maximum integer value representable in \(E\).Example
namespace ct = ::cuda::tiles; using namespace ct::literals; using i32x2x4 = ct::tile<int, ct::shape<2, 4>>; unsigned char xData[2][4] = { {0b00001111, 0b10101010}, {0b01010101, 0b11110000}, }; auto x = ct::load(&xData[0][0] + ct::iota<i32x2x4>()); auto r = ct::reduce_bitand(x, 0_ic);
\[\begin{split}\begin{pmatrix} \text{0b00001111} & \text{0b10101010} \\ \text{0b01010101} & \text{0b11110000} \end{pmatrix} \rightarrow \begin{pmatrix} \text{0b00000101} & \text{0b10100000} \end{pmatrix}\end{split}\]
cuda::tiles::reduce_bitor
-
template<ct::integral auto D, ct::integral_tile T>
requires (D >= 0) && (D < tile_rank_v<T>)
__tile__ ct::reduction_result_t<T, D> reduce_bitor(T a, ct::integral_constant<D> = {}) noexcept;
-
Yields the unique bitwise OR reduction of the operand
aalong dimension \(D\).The binary operator \(\operatorname{op}(x, y)\) is the expression ct::operator|(x, y).
The identity element \(\mathrm{id}\) is \(0\).
Example
namespace ct = ::cuda::tiles; using namespace ct::literals; using i32x2x4 = ct::tile<int, ct::shape<2, 4>>; unsigned char xData[2][4] = { {0b00001111, 0b10101010}, {0b01010101, 0b11110000}, }; auto x = ct::load(&xData[0][0] + ct::iota<i32x2x4>()); auto r = ct::reduce_bitor(x, 0_ic);
\[\begin{split}\begin{pmatrix} \text{0b00001111} & \text{0b10101010} \\ \text{0b01010101} & \text{0b11110000} \end{pmatrix} \rightarrow \begin{pmatrix} \text{0b01011111} & \text{0b11111010} \end{pmatrix}\end{split}\]
cuda::tiles::reduce_bitxor
-
template<ct::integral auto D, ct::integral_tile T>
requires (D >= 0) && (D < tile_rank_v<T>)
__tile__ ct::reduction_result_t<T, D> reduce_bitxor(T a, ct::integral_constant<D> = {}) noexcept;
-
Yields the unique bitwise XOR reduction of the operand
aalong dimension \(D\).The binary operator \(\operatorname{op}(x, y)\) is the expression ct::operator^(x, y).
The identity element \(\mathrm{id}\) is \(0\).
Example
namespace ct = ::cuda::tiles; using namespace ct::literals; using i32x2x4 = ct::tile<int, ct::shape<2, 4>>; unsigned char xData[2][4] = { {0b00001111, 0b10101010}, {0b01010101, 0b11110000}, }; auto x = ct::load(&xData[0][0] + ct::iota<i32x2x4>()); auto r = ct::reduce_bitxor(x, 0_ic);
\[\begin{split}\begin{pmatrix} \text{0b00001111} & \text{0b10101010} \\ \text{0b01010101} & \text{0b11110000} \end{pmatrix} \rightarrow \begin{pmatrix} \text{0b01011010} & \text{0b01011010} \end{pmatrix}\end{split}\]
cuda::tiles::partial_sum
-
template<
ct::integral auto D,
ct::arithmetic_tile T
>
requires (D >= 0) && (D < tile_rank_v<T>)
__tile__ T partial_sum(T a, ct::integral_constant<D> = {}) noexcept;
-
template<
ct::integral auto D,
ct::rounding_mode Mode = ct::default_rounding_mode(),
ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(),
ct::arithmetic_tile T
>
requires (D >= 0) && (D < tile_rank_v<T>) && /* atomic constraint */
__tile__ T partial_sum(T a, ct::integral_constant<D>, ct::rounding_mode_constant<Mode>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;
-
Yields an unspecified scan summation of operand
aalong dimension \(D\). The binary operation \(\operatorname{op}(x, y)\) is defined asOverload 1: ct::add(x, y)
Overload 2: ct::add<Mode, SubMode>(x, y)
The identity element \(\mathrm{id}\) is \(0\) for integral scalar elements and positive zero for floating point elements of
a.The atomic constraint of overload (2) validates that:
\(T\) is a basic floating point scalar
Modeis a precise rounding modeIf
SubModeis round subnormals to zero, then \(T\) isfloat.The values
ModeandSubModeare enumerators of their respective types.
Note
The binary operation \(\operatorname{op}\) is not associative for floating point arguments. There may be multiple valid results of the scan.
Undefined behavior may occur if any ordering or grouping of elements triggers signed integer overflow. See undefined behavior annex for an example.
Example
The following example performs a partial sum of the rows of a tile:
namespace ct = ::cuda::tiles; using namespace ct::literals; using i32x2x4 = ct::tile<int, ct::shape<2, 4>>; float xData[2][4] = { {3, 2, 1, 4}, {-3, 2, 1, 5}, }; auto x = ct::load(&xData[0][0] + ct::iota<i32x2x4>()); auto r0 = ct::partial_sum(x, 1_ic); // No flags auto r1 = ct::partial_sum(x, 1_ic, // With flags ct::round_ties_to_even_t{}, ct::preserve_subnormals_t{});
\[\begin{split}\begin{pmatrix} 3 & 2 & 1 & 4 \\ -3 & 2 & 1 & 5 \\ \end{pmatrix} \rightarrow \begin{pmatrix} 3 & 5 & 6 & 10 \\ -3 & -1 & 0 & 5 \end{pmatrix}\end{split}\]
cuda::tiles::partial_prod
-
template<
ct::integral auto D,
ct::arithmetic_tile T
>
requires (D >= 0) && (D < tile_rank_v<T>)
__tile__ T partial_prod(T a, ct::integral_constant<D> = {}) noexcept;
-
template<
ct::integral auto D,
ct::rounding_mode Mode = ct::default_rounding_mode(),
ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(),
ct::arithmetic_tile T
>
requires (D >= 0) && (D < tile_rank_v<T>) && /* atomic constraint */
__tile__ T partial_prod(T a, ct::integral_constant<D>, ct::rounding_mode_constant<Mode>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;
-
Yields an unspecified product scan of operand
aalong dimension \(D\). The binary operation \(\operatorname{op}(x, y)\) is defined asOverload 1: ct::mul(x, y)
Overload 2: ct::mul<Mode, SubMode>(x, y)
The identity element \(\mathrm{id}\) is \(1\).
The atomic constraint of overload (2) validates that:
\(T\) is a basic floating point scalar
Modeis a precise rounding modeIf
SubModeis round subnormals to zero, then \(T\) isfloat.The values
ModeandSubModeare enumerators of their respective types.
Note
The binary operation \(\operatorname{op}\) is not associative for floating point arguments. There may be multiple valid results of the scan.
Undefined behavior may occur if any ordering or grouping of elements triggers signed integer overflow. See the undefined behavior annex for an example.
Example
The following example performs a product scan along the rows of a tile:
namespace ct = ::cuda::tiles; using namespace ct::literals; using i32x2x4 = ct::tile<int, ct::shape<2, 4>>; float xData[2][4] = { {3, 2, 1, 4}, {-3, 2, 1, 5}, }; auto x = ct::load(&xData[0][0] + ct::iota<i32x2x4>()); auto r0 = ct::partial_prod(x, 1_ic); // No flags auto r1 = ct::partial_prod(x, 1_ic, // With flags ct::round_ties_to_even_t{}, ct::preserve_subnormals_t{});
\[\begin{split}\begin{pmatrix} 3 & 2 & 1 & 4 \\ -3 & 2 & 1 & 5 \\ \end{pmatrix} \rightarrow \begin{pmatrix} 3 & 6 & 6 & 24 \\ -3 & -6 & -6 & -30 \end{pmatrix}\end{split}\]