Arithmetic Operations
This section documents the APIs for performing arithmetic on tiles.
Addition
-
template<
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs
>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> operator+(Lhs lhs, Rhs rhs) noexcept;
-
template<
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs
>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> add(Lhs lhs, Rhs rhs) noexcept;
-
template<
ct::rounding_mode Mode = ct::default_rounding_mode(),
ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(),
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs
>
requires ct::arithmetic_tile_convertible<Lhs, Rhs> && /* atomic constraint */
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> add(Lhs lhs, Rhs rhs, ct::rounding_mode_constant<Mode>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;
-
Performs elementwise addition on the arithmetic tile converted operands
lhsandrhs.Let \(a\) and \(b\) be corresponding elements of the converted operands and let \(T\) be their type.
If \(T\) is an unsigned integral scalar, the result of each addition is \(a + b \operatorname{mod} 2^n\) where \(n\) is the bitwidth of \(T\).
If \(T\) is a signed integral scalar, the result of each addition is \(a + b\). The behavior is undefined if \(a + b\) is not representable in \(T\) for any pair of corresponding elements in the converted operands.
If \(T\) is a basic floating point scalar, the result of each addition is
\[\operatorname{subround}(\textbf{addition}(\operatorname{subround}(a), \operatorname{subround}(b)))\]where \(\textbf{addition}\) is the IEEE 754 addition 1 with rounding mode and \(\operatorname{subround}\) applies a subnormals rounding mode. For overloads (1) and (2), the default rounding mode and default subnormals rounding mode are used. For overload (3),
Modedetermines the rounding mode andSubModedetermines the subnormals rounding mode.The atomic constraint of overload (3) validates that:
\(T\) is a basic floating point scalar
Modeis a precise rounding modeIf
SubModeis round subnormals to zero, then \(T\) isfloat.The values
ModeandSubModeare enumerators of their respective types.
Example
The following example shows how arithmetic tile conversion is used in the addition APIs. The first operand (an integer scalar) is broadcasted and converted to match the type of the second operand (a \(2 \times 2\) floating point tile).
namespace ct = ::cuda::tiles; using i32x2x2 = ct::tile<int, ct::shape<2, 2>>; using f32x2x2 = ct::tile<float, ct::shape<2, 2>>; float data[2][2] = { {0.0, 1.5}, {3.0, 3.5}, }; f32x2x2 x = ct::load(&data[0][0] + ct::iota<i32x2x2>()); f32x2x2 result = 5 + x;
The
resultholds the following value after the code runs:\[\begin{split}\begin{pmatrix} 5.0 & 6.5 \\ 8.0 & 8.5 \end{pmatrix}\end{split}\]Example
The following example uses the
ct::round_toward_negative_thelper type to select a rounding mode for the addition. Note that thect::addAPI must be used instead of the operator overload when specifying an explicit rounding mode:namespace ct = ::cuda::tiles; float eps = 0x0.000002p0f; float result = ct::add(8.0f, 5 * eps, ct::round_toward_negative_t{});
The result of the addition \(8 + 5\varepsilon\) is rounded toward negative infinity to yield the value
8.0f.
Arithmetic Promotion
-
template<ct::arithmetic_tile T>
__tile__ ct::arithmetic_tile_promotion_t<T> operator+(T x) noexcept;
-
Yields the result of arithmetic tile promotion on the operand
x.Example
In the following example, the tile of
charelements is promoted to a tile ofintelements. The numeric values are unchanged after the operation:namespace ct = ::cuda::tiles; using i8x2x2 = ct::tile<char, ct::shape<2, 2>>; using i32x2x2 = ct::tile<int, ct::shape<2, 2>>; char data[2][2] = { {0, 1}, {2, 3}, }; i8x2x2 x = ct::load(&data[0][0] + ct::iota<i32x2x2>()); i32x2x2 result = +x;
Subtraction
-
template<
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs
>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> operator-(Lhs lhs, Rhs rhs) noexcept;
-
template<
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs
>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> sub(Lhs lhs, Rhs rhs) noexcept;
-
template<
ct::rounding_mode Mode = ct::default_rounding_mode(),
ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(),
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs
>
requires ct::arithmetic_tile_convertible<Lhs, Rhs> && /* atomic constraint */
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> sub(Lhs lhs, Rhs rhs, ct::rounding_mode_constant<Mode>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;
-
Performs elementwise subtraction of the arithmetic tile converted operands
lhsandrhs.Let \(a\) and \(b\) denote corresponding elements of the converted operands
lhsandrhsrespectively and let \(T\) be their type.If \(T\) is an unsigned integral scalar, the result of each subtraction is \(a - b \operatorname{mod} 2^n\) where \(n\) is bitwidth of \(T\).
If \(T\) is a signed integral scalar, the result of each subtraction is \(a - b\). The behavior is undefined if \(a - b\) is not representable in \(T\) for any pair of corresponding elements in the converted operands.
If \(T\) is a basic floating point scalar, the result of each subtraction is
\[\operatorname{subround}(\textbf{subtraction}(\operatorname{subround}(a), \operatorname{subround}(b)))\]where \(\textbf{sub}\) is the IEEE 754 subtraction 1 operation with rounding mode and \(\operatorname{subround}\) applies a subnormals rounding mode. For overloads (1) and (2), the default rounding mode and default subnormals rounding mode are used. For overload (3),
Modedetermines the rounding mode andSubModedetermines the subnormals rounding mode.The atomic constraint of overload (3) validates that:
\(T\) is a basic floating point scalar
Modeis a precise rounding modeIf
SubModeis round subnormals to zero, then \(T\) isfloat.The values
ModeandSubModeare enumerators of their respective types.
Example
The following example shows how to use the
ct::round_subnormals_to_zero_thelper type to specify a flush to zero behavior. Note that thect::subfunction must be used instead of the operator overload when specifying an explicit subnormals rounding mode. The result of the computation below is0.0f.namespace ct = ::cuda::tiles; float result = ct::sub(0x1.1p-126f, 0x1.0p-126f, ct::round_ties_to_even_t{}, ct::round_subnormals_to_zero_t{});
Negation
-
template<ct::arithmetic_tile U>
__tile__ U operator-(U x) noexcept;
-
Performs elementwise negation of operand
x.Let \(a\) be an element of the operand
xand \(T\) its type.If \(T\) is an unsigned integral scalar, the result of the negation is \(2^n - a\), where \(n\) is the bitwidth of \(T\).
If \(T\) is a signed integral scalar, the result of the negation is \(-a\). If the result value is not representable in \(T\) for any element of the operand \(x\), the behavior is undefined.
If \(T\) is a basic floating point scalar, the result of the negation is \(\textbf{negate}(a)\) where \(\textbf{negate}\) is the IEEE 754 negate 2 operation.
Multiplication
-
template<
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs
>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> operator*(Lhs lhs, Rhs rhs) noexcept;
-
template<
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs
>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> mul(Lhs lhs, Rhs rhs) noexcept;
-
template<
ct::rounding_mode Mode = ct::default_rounding_mode(),
ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(),
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs
>
requires ct::arithmetic_tile_convertible<Lhs, Rhs> && /* atomic constraint */
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> mul(Lhs lhs, Rhs rhs, ct::rounding_mode_constant<Mode>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;
-
Performs elementwise multiplication on the arithmetic tile converted operands
lhsandrhs.Let \(a\) and \(b\) denote corresponding elements of the converted operands
lhsandrhsrespectively and let \(T\) be their type.If \(T\) is an unsigned integral scalar, the result of each multiplication is \(a \cdot b \operatorname{mod} 2^n\) where \(n\) is the bitwidth of \(T\).
If \(T\) is a signed integral scalar, the result of each multiplication is \(a \cdot b\). If this result is not representable in \(T\) for any pair of corresponding elements, the behavior is undefined.
If \(T\) is a basic floating point scalar, the result of each addition is
\[\operatorname{subround}(\textbf{multiplication}(\operatorname{subround}(a), \operatorname{subround}(b)))\]where \(\textbf{multiplication}\) is the IEEE 754 multiplication 1 operation with rounding mode and \(\operatorname{subround}\) applies a subnormals rounding mode. For overloads (1) and (2), the default rounding mode and default subnormals rounding mode are used. For overload (3),
Modespecifies the rounding mode andSubModespecifies the subnormals rounding mode.The atomic constraint of overload (3) validates that:
\(T\) is a basic floating point scalar
Modeis a precise rounding modeIf
SubModeis round subnormals to zero, then \(T\) isfloat.The values
ModeandSubModeare enumerators of their respective types.
Division
-
template<
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs
>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> operator/(Lhs dividend, Rhs divisor) noexcept;
-
template<
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs
>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> div(Lhs dividend, Rhs divisor) noexcept;
-
template<
ct::rounding_mode Mode = ct::default_rounding_mode(),
ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(),
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs
>
requires ct::arithmetic_tile_convertible<Lhs, Rhs> && /* atomic constraint */
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> div(Lhs dividend, Rhs divisor, ct::rounding_mode_constant<Mode>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;
-
Performs elementwise division of the arithmetic tile converted operands
dividendanddivisor.Let \(a\) and \(b\) denote corresponding elements of the converted operands
dividendanddivisorrespectively and let \(T\) be their type.If \(T\) is an integral scalar, the result of each division is
\[\operatorname{trunc}\left(\frac{a}{b}\right)\]where the truncation function \(\operatorname{trunc}(x)\) yields the nearest integer value to \(x\) that does not exceed \(x\) in magnitude. If the resulting value is not representable in \(T\) or if \(b\) is zero for any pair of corresponding operands, the behavior is undefined.
If \(T\) is a basic floating point scalar, the result of each division is
\[\operatorname{subround}(\textbf{div}(\operatorname{subround}(a), \operatorname{subround}(b)))\]where \(\operatorname{subround}\) applies a subnormals rounding mode and the behavior of \(\textbf{div}\) is determined by the selected rounding mode.
If the selected rounding mode is a precise rounding mode, then \(\textbf{div}\) is the IEEE 754 division 1 operation with that rounding mode.
If the selected rounding mode is round approximate, then \(\textbf{div}(x, y)\) performs a fast approximation of division using a multiplication by reciprocal.
If the selected rounding mode is round full, then \(\textbf{div}(x, y)\) performs a relatively fast approximation of division that has better accuracy across the full input range than the round approximate division.
Note
Neither the round approximate nor the round full division is IEEE 754 compliant.
For overloads (1) and (2), the default rounding mode and default subnormals rounding mode are used. For overload (3),
Modespecifies the rounding mode andSubModespecifies the subnormals rounding mode.The atomic constraint of overload (3) validates that:
\(T\) is a basic floating point scalar
If
Modeis a round full or round approximate, then \(T\) isfloat.If
SubModeis round subnormals to zero, then \(T\) isfloat.The values
ModeandSubModeare enumerators of their respective types.
Ceiling Integer Division
-
template<ct::integral_tile Lhs, ct::integral_tile Rhs>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> ceildiv(Lhs dividend, Rhs divisor) noexcept;
-
Performs elementwise integral ceiling division of the arithmetic tile converted operands
dividendanddivisor.Let \(a\) and \(b\) denote corresponding elements of the converted operands
dividendanddivisorrespectively and let \(T\) be their type.The result of each division is
\[\bigg \lceil \frac{a}{b} \bigg \rceil\]If the resulting value is not representable in \(T\) or if \(b\) is zero for any pair of corresponding operands, the behavior is undefined.
Floor Integer Division
-
template<ct::integral_tile Dividend, ct::integral_tile Divisor>
requires ct::arithmetic_tile_convertible<Dividend, Divisor>
__tile__ ct::arithmetic_tile_conversion_t<Dividend, Divisor> floordiv(Dividend dividend, Divisor divisor) noexcept;
-
Performs elementwise integral floor division of the arithmetic tile converted operands
dividendanddivisor.Let \(a\) and \(b\) denote corresponding elements of the converted operands
dividendanddivisorrespectively and let \(T\) be their type.The result of each division is
\[\bigg \lfloor \frac{a}{b} \bigg \rfloor\]If the resulting value is not representable in \(T\) or if \(b\) is zero for any pair of corresponding operands, the behavior is undefined.
Remainder
-
template<ct::integral_tile Lhs, ct::integral_tile Rhs>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> operator%(Lhs lhs, Rhs rhs) noexcept;
-
template<ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> remainder(Lhs lhs, Rhs rhs) noexcept;
-
Performs elementwise remainder on the arithmetic tile converted operands
lhsandrhs.Let
aandbbe corresponding elements of the converted operandslhsandrhsrespectively and let \(T\) be their type.If \(T\) is an integral scalar, each remainder is computed as
\[a - \operatorname{trunc}\left( \frac{a}{b} \right) * b\]where the truncation function \(\operatorname{trunc}(x)\) yields the nearest integer value to \(x\) that does not exceed \(x\) in magnitude. The behavior is undefined if \(b\) is zero or if \(\operatorname{trunc}\left( \frac{a}{b} \right)\) is not representable in \(T\) for any pair of corresponding elements in the converted operands.
If \(T\) is a basic floating point scalar, each remainder is computed as follows:
\[a - \operatorname{trunc}\left(\frac{a}{b}\right) \cdot b\]where the truncation function \(\operatorname{trunc}(x)\) yields the nearest integer value to \(x\) that does not exceed \(x\) in magnitude. If the resulting value is \(0\), the sign shall match the sign of \(a\). If \(b\) is zero the result is \(\text{NaN}\).
Non finite values for \(a\) or \(b\) are handled as follows:
If either \(a\) or \(b\) is \(\text{NaN}\), the result is \(\text{NaN}\).
If \(a\) is infinite and \(b\) is finite, the result is \(\text{NaN}\).
If \(a\) is finite and \(b\) is infinite, the result is \(a\).
If \(a\) and \(b\) are both infinite, the behavior is unspecified.
Note
The C++ STL function
std::remainderon integral arguments performs a floating point remainder whereasct::remainderperforms integral remainder in this case.
Comparison Operators
-
template<ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs>
requires ct::arithmetic_tile_comparable<Lhs, Rhs>
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator==(Lhs lhs, Rhs rhs) noexcept;
-
template<ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs>
requires ct::arithmetic_tile_comparable<Lhs, Rhs>
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator!=(Lhs lhs, Rhs rhs) noexcept;
-
template<ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs>
requires ct::arithmetic_tile_comparable<Lhs, Rhs>
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator<(Lhs lhs, Rhs rhs) noexcept;
-
template<ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs>
requires ct::arithmetic_tile_comparable<Lhs, Rhs>
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator<=(Lhs lhs, Rhs rhs) noexcept;
-
template<ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs>
requires ct::arithmetic_tile_comparable<Lhs, Rhs>
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator>(Lhs lhs, Rhs rhs) noexcept;
-
template<ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs>
requires ct::arithmetic_tile_comparable<Lhs, Rhs>
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator>=(Lhs lhs, Rhs rhs) noexcept;
-
Performs elementwise comparison of the arithmetic comparison converted operands
lhsandrhs.Let
aandbbe corresponding elements of the converted operandslhsandrhsrespectively and let \(T\) be their type.If \(T\) is an integral scalar, the result of the comparison is one of the following mathematical predicates according to the selected operator:
Operator
Result
==
\(a = b\)
!=
\(a \neq b\)
<
\(a < b\)
<=
\(a \leq b\)
>
\(a > b\)
>=
\(a \geq b\)
If \(T\) is a basic floating point scalar, the result of each comparison is one of the following IEEE 754 comparison predicates 4 determined according to the operator:
Operator
Result
==
\(\operatorname{compareQuietEqual}(a, b)\)
!=
\(\operatorname{compareQuietNotEqual}(a, b)\)
<
\(\operatorname{compareSignalingLess}(a, b)\)
<=
\(\operatorname{compareSignalingLessEqual}(a, b)\)
>
\(\operatorname{compareSignalingGreater}(a, b)\)
>=
\(\operatorname{compareSignalingGreaterEqual}(a, b)\)
Note
For all the above predicates except \(\operatorname{compareQuietNotEqual}\), a
NaNcomparesfalseto all other values. For details, refer to the IEEE 754 comparison predicates 4.
Bitwise And
-
template<ct::integral_tile Lhs, ct::integral_tile Rhs>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> operator&(Lhs lhs, Rhs rhs) noexcept;
-
Performs elementwise bitwise AND on the arithmetic tile converted operands
lhsandrhs.Let \(a\) and \(b\) be corresponding elements of the converted operands, and let \(a_i\) and \(b_i\) be the \(i^{th}\) bit of the base two representation of \(a\) and \(b\) respectively. The result of each computation is determined by:
\[r_i = \operatorname{AND}(a_i, b_i)\]where \(r_i\) is the \(i^{th}\) bit of the base two representation of the result and \(\operatorname{AND}(x, y)\) is defined by the following truth table:
\(x\)
\(y\)
\(\operatorname{AND}(x, y)\)
0
0
0
0
1
0
1
0
0
1
1
1
Bitwise Or
-
template<ct::integral_tile Lhs, ct::integral_tile Rhs>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> operator|(Lhs lhs, Rhs rhs) noexcept;
-
Performs elementwise bitwise OR on the arithmetic tile converted operands
lhsandrhs.Let \(a\) and \(b\) be corresponding elements of the converted operands, and let \(a_i\) and \(b_i\) be the \(i^{th}\) bit of the base two representation of \(a\) and \(b\) respectively. The result of each computation is determined by:
\[r_i = \operatorname{OR}(a_i, b_i)\]where \(r_i\) is the \(i^{th}\) bit of the base two representation of the result and \(\operatorname{OR}(x, y)\) is defined by the following truth table:
\(x\)
\(y\)
\(\operatorname{OR}(x, y)\)
0
0
0
0
1
1
1
0
1
1
1
1
Bitwise Xor
-
template<ct::integral_tile Lhs, ct::integral_tile Rhs>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> operator^(Lhs lhs, Rhs rhs) noexcept;
-
Performs elementwise bitwise XOR on the arithmetic tile converted operands
lhsandrhs.Let \(a\) and \(b\) be corresponding elements of the converted operands, and let \(a_i\) and \(b_i\) be the \(i^{th}\) bit of the base two representation of \(a\) and \(b\) respectively. The result of each computation is determined by:
\[r_i = \operatorname{XOR}(a_i, b_i)\]where \(r_i\) is the \(i^{th}\) bit of the base two representation of the result and \(\operatorname{XOR}(x, y)\) is defined by the following truth table:
\(x\)
\(y\)
\(\operatorname{XOR}(x, y)\)
0
0
0
0
1
1
1
0
1
1
1
0
Bitwise Complement
-
template<ct::integral_tile T>
requires /* atomic constraint */
__tile__ T operator~(T in) noexcept;
-
Performs elementwise complement on the operand
in.Let \(a\) be an element of
inand let \(a_i\) be the \(i^{th}\) bit of the base two representation of \(a\). The \(i^{th}\) bit \(r_i\) in the base two representation of the each result is determined as\[\begin{split}r_i = \begin{cases} 1 \quad \text{if} \quad a_i = 0 \\ 0 \quad \text{if} \quad a_i = 1 \\ \end{cases}\end{split}\]The atomic constraint validates that the element type of
Tis notbool.
Left Bitshift
-
template<ct::integral_tile Lhs, ct::integral_tile Rhs>
requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, ct::tile_element_t<Lhs>> operator<<(Lhs lhs, Rhs rhs) noexcept;
-
Performs elementwise left bitshift of the mutual broadcast converted operands
lhsandrhs.Let \(a\) and \(b\) be corresponding elements of the converted operands
lhsandrhsrespectively. Let \(T\) be the type of \(a\) and \(N\) its bitwidth.The result of the operation is the unique integer representable in \(T\) which is congruent to \(a \cdot 2^b\) modulo \(2^N\).
If any element of the converted
rhsis negative, the behavior is undefined. If any element of the convertedrhsis greater than or equal to the \(N\), the behavior is undefined.The atomic constraint validates that \(T\) is not
bool.
Right Bitshift
-
template<ct::integral_tile Lhs, ct::integral_tile Rhs>
requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, ct::tile_element_t<Lhs>> operator>>(Lhs lhs, Rhs rhs) noexcept;
-
Performs elementwise right bitshift of the mutual broadcast converted operands
lhsandrhs.Let \(a\) and \(b\) be corresponding elements of the converted operands
lhsandrhsrespectively. Let \(T\) be the type of \(a\) and \(N\) its bitwidth.The result of each operation is
\[\bigg \lfloor \frac{a}{2^b} \bigg \rfloor\]If any element of the converted
rhsis negative, the behavior is undefined. If any element of the convertedrhsis greater than or equal to the \(N\), the behavior is undefined.The atomic constraint validates that \(T\) is not
bool.
Logical Conjunction
-
template<ct::bool_tile_convertible Lhs, ct::bool_tile_convertible Rhs>
requires ct::broadcast_compatible<Lhs, Rhs>
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator&&(Lhs lhs, Rhs rhs) noexcept;
-
Performs elementwise logical conjunction of the mutual broadcasted bool tile converted operands
lhsandrhs.Let \(a\) and \(b\) be corresponding elements of the converted operands. The result of each operation is determined by the following table:
\(a\)
\(b\)
Result
falsefalsefalsefalsetruefalsetruefalsefalsetruetruetrue
Logical Disjunction
-
template<ct::bool_tile_convertible Lhs, ct::bool_tile_convertible Rhs>
requires ct::broadcast_compatible<Lhs, Rhs>
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator||(Lhs lhs, Rhs rhs) noexcept;
-
Performs elementwise logical disjunction of the mutual broadcasted bool tile converted operands
lhsandrhs.Let \(a\) and \(b\) be corresponding elements of the converted operands. The result of each operation is determined by the following table:
\(a\)
\(b\)
Result
falsefalsefalsefalsetruetruetruefalsetruetruetruetrue
Logical Negation
-
template<ct::bool_tile_convertible T>
__tile__ ct::tile_with_element_t<T, bool> operator!(T x) noexcept;
-
Performs elementwise logical negation on the bool tile converted operand
x.Let \(a\) be an element of
x. The result of each operation is\[\begin{split}\begin{cases} \text{false} \quad \text{if a is true} \\ \text{true} \quad \text{if a is false} \end{cases}\end{split}\]
Maximum
-
template<
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs
>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> max(Lhs lhs, Rhs rhs) noexcept;
-
template<
ct::nan_propagation_mode NanMode = ct::default_nan_propagation_mode(),
ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(),
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs
>
requires ct::arithmetic_tile_convertible<Lhs, Rhs> && /* atomic constraint */
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> max(Lhs lhs, Rhs rhs, ct::nan_propagation_mode_constant<NanMode>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;
-
Performs elementwise maximum on the arithmetic tile converted operands
lhsandrhs.Let \(a\) and \(b\) be corresponding elements of the converted operands
lhsandrhsrespectively and let \(T\) be their type.If \(T\) is an integral scalar, the result of each computation is
\[\begin{split}\begin{cases} a \quad a > b \\ b \quad a \leq b \\ \end{cases}\end{split}\]If \(T\) is a basic floating point scalar, the result of each computation is
\[\operatorname{subround}(\textbf{maximum}(a, b))\]where \(\operatorname{subround}\) applies a subnormals rounding mode and the behavior of \(\textbf{maximum}\) is determined according to the selected NaN propagation mode.
If the selected NaN propagation mode is suppress NaN, then \(\textbf{maximum}\) refers to the IEEE 754 maximumNumber 3 function.
If the selected NaN propagation mode is propagate NaN, then \(\textbf{maximum}\) refers to the IEEE 754 maximum 3 function.
For overload (1), the default nan propagation mode and default subnormals rounding mode are selected. For overload (2), the NaN propagation mode and subnormals rounding mode are determined by
NanModeandSubModerespectively.The atomic constraint of overload (2) validates that:
\(T\) is a basic floating point scalar
If
SubModeis round subnormals to zero, then \(T\) isfloat.The values
NanModeandSubModeare enumerators of their respective types.
Minimum
-
template<
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs
>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> min(Lhs lhs, Rhs rhs) noexcept;
-
template<
ct::nan_propagation_mode NanMode = ct::default_nan_propagation_mode(),
ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(),
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs
>
requires ct::arithmetic_tile_convertible<Lhs, Rhs> && /* atomic constraint */
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> min(Lhs lhs, Rhs rhs, ct::nan_propagation_mode_constant<NanMode>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;
-
Performs elementwise minimum on the arithmetic tile converted operands
lhsandrhs.Let \(a\) and \(b\) be corresponding elements of the converted operands
lhsandrhsrespectively and let \(T\) be their type.If \(T\) is an integral scalar, the result of each computation is
\[\begin{split}\begin{cases} b \quad a > b \\ a \quad a \leq b \\ \end{cases}\end{split}\]If \(T\) is a basic floating point scalar, the result of each computation is
\[\operatorname{subround}(\textbf{min}(a, b))\]where \(\operatorname{subround}\) applies a subnormals rounding mode and the behavior of \(\textbf{min}\) is determined according to the selected NaN propagation mode.
If the selected nan propagation mode is suppress NaN, then \(\textbf{min}\) refers to the IEEE 754 minimumNumber 3 function.
If the selected nan propagation mode is propagate NaN, then \(\textbf{min}\) refers to the IEEE 754 minimum 3 function.
For overload (1), the default nan propagation mode and default subnormals rounding mode are selected. For overload (2), the NaN propagation mode and subnormals rounding mode are determined by
NanModeandSubModerespectively.The atomic constraint of overload (2) validates that:
\(T\) is a basic floating point scalar
If
SubModeis round subnormals to zero, then \(T\) isfloat.The values
NanModeandSubModeare enumerators of their respective types.
Absolute Value
-
template<ct::arithmetic_tile U>
__tile__ U abs(U x) noexcept;
-
Yields the elementwise absolute value of the operand
x.Let \(a\) be an element of
xand \(T\) its type.If \(T\) is an integral scalar, the result of each computation is absolute value \(|a|\). If this value is not representable in \(T\) the behavior is undefined.
If \(T\) is a basic floating point scalar, the result of each computation is
\[\textbf{abs}(a)\]where \(\textbf{abs}\) is the IEEE 754 abs 2 function.
Fused Multiply Add
-
template<
ct::rounding_mode Mode = ct::default_rounding_mode(),
ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(),
ct::arithmetic_tile Lhs,
ct::arithmetic_tile Rhs,
ct::basic_floating_point_tile Acc
>
requires /* atomic constraint */
__tile__ Acc fma(Lhs lhs, Rhs rhs, Acc acc, ct::rounding_mode_constant<Mode> = {}, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;
-
Performs elementwise fused multiplication-addition of the operands
lhsrhsandacc. Thelhsandrhsoperands are broadcast converted to the shape ofacc, then tile converted the typeAcc.Let \(a\) and \(b\) and \(c\) be corresponding elements of the converted operands
lhs,rhs, andaccrespectively. The result of each computation is\[\operatorname{subround}(\textbf{fusedMultiplyAdd}(\operatorname{subround}(a), \operatorname{subround}(b), \operatorname{subround}(acc)))\]where \(\textbf{fusedMultiplyAdd}\) is the IEEE 754 fusedMultiplyAdd 1 operation with rounding mode and \(\operatorname{subround}\) applies a subnormals rounding mode.
Modespecifies the rounding mode andSubModespecifies the subnormals rounding mode.The atomic constraint validates that
lhsandrhsare broadcast convertible to the shape ofacc.The element types of
lhsandrhsare non-narrowing scalar convertible to the element type ofacc.Modeis a precise rounding modeIf
SubModeis round subnormals to zero, then \(T\) isfloat.The values
ModeandSubModeare enumerators of their respective types.
Multiply High Bits
-
template<ct::integral_tile Lhs, ct::integral_tile Rhs>
requires ct::arithmetic_tile_convertible<Lhs, Rhs>
__tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> mulhi(Lhs lhs, Rhs rhs) noexcept;
-
Performs elementwise operation multiplication of the arithmetic tile converted operands
lhsandrhsand yields only the upper \(N\) bit result of the \(2N\) bit product.Let \(a\) and \(b\) be corresponding elements of the converted operands and let \(T\) be their type and \(N\) their bitwidth.
If \(T\) is an unsigned integral scalar, the result of each computation is
\[\textbf{mulhiUnsigned(a, b)} = \bigg \lfloor \frac{a \cdot b}{2^N} \bigg \rfloor\]This corresponds to multiplication of \(a\) and \(b\) in an integer type of twice the bitwidth of \(T\) followed by a \(N\) bit right shift.
If \(T\) is a signed integral scalar, the result of each computation is the unique integer representable in \(T\) that is congruent module \(2^N\) to
\[\textbf{mulhiUnsigned}(a \operatorname{mod} 2^N, b \operatorname{mod} 2^N)\]This corresponds to an unsigned mulhi operation where the operands are reinterpreted as unsigned integers.
Pointer No-Op
-
template<ct::pointer_tile Tile>
__tile__ Tile operator+(Tile x) noexcept;
-
Yields the value
xwithout modification.
Pointer Addition
-
template<ct::integral_tile Offset, ct::pointer_tile Ptr>
requires ct::broadcast_compatible<Offset, Ptr> && /* atomic constraint */
__tile__ ct::mutual_broadcast_t<Offset, Ptr, ct::tile_element_t<Ptr>> operator+(Offset offset, Ptr ptr) noexcept;
-
template<ct::pointer_tile Ptr, ct::integral_tile Offset>
requires ct::broadcast_compatible<Ptr, Offset> && /* atomic constraint */
__tile__ ct::mutual_broadcast_t<Ptr, Offset, ct::tile_element_t<Ptr>> operator+(Ptr ptr, Offset offset) noexcept;
-
Performs elementwise operation pointer addition of the mutual broadcast converted operands
ptrandoffset.Let \(p\) and \(x\) denote corresponding elements of the converted operands
ptrandoffset. The result of each computation is the result of evaluating the C++ pointer addition operation 5p + x.If this expression would yield undefined behavior for any pair of corresponding elements, the behavior of the operation as a whole is undefined.
The atomic constraint tests that the element type of
ptris not a pointer to (possibly cv-qualified)void.Note
The builtin C++ pointer addition operation is well defined only if the resulting pointer is valid in the C++ object model. For details see [expr.add] § 7.6.6 of ISO/IEC 14882:2024.
Pointer Subtraction
-
template<ct::pointer_tile Ptr, ct::integral_tile Offset>
requires ct::broadcast_compatible<Ptr, Offset> && /* atomic constraint */
__tile__ ct::mutual_broadcast_t<Ptr, Offset, ct::tile_element_t<Ptr>> operator-(Ptr ptr, Offset offset) noexcept;
-
Performs elementwise operation pointer subtraction of the mutual broadcast converted operands
ptrandoffset.Let \(p\) and \(x\) denote corresponding elements of the converted operands
ptrandoffset. The result of each computation is the result of evaluating the C++ pointer subtraction operation 5p - x.If this expression would yield undefined behavior for any pair of corresponding elements, the behavior of the operation as a whole is undefined.
The atomic constraint tests that the element type of
ptris not a pointer to (possibly cv-qualified)void.Note
The builtin C++ pointer subtraction operation is well defined only if the resulting pointer is valid in the C++ object model. For details see [expr.add] § 7.6.6 of ISO/IEC 14882:2024.
Pointer Difference
-
template<ct::pointer_tile Lhs, ct::pointer_tile Rhs>
requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, ptrdiff_t> operator-(Lhs lhs, Rhs rhs) noexcept;
-
Performs elementwise operation pointer difference on the mutual broadcast converted operands
lhsandrhs.Let \(a\) and \(b\) denote corresponding elements of the converted operands
lhsandrhsrespectively. The result of each computation is the result of evaluating the C++ pointer difference operation 5a - b.If this expression would yield undefined behavior for any pair of corresponding elements, the behavior of the operation as a whole is undefined.
The atomic constraint validates that the element type of both
lhsandrhsmay participate in the builtin difference operation.Note
The builtin C++ pointer difference operation is well defined only if the arguments are appropriate pointers in the C++ object model. For details see [expr.add] § 7.6.6 of ISO/IEC 14882:2024.
Pointer Comparison
-
template<ct::pointer_tile Lhs, ct::pointer_tile Rhs>
requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator==(Lhs lhs, Rhs rhs) noexcept;
-
template<ct::pointer_tile Lhs, ct::pointer_tile Rhs>
requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator!=(Lhs lhs, Rhs rhs) noexcept;
-
template<ct::pointer_tile Lhs, ct::pointer_tile Rhs>
requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator<(Lhs lhs, Rhs rhs) noexcept;
-
template<ct::pointer_tile Lhs, ct::pointer_tile Rhs>
requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator<=(Lhs lhs, Rhs rhs) noexcept;
-
template<ct::pointer_tile Lhs, ct::pointer_tile Rhs>
requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator>(Lhs lhs, Rhs rhs) noexcept;
-
template<ct::pointer_tile Lhs, ct::pointer_tile Rhs>
requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */
__tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator>=(Lhs lhs, Rhs rhs) noexcept;
-
Performs elementwise operation pointer comparison of the mutual broadcast converted operands
lhsandrhs.Let \(a\) and \(b\) be corresponding elements of the converted operands
lhsandrhsrespectively and let@denote the selected overloaded operator. The result of each computation is the result of evaluating the expressiona @ bwhich is either a builtin relation operator expression 6 or a builtin equality operator expression 7.The atomic constraint validates the element types of
LhsandRhsmay participate in the builtin C++ relational and comparison operators.Note
The result of a the builtin relational and equality comparison operators may be unspecified if the pointer arguments are not appropriately arranged in the C++ object model. For details, see [expr.rel] § 7.6.9 and [expr.eq] § 7.6.10 of ISO/IEC 14882:2024.
Null Pointer Equality
-
template<ct::pointer_tile Ptrs>
__tile__ ct::tile_with_element_t<Ptrs, bool> operator==(Ptrs ptrs, nullptr-t) noexcept;
-
template<ct::pointer_tile Ptrs>
__tile__ ct::tile_with_element_t<Ptrs, bool> operator==(nullptr-t, Ptrs ptrs) noexcept;
-
Performs elementwise operation equality comparison to a null pointer value. Let \(a\) be an element of operand
ptrs.The result of each computation is
trueif \(a\) is a null pointer value andfalseotherwise.
Null Pointer Inequality
-
template<ct::pointer_tile Ptrs>
__tile__ ct::tile_with_element_t<Ptrs, bool> operator!=(Ptrs ptrs, nullptr-t) noexcept;
-
template<ct::pointer_tile Ptrs>
__tile__ ct::tile_with_element_t<Ptrs, bool> operator!=(nullptr-t, Ptrs ptrs) noexcept;
-
Performs elementwise operation inequality comparison to a null pointer value. Let \(a\) be an element of operand
ptrs.The result of each computation is
falseif \(a\) is a null pointer value andtrueotherwise.
Footnotes
- 1(1,2,3,4,5)
-
See Arithmetic operations § 5.4.1 of IEEE 754-2019
- 2(1,2)
-
See Sign bit operations § 5.5.1 of IEEE 754-2019
- 3(1,2,3,4)
-
See Minimum and maximum operations § 9.6 of IEEE 754-2019
- 4(1,2)
-
See Details of comparison predicates § 5.11 of IEEE 754-2019
- 5(1,2,3)
-
See Additive operators [expr.add] § 7.6.6 of ISO/IEC 14882:2024
- 6
-
See Relational operators [expr.rel] § 7.6.9 of ISO/IEC 14882:2024
- 7
-
See Equality operators [expr.eq] § 7.6.10 of ISO/IEC 14882:2024