Arithmetic Operations

This section documents the APIs for performing arithmetic on tiles.

Addition

template< ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs > requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> operator+(Lhs lhs, Rhs rhs) noexcept;

template< ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs > requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> add(Lhs lhs, Rhs rhs) noexcept;

template< ct::rounding_mode Mode = ct::default_rounding_mode(), ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(), ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs > requires ct::arithmetic_tile_convertible<Lhs, Rhs> && /* atomic constraint */ __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> add(Lhs lhs, Rhs rhs, ct::rounding_mode_constant<Mode>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;

Performs elementwise addition on the arithmetic tile converted operands lhs and rhs.

Let \(a\) and \(b\) be corresponding elements of the converted operands and let \(T\) be their type.

If \(T\) is an unsigned integral scalar, the result of each addition is \(a + b \operatorname{mod} 2^n\) where \(n\) is the bitwidth of \(T\).

If \(T\) is a signed integral scalar, the result of each addition is \(a + b\). The behavior is undefined if \(a + b\) is not representable in \(T\) for any pair of corresponding elements in the converted operands.

If \(T\) is a basic floating point scalar, the result of each addition is

\[\operatorname{subround}(\textbf{addition}(\operatorname{subround}(a), \operatorname{subround}(b)))\]

where \(\textbf{addition}\) is the IEEE 754 addition 1 with rounding mode and \(\operatorname{subround}\) applies a subnormals rounding mode. For overloads (1) and (2), the default rounding mode and default subnormals rounding mode are used. For overload (3), Mode determines the rounding mode and SubMode determines the subnormals rounding mode.

The atomic constraint of overload (3) validates that:

\(T\) is a basic floating point scalar
Mode is a precise rounding mode
If SubMode is round subnormals to zero, then \(T\) is float.
The values Mode and SubMode are enumerators of their respective types.

Example

The following example shows how arithmetic tile conversion is used in the addition APIs. The first operand (an integer scalar) is broadcasted and converted to match the type of the second operand (a \(2 \times 2\) floating point tile).

namespace ct = ::cuda::tiles;
using i32x2x2 = ct::tile<int, ct::shape<2, 2>>;
using f32x2x2 = ct::tile<float, ct::shape<2, 2>>;

float data[2][2] = {
  {0.0, 1.5},
  {3.0, 3.5},
};

f32x2x2 x = ct::load(&data[0][0] + ct::iota<i32x2x2>());
f32x2x2 result = 5 + x;

The result holds the following value after the code runs:

\[\begin{split}\begin{pmatrix} 5.0 & 6.5 \\ 8.0 & 8.5 \end{pmatrix}\end{split}\]

Example

The following example uses the ct::round_toward_negative_t helper type to select a rounding mode for the addition. Note that the ct::add API must be used instead of the operator overload when specifying an explicit rounding mode:

namespace ct = ::cuda::tiles;
float eps = 0x0.000002p0f;
float result = ct::add(8.0f, 5 * eps, ct::round_toward_negative_t{});

The result of the addition \(8 + 5\varepsilon\) is rounded toward negative infinity to yield the value 8.0f.

Arithmetic Promotion

template<ct::arithmetic_tile T> __tile__ ct::arithmetic_tile_promotion_t<T> operator+(T x) noexcept;

Yields the result of arithmetic tile promotion on the operand x.

Example

In the following example, the tile of char elements is promoted to a tile of int elements. The numeric values are unchanged after the operation:

namespace ct = ::cuda::tiles;
using i8x2x2 = ct::tile<char, ct::shape<2, 2>>;
using i32x2x2 = ct::tile<int, ct::shape<2, 2>>;

char data[2][2] = {
  {0, 1},
  {2, 3},
};

i8x2x2 x = ct::load(&data[0][0] + ct::iota<i32x2x2>());
i32x2x2 result = +x;

Subtraction

template< ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs > requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> operator-(Lhs lhs, Rhs rhs) noexcept;

template< ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs > requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> sub(Lhs lhs, Rhs rhs) noexcept;

template< ct::rounding_mode Mode = ct::default_rounding_mode(), ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(), ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs > requires ct::arithmetic_tile_convertible<Lhs, Rhs> && /* atomic constraint */ __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> sub(Lhs lhs, Rhs rhs, ct::rounding_mode_constant<Mode>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;

Performs elementwise subtraction of the arithmetic tile converted operands lhs and rhs.

Let \(a\) and \(b\) denote corresponding elements of the converted operands lhs and rhs respectively and let \(T\) be their type.

If \(T\) is an unsigned integral scalar, the result of each subtraction is \(a - b \operatorname{mod} 2^n\) where \(n\) is bitwidth of \(T\).

If \(T\) is a signed integral scalar, the result of each subtraction is \(a - b\). The behavior is undefined if \(a - b\) is not representable in \(T\) for any pair of corresponding elements in the converted operands.

If \(T\) is a basic floating point scalar, the result of each subtraction is

\[\operatorname{subround}(\textbf{subtraction}(\operatorname{subround}(a), \operatorname{subround}(b)))\]

where \(\textbf{sub}\) is the IEEE 754 subtraction 1 operation with rounding mode and \(\operatorname{subround}\) applies a subnormals rounding mode. For overloads (1) and (2), the default rounding mode and default subnormals rounding mode are used. For overload (3), Mode determines the rounding mode and SubMode determines the subnormals rounding mode.

The atomic constraint of overload (3) validates that:

\(T\) is a basic floating point scalar
Mode is a precise rounding mode
If SubMode is round subnormals to zero, then \(T\) is float.
The values Mode and SubMode are enumerators of their respective types.

Example

The following example shows how to use the ct::round_subnormals_to_zero_t helper type to specify a flush to zero behavior. Note that the ct::sub function must be used instead of the operator overload when specifying an explicit subnormals rounding mode. The result of the computation below is 0.0f.

namespace ct = ::cuda::tiles;
float result = ct::sub(0x1.1p-126f, 0x1.0p-126f,
                       ct::round_ties_to_even_t{},
                       ct::round_subnormals_to_zero_t{});

Negation

template<ct::arithmetic_tile U> __tile__ U operator-(U x) noexcept;

Performs elementwise negation of operand x.

Let \(a\) be an element of the operand x and \(T\) its type.

If \(T\) is an unsigned integral scalar, the result of the negation is \(2^n - a\), where \(n\) is the bitwidth of \(T\).

If \(T\) is a signed integral scalar, the result of the negation is \(-a\). If the result value is not representable in \(T\) for any element of the operand \(x\), the behavior is undefined.

If \(T\) is a basic floating point scalar, the result of the negation is \(\textbf{negate}(a)\) where \(\textbf{negate}\) is the IEEE 754 negate 2 operation.

Multiplication

template< ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs > requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> operator*(Lhs lhs, Rhs rhs) noexcept;

template< ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs > requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> mul(Lhs lhs, Rhs rhs) noexcept;

template< ct::rounding_mode Mode = ct::default_rounding_mode(), ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(), ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs > requires ct::arithmetic_tile_convertible<Lhs, Rhs> && /* atomic constraint */ __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> mul(Lhs lhs, Rhs rhs, ct::rounding_mode_constant<Mode>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;

Performs elementwise multiplication on the arithmetic tile converted operands lhs and rhs.

Let \(a\) and \(b\) denote corresponding elements of the converted operands lhs and rhs respectively and let \(T\) be their type.

If \(T\) is an unsigned integral scalar, the result of each multiplication is \(a \cdot b \operatorname{mod} 2^n\) where \(n\) is the bitwidth of \(T\).

If \(T\) is a signed integral scalar, the result of each multiplication is \(a \cdot b\). If this result is not representable in \(T\) for any pair of corresponding elements, the behavior is undefined.

If \(T\) is a basic floating point scalar, the result of each addition is

\[\operatorname{subround}(\textbf{multiplication}(\operatorname{subround}(a), \operatorname{subround}(b)))\]

where \(\textbf{multiplication}\) is the IEEE 754 multiplication 1 operation with rounding mode and \(\operatorname{subround}\) applies a subnormals rounding mode. For overloads (1) and (2), the default rounding mode and default subnormals rounding mode are used. For overload (3), Mode specifies the rounding mode and SubMode specifies the subnormals rounding mode.

The atomic constraint of overload (3) validates that:

\(T\) is a basic floating point scalar
Mode is a precise rounding mode
If SubMode is round subnormals to zero, then \(T\) is float.
The values Mode and SubMode are enumerators of their respective types.

Division

template< ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs > requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> operator/(Lhs dividend, Rhs divisor) noexcept;

template< ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs > requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> div(Lhs dividend, Rhs divisor) noexcept;

template< ct::rounding_mode Mode = ct::default_rounding_mode(), ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(), ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs > requires ct::arithmetic_tile_convertible<Lhs, Rhs> && /* atomic constraint */ __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> div(Lhs dividend, Rhs divisor, ct::rounding_mode_constant<Mode>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;

Performs elementwise division of the arithmetic tile converted operands dividend and divisor.

Let \(a\) and \(b\) denote corresponding elements of the converted operands dividend and divisor respectively and let \(T\) be their type.

If \(T\) is an integral scalar, the result of each division is

\[\operatorname{trunc}\left(\frac{a}{b}\right)\]

where the truncation function \(\operatorname{trunc}(x)\) yields the nearest integer value to \(x\) that does not exceed \(x\) in magnitude. If the resulting value is not representable in \(T\) or if \(b\) is zero for any pair of corresponding operands, the behavior is undefined.

If \(T\) is a basic floating point scalar, the result of each division is

\[\operatorname{subround}(\textbf{div}(\operatorname{subround}(a), \operatorname{subround}(b)))\]

where \(\operatorname{subround}\) applies a subnormals rounding mode and the behavior of \(\textbf{div}\) is determined by the selected rounding mode.

If the selected rounding mode is a precise rounding mode, then \(\textbf{div}\) is the IEEE 754 division 1 operation with that rounding mode.

If the selected rounding mode is round approximate, then \(\textbf{div}(x, y)\) performs a fast approximation of division using a multiplication by reciprocal.

If the selected rounding mode is round full, then \(\textbf{div}(x, y)\) performs a relatively fast approximation of division that has better accuracy across the full input range than the round approximate division.

Note

Neither the round approximate nor the round full division is IEEE 754 compliant.

For overloads (1) and (2), the default rounding mode and default subnormals rounding mode are used. For overload (3), Mode specifies the rounding mode and SubMode specifies the subnormals rounding mode.

The atomic constraint of overload (3) validates that:

\(T\) is a basic floating point scalar
If Mode is a round full or round approximate, then \(T\) is float.
If SubMode is round subnormals to zero, then \(T\) is float.
The values Mode and SubMode are enumerators of their respective types.

Ceiling Integer Division

template<ct::integral_tile Lhs, ct::integral_tile Rhs> requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> ceildiv(Lhs dividend, Rhs divisor) noexcept;

Performs elementwise integral ceiling division of the arithmetic tile converted operands dividend and divisor.

Let \(a\) and \(b\) denote corresponding elements of the converted operands dividend and divisor respectively and let \(T\) be their type.

The result of each division is

\[\bigg \lceil \frac{a}{b} \bigg \rceil\]

If the resulting value is not representable in \(T\) or if \(b\) is zero for any pair of corresponding operands, the behavior is undefined.

Floor Integer Division

template<ct::integral_tile Dividend, ct::integral_tile Divisor> requires ct::arithmetic_tile_convertible<Dividend, Divisor> __tile__ ct::arithmetic_tile_conversion_t<Dividend, Divisor> floordiv(Dividend dividend, Divisor divisor) noexcept;

Performs elementwise integral floor division of the arithmetic tile converted operands dividend and divisor.

Let \(a\) and \(b\) denote corresponding elements of the converted operands dividend and divisor respectively and let \(T\) be their type.

The result of each division is

\[\bigg \lfloor \frac{a}{b} \bigg \rfloor\]

If the resulting value is not representable in \(T\) or if \(b\) is zero for any pair of corresponding operands, the behavior is undefined.

Remainder

template<ct::integral_tile Lhs, ct::integral_tile Rhs> requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> operator%(Lhs lhs, Rhs rhs) noexcept;

template<ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs> requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> remainder(Lhs lhs, Rhs rhs) noexcept;

Performs elementwise remainder on the arithmetic tile converted operands lhs and rhs.

Let a and b be corresponding elements of the converted operands lhs and rhs respectively and let \(T\) be their type.

If \(T\) is an integral scalar, each remainder is computed as

\[a - \operatorname{trunc}\left( \frac{a}{b} \right) * b\]

where the truncation function \(\operatorname{trunc}(x)\) yields the nearest integer value to \(x\) that does not exceed \(x\) in magnitude. The behavior is undefined if \(b\) is zero or if \(\operatorname{trunc}\left( \frac{a}{b} \right)\) is not representable in \(T\) for any pair of corresponding elements in the converted operands.

If \(T\) is a basic floating point scalar, each remainder is computed as follows:

\[a - \operatorname{trunc}\left(\frac{a}{b}\right) \cdot b\]

where the truncation function \(\operatorname{trunc}(x)\) yields the nearest integer value to \(x\) that does not exceed \(x\) in magnitude. If the resulting value is \(0\), the sign shall match the sign of \(a\). If \(b\) is zero the result is \(\text{NaN}\).

Non finite values for \(a\) or \(b\) are handled as follows:

If either \(a\) or \(b\) is \(\text{NaN}\), the result is \(\text{NaN}\).
If \(a\) is infinite and \(b\) is finite, the result is \(\text{NaN}\).
If \(a\) is finite and \(b\) is infinite, the result is \(a\).
If \(a\) and \(b\) are both infinite, the behavior is unspecified.

Note

The C++ STL function std::remainder on integral arguments performs a floating point remainder whereas ct::remainder performs integral remainder in this case.

Comparison Operators

template<ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs> requires ct::arithmetic_tile_comparable<Lhs, Rhs> __tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator==(Lhs lhs, Rhs rhs) noexcept;

template<ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs> requires ct::arithmetic_tile_comparable<Lhs, Rhs> __tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator!=(Lhs lhs, Rhs rhs) noexcept;

template<ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs> requires ct::arithmetic_tile_comparable<Lhs, Rhs> __tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator<(Lhs lhs, Rhs rhs) noexcept;

template<ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs> requires ct::arithmetic_tile_comparable<Lhs, Rhs> __tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator<=(Lhs lhs, Rhs rhs) noexcept;

template<ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs> requires ct::arithmetic_tile_comparable<Lhs, Rhs> __tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator>(Lhs lhs, Rhs rhs) noexcept;

template<ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs> requires ct::arithmetic_tile_comparable<Lhs, Rhs> __tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator>=(Lhs lhs, Rhs rhs) noexcept;

Performs elementwise comparison of the arithmetic comparison converted operands lhs and rhs.

Let a and b be corresponding elements of the converted operands lhs and rhs respectively and let \(T\) be their type.

If \(T\) is an integral scalar, the result of the comparison is one of the following mathematical predicates according to the selected operator:

Operator	Result
==	\(a = b\)
!=	\(a \neq b\)
<	\(a < b\)
<=	\(a \leq b\)
>	\(a > b\)
>=	\(a \geq b\)

If \(T\) is a basic floating point scalar, the result of each comparison is one of the following IEEE 754 comparison predicates 4 determined according to the operator:

Operator	Result
==	\(\operatorname{compareQuietEqual}(a, b)\)
!=	\(\operatorname{compareQuietNotEqual}(a, b)\)
<	\(\operatorname{compareSignalingLess}(a, b)\)
<=	\(\operatorname{compareSignalingLessEqual}(a, b)\)
>	\(\operatorname{compareSignalingGreater}(a, b)\)
>=	\(\operatorname{compareSignalingGreaterEqual}(a, b)\)

Note

For all the above predicates except \(\operatorname{compareQuietNotEqual}\), a NaN compares false to all other values. For details, refer to the IEEE 754 comparison predicates 4.

Bitwise And

template<ct::integral_tile Lhs, ct::integral_tile Rhs> requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> operator&(Lhs lhs, Rhs rhs) noexcept;

Performs elementwise bitwise AND on the arithmetic tile converted operands lhs and rhs.

Let \(a\) and \(b\) be corresponding elements of the converted operands, and let \(a_i\) and \(b_i\) be the \(i^{th}\) bit of the base two representation of \(a\) and \(b\) respectively. The result of each computation is determined by:

\[r_i = \operatorname{AND}(a_i, b_i)\]

where \(r_i\) is the \(i^{th}\) bit of the base two representation of the result and \(\operatorname{AND}(x, y)\) is defined by the following truth table:

\(x\)	\(y\)	\(\operatorname{AND}(x, y)\)
0	0	0
0	1	0
1	0	0
1	1	1

Bitwise Or

template<ct::integral_tile Lhs, ct::integral_tile Rhs> requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> operator|(Lhs lhs, Rhs rhs) noexcept;

Performs elementwise bitwise OR on the arithmetic tile converted operands lhs and rhs.

Let \(a\) and \(b\) be corresponding elements of the converted operands, and let \(a_i\) and \(b_i\) be the \(i^{th}\) bit of the base two representation of \(a\) and \(b\) respectively. The result of each computation is determined by:

\[r_i = \operatorname{OR}(a_i, b_i)\]

where \(r_i\) is the \(i^{th}\) bit of the base two representation of the result and \(\operatorname{OR}(x, y)\) is defined by the following truth table:

\(x\)	\(y\)	\(\operatorname{OR}(x, y)\)
0	0	0
0	1	1
1	0	1
1	1	1

Bitwise Xor

template<ct::integral_tile Lhs, ct::integral_tile Rhs> requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> operator^(Lhs lhs, Rhs rhs) noexcept;

Performs elementwise bitwise XOR on the arithmetic tile converted operands lhs and rhs.

Let \(a\) and \(b\) be corresponding elements of the converted operands, and let \(a_i\) and \(b_i\) be the \(i^{th}\) bit of the base two representation of \(a\) and \(b\) respectively. The result of each computation is determined by:

\[r_i = \operatorname{XOR}(a_i, b_i)\]

where \(r_i\) is the \(i^{th}\) bit of the base two representation of the result and \(\operatorname{XOR}(x, y)\) is defined by the following truth table:

\(x\)	\(y\)	\(\operatorname{XOR}(x, y)\)
0	0	0
0	1	1
1	0	1
1	1	0

Bitwise Complement

template<ct::integral_tile T> requires /* atomic constraint */ __tile__ T operator~(T in) noexcept;

Performs elementwise complement on the operand in.

Let \(a\) be an element of in and let \(a_i\) be the \(i^{th}\) bit of the base two representation of \(a\). The \(i^{th}\) bit \(r_i\) in the base two representation of the each result is determined as

\[\begin{split}r_i = \begin{cases} 1 \quad \text{if} \quad a_i = 0 \\ 0 \quad \text{if} \quad a_i = 1 \\ \end{cases}\end{split}\]

The atomic constraint validates that the element type of T is not bool.

Left Bitshift

template<ct::integral_tile Lhs, ct::integral_tile Rhs> requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */ __tile__ ct::mutual_broadcast_t<Lhs, Rhs, ct::tile_element_t<Lhs>> operator<<(Lhs lhs, Rhs rhs) noexcept;

Performs elementwise left bitshift of the mutual broadcast converted operands lhs and rhs.

Let \(a\) and \(b\) be corresponding elements of the converted operands lhs and rhs respectively. Let \(T\) be the type of \(a\) and \(N\) its bitwidth.

The result of the operation is the unique integer representable in \(T\) which is congruent to \(a \cdot 2^b\) modulo \(2^N\).

If any element of the converted rhs is negative, the behavior is undefined. If any element of the converted rhs is greater than or equal to the \(N\), the behavior is undefined.

The atomic constraint validates that \(T\) is not bool.

Right Bitshift

template<ct::integral_tile Lhs, ct::integral_tile Rhs> requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */ __tile__ ct::mutual_broadcast_t<Lhs, Rhs, ct::tile_element_t<Lhs>> operator>>(Lhs lhs, Rhs rhs) noexcept;

Performs elementwise right bitshift of the mutual broadcast converted operands lhs and rhs.

Let \(a\) and \(b\) be corresponding elements of the converted operands lhs and rhs respectively. Let \(T\) be the type of \(a\) and \(N\) its bitwidth.

The result of each operation is

\[\bigg \lfloor \frac{a}{2^b} \bigg \rfloor\]

If any element of the converted rhs is negative, the behavior is undefined. If any element of the converted rhs is greater than or equal to the \(N\), the behavior is undefined.

The atomic constraint validates that \(T\) is not bool.

Logical Conjunction

template<ct::bool_tile_convertible Lhs, ct::bool_tile_convertible Rhs> requires ct::broadcast_compatible<Lhs, Rhs> __tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator&&(Lhs lhs, Rhs rhs) noexcept;

Performs elementwise logical conjunction of the mutual broadcasted bool tile converted operands lhs and rhs.

Let \(a\) and \(b\) be corresponding elements of the converted operands. The result of each operation is determined by the following table:

\(a\)	\(b\)	Result
`false`	`false`	`false`
`false`	`true`	`false`
`true`	`false`	`false`
`true`	`true`	`true`

Logical Disjunction

template<ct::bool_tile_convertible Lhs, ct::bool_tile_convertible Rhs> requires ct::broadcast_compatible<Lhs, Rhs> __tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator||(Lhs lhs, Rhs rhs) noexcept;

Performs elementwise logical disjunction of the mutual broadcasted bool tile converted operands lhs and rhs.

Let \(a\) and \(b\) be corresponding elements of the converted operands. The result of each operation is determined by the following table:

\(a\)	\(b\)	Result
`false`	`false`	`false`
`false`	`true`	`true`
`true`	`false`	`true`
`true`	`true`	`true`

Logical Negation

template<ct::bool_tile_convertible T> __tile__ ct::tile_with_element_t<T, bool> operator!(T x) noexcept;

Performs elementwise logical negation on the bool tile converted operand x.

Let \(a\) be an element of x. The result of each operation is

\[\begin{split}\begin{cases} \text{false} \quad \text{if a is true} \\ \text{true} \quad \text{if a is false} \end{cases}\end{split}\]

Maximum

template< ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs > requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> max(Lhs lhs, Rhs rhs) noexcept;

template< ct::nan_propagation_mode NanMode = ct::default_nan_propagation_mode(), ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(), ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs > requires ct::arithmetic_tile_convertible<Lhs, Rhs> && /* atomic constraint */ __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> max(Lhs lhs, Rhs rhs, ct::nan_propagation_mode_constant<NanMode>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;

Performs elementwise maximum on the arithmetic tile converted operands lhs and rhs.

Let \(a\) and \(b\) be corresponding elements of the converted operands lhs and rhs respectively and let \(T\) be their type.

If \(T\) is an integral scalar, the result of each computation is

\[\begin{split}\begin{cases} a \quad a > b \\ b \quad a \leq b \\ \end{cases}\end{split}\]

If \(T\) is a basic floating point scalar, the result of each computation is

\[\operatorname{subround}(\textbf{maximum}(a, b))\]

where \(\operatorname{subround}\) applies a subnormals rounding mode and the behavior of \(\textbf{maximum}\) is determined according to the selected NaN propagation mode.

If the selected NaN propagation mode is suppress NaN, then \(\textbf{maximum}\) refers to the IEEE 754 maximumNumber 3 function.

If the selected NaN propagation mode is propagate NaN, then \(\textbf{maximum}\) refers to the IEEE 754 maximum 3 function.

For overload (1), the default nan propagation mode and default subnormals rounding mode are selected. For overload (2), the NaN propagation mode and subnormals rounding mode are determined by NanMode and SubMode respectively.

The atomic constraint of overload (2) validates that:

\(T\) is a basic floating point scalar
If SubMode is round subnormals to zero, then \(T\) is float.
The values NanMode and SubMode are enumerators of their respective types.

Minimum

template< ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs > requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> min(Lhs lhs, Rhs rhs) noexcept;

template< ct::nan_propagation_mode NanMode = ct::default_nan_propagation_mode(), ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(), ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs > requires ct::arithmetic_tile_convertible<Lhs, Rhs> && /* atomic constraint */ __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> min(Lhs lhs, Rhs rhs, ct::nan_propagation_mode_constant<NanMode>, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;

Performs elementwise minimum on the arithmetic tile converted operands lhs and rhs.

Let \(a\) and \(b\) be corresponding elements of the converted operands lhs and rhs respectively and let \(T\) be their type.

If \(T\) is an integral scalar, the result of each computation is

\[\begin{split}\begin{cases} b \quad a > b \\ a \quad a \leq b \\ \end{cases}\end{split}\]

If \(T\) is a basic floating point scalar, the result of each computation is

\[\operatorname{subround}(\textbf{min}(a, b))\]

where \(\operatorname{subround}\) applies a subnormals rounding mode and the behavior of \(\textbf{min}\) is determined according to the selected NaN propagation mode.

If the selected nan propagation mode is suppress NaN, then \(\textbf{min}\) refers to the IEEE 754 minimumNumber 3 function.

If the selected nan propagation mode is propagate NaN, then \(\textbf{min}\) refers to the IEEE 754 minimum 3 function.

For overload (1), the default nan propagation mode and default subnormals rounding mode are selected. For overload (2), the NaN propagation mode and subnormals rounding mode are determined by NanMode and SubMode respectively.

The atomic constraint of overload (2) validates that:

\(T\) is a basic floating point scalar
If SubMode is round subnormals to zero, then \(T\) is float.
The values NanMode and SubMode are enumerators of their respective types.

Absolute Value

template<ct::arithmetic_tile U> __tile__ U abs(U x) noexcept;

Yields the elementwise absolute value of the operand x.

Let \(a\) be an element of x and \(T\) its type.

If \(T\) is an integral scalar, the result of each computation is absolute value \(|a|\). If this value is not representable in \(T\) the behavior is undefined.

If \(T\) is a basic floating point scalar, the result of each computation is

\[\textbf{abs}(a)\]

where \(\textbf{abs}\) is the IEEE 754 abs 2 function.

Fused Multiply Add

template< ct::rounding_mode Mode = ct::default_rounding_mode(), ct::subnormals_rounding_mode SubMode = ct::default_subnormals_rounding_mode(), ct::arithmetic_tile Lhs, ct::arithmetic_tile Rhs, ct::basic_floating_point_tile Acc > requires /* atomic constraint */ __tile__ Acc fma(Lhs lhs, Rhs rhs, Acc acc, ct::rounding_mode_constant<Mode> = {}, ct::subnormals_rounding_mode_constant<SubMode> = {}) noexcept;

Performs elementwise fused multiplication-addition of the operands lhs rhs and acc. The lhs and rhs operands are broadcast converted to the shape of acc, then tile converted the type Acc.

Let \(a\) and \(b\) and \(c\) be corresponding elements of the converted operands lhs, rhs, and acc respectively. The result of each computation is

\[\operatorname{subround}(\textbf{fusedMultiplyAdd}(\operatorname{subround}(a), \operatorname{subround}(b), \operatorname{subround}(acc)))\]

where \(\textbf{fusedMultiplyAdd}\) is the IEEE 754 fusedMultiplyAdd 1 operation with rounding mode and \(\operatorname{subround}\) applies a subnormals rounding mode. Mode specifies the rounding mode and SubMode specifies the subnormals rounding mode.

The atomic constraint validates that

lhs and rhs are broadcast convertible to the shape of acc.
The element types of lhs and rhs are non-narrowing scalar convertible to the element type of acc.
Mode is a precise rounding mode
If SubMode is round subnormals to zero, then \(T\) is float.
The values Mode and SubMode are enumerators of their respective types.

Multiply High Bits

template<ct::integral_tile Lhs, ct::integral_tile Rhs> requires ct::arithmetic_tile_convertible<Lhs, Rhs> __tile__ ct::arithmetic_tile_conversion_t<Lhs, Rhs> mulhi(Lhs lhs, Rhs rhs) noexcept;

Performs elementwise operation multiplication of the arithmetic tile converted operands lhs and rhs and yields only the upper \(N\) bit result of the \(2N\) bit product.

Let \(a\) and \(b\) be corresponding elements of the converted operands and let \(T\) be their type and \(N\) their bitwidth.

If \(T\) is an unsigned integral scalar, the result of each computation is

\[\textbf{mulhiUnsigned(a, b)} = \bigg \lfloor \frac{a \cdot b}{2^N} \bigg \rfloor\]

This corresponds to multiplication of \(a\) and \(b\) in an integer type of twice the bitwidth of \(T\) followed by a \(N\) bit right shift.

If \(T\) is a signed integral scalar, the result of each computation is the unique integer representable in \(T\) that is congruent module \(2^N\) to

\[\textbf{mulhiUnsigned}(a \operatorname{mod} 2^N, b \operatorname{mod} 2^N)\]

This corresponds to an unsigned mulhi operation where the operands are reinterpreted as unsigned integers.

Pointer No-Op

template<ct::pointer_tile Tile> __tile__ Tile operator+(Tile x) noexcept;: Yields the value x without modification.

Pointer Addition

template<ct::integral_tile Offset, ct::pointer_tile Ptr> requires ct::broadcast_compatible<Offset, Ptr> && /* atomic constraint */ __tile__ ct::mutual_broadcast_t<Offset, Ptr, ct::tile_element_t<Ptr>> operator+(Offset offset, Ptr ptr) noexcept;

template<ct::pointer_tile Ptr, ct::integral_tile Offset> requires ct::broadcast_compatible<Ptr, Offset> && /* atomic constraint */ __tile__ ct::mutual_broadcast_t<Ptr, Offset, ct::tile_element_t<Ptr>> operator+(Ptr ptr, Offset offset) noexcept;

Performs elementwise operation pointer addition of the mutual broadcast converted operands ptr and offset.

Let \(p\) and \(x\) denote corresponding elements of the converted operands ptr and offset. The result of each computation is the result of evaluating the C++ pointer addition operation 5 p + x.

If this expression would yield undefined behavior for any pair of corresponding elements, the behavior of the operation as a whole is undefined.

The atomic constraint tests that the element type of ptr is not a pointer to (possibly cv-qualified) void.

Note

The builtin C++ pointer addition operation is well defined only if the resulting pointer is valid in the C++ object model. For details see [expr.add] § 7.6.6 of ISO/IEC 14882:2024.

Pointer Subtraction

template<ct::pointer_tile Ptr, ct::integral_tile Offset> requires ct::broadcast_compatible<Ptr, Offset> && /* atomic constraint */ __tile__ ct::mutual_broadcast_t<Ptr, Offset, ct::tile_element_t<Ptr>> operator-(Ptr ptr, Offset offset) noexcept;

Performs elementwise operation pointer subtraction of the mutual broadcast converted operands ptr and offset.

Let \(p\) and \(x\) denote corresponding elements of the converted operands ptr and offset. The result of each computation is the result of evaluating the C++ pointer subtraction operation 5 p - x.

If this expression would yield undefined behavior for any pair of corresponding elements, the behavior of the operation as a whole is undefined.

The atomic constraint tests that the element type of ptr is not a pointer to (possibly cv-qualified) void.

Note

The builtin C++ pointer subtraction operation is well defined only if the resulting pointer is valid in the C++ object model. For details see [expr.add] § 7.6.6 of ISO/IEC 14882:2024.

Pointer Difference

template<ct::pointer_tile Lhs, ct::pointer_tile Rhs> requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */ __tile__ ct::mutual_broadcast_t<Lhs, Rhs, ptrdiff_t> operator-(Lhs lhs, Rhs rhs) noexcept;

Performs elementwise operation pointer difference on the mutual broadcast converted operands lhs and rhs.

Let \(a\) and \(b\) denote corresponding elements of the converted operands lhs and rhs respectively. The result of each computation is the result of evaluating the C++ pointer difference operation 5 a - b.

If this expression would yield undefined behavior for any pair of corresponding elements, the behavior of the operation as a whole is undefined.

The atomic constraint validates that the element type of both lhs and rhs may participate in the builtin difference operation.

Note

The builtin C++ pointer difference operation is well defined only if the arguments are appropriate pointers in the C++ object model. For details see [expr.add] § 7.6.6 of ISO/IEC 14882:2024.

Pointer Comparison

template<ct::pointer_tile Lhs, ct::pointer_tile Rhs> requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */ __tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator==(Lhs lhs, Rhs rhs) noexcept;

template<ct::pointer_tile Lhs, ct::pointer_tile Rhs> requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */ __tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator!=(Lhs lhs, Rhs rhs) noexcept;

template<ct::pointer_tile Lhs, ct::pointer_tile Rhs> requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */ __tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator<(Lhs lhs, Rhs rhs) noexcept;

template<ct::pointer_tile Lhs, ct::pointer_tile Rhs> requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */ __tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator<=(Lhs lhs, Rhs rhs) noexcept;

template<ct::pointer_tile Lhs, ct::pointer_tile Rhs> requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */ __tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator>(Lhs lhs, Rhs rhs) noexcept;

template<ct::pointer_tile Lhs, ct::pointer_tile Rhs> requires ct::broadcast_compatible<Lhs, Rhs> && /* atomic constraint */ __tile__ ct::mutual_broadcast_t<Lhs, Rhs, bool> operator>=(Lhs lhs, Rhs rhs) noexcept;

Performs elementwise operation pointer comparison of the mutual broadcast converted operands lhs and rhs.

Let \(a\) and \(b\) be corresponding elements of the converted operands lhs and rhs respectively and let @ denote the selected overloaded operator. The result of each computation is the result of evaluating the expression a @ b which is either a builtin relation operator expression 6 or a builtin equality operator expression 7.

The atomic constraint validates the element types of Lhs and Rhs may participate in the builtin C++ relational and comparison operators.

Note

The result of a the builtin relational and equality comparison operators may be unspecified if the pointer arguments are not appropriately arranged in the C++ object model. For details, see [expr.rel] § 7.6.9 and [expr.eq] § 7.6.10 of ISO/IEC 14882:2024.

Null Pointer Equality

template<ct::pointer_tile Ptrs> __tile__ ct::tile_with_element_t<Ptrs, bool> operator==(Ptrs ptrs, nullptr-t) noexcept;

template<ct::pointer_tile Ptrs> __tile__ ct::tile_with_element_t<Ptrs, bool> operator==(nullptr-t, Ptrs ptrs) noexcept;

Performs elementwise operation equality comparison to a null pointer value. Let \(a\) be an element of operand ptrs.

The result of each computation is true if \(a\) is a null pointer value and false otherwise.

Null Pointer Inequality

template<ct::pointer_tile Ptrs> __tile__ ct::tile_with_element_t<Ptrs, bool> operator!=(Ptrs ptrs, nullptr-t) noexcept;

template<ct::pointer_tile Ptrs> __tile__ ct::tile_with_element_t<Ptrs, bool> operator!=(nullptr-t, Ptrs ptrs) noexcept;

Performs elementwise operation inequality comparison to a null pointer value. Let \(a\) be an element of operand ptrs.

The result of each computation is false if \(a\) is a null pointer value and true otherwise.

Footnotes

1(1,2,3,4,5): See Arithmetic operations § 5.4.1 of IEEE 754-2019
2(1,2): See Sign bit operations § 5.5.1 of IEEE 754-2019
3(1,2,3,4): See Minimum and maximum operations § 9.6 of IEEE 754-2019
4(1,2): See Details of comparison predicates § 5.11 of IEEE 754-2019
5(1,2,3): See Additive operators [expr.add] § 7.6.6 of ISO/IEC 14882:2024
6: See Relational operators [expr.rel] § 7.6.9 of ISO/IEC 14882:2024
7: See Equality operators [expr.eq] § 7.6.10 of ISO/IEC 14882:2024