Assumptions
An assumption introduces constraints on program values that the compiler may use for optimization purposes. If a value violates an assumption, the behavior is undefined. Assumptions are not checked at runtime.
The assumption functions return the same value as their argument. For the compiler to effectively leverage an assumption, the return value of the assumption API should be used in subsequent operations.
Example
The following code tells the compiler to assume that the value of y is greater than
or equal to \(0\). This allows the compiler to execute the first printf without
performing any runtime check on the value of y.
The second if condition checks the value of x. Since x was not
derived from the return value of an assumption, the compiler might not eliminate the runtime
check of x.
__tile__ void foo(int x) {
namespace ct = ::cuda::tiles;
using namespace ct::literals;
int y = ct::assume_bounded_below(x, 0_ic);
if (y >= 0) { // conditional check eliminated by compiler
printf("y is non-negative!\n");
}
if (x >= 0) { // conditional checked at runtime
printf("x is non-negative!\n");
}
}
Warning
The only effect of an assumption is to introduce undefined behavior. If the assumption is violated at runtime, the program’s behavior is undefined.
cuda::tiles::assume_blocked
-
template<ct::shape_like S, ct::tile_like T>
requires /* atomic constraint */
[[nodiscard]] __tile__ T assume_blocked(T a, S = {}) noexcept;
-
Indicates that tile like argument
acan be represented by partitions of shape \(S\) where all elements within a given partition have the same value.The atomic constraint validates that:
\(S\) has no zero length extents.
\(T\) is a pointer tile or integral tile.
Let \(N\) denote the rank of \(T\). A partition \(P_J\) at indices \(J = (j_0, j_1, \ldots, j_{N-1})\) is defined as the set of values
\[P_J = \left \{ a(i_0, i_1, \ldots , i_{N-1}) \quad | \quad j_k \cdot S_k \leq i_k < \operatorname{min}((j_k + 1) \cdot S_k, T_k) \quad 0 \leq k < N \right \}\]This operation introduces undefined behavior if the set \(P_J\) has more than one distinct element for any combination of non-negative indices \(J = (j_0, j_1, \ldots, j_{N - 1})\).
Example
The following code specifies partitions of shape \(3 \times 2\) must all have the same elements.
namespace ct = ::cuda::tiles; using namespace ct::literals; auto y = ct::assume_blocked(x, ct::shape{3_ic, 2_ic});
The following \(8 \times 8\) tile satisfies this assumption:
\[\begin{split}\begin{pmatrix} 42 & 42 & 5 & 5 & 1 & 1 & -2 & -2 \\ 42 & 42 & 5 & 5 & 1 & 1 & -2 & -2 \\ 42 & 42 & 5 & 5 & 1 & 1 & -2 & -2 \\ 3 & 3 & 2 & 2 & 4 & 4 & -5 & -5 \\ 3 & 3 & 2 & 2 & 4 & 4 & -5 & -5 \\ 3 & 3 & 2 & 2 & 4 & 4 & -5 & -5 \\ 7 & 7 & 8 & 8 & 3 & 3 & -6 & -6 \\ 7 & 7 & 8 & 8 & 3 & 3 & -6 & -6 \\ \end{pmatrix}\end{split}\]
cuda::tiles::assume_bounded
-
template<ct::integral auto LB, ct::integral auto UB, ct::integral_tile T>
requires /* atomic constraint */
[[nodiscard]] __tile__ T assume_bounded(T a, ct::integral_constant<LB> = {}, ct::integral_constant<UB> = {}) noexcept;
-
Indicates that the elements of
ainhabit the inclusive range \([ LB, UB ]\).Let \(E\) be the element type of \(T\), let \(L\) be the least value representable in \(E\), and let \(U\) be the greatest value representable in make-signed-t<E>. The atomic constraint validates that:
\(E\) is not
bool\(L \leq LB \leq UB \leq U\)
This operation introduces undefined behavior if any element \(e\) of
asatisfies \(e < LB\) or \(e > UB\).Example
The following example assumes that every element of
xis bounded between \(-10\) and \(100\) inclusive.namespace ct = ::cuda::tiles; using namespace ct::literals; auto y = ct::assume_bounded(x, -10_ic, 100_ic);
cuda::tiles::assume_bounded_above
-
template<ct::integral auto UB, ct::integral_tile T>
requires /* atomic constraint */
[[nodiscard]] __tile__ T assume_bounded_above(T a, ct::integral_constant<UB> = {}) noexcept;
-
Indicates that the elements of
aare less than or equal to the upper bound \(UB\).Let \(E\) be the element type of
a, let \(L\) be the least value representable in E and let \(U\) be the greatest value representable by make-signed-t<E>. The atomic constraint validates that:\(E\) is not
bool\(L \leq UB \leq U\)
This operation introduces undefined behavior if any element \(e\) of
asatisfies \(e > UB\).Example
The following example assumes that every element of
xis less than or equal to \(100\).namespace ct = ::cuda::tiles; using namespace ct::literals; auto y = ct::assume_bounded_above(x, 100_ic);
cuda::tiles::assume_bounded_below
-
template<ct::integral auto LB, ct::integral_tile T>
requires /* atomic constraint */
[[nodiscard]] __tile__ T assume_bounded_below(T a, ct::integral_constant<LB> = {}) noexcept;
-
Indicates that the elements of
aare greater than or equal to the lower bound \(LB\).Let \(E\) be the element type of
a, let \(L\) be the least value representable by \(E\) and \(U\) be the greatest value representable by \(E\). The atomic constraint validates that:\(E\) is not
bool\(E\) is not an unsigned integral type
\(L \leq LB \leq U\)
This operation introduces undefined behavior if any element \(e\) of
asatisfies \(e < LB\).Example
The following example assumes that every element of
xis greater than or equal to \(-10\).namespace ct = ::cuda::tiles; using namespace ct::literals; auto y = ct::assume_bounded_below(x, -10_ic);
cuda::tiles::assume_divisible
-
template<ct::integral auto Div, ct::integral_tile T>
requires /* atomic constraint */
[[nodiscard]] __tile__ T assume_divisible(T a, ct::integral_constant<Div> = {}) noexcept;
-
Indicates that the elements of
aare divisible by \(\text{Div}\).The atomic constraint validates that \(\text{Div}\) is a power of two.
This operation introduces undefined behavior if any element of
ais not divisible by \(\text{Div}\).Example
The following example assumes that every element of
xis divisible by \(16\).namespace ct = ::cuda::tiles; using namespace ct::literals; auto y = ct::assume_divisible(x, 16_ic);
cuda::tiles::assume_divisible_strided
-
template<
ct::integral auto Div,
ct::integral auto Stride,
ct::integral auto D,
ct::integral_tile T
>
requires /* atomic constraint */
[[nodiscard]] __tile__ T assume_divisible_strided(T a, ct::integral_constant<Div> = {}, ct::integral_constant<Stride> = {}, ct::integral_constant<D> = {}) noexcept;
-
Indicates that every \(\text{Stride}\) elements along \(D\) is divisible by \(\text{Div}\). Between the divisible elements, the values increase sequentially along \(D\).
The atomic constraint validates that:
\(\text{Div}\) is a power of two
\(\text{Stride} > 0\)
The element type of \(T\) is signed
A strided partition at index \(J = (j_0, j_1, \ldots, j_{N-1})\) is defined as a sequence
\[P_k = a(j_0, ..., j_D * \text{Stride} + k, \ldots , j_{N-1}) \quad 0 \leq k < \operatorname{min}(\text{Stride}, T_D - j_D \cdot \text{Stride} )\]This operation introduces undefined behavior if any such partition is not of the format
\[(n, n + 1, n + 2, \ldots, n + m)\]where \(n\) is divisible by \(\text{Div}\).
Example
The following example assumes that the start of each partition is divisible by \(16\). The partitions have stride \(3\) along dimension \(1\).
namespace ct = ::cuda::tiles; using namespace ct::literals; auto y = ct::assume_divisible_strided(x, 16_ic, 3_ic, 1_ic);
The following matrix satisfies these assumptions:
\[\begin{split}\begin{pmatrix} 32 & 33 & 34 & 0 & 1 & 2 & 16 & 17 \\ 64 & 65 & 66 & -16 & -15 & -14 & 0 & 1 \\ \end{pmatrix}\end{split}\]
cuda::tiles::assume_aligned
-
template<ct::integral auto Align, ct::pointer_tile T>
requires /* atomic constraint */
[[nodiscard]] __tile__ T assume_aligned(T a, ct::integral_constant<Align> = {}) noexcept;
-
Indicates that the elements of
ahave alignment \(\text{Align}\).The atomic constraint validates that \(\text{Align}\) is a power of two.
This operation introduces undefined behavior if any element of
ais not aligned to \(\text{Align}\).Example
The following example assumes that every element of
xis \(16\) byte aligned.namespace ct = ::cuda::tiles; using namespace ct::literals; auto y = ct::assume_aligned(x, 16_ic);
cuda::tiles::assume_aligned_strided
-
template<
ct::integral auto Align,
ct::integral auto Stride,
ct::integral auto D,
ct::pointer_tile T
>
requires /* atomic constraint */
[[nodiscard]] __tile__ T assume_aligned_strided(T a, ct::integral_constant<Align> = {}, ct::integral_constant<Stride> = {}, ct::integral_constant<D> = {}) noexcept;
-
Indicates that every \(\text{Stride}\) elements along \(D\) is a pointer aligned to \(\text{Align}\). Between the aligned elements, the pointer values increase sequentially according to the object size of the pointee type along \(D\).
The atomic constraint validates that:
\(\text{Div}\) is a power of two
\(\text{Stride} > 0\)
A strided partition at index \(J = (j_0, j_1, \ldots, j_{N-1})\) is defined as a sequence
\[P_k = a(j_0, ..., j_D * \text{Stride} + k, \ldots , j_{N-1}) \quad 0 \leq k < \operatorname{min}(\text{Stride}, T_D - j_D \cdot \text{Stride} )\]This operation introduces undefined behavior if any such partition is not of the format
\[(p, p + 1, p + 2, \ldots, p + m)\]where \(p\) is a pointer value aligned to \(\text{Align}\).
Example
The following example assumes that the partitions of
xbegin with pointers aligned to \(8\). The partitions have a stride of \(3\) along dimension \(1\).namespace ct = ::cuda::tiles; using namespace ct::literals; auto y = ct::assume_aligned_strided(x, 8_ic, 3_ic, 1_ic);
The following matrix satisfies the assumption for a divisibility of 8, a stride of 3, and a dimension of 1 given some pointer \(p\) aligned to \(8\).
\[\begin{split}\begin{pmatrix} p & p + 1 & p + 2 & p + 8 & p + 9 & p + 10 & p + 16 & p + 17 \\ p + 64 & p + 65 & p +66 & p - 16 & p - 15 & p - 14 & p & p + 1 \\ \end{pmatrix}\end{split}\]