Assumptions

An assumption introduces constraints on program values that the compiler may use for optimization purposes. If a value violates an assumption, the behavior is undefined. Assumptions are not checked at runtime.

The assumption functions return the same value as their argument. For the compiler to effectively leverage an assumption, the return value of the assumption API should be used in subsequent operations.

Example

The following code tells the compiler to assume that the value of y is greater than or equal to \(0\). This allows the compiler to execute the first printf without performing any runtime check on the value of y.

The second if condition checks the value of x. Since x was not derived from the return value of an assumption, the compiler might not eliminate the runtime check of x.

__tile__ void foo(int x) {
  namespace ct = ::cuda::tiles;
  using namespace ct::literals;

  int y = ct::assume_bounded_below(x, 0_ic);

  if (y >= 0) { // conditional check eliminated by compiler
    printf("y is non-negative!\n");
  }

  if (x >= 0) { // conditional checked at runtime
    printf("x is non-negative!\n");
  }
}

Warning

The only effect of an assumption is to introduce undefined behavior. If the assumption is violated at runtime, the program’s behavior is undefined.

cuda::tiles::assume_blocked

template<ct::shape_like S, ct::tile_like T>
requires /* atomic constraint */
[[nodiscard]] __tile__ T assume_blocked(T a, S = {}) noexcept;

Indicates that tile like argument a can be represented by partitions of shape \(S\) where all elements within a given partition have the same value.

The atomic constraint validates that:

  1. \(S\) has no zero length extents.

  2. The rank of \(T\) matches the rank of \(S\).

  3. \(T\) is a pointer tile or integral tile.

Let \(N\) denote the rank of \(T\). A partition \(P_J\) at indices \(J = (j_0, j_1, \ldots, j_{N-1})\) is defined as the set of values

\[P_J = \left \{ a(i_0, i_1, \ldots , i_{N-1}) \quad | \quad j_k \cdot S_k \leq i_k < \operatorname{min}((j_k + 1) \cdot S_k, T_k) \quad 0 \leq k < N \right \}\]

This operation introduces undefined behavior if the set \(P_J\) has more than one distinct element for any combination of non-negative indices \(J = (j_0, j_1, \ldots, j_{N - 1})\).

Example

The following code specifies partitions of shape \(3 \times 2\) must all have the same elements.

namespace ct = ::cuda::tiles;
using namespace ct::literals;

auto y = ct::assume_blocked(x, ct::shape{3_ic, 2_ic});

The following \(8 \times 8\) tile satisfies this assumption:

\[\begin{split}\begin{pmatrix} 42 & 42 & 5 & 5 & 1 & 1 & -2 & -2 \\ 42 & 42 & 5 & 5 & 1 & 1 & -2 & -2 \\ 42 & 42 & 5 & 5 & 1 & 1 & -2 & -2 \\ 3 & 3 & 2 & 2 & 4 & 4 & -5 & -5 \\ 3 & 3 & 2 & 2 & 4 & 4 & -5 & -5 \\ 3 & 3 & 2 & 2 & 4 & 4 & -5 & -5 \\ 7 & 7 & 8 & 8 & 3 & 3 & -6 & -6 \\ 7 & 7 & 8 & 8 & 3 & 3 & -6 & -6 \\ \end{pmatrix}\end{split}\]

cuda::tiles::assume_bounded

template<ct::integral auto LB, ct::integral auto UB, ct::integral_tile T>
requires /* atomic constraint */
[[nodiscard]] __tile__ T assume_bounded(T a, ct::integral_constant<LB> = {}, ct::integral_constant<UB> = {}) noexcept;

Indicates that the elements of a inhabit the inclusive range \([ LB, UB ]\).

Let \(E\) be the element type of \(T\), let \(L\) be the least value representable in \(E\), and let \(U\) be the greatest value representable in make-signed-t<E>. The atomic constraint validates that:

  1. \(E\) is not bool

  2. \(L \leq LB \leq UB \leq U\)

This operation introduces undefined behavior if any element \(e\) of a satisfies \(e < LB\) or \(e > UB\).

Example

The following example assumes that every element of x is bounded between \(-10\) and \(100\) inclusive.

namespace ct = ::cuda::tiles;
using namespace ct::literals;

auto y = ct::assume_bounded(x, -10_ic, 100_ic);

cuda::tiles::assume_bounded_above

template<ct::integral auto UB, ct::integral_tile T>
requires /* atomic constraint */
[[nodiscard]] __tile__ T assume_bounded_above(T a, ct::integral_constant<UB> = {}) noexcept;

Indicates that the elements of a are less than or equal to the upper bound \(UB\).

Let \(E\) be the element type of a, let \(L\) be the least value representable in E and let \(U\) be the greatest value representable by make-signed-t<E>. The atomic constraint validates that:

  1. \(E\) is not bool

  2. \(L \leq UB \leq U\)

This operation introduces undefined behavior if any element \(e\) of a satisfies \(e > UB\).

Example

The following example assumes that every element of x is less than or equal to \(100\).

namespace ct = ::cuda::tiles;
using namespace ct::literals;

auto y = ct::assume_bounded_above(x, 100_ic);

cuda::tiles::assume_bounded_below

template<ct::integral auto LB, ct::integral_tile T>
requires /* atomic constraint */
[[nodiscard]] __tile__ T assume_bounded_below(T a, ct::integral_constant<LB> = {}) noexcept;

Indicates that the elements of a are greater than or equal to the lower bound \(LB\).

Let \(E\) be the element type of a, let \(L\) be the least value representable by \(E\) and \(U\) be the greatest value representable by \(E\). The atomic constraint validates that:

  1. \(E\) is not bool

  2. \(E\) is not an unsigned integral type

  3. \(L \leq LB \leq U\)

This operation introduces undefined behavior if any element \(e\) of a satisfies \(e < LB\).

Example

The following example assumes that every element of x is greater than or equal to \(-10\).

namespace ct = ::cuda::tiles;
using namespace ct::literals;

auto y = ct::assume_bounded_below(x, -10_ic);

cuda::tiles::assume_divisible

template<ct::integral auto Div, ct::integral_tile T>
requires /* atomic constraint */
[[nodiscard]] __tile__ T assume_divisible(T a, ct::integral_constant<Div> = {}) noexcept;

Indicates that the elements of a are divisible by \(\text{Div}\).

The atomic constraint validates that \(\text{Div}\) is a power of two.

This operation introduces undefined behavior if any element of a is not divisible by \(\text{Div}\).

Example

The following example assumes that every element of x is divisible by \(16\).

namespace ct = ::cuda::tiles;
using namespace ct::literals;

auto y = ct::assume_divisible(x, 16_ic);

cuda::tiles::assume_divisible_strided

template<
ct::integral auto Div,
ct::integral auto Stride,
ct::integral auto D,
ct::integral_tile T
>
requires /* atomic constraint */
[[nodiscard]] __tile__ T assume_divisible_strided(T a, ct::integral_constant<Div> = {}, ct::integral_constant<Stride> = {}, ct::integral_constant<D> = {}) noexcept;

Indicates that every \(\text{Stride}\) elements along \(D\) is divisible by \(\text{Div}\). Between the divisible elements, the values increase sequentially along \(D\).

The atomic constraint validates that:

  1. \(\text{Div}\) is a power of two

  2. \(\text{Stride} > 0\)

  3. \(0 \leq D < N\) where \(N\) is the rank of T

  4. The element type of \(T\) is signed

A strided partition at index \(J = (j_0, j_1, \ldots, j_{N-1})\) is defined as a sequence

\[P_k = a(j_0, ..., j_D * \text{Stride} + k, \ldots , j_{N-1}) \quad 0 \leq k < \operatorname{min}(\text{Stride}, T_D - j_D \cdot \text{Stride} )\]

This operation introduces undefined behavior if any such partition is not of the format

\[(n, n + 1, n + 2, \ldots, n + m)\]

where \(n\) is divisible by \(\text{Div}\).

Example

The following example assumes that the start of each partition is divisible by \(16\). The partitions have stride \(3\) along dimension \(1\).

namespace ct = ::cuda::tiles;
using namespace ct::literals;

auto y = ct::assume_divisible_strided(x, 16_ic, 3_ic, 1_ic);

The following matrix satisfies these assumptions:

\[\begin{split}\begin{pmatrix} 32 & 33 & 34 & 0 & 1 & 2 & 16 & 17 \\ 64 & 65 & 66 & -16 & -15 & -14 & 0 & 1 \\ \end{pmatrix}\end{split}\]

cuda::tiles::assume_aligned

template<ct::integral auto Align, ct::pointer_tile T>
requires /* atomic constraint */
[[nodiscard]] __tile__ T assume_aligned(T a, ct::integral_constant<Align> = {}) noexcept;

Indicates that the elements of a have alignment \(\text{Align}\).

The atomic constraint validates that \(\text{Align}\) is a power of two.

This operation introduces undefined behavior if any element of a is not aligned to \(\text{Align}\).

Example

The following example assumes that every element of x is \(16\) byte aligned.

namespace ct = ::cuda::tiles;
using namespace ct::literals;

auto y = ct::assume_aligned(x, 16_ic);

cuda::tiles::assume_aligned_strided

template<
ct::integral auto Align,
ct::integral auto Stride,
ct::integral auto D,
ct::pointer_tile T
>
requires /* atomic constraint */
[[nodiscard]] __tile__ T assume_aligned_strided(T a, ct::integral_constant<Align> = {}, ct::integral_constant<Stride> = {}, ct::integral_constant<D> = {}) noexcept;

Indicates that every \(\text{Stride}\) elements along \(D\) is a pointer aligned to \(\text{Align}\). Between the aligned elements, the pointer values increase sequentially according to the object size of the pointee type along \(D\).

The atomic constraint validates that:

  1. \(\text{Div}\) is a power of two

  2. \(\text{Stride} > 0\)

  3. \(0 \leq D < N\) where \(N\) is the rank of T

A strided partition at index \(J = (j_0, j_1, \ldots, j_{N-1})\) is defined as a sequence

\[P_k = a(j_0, ..., j_D * \text{Stride} + k, \ldots , j_{N-1}) \quad 0 \leq k < \operatorname{min}(\text{Stride}, T_D - j_D \cdot \text{Stride} )\]

This operation introduces undefined behavior if any such partition is not of the format

\[(p, p + 1, p + 2, \ldots, p + m)\]

where \(p\) is a pointer value aligned to \(\text{Align}\).

Example

The following example assumes that the partitions of x begin with pointers aligned to \(8\). The partitions have a stride of \(3\) along dimension \(1\).

namespace ct = ::cuda::tiles;
using namespace ct::literals;

auto y = ct::assume_aligned_strided(x, 8_ic, 3_ic, 1_ic);

The following matrix satisfies the assumption for a divisibility of 8, a stride of 3, and a dimension of 1 given some pointer \(p\) aligned to \(8\).

\[\begin{split}\begin{pmatrix} p & p + 1 & p + 2 & p + 8 & p + 9 & p + 10 & p + 16 & p + 17 \\ p + 64 & p + 65 & p +66 & p - 16 & p - 15 & p - 14 & p & p + 1 \\ \end{pmatrix}\end{split}\]