Tile Operations
This section documents various operations for manipulating tile like objects.
cuda::tiles::full
-
template<ct::tile_like T>
__tile__ T full(ct::tile_element_t<T> x) noexcept;
-
Yields a tile like object of type \(T\) whose elements all have value
x.Example
The code
ct::full<ct::tile<int, ct::shape<2, 2>>>(42)produces the following matrix:\[\begin{split}\begin{pmatrix} 42 & 42 \\ 42 & 42 \end{pmatrix}\end{split}\]
cuda::tiles::iota
-
template<ct::integral_tile T>
requires /* atomic constraint */
__tile__ T iota() noexcept;
-
Yields an integral tile of type \(T\) whose row major arrangement is the sequence \((0, 1, ..., N - 1)\) where \(N\) is the tile size of \(T\).
The atomic constraint validates that \(N - 1\) is representable in the element type of \(T\).
Example
The code
ct::iota<ct::tile<int, ct::shape<2, 4>>>()produces the following matrix:\[\begin{split}\begin{pmatrix} 0 & 1 & 2 & 3 \\ 4 & 5 & 6 & 7 \end{pmatrix}\end{split}\]
cuda::tiles::ones
-
template<ct::numeric_tile T>
__tile__ T ones() noexcept;
-
Yields a numeric tile object of type \(T\) whose elements all have value \(1\) or
truein the case ofboolelements.Example
The code
ct::ones<ct::tile<int, ct::shape<2, 2>>>(42)produces the following matrix:\[\begin{split}\begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix}\end{split}\]
cuda::tiles::zeros
-
template<ct::numeric_tile T>
__tile__ T zeros() noexcept;
-
Yields a numeric tile object of type \(T\) whose elements all have value \(0\) or
falsein the case ofboolelements.When \(T\) is a basic floating point tile, the sign bit of each element of the result shall be positive.
Example
The code
ct::zeros<ct::tile<double, ct::shape<2, 2>>>(42)produces the following matrix:\[\begin{split}\begin{pmatrix} 0.0 & 0.0 \\ 0.0 & 0.0 \end{pmatrix}\end{split}\]
cuda::tiles::isinf
-
template<ct::basic_floating_point_tile T>
__tile__ ct::tile_with_element_t<T, bool> isinf(T x);
-
Performs an elementwise check for infinities in
x.For each element \(a\) of
x, the result istrueif \(a\) is positive or negative infinity andfalseotherwise.Example
The following code checks for infinite values in
x:namespace ct = ::cuda::tiles; using i32x4 = ct::tile<int, ct::shape<4>>; float xData[4] = { 1.0 / 0.0, // positive inf -1.0 / 0.0, // negative inf 0.0 / 0.0, // NaN 1.0, // finite }; auto x = ct::load(&xData[0] + ct::iota<i32x4>()); auto r = ct::isinf(x);
\[\begin{split}\begin{pmatrix} \infty \\ -\infty \\ \text{NaN} \\ 1.0 \end{pmatrix} \rightarrow \begin{pmatrix} \text{true} \\ \text{true} \\ \text{false} \\ \text{false} \end{pmatrix}\end{split}\]
cuda::tiles::isnan
-
template<ct::basic_floating_point_tile T>
__tile__ ct::tile_with_element_t<T, bool> isnan(T x);
-
Performs an elementwise check for
NaNvalues inx.For each element \(a\) of
x, the result istrueif \(a\) is not a number andfalseotherwise.Example
The following code checks for infinite values in
x:namespace ct = ::cuda::tiles; using i32x4 = ct::tile<int, ct::shape<4>>; float xData[4] = { 1.0 / 0.0, // positive inf -1.0 / 0.0, // negative inf 0.0 / 0.0, // NaN 1.0, // finite }; auto x = ct::load(&xData[0] + ct::iota<i32x4>()); auto r = ct::isnan(x);
\[\begin{split}\begin{pmatrix} \infty \\ -\infty \\ \text{NaN} \\ 1.0 \end{pmatrix} \rightarrow \begin{pmatrix} \text{false} \\ \text{false} \\ \text{true} \\ \text{false} \end{pmatrix}\end{split}\]
cuda::tiles::reshape
-
template<ct::tile_shape Shape, ct::tile_like Tile>
requires (ct::tile_size_v<Tile> == ct::shape_size_v<Shape>)
__tile__ ct::tile<ct::tile_element_t<Tile>, remove-cv-t<Shape>> reshape(Tile x, Shape = {}) noexcept;
-
Reshapes argument
xto match the shapeShape.The result is a
ct::tileobject of shapeShapewhose row major arrangement of elements matches the row major arrangement ofx.Example
The following example reshapes a \(2 \times 4\) matrix into a \(4 \times 2\) matrix:
namespace ct = ::cuda::tiles; using namespace ct::literals; auto x = ct::iota<ct::tile<int, ct::shape<2, 4>>>(); auto y = ct::reshape(x, ct::shape{4_ic, 2_ic});
\[\begin{split}\begin{pmatrix} 0 & 1 & 2 & 3 \\ 4 & 5 & 6 & 7 \end{pmatrix} \rightarrow \begin{pmatrix} 0 & 1 \\ 2 & 3 \\ 4 & 5 \\ 6 & 7 \end{pmatrix}\end{split}\]
cuda::tiles::tile_permutation_t
-
template<ct::tile_like T, typename Map>
requires /* atomic constraint */
using tile_permutation_t = /* see below */
-
Yields the type formed by permuting the dimensions of \(T\) according to the
ct::dimension_mapMap.If \(T\) has rank less than \(2\), the result type is \(T\). Otherwise, the result \(U\) is a specialization of
ct::tilesatisfying the following:The element type and rank of \(U\) matches that of \(T\).
Let \(P_i\) be
Map::mapping(i), denoting the permuted dimension at index \(i\). The length of \(U\) at \(i\) is the length of the permuted dimension of \(T\): \(U_i = T_{P_i}\).
The atomic constraint validates that:
Mapis a (possibly cv-qualified) specialization ofct::dimension_map.The rank of
Tileis equal toMap::rank().
Examples
Example 1
Tile
intMap
ct::dimension_map<>Result
intExample 2
Tile
ct::tile<int, ct::shape<>>Map
ct::dimension_map<>Result
ct::tile<int, ct::shape<>>Example 3
Tile
ct::tile<int, ct::shape<4, 2, 16, 8>>Map
ct::dimension_map<2, 1, 3, 0>Result
ct::tile<int, ct::shape<16, 2, 8, 4>>
cuda::tiles::permute
-
template<typename Map, ct::tile_like Tile>
requires /* atomic constraint */
__tile__ ct::tile_permutation_t<Tile, Map> permute(Tile in, Map = {}) noexcept;
-
Yields the permutation of
inaccording to the zero-based permutation vector described by the providedct::dimension_map.The atomic constraint validates that:
Mapis a specialization ofct::dimension_mapThe rank of
TileisMap::rank().
Example
The following code permutes a \(4 \times 2 \times 2\) tile to a \(2 \times 4 \times 2\) tile according to the dimension map
ct::dimension_map<2, 0, 1>:namespace ct = ::cuda::tiles; using namespace ct::literals; auto x = ct::iota<ct::tile<int, ct::shape<4, 2, 2>>>(); auto r = ct::permute(x, ct::dimension_map{2_ic, 0_ic, 1_ic});
\[\begin{split}\begin{pmatrix} \begin{pmatrix} 0 & 1 \end{pmatrix} & \begin{pmatrix} 2 & 3 \end{pmatrix} \\ \begin{pmatrix} 4 & 5 \end{pmatrix} & \begin{pmatrix} 6 & 7 \end{pmatrix} \\ \begin{pmatrix} 8 & 9 \end{pmatrix} & \begin{pmatrix} 10 & 11 \end{pmatrix} \\ \begin{pmatrix} 12 & 13 \end{pmatrix} & \begin{pmatrix} 14 & 15 \end{pmatrix} \\ \end{pmatrix} \rightarrow \begin{pmatrix} \begin{pmatrix} 0 & 2 \end{pmatrix} & \begin{pmatrix} 4 & 6 \end{pmatrix} & \begin{pmatrix} 8 & 10 \end{pmatrix} & \begin{pmatrix} 12 & 14 \end{pmatrix} \\ \begin{pmatrix} 1 & 3 \end{pmatrix} & \begin{pmatrix} 5 & 7 \end{pmatrix} & \begin{pmatrix} 9 & 11 \end{pmatrix} & \begin{pmatrix} 13 & 15 \end{pmatrix} \\ \end{pmatrix}\end{split}\]The behavior of the permutation is described precisely below:
- tile projection
-
Let \(a\) be a tile like object of rank \(N\), shape \(S\) and index space \(I = [0, S_0) \times [0, S_1) \times ... \times [0, S_{N - 1})\).
We may interpret \(a\) as a function \(a : I \rightarrow \mathbb{E}\) describing the association of indices to elements \(\mathbb{E}\) .
The projection of \(a\) along dimension \(0 \leq d < N\) at index \(i \in [0, S_d)\) is a new tile like object of index space \(I_p = [0, S_0) \times [0, S_1) \times ... \times [0, S_{d - 1}] \times [0, 1) \times [0, S_{d + 1}] \times ... \times [0, S_{N - 1}]\):
\[\mathrm{Proj}(a, i, d) : I_p \rightarrow \mathbb{E}\]whose values correspond to \(a\) when dimension \(d\) is fixed at index \(i\):
\[\mathrm{Proj}(a, i, d)(x_0, x_1, \ldots, x_{d-1}, 0, x_{d+1}, \ldots x_{N-1}) = a(x_0, x_1, ..., x_{d-1}, i, x_d, ..., x_{N-2})\] - iterated tile projection
-
The iterated projection of \(a\) for indices \((i_0, i_1, ..., i_{N-1}) \in I\) at dimensions \((d_0, d_1, ..., d_n)\) is formed by the repeated application of \(\mathrm{Proj}\):
\[\mathrm{Proj}(... \mathrm{Proj}(\mathrm{Proj}(a, i_0, d_0), i_1, d_1), ..., i_{N-1}, d_{N-1})\]The iterated projection of \(a\) is a singleton tile whose value is the value of \(a\) at index \(i_0\) along dimension \(d_0\), \(i_1\) along \(d_1\), etc…
- tile permutation
-
Let \(P : [0, N) -> [0, N)\) be the bijective function describing a permutation of dimensions.
The permutation \(r\) of \(a\) is the unique tile-like object whose value at indices \((i_0, i_1, ..., i_{N-1})\) is the iterated projection of \((i_0, i_1, ..., i_{N-1})\) along dimensions \((P(0), P(1), ..., P(N-1))\).
cuda::tiles::tile_transpose_t
-
template<ct::tile_like T>
using tile_transpose_t = /* see below */;
-
Yields the result type when interchanging the first two dimensions of an object of type \(T\).
If \(T\) has rank less than \(2\), the resulting type is \(T\). Otherwise, the resulting type \(U\) is a specialization of
ct::tilesatisfying the following:The element type and rank of \(U\) matches that of \(T\)
-
The lengths of \(U\) satisfy:
\(U_0 = T_1\)
\(U_1 = T_0\)
\(U_i = T_i\) for each \(2 \leq i < N\) where \(N\) is the rank of \(T\).
Examples
Example 1
Tile
intResult
intExample 2
Tile
ct::tile<int, ct::shape<>>Result
ct::tile<int, ct::shape<>>Example 3
Tile
ct::tile<int, ct::shape<4, 2, 16, 8>>Result
ct::tile<int, ct::shape<2, 4, 16, 8>>
cuda::tiles::transpose
-
template<ct::tile_like Tile>
__tile__ ct::tile_transpose_t<Tile> transpose(Tile in) noexcept;
-
Interchanges the first two dimensions of
in.If the rank of
inis less than \(2\),inis returned unmodified. Otherwise, the resulting tile is produced as if by invoking:ct::permute(src, ct::dimension_map<1, 0, 2, 3, ..., N-1>{});
where \(N\) is the rank of
src.Example
The following code transposes a \(4 \times 2 \times 2\) tile to a \(2 \times 4 \times 2\) tile:
namespace ct = ::cuda::tiles; auto x = ct::iota<ct::tile<int, ct::shape<4, 2, 2>>>(); auto r = ct::transpose(x);
\[\begin{split}\begin{pmatrix} \begin{pmatrix} 0 & 1 \end{pmatrix} & \begin{pmatrix} 2 & 3 \end{pmatrix} \\ \begin{pmatrix} 4 & 5 \end{pmatrix} & \begin{pmatrix} 6 & 7 \end{pmatrix} \\ \begin{pmatrix} 8 & 9 \end{pmatrix} & \begin{pmatrix} 10 & 11 \end{pmatrix} \\ \begin{pmatrix} 12 & 13 \end{pmatrix} & \begin{pmatrix} 14 & 15 \end{pmatrix} \\ \end{pmatrix} \rightarrow \begin{pmatrix} \begin{pmatrix} 0 & 1 \end{pmatrix} & \begin{pmatrix} 4 & 5 \end{pmatrix} & \begin{pmatrix} 8 & 9 \end{pmatrix} & \begin{pmatrix} 12 & 13 \end{pmatrix} \\ \begin{pmatrix} 2 & 3 \end{pmatrix} & \begin{pmatrix} 6 & 7 \end{pmatrix} & \begin{pmatrix} 10 & 11 \end{pmatrix} & \begin{pmatrix} 14 & 15 \end{pmatrix} \\ \end{pmatrix}\end{split}\]
concatenation_compatible
-
template<typename T, typename U, size_t D>
concept concatenation_compatible = ct::tile_like<T> && ct::tile_like<U> && /* atomic constraint */;
-
Indicates whether tile like types \(T\) and \(U\) are concatenation compatible along dimension \(D\).
\(T\) and \(U\) are concatenation compatible along \(D\) if:
\(T\) and \(U\) have the same rank and the same element type
\(0 \leq D < N\) where \(N\) is the rank of \(T\) and \(U\)
The lengths satisfy \(T_i = U_i\) for each \(0 \leq i < N, \quad i \neq D\).
The shape \(S\) formed by \(S_i = T_i\) for \(i \neq D\) and \(S_d = T_d + U_d\) is a tile compatible shape.
Note
Rank 0 tiles are never concatenation compatible.
cuda::tiles::concatenation_t
-
template<ct::tile_like T, ct::tile_like U, size_t Dim>
requires ct::concatenation_compatible<T, U, Dim>
using concatenation_t = /* see below */;
-
Yields the result of concatenating tile like types \(T\) and \(U\) along dimension \(D\).
The result type is a specialization of
ct::tilewhose element type and rank matches that of \(T\) and \(U\) and whose shape \(S\) satisfies:\(S_i = T_i\) for \(0 \leq i < N, \quad i \neq D\) where \(N\) is the rank of \(T\) and \(U\).
\(S_D = T_D + U_D\)
Example
Example 1
\(T\)
int\(U\)
intDimension
\(0\)
Result
Incompatible
Example 2
\(T\)
ct::tile<int, ct::shape<2, 4>\(U\)
ct::tile<int, ct::shape<2, 4>Dimension
\(0\)
Result
ct::tile<int, ct::shape<4, 4>>Example 3
\(T\)
ct::tile<int, ct::shape<2, 4>\(U\)
ct::tile<int, ct::shape<2, 4>Dimension
\(1\)
Result
ct::tile<int, ct::shape<2, 8>>
cuda::tiles::cat
-
template<ct::integral auto D, ct::tile_like T, ct::tile_like U>
requires (D >= 0) && ct::concatenation_compatible<T, U, D>
__tile__ ct::concatenation_t<T, U, D> cat(T x, U y, ct::integral_constant<D> = {}) noexcept;
-
Concatenates
xandyalong dimension \(D\).Let \(N\) be the rank of \(T\) and \(U\). The resulting tile object \(r\) has values defined below
\[\begin{split}r(i_0, i_1, ..., i_D, ..., i_{N-1}) = \begin{cases} x(i_0, i_1, ..., i_D, ..., i_{N-1}) & i_D < T_D \\ y(i_0, i_1, ..., i_D - T_D, ..., i_{N-1}) & \text{otherwise} \end{cases}\end{split}\]Example
The following code concatenates a matrix of zeros with a matrix of ones along the second dimension.
namespace ct = ::cuda::tiles; using namespace ct::literals; auto x = ct::full<ct::tile<int, ct::shape<4, 2>>>(0); auto y = ct::full<ct::tile<int, ct::shape<4, 2>>>(1); auto r = ct::cat(x, y, 1_ic);
\[\begin{split}\begin{pmatrix} 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ \end{pmatrix} \quad \begin{pmatrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ \end{pmatrix} \rightarrow \begin{pmatrix} 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ \end{pmatrix}\end{split}\]
cuda::tiles::select
-
template<ct::tile_like T, ct::bool_tile_convertible C>
requires ct::broadcastable_to<C, ct::tile_shape_t<T>>
__tile__ T select(C condition, T lhs, T rhs) noexcept;
-
Performs an elementwise selection of the values from
lhswhereconditionistrueandrhswhereconditionisfalse.conditionundergoes bool tile conversion followed by broadcast conversion to match the shape of \(T\).Let \(a\), \(b\) and \(c\) denote corresponding elements of the converted
lhs,rhsandconditionarguments. The result of each selection is \(a\) if \(c\) istrueand \(b\) otherwise.Example
In the following example, the even elements of the result are derived from
xwhile the odd elements are derived fromy:namespace ct = ::cuda::tiles; using i32x4 = ct::tile<int, ct::shape<4>>; bool cData[4] = {true, false, true, false}; auto c = ct::load(&cData[0] + ct::iota<i32x4>()); auto x = ct::iota<i32x4>(); auto y = -ct::iota<i32x4>(); auto r = ct::select(c, x, y);
\[\begin{split}\begin{pmatrix} \text{true} \\ \text{false} \\ \text{true} \\ \text{false} \end{pmatrix} \quad \begin{pmatrix} 0 \\ 1 \\ 2 \\ 3 \end{pmatrix} \quad \begin{pmatrix} 0 \\ -1 \\ -2 \\ -3 \end{pmatrix} \rightarrow \begin{pmatrix} 0 \\ -1 \\ 2 \\ -3 \end{pmatrix}\end{split}\]
cuda::tiles::extractable_from
-
template<typename S, typename T>
concept extractable_from = ct::tile_shape<S> && ct::tile_like<T> && /* see below */
-
Determines if tile compatible shape \(S\) is an extractable shape of \(T\). \(S\) is extractable from \(T\) if:
\(S\) and \(T\) have the same rank denoted \(N\).
For each \(i\) in \(0 \leq i < N\), \(T_i\) is divisible by \(S_i\).
Example
Example 1
\(T\)
int\(S\)
ct::shape<>Extractible
Yes
Example 2
\(T\)
ct::tile<int, ct::shape<32, 8>>\(S\)
ct::shape<16, 2>Extractible
Yes
Example 3
\(T\)
ct::tile<int, ct::shape<32, 8>>\(S\)
ct::shape<2, 16>Extractible
No
cuda::tiles::extract
-
template<ct::tile_shape S, ct::tile_like T, typename ...Indices>
requires ct::extractable_from<S, T> && /* atomic constraint */
__tile__ ct::tile<ct::tile_element_t<T>, S> extract(T x, S, Indices... idx) noexcept;
-
Extracts a partition of shape \(S\) from tile like object
x.The elements of
xare partitioned into equally sized sections of shape \(S\) and each partition is assigned a zero-based index. Theidxargument specifies the index of the partition that will be returned.Let \(N\) be the rank of \(T\) and let \(i_0, i_1, \ldots, i_{N-1}\) be the values of the
idxpack after conversion tosize_t. The resulting extraction \(r\) has the following value:\[r(k_0, k_1, \ldots, k_{N-1}) = x(i_0 \cdot S_0 + k_0, i_1 \cdot S_1 + k_1, \ldots, i_{N-1} \cdot S_{N-1} + k_{N-1})\]If the index \(I = (i_0 \cdot S_0, \ldots i_0 \cdot S_0)\) is not in the index space of
x, the behavior is undefined.The atomic constraint validates that:
The size of the parameter pack
Indicesis \(N\)For each type \(U\) in the parameter pack
Indices,is-convertible-v<U, size_t>holds.
Example
In the following code, the partition in top right quadrant of the matrix is extracted:
namespace ct = ::cuda::tiles; using namespace ct::literals; auto x = ct::iota<ct::tile<int, ct::shape<4, 4>>>(); auto r = ct::extract(x, ct::shape{2_ic, 2_ic}, 0, 1);
\[\begin{split}\begin{pmatrix} 0 & 1 & 2 & 3 \\ 4 & 5 & 6 & 7 \\ 8 & 9 & 10 & 11 \\ 12 & 13 & 14 & 15 \end{pmatrix} \rightarrow \begin{pmatrix} 2 & 3 \\ 6 & 7 \end{pmatrix}\end{split}\]
cuda::tiles::broadcast
-
template<ct::tile_shape S, ct::tile_like T>
requires ct::broadcastable_to<T, S>
__tile__ ct::tile<ct::tile_element_t<T>, S> broadcast(T x, S = {}) noexcept;
-
Yields the broadcast conversion of
xto shape \(S\).Example
The following code broadcasts a \(4 \times 1\) tile to a \(4 \times 4\) tile:
namespace ct = ::cuda::tiles; using namespace ct::literals; auto x = ct::iota<ct::tile<int, ct::shape<4, 1>>>(); auto r = ct::broadcast(x, ct::shape{4_ic, 4_ic});
\[\begin{split}\begin{pmatrix} 0 \\ 1 \\ 2 \\ 3 \end{pmatrix} \rightarrow \begin{pmatrix} 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 2 & 2 & 2 & 2 \\ 3 & 3 & 3 & 3 \end{pmatrix}\end{split}\]
cuda::tiles::element_cast
-
template<ct::scalar E, ct::tile_like T>
requires ct::tile_convertible_to<T, ct::tile_with_element_t<T, E>>
__tile__ ct::tile_with_element_t<T, E> element_cast(T x) noexcept;
-
Yields the tile conversion of
xto the typect::tile_with_element_t<T, E>.Example
The following code converts a tile of integers to a tile of doubles:
namespace ct = ::cuda::tiles; auto x = ct::iota<ct::tile<int, ct::shape<4, 1>>>(); auto r = ct::element_cast<double>(x);
\[\begin{split}\begin{pmatrix} 0 \\ 1 \\ 2 \\ 3 \end{pmatrix} \rightarrow \begin{pmatrix} 0.0 \\ 1.0 \\ 2.0 \\ 3.0 \end{pmatrix}\end{split}\]
cuda::tiles::element_bitcast
-
template<ct::scalar E, ct::tile_like T>
requires /* atomic constraint */
__tile__ ct::tile_with_element_t<T, E> element_bitcast(T x) noexcept;
-
Yields the elementwise bitcast of
xto the type \(E\).For each element \(a\) in
x, the result of the computation is produced as if by executingstd::bitcast<E>(a)1.The atomic constraint validates that the object size of \(E\) is the same as the object size of the element type of \(T\).
Note
The result of this operation depends on the value and object representations of the source and result types, and may generate undefined behavior, see § 22.15.3 of ISO/IEC 14882:2024 for details.
Example
namespace ct = ::cuda::tiles; auto x = ct::full<ct::tile<unsigned char, ct::shape<4, 1>>>(255); auto r = ct::element_bitcast<signed char>(x);
\[\begin{split}\begin{pmatrix} 255 \\ 255 \\ 255 \\ 255 \end{pmatrix} \rightarrow \begin{pmatrix} -1 \\ -1 \\ -1 \\ -1 \end{pmatrix}\end{split}\]
Footnotes
- 1
-
See Function template bit_cast [bit.cast] § 22.15.3 of ISO/IEC 14882:2024