Constant Wrappers

This section describes wrapper types that carry constant values as well as API modifier flags. Using these types, one can pass compile time information to an API through a function argument instead of an explicit template argument.

Example

The following code shows how to use pass compile time values using both the explicit template argument and constant wrapper techniques.

namespace ct = ::cuda::tiles;
using namespace ct::literals;

// Example 1
ct::shape<2, 4> x;
ct::shape       y{2_ic, 4_ic};

// Example 2
ct::dimension_map<2, 0, 1> w;
ct::dimension_map          z{2_ic, 0_ic, 1_ic};

// Example 3
ct::add<ct::rounding_mode::round_toward_zero>(0.0, 1.0, {});
ct::add(0.0, 1, ct::round_toward_negative_t{});

// Example 4
double* ptr{nullptr};
ct::atomic_load<ct::memory_order::relaxed>(ptr);
ct::atomic_load(ptr, ct::memory_order_relaxed_t{});

// Example 5
ct::partition_view P{
  ct::tensor_span{ptr, ct::shape{4_ic, 8_ic}},
  ct::shape{2_ic, 2_ic}};
P.load_masked<ct::view_padding::positive_inf>(0, 1);
P.load_masked(ct::view_padding_positive_inf_t{}, 0, 1);

cuda::tiles::integral_constant

template<ct::integral auto V>
struct integral_constant

A ct::integral_constant is a stateless empty type that encodes a compile time integral value.

Aliases

using type = integral_constant
using value_type = decltype(V)

Member Variables

static constexpr value_type value = V

Conversion Operator

__tile__ __host__ __device__ constexpr operator value_type() const noexcept;

Conversion operator yielding the value V.

Function Call Operator

__tile__ __host__ __device__ constexpr operator()() const noexcept;

Yields the value V.

Overloaded Operators

template<auto x, auto y>
__tile__ __host__ __device__ constexpr ct::integral_constant<x + y> operator+(ct::integral_constant<x>, ct::integral_constant<y>) noexcept;
template<auto x, auto y>
__tile__ __host__ __device__ constexpr ct::integral_constant<x * y> operator*(ct::integral_constant<x>, ct::integral_constant<y>) noexcept;
template<auto x, auto y>
__tile__ __host__ __device__ constexpr ct::integral_constant<x - y> operator-(ct::integral_constant<x>, ct::integral_constant<y>) noexcept;
template<auto x, auto y>
__tile__ __host__ __device__ constexpr ct::integral_constant<x / y> operator/(ct::integral_constant<x>, ct::integral_constant<y>) noexcept;
template<auto x, auto y>
__tile__ __host__ __device__ constexpr ct::integral_constant<x % y> operator%(ct::integral_constant<x>, ct::integral_constant<y>) noexcept;
template<auto x, auto y>
__tile__ __host__ __device__ constexpr ct::integral_constant<x & y> operator&(ct::integral_constant<x>, ct::integral_constant<y>) noexcept;
template<auto x, auto y>
__tile__ __host__ __device__ constexpr ct::integral_constant<x | y> operator|(ct::integral_constant<x>, ct::integral_constant<y>) noexcept;
template<auto x, auto y>
__tile__ __host__ __device__ constexpr ct::integral_constant<x ^ y> operator^(ct::integral_constant<x>, ct::integral_constant<y>) noexcept;
template<auto x, auto y>
__tile__ __host__ __device__ constexpr ct::integral_constant<(x << y)> operator<<(ct::integral_constant<x>, ct::integral_constant<y>) noexcept;
template<auto x, auto y>
__tile__ __host__ __device__ constexpr ct::integral_constant<(x >> y)> operator>>(ct::integral_constant<x>, ct::integral_constant<y>) noexcept;
template<auto x>
__tile__ __host__ __device__ constexpr ct::integral_constant<~x> operator~(ct::integral_constant<x>) noexcept;
template<auto x>
__tile__ __host__ __device__ constexpr ct::integral_constant<+x> operator+(ct::integral_constant<x>) noexcept;
template<auto x>
__tile__ __host__ __device__ constexpr ct::integral_constant<-x> operator-(ct::integral_constant<x>) noexcept;

Overloaded operators for ct::integral_constant. Each operator yields default constructed instance of its return type.

Literal Operator Template

template<char...>
__tile__ __host__ __device__ constexpr ct::integral_constant</* see below */> operator""_ic() noexcept;

A literal operator template 1 for integer literals suffixed with _ic. This function inhabits the cuda::tiles::literals namespace.

Yields a default constructed instance of ct::integral_constant whose constant value matches the value of the provided integer literal. The value type of the result matches the type the integer literal would have had the _ic suffix been omitted 2 except the program is ill-formed if that type would be an extended integer type.

The program is ill-formed if the provided literal is not a decimal integer literal.

Example

The following code shows two _ic literals and their corresponding types.

namespace ct = ::cuda::tiles;
using namespace ct::literals;

// Type is `int`
22_ic;

// Type is either `long int` or `long long int`.
4294967296_ic;

cuda::tiles::dimension_map

template<size_t... Idx>
requires /* atomic constraint */
struct dimension_map;

Stateless empty type representing a permutation of the integers \(0\) through sizeof...(Idx) exclusive. Element \(k\) of the Idx pack indicates which dimension of the source should be placed at dimension \(k\) in the target during a ct::permute() operation. The rank of a ct::dimension_map is the size of the Idx pack.

The atomic constraint validates the Idx pack is a permutation of the integers \(0\) through \(N-1\) inclusive where \(N\) is the rank. That is, each of the Idx values lies in the half open range \([0 N)\) and there are no duplicates. An empty Idx pack represents a valid permutation.

Default Construction

__tile__ __host__ __device__ constexpr dimension_map() noexcept = default;

Default constructs an instance of this type.

Construction From Arguments

template<typename ...Ts>
requires (sizeof...(Ts) > 0)
__tile__ __host__ __device__ explicit constexpr dimension_map(Ts...) noexcept;

Constructs an instance of this type, discarding any arguments that were passed to it. This constructor is used in conjunction with the deduction guide to enable CTAD based construction from ct::integral_constant values.

rank

__tile__ __host__ __device__ static constexpr size_t rank() noexcept;

Yields the size of the Idx pack.

mapping

__tile__ __host__ __device__ static constexpr size_t mapping(rank_type i) noexcept;

Yields the \(i^{th}\) element of pack Idx. The behavior is undefined if \(i \geq N\) where \(N\) is parameter pack size.

Deduction Guide

template<auto... Vs>
requires /* atomic constraint */
dimension_map(integral_constant<Vs>...) -> dimension_map<size_t(Vs)...>;

Deduction guide enabling CTAD from ct::integral_constant arguments. The atomic constraint validates that the expression size_t(Vs) is well formed for each element of the Vs pack.

Example

namespace ct = ::cuda::tiles;
using namespace ct::literals;

// Type: ct::dimension_map<2, 0, 1>
ct::dimension_map{2_ic, 0_ic, 1_ic};

Rounding Mode

APIs for working with round modes.

cuda::tiles::rounding_mode

enum class rounding_mode : /* unspecified */
enumerator round_ties_to_even = 0
enumerator round_toward_zero = 1
enumerator round_toward_negative = 2
enumerator round_toward_positive = 3
enumerator round_approximate = 4
enumerator round_full = 5

Type enumerating the available rounding modes.

cuda::tiles::rounding_mode_constant

template<ct::rounding_mode Value>
requires /* atomic constraint */
struct rounding_mode_constant;

A specialization of rounding_mode_constant encodes a rounding mode in its type. The atomic constraint validates that Value is an enumerator of ct::rounding_mode.

Example

Example usage of rounding_mode_constant to infer the rounding mode non-type template parameter of ct::add() using a function argument.

namespace ct = ::cuda::tiles;
ct::rounding_mode_constant<
    ct::rounding_mode::round_toward_negative> mode;
auto result = ct::add(1.0, 2.5, mode);

Member Aliases

using type = rounding_mode_constant;
using value_type = ct::rounding_mode;

Member Variables

static constexpr ct::rounding_mode value = Value;

Member Functions

__tile__ __host__ __device__ constexpr operator ct::rounding_mode() const noexcept;

Conversion operator yielding Value.

__tile__ __host__ __device__ constexpr ct::rounding_mode operator()() const noexcept;

Call operator yielding Value.

Constant Aliases

using round_ties_to_even_t = ct::rounding_mode_constant<ct::rounding_mode::round_ties_to_even>;
using round_toward_zero_t = ct::rounding_mode_constant<ct::rounding_mode::round_toward_zero>;
using round_toward_negative_t = ct::rounding_mode_constant<ct::rounding_mode::round_toward_negative>;
using round_toward_positive_t = ct::rounding_mode_constant<ct::rounding_mode::round_toward_positive>;
using round_full_t = ct::rounding_mode_constant<ct::rounding_mode::round_full>;
using round_approximate_t = ct::rounding_mode_constant<ct::rounding_mode::round_approximate>;
using default_rounding_mode_t = ct::rounding_mode_constant<ct::default_rounding_mode()>;

Convenience type aliases for inferring the rounding mode non-type template parameter of arithmetic and math APIs.

Example

Example usage of the rounding mode constant aliases:

namespace ct = ::cuda::tiles;
auto result = ct::add(1.0, 2.5, ct::round_toward_negative_t{});

cuda::tiles::default_rounding_mode

__tile__ __host__ __device__ constexpr ct::rounding_mode default_rounding_mode() noexcept;

Yields the default rounding mode which is Round Ties to Even.

Subnormals Rounding Mode

APIs for working with subnormals rounding modes.

cuda::tiles::subnormals_rounding_mode

enum class subnormals_rounding_mode : /* unspecified */
enumerator preserve_subnormals = 0
enumerator round_subnormals_to_zero = 1

Type enumerating the available subnormals rounding modes.

cuda::tiles::subnormals_rounding_mode_constant

template<ct::subnormals_rounding_mode Value>
requires /* atomic constraint */
struct subnormals_rounding_mode_constant;

A specialization of subnormals_rounding_mode_constant encodes a subnormals rounding mode in its type. The atomic constraint validates that Value is one of the enumerators of ct::subnormals_rounding_mode.

Example

Example usage of subnormals_rounding_mode_constant to infer the subnormals rounding mode non-type template parameter of ct::add() using a function argument.

namespace ct = ::cuda::tiles;
ct::subnormals_rounding_mode_constant<
   ct::subnormals_rounding_mode::round_subnormals_to_zero> mode;
auto result = ct::add(1.0f, 2.5f, {}, mode);

Member Aliases

using type = subnormals_rounding_mode_constant;
using value_type = ct::subnormals_rounding_mode;

Member Variables

static constexpr ct::subnormals_rounding_mode value = Value;

Member Functions

__tile__ __host__ __device__ constexpr operator ct::subnormals_rounding_mode() const noexcept;

Conversion operator yielding Value.

__tile__ __host__ __device__ constexpr ct::subnormals_rounding_mode operator()() const noexcept;

Call operator yielding Value.

Constant Aliases

using preserve_subnormals_t = ct::subnormals_rounding_mode_constant<ct::subnormals_rounding_mode::preserve_subnormals>;
using round_subnormals_to_zero_t = ct::subnormals_rounding_mode_constant<ct::subnormals_rounding_mode::round_subnormals_to_zero>;
using default_subnormals_rounding_mode_t = ct::subnormals_rounding_mode_constant<ct::default_subnormals_rounding_mode()>;

Convenience type aliases for inferring the subnormals rounding mode non-type template parameter of arithmetic and math APIs.

Example

Example usage of the subnormals rounding mode constant aliases:

namespace ct = ::cuda::tiles;
auto result = ct::add(1.0f, 2.5f, {},
                      ct::round_subnormals_to_zero_t{});

cuda::tiles::default_subnormals_rounding_mode

__tile__ __host__ __device__ constexpr ct::subnormals_rounding_mode default_subnormals_rounding_mode() noexcept;

Returns the default subnormals rounding mode which is Preserve Subnormals.

NaN Propagation Mode

APIs for working with NaN propagation modes.

cuda::tiles::nan_propagation_mode

enum class nan_propagation_mode : /* unspecified */
enumerator suppress_nan = 0
enumerator propagate_nan = 1

Type enumerating the available NaN propagation modes.

cuda::tiles::nan_propagation_constant

template<ct::nan_propagation_mode Value>
requires /* atomic constraint */
struct nan_propagation_mode_constant;

A specialization of ct::nan_propagation_mode encodes a NaN propagation modes in its type. The atomic constraint validates that Value is an enumerator of ct::nan_propagation_mode.

Member Aliases

using type = nan_propagation_mode_constant;
using value_type = ct::nan_propagation_mode;

Member Variables

static constexpr ct::nan_propagation_mode value = Value;

Member Functions

__tile__ __host__ __device__ constexpr operator ct::nan_propagation_mode() const noexcept;

Yields the propagation mode constant Value.

__tile__ __host__ __device__ constexpr ct::nan_propagation_mode operator()() const noexcept;

Yields the propagation mode constant Value.

Constant Aliases

using suppress_nan_t = ct::nan_propagation_mode_constant<ct::nan_propagation_mode::suppress_nan>;
using propagate_nan_t = ct::nan_propagation_mode_constant<ct::nan_propagation_mode::propagate_nan>;
using default_nan_propagation_mode_t = ct::nan_propagation_mode_constant<ct::default_nan_propagation_mode()>;

Convenience type aliases for inferring the NaN propagation mode non-type template parameter of arithmetic APIs.

Example

Example usage of the NaN propagation mode constant aliases:

namespace ct = ::cuda::tiles;
auto result = ct::max(1.0, 2.5,
                      ct::propagate_nan_t{});

cuda::tiles::default_nan_propagation_mode

__tile__ __host__ __device__ inline constexpr ct::nan_propagation_mode default_nan_propagation_mode();

Yields the default nan propagation mode.

View Padding

APIs for working with view padding values.

cuda::tiles::view_padding

enum class view_padding : /* unspecified */
enumerator zero = 0
enumerator negative_zero = 1
enumerator positive_inf = 2
enumerator negative_inf = 3
enumerator nan = 4

Enumeration of the supported padding values for masked view loads.

cuda::tiles::view_padding_constant

template<ct::view_padding Value>
requires /* atomic constraint */
struct view_padding_constant;

A specialization of ct::view_padding_constant encodes a view padding value in its type. The atomic constraint validates that Value is an enumerator of ct::view_padding.

Member Aliases

using type = view_padding_constant;
using value_type = ct::view_padding;

Member Variables

static constexpr ct::view_padding value = Value;

Member Functions

__tile__ __host__ __device__ constexpr operator ct::view_padding() const noexcept;

Yields the value Value.

__tile__ __host__ __device__ constexpr ct::view_padding operator()() const noexcept;

Yields the value Value.

Constant Aliases

using view_padding_zero_t = ct::view_padding_constant<ct::view_padding::zero>;
using view_padding_negative_zero_t = ct::view_padding_constant<ct::view_padding::negative_zero>;
using view_padding_positive_inf_t = ct::view_padding_constant<ct::view_padding::positive_inf>;
using view_padding_negative_inf_t = ct::view_padding_constant<ct::view_padding::negative_inf>;
using view_padding_nan_t = ct::view_padding_constant<ct::view_padding::nan>;
using default_view_padding_t = ct::view_padding_constant<ct::default_view_padding()>;

Convenience aliases for specifying a view padding constant in view operations.

Example

Example usage of the view padding constant aliases:

namespace ct = ::cuda::tiles;
using namespace ct::literals;

int x = 0;
ct::tensor_span t{&x, ct::shape{1_ic}};
ct::partition_view p{t, ct::shape{1_ic}};

p.load_masked(ct::view_padding_zero_t{}, 0);

cuda::tiles::default_view_padding

__tile__ __host__ __device__ inline constexpr ct::view_padding default_view_padding();

Yields the default view padding.

Memory Order

APIs for working with memory orders.

cuda::tiles::memory_order

enum class memory_order : /* unspecified */
enumerator relaxed = 0
enumerator acquire = 1
enumerator release = 2
enumerator acq_rel = 3

Enumeration specifying the supported memory orders.

cuda::tiles::memory_order_constant

template<memory_order Value>
requires /* atomic constraint */
struct memory_order_constant;

A specialization of ct::memory_order_constant encodes a memory order value in its type. The atomic constraint validates that Value is an enumerator of ct::memory_order.

Member Aliases

using type = memory_order_constant;
using value_type = ct::memory_order;

Member Variables

static constexpr ct::memory_order value = Value;

Member Functions

__tile__ __host__ __device__ constexpr operator ct::memory_order() const noexcept;

Yields Value.

__tile__ __host__ __device__ constexpr ct::memory_order operator()() const noexcept;

Yields Value.

Constant Aliases

using memory_order_relaxed_t = ct::memory_order_constant<ct::memory_order::relaxed>;
using memory_order_acquire_t = ct::memory_order_constant<ct::memory_order::acquire>;
using memory_order_release_t = ct::memory_order_constant<ct::memory_order::release>;
using memory_order_acq_rel_t = ct::memory_order_constant<ct::memory_order::acq_rel>;

Convenience aliases for specifying a memory order in atomic memory APIs.

Example

Example usage of the memory order constant aliases:

namespace ct = ::cuda::tiles;
ct::atomic_load(ptr, ct::memory_order_relaxed_t{});

cuda::tiles::read_memory_order

template<ct::memory_order Order>
concept read_memory_order = /* atomic constraint */;

Indicates whether Order is a read memory order.

cuda::tiles::write_memory_order

template<ct::memory_order Order>
concept write_memory_order = /* atomic constraint */;

Indicates whether Order is a write memory order.

Thread Scope

APIs for working with thread scopes.

cuda::tiles::thread_scope

enum class thread_scope : /* unspecified */
enumerator system = 0
enumerator device = 1
enumerator block = 2

Enumeration specifying the supported thread scopes.

cuda::tiles::thread_scope_constant

template<ct::thread_scope Value>
requires /* atomic constraint */
struct thread_scope_constant;

A specialization of ct::thread_scope_constant encodes a thread scope value in its type. The atomic constraint validates that Value is an enumerator of ct::thread_scope.

Member Aliases

using type = thread_scope_constant;
using value_type = ct::thread_scope;

Member Variables

static constexpr ct::thread_scope value = Value;

Member Functions

__tile__ __host__ __device__ constexpr operator ct::thread_scope() const noexcept;

Yields Value.

__tile__ __host__ __device__ constexpr ct::thread_scope operator()() const noexcept;

Yields Value.

Constant Aliases

using thread_scope_system_t = ct::thread_scope_constant<ct::thread_scope::system>;
using thread_scope_device_t = ct::thread_scope_constant<ct::thread_scope::device>;
using thread_scope_block_t = ct::thread_scope_constant<ct::thread_scope::block>;
using default_thread_scope_t = ct::thread_scope_constant<ct::default_thread_scope()>;

Convenience aliases for specifying a thread scope in atomic memory APIs.

Example

Example usage of the thread scope constant aliases:

namespace ct = ::cuda::tiles;
ct::atomic_load(ptr,
                ct::memory_order_acquire_t{},
                ct::thread_scope_device_t{});

cuda::tiles::default_thread_scope

__tile__ __host__ __device__ constexpr ct::thread_scope default_thread_scope() noexcept;

Yields the default thread scope for atomic memory APIs.

Footnotes

1

See § 12.6 [over.literal] of ISO/IEC 14882:2024

2

See § 5.13.2 [lex.icon] of ISO/IEC 14882:2024