Undefined Behavior Annex

This section summarizes the scenarios where undefined behavior may occur in Tile C++ APIs. Please see the API reference for details on the specific behavior of any given operation.

  • Undefined behavior may occur due to signed integer overflow in arithmetic addition APIs. This allows the compiler to optimize signed integer operations by assuming the absence of integer overflow. See addition for details.

    Example

    No integer promotion occurs as part of the arithmetic tile conversions. As a result, the addition below is performed in the type signed char and overflow occurs triggering undefined behavior.

    namespace ct = ::cuda::tiles;
    signed char x = 127;
    signed char y = 1;
    auto z = ct::add(x, y); // UB
    
  • Undefined behavior may occur due to signed integer overflow in arithmetic subtraction APIs. This allows the compiler to optimize signed integer operations by assuming the absence of integer overflow. See subtraction for details.

    Example

    No integer promotion occurs as part of the arithmetic tile conversions. As a result, the subtraction below is performed in the type signed char and overflow occurs triggering undefined behavior.

    namespace ct = ::cuda::tiles;
    signed char x = -128;
    signed char y = 1;
    auto z = ct::sub(x, y); // UB
    
  • Undefined behavior may occur due to signed integer overflow in arithmetic negation APIs. This allows the compiler to optimize signed integer operations by assuming the absence of integer overflow. See negation for details.

    Example

    No integer promotion occurs in the negation APIs. As a result, the negation below is performed in the type signed char and overflow occurs triggering undefined behavior.

    namespace ct = ::cuda::tiles;
    using i8 = ct::tile<signed char, ct::shape<>>;
    auto x = ct::full<i8>(-128);
    auto y = -x; // UB
    
  • Undefined behavior may occur due to signed integer overflow in multiplication APIs. This allows the compiler to optimize signed integer operations by assuming the absence of integer overflow. See multiplication for details.

    Example

    No integer promotion occurs as part of the arithmetic tile conversions. As a result, the multiplication below is performed in the type signed char and overflow occurs triggering undefined behavior.

    namespace ct = ::cuda::tiles;
    signed char x = -128;
    signed char y = -1;
    auto z = ct::mul(x, y); // UB
    
  • Undefined behavior may occur due to signed integer overflow in division, ceil division, floor division, and remainder APIs. This allows the compiler to optimize signed integer operations by assuming the absence of integer overflow. See division, ceiling division, floor division, and remainder for details.

    Example

    No integer promotion occurs as part of the arithmetic tile conversions. As a result, the division operations below are performed in the type signed char and overflow occurs triggering undefined behavior.

    namespace ct = ::cuda::tiles;
    signed char x = -128;
    signed char y = -1;
    auto z0 = ct::div(x, y); // UB
    auto z1 = ct::ceildiv(x, y); // UB
    auto z2 = ct::floordiv(x, y); // UB
    auto z3 = ct::remainder(x, y); // UB
    
  • Undefined behavior may occur in integer division, ceil division, floor division, and remainder APIs when the divisor is zero. See division, ceiling division, floor division, and remainder for details.

    Example

    namespace ct = ::cuda::tiles;
    int x = 10;
    int y = 0;
    auto z0 = ct::div(x, y); // UB
    auto z1 = ct::ceildiv(x, y); // UB
    auto z2 = ct::floordiv(x, y); // UB
    auto z3 = ct::remainder(x, y); // UB
    
  • Undefined behavior may occur in left or right bitshift operations when the right hand side operand is negative. See left bitshift and right bitshift for details.

    Example

    namespace ct = ::cuda::tiles;
    using i32x4 = ct::tile<int, ct::shape<4>>;
    i32x4 x = ct::iota<i32x4>();
    i32x4 y0 = x << -2; // UB
    i32x4 y1 = x >> -2; // UB
    
  • Undefined behavior may occur in left or right bitshift operations when the right hand side operand is greater than the bitwidth of the left hand side type. See left bitshift and right bitshift for details.

    Example

    namespace ct = ::cuda::tiles;
    using i32x4 = ct::tile<int, ct::shape<4>>;
    i32x4 x = ct::iota<i32x4>();
    i32x4 y0 = x << 32; // UB
    i32x4 y1 = x >> 32; // UB
    
  • Undefined behavior may occur when computing the absolute value of a minimal integer value. See ct::abs() for details.

    Example

    namespace ct = ::cuda::tiles;
    signed char x = -128;
    auto y = ct::abs(x); // UB
    
  • Undefined behavior may occur in pointer addition and subtraction operations if the resulting pointer does not refer to a valid object in the C++ object model or if other conditions of the builtin pointer arithmetic operations are not satisfied. See pointer addition and pointer subtraction for details.

    Example

    In the following example, the pointer values do not refer to objects in the original C++ array referenced by p. The behavior is undefined even though the pointer values are never dereferenced.

    namespace ct = ::cuda::tiles;
    int x[10];
    auto p = ct::full<ct::tile<int*, ct::shape<4>>>(&x[0]);
    auto q0 = p + 100; // UB
    auto q1 = p - 100; // UB
    
  • Undefined behavior may occur in pointer difference operations if the arguments do not refer to the same array object or if other conditions of the builtin pointer difference operation are not satisfied. See pointer difference for details.

    Example

    In the following example, the two pointer arguments refer to different array objects yielding undefined behavior.

    namespace ct = ::cuda::tiles;
    struct Obj { int x[4]; int y[4]; };
    Obj obj;
    auto p0 = ct::full<ct::tile<int*, ct::shape<4>>>(&obj.x[0]);
    auto p1 = ct::full<ct::tile<int*, ct::shape<4>>>(&obj.y[0]);
    auto q = p1 - p0; // UB
    
  • Undefined behavior may occur when retrieving the size of a shape like type whose total size cannot be represented in size_t. For details see ct::shape_size_v.

    Example

    namespace ct = ::cuda::tiles;
    using T = ct::shape<65536, 65536, 65536, 65536>;
    auto s = ct::shape_size_v<T>; // UB
    
  • Undefined behavior may occur when extracting an out of bounds partition from a tile. For details, see ct::extract().

    Example

    namespace ct = ::cuda::tiles;
    using namespace ct::literals;
    auto x = ct::iota<ct::tile<int, ct::shape<4, 4>>>();
    auto r = ct::extract(x, ct::shape{2_ic, 2_ic}, 0, 2); // UB
    
  • Undefined behavior may occur when bitcasting the elements of a tile if the resulting value representation does not correspond to a value of the destination type. For details, see ct::element_bitcast().

    Example

    namespace ct = ::cuda::tiles;
    unsigned char x = 2;
    auto r0 = ct::element_bitcast<bool>(x); // UB
    
  • Undefined behavior may occur during a reduction or scan if there exists a possible grouping or order of elements that could trigger undefined behavior in the binary operation.

    Example

    The ct::prod() function is UB because the grouping \(((127 \operatorname{op} 2) \operatorname{op} 0) \operatorname{op} 1\) triggers signed integer overflow.

    The ct::partial_prod() invocation is undefined behavior because the grouping \((127 \operatorname{op} 2) \operatorname{op} 0\) for computing the third element of the scan result triggers undefined behavior.

    namespace ct = ::cuda::tiles;
    using namespace ct::literals;
    signed char xData[4] = {127, 0, 2, 1};
    auto x = ct::load(&xData[0] + ct::iota<ct::tile<int, ct::shape<4>>>());
    auto r0 = ct::prod(x, 0_ic); // UB
    auto r1 = ct::partial_prod(x, 0_ic); // UB
    
  • Undefined behavior may occur due to an assumption API that is violated at runtime. For details, see Assumptions.

    Example

    In the following code, the variable x violates the bounds assumption.

    namespace ct = ::cuda::tiles;
    using namespace ct::literals;
    
    int x = 50;
    int y = ct::assume_bounded_above(x, 10_ic); // UB
    
  • Undefined behavior may occur when accessing the elements of a default constructed ct::tile object. For details, see ct::tile::tile().

    Example

    Undefined behavior occurs at the addition operation because x is default constructed.

    namespace ct = ::cuda::tiles;
    ct::tile<int, ct::shape<4>> x;
    
    auto y = x + 3; // UB
    
  • Undefined behavior may occur when accessing a non-existent extent value via the ct::extents::static_extent() or ct::extents::extent() API.

    Example

    namespace ct = ::cuda::tiles;
    using Exts = ct::extents<uint32_t, 4, ct::dynamic_extent>;
    
    Exts::static_extent(2); // UB
    
    Exts e{8};
    e.extent(2); // UB
    
  • Undefined behavior may occur when accessing a static extent value via the ct::extents::extent() function if the value is not representable in the index type.

    Example

    namespace ct = ::cuda::tiles;
    ct::extents<uint16_t, 65536> e;
    e.extent(0); // UB
    
  • Undefined behavior may occur when accessing a non-existent dimension value via the ct::dimension_map::mapping() API.

    Example

    namespace ct = ::cuda::tiles;
    using Map = ct::dimension_map<1, 0>;
    Map::mapping(2); // UB
    
  • Undefined behavior may occur when accessing a non-existent stride value via the static_stride or stride member functions of layout mappings.

    Example

    namespace ct = ::cuda::tiles;
    using L = ct::layout_right_mapping<ct::extents<uint32_t, 4, 8>>;
    L::static_stride(2); // UB
    L{{}}.stride(2);       // UB
    
  • Undefined behavior may occur if overflow happens while accessing the static_stride or stride member functions of layout mappings.

    Example

    namespace ct = ::cuda::tiles;
    using L = ct::layout_right_mapping<ct::extents<uint32_t, 65536, 65536, 65536, 65536, 65536>>;
    L::static_stride(0); // UB
    L{{}}.stride(0);       // UB
    
  • Undefined behavior may occur when loading or storing an out-of-bounds partition from a ct::partition_view. For details see partition view loads and stores.

    Example

    In this example, the behavior is undefined even though the load and store are both masked because the partition index is fully out of bounds.

    namespace ct = ::cuda::tiles;
    using namespace ct::literals;
    ct::tensor_span t{ptr, ct::extents{16, 32}};
    ct::partition_view p{t, ct::shape{4_ic, 2_ic}};
    
    p.load_masked(4, 0); // UB
    p.store_masked(x, 0, 16); // UB
    
  • Undefined behavior may occur when loading or storing a partially out-of-bounds partition from a ct::partition_view if the load or store is not masked.

    Example

    namespace ct = ::cuda::tiles;
    using namespace ct::literals;
    ct::tensor_span t{ptr, ct::extents{5, 6}};
    ct::partition_view p{t, ct::shape{4_ic, 4_ic}};
    
    p.load(0, 2); // UB
    p.store(x, 2, 0); // UB
    
  • Undefined behavior may occur when loading or storing a partition view whose underlying tensor span is not injective.

    Example

    namespace ct = ::cuda::tiles;
    using namespace ct::literals;
    ct::layout_strided_mapping m{ct::extents{8, 16}, ct::extents{8, 1}};
    
    ct::tensor_span t{ptr, m};
    ct::partition_view p{t, ct::shape{4_ic, 4_ic}};
    
    p.load(0, 0); // UB
    p.store(x, 0, 0); // UB
    
  • Undefined behavior may occur when providing a negative step value during ct::irange construction.

    Example

    namespace ct = ::cuda::tiles;
    ct::irange x(20, 10, -1); // UB
    
  • Undefined behavior may occur when constructing an ct::irange whose upper bound is close to the max integer value representable in the range’s value type. For details, see the constructor ct::irange::irange().

    Example

    namespace ct = ::cuda::tiles;
    ct::irange<unsigned char> x(0, 251, 10); // UB
    
  • Undefined behavior may occur during a scalar conversion from a floating point scalar to an integer scalar if the source value is not (after truncation of its fractional part) within the representable range of the target. This behavior is described in 7.3.1 [conv.fpint] of ISO/IEC 14882:2024.

    Note that this kind of scalar conversion can occur in several APIs including the ct::tile::tile(), ct::element_cast(), and the tile to scalar conversion operator.

    Conversions to bool are handled by 7.3.15 [conv.bool] of ISO/IEC 14882:2024 and are not subject to this undefined behavior.

    Example

    namespace ct = ::cuda::tiles;
    unsigned char x{ct::tile<float, ct::shape<>>{256.0f}}; // UB
    ct::tile<unsigned char, ct::shape<>> y{256.0f}; // UB
    unsigned char z = ct::element_cast<unsigned char>(256.0f); // UB
    
  • Undefined behavior may occur due to a data race when a single API generates multiple weak memory operations on the same memory location.

    Example

    namespace ct = ::cuda::tiles;
    using i32x4 = ct::tile<int, ct::shape<4>>;
    using ptr64x4 = ct::tile<int*, ct::shape<4>>;
    
    int x = 0;
    
    auto ptrs = ct::full<ptr64x4>(&x);
    ct::store(ptrs, ct::iota<i32x4>()); // UB
    
  • Undefined behavior may occur due to signed integer overflow in the ct::atomic_add() API.

    Example

    namespace ct = ::cuda::tiles;
    int x = 0x7FFFFFFF;
    int y = 1;
    auto z = ct::atomic_add(&x, y, ct::memory_order_relaxed_t{}); // UB
    
  • Undefined behavior may occur due to signed integer overflow in the ct::atomic_sub() API, including due to the internal unary negation performed on the right hand side operand.

    Example

    namespace ct = ::cuda::tiles;
    int x = -1;
    int y = 1 << 31;
    auto z = ct::atomic_sub(&x, y, ct::memory_order_relaxed_t{}); // UB