Memory Model#

cuTile’s memory model permits the compiler and hardware to reorder operations for performance. Without explicit synchronization, the ordering of memory accesses across blocks is not guaranteed.

To coordinate memory accesses between blocks, cuTile provides two attributes for atomic operations:

  • Memory Order — defines the ordering semantics of an atomic operation.

  • Memory Scope — defines the set of blocks that participate in ordering.

Synchronization operates at per-element granularity: each element in the array participates independently in the memory model.

For further details, see the Memory Model section of the Tile IR documentation.

Memory Order#

class cuda.tile.MemoryOrder#

Memory ordering semantics of a memory operation.

WEAK = 'weak'#

Weak (non-atomic) ordering. The default for load/store operations.

RELAXED = 'relaxed'#

No ordering guarantees. Cannot be used to synchronize between threads.

ACQUIRE = 'acquire'#

Acquire semantics. When this reads a value written by a release, the releasing thread’s prior writes become visible. Subsequent reads/writes within the same block cannot be reordered before this operation.

RELEASE = 'release'#

Release semantics. When an acquire reads the value written by this, this thread’s prior writes become visible to the acquiring thread. Prior reads/writes within the same block cannot be reordered after this operation.

ACQ_REL = 'acq_rel'#

Combined acquire and release semantics.

Memory Scope#

class cuda.tile.MemoryScope#

The scope of threads that participate in memory ordering.

NONE = 'none'#

No memory scope. Used for load/store operations with WEAK memory ordering.

BLOCK = 'block'#

Ordering guarantees apply to threads within the same block.

DEVICE = 'device'#

Ordering guarantees apply to all threads on the same GPU.

SYS = 'sys'#

Ordering guarantees apply to all threads across the entire system, including multiple GPUs and the host.