Memory Model#
cuTile employs a memory model that permits the compiler and hardware to reorder operations for performance. As a result, without explicit synchronization, there is no guaranteed ordering of memory accesses across threads.
To coordinate memory accesses among threads, cuTile provides two attributes for atomic operations:
Memory Order: Defines the memory ordering semantics of an atomic operation.
Memory Scope: Defines the scope of threads that participate in memory ordering.
Synchronization occurs at a per-element granularity. Each element in the array participates independently in the memory model.
For a more detailed explanation, see the Memory Model section in the Tile IR documentation.
Memory Order#
- class cuda.tile.MemoryOrder#
Memory ordering semantics of an atomic operation.
- RELAXED = 'relaxed'#
No ordering guarantees. Cannot be used to synchronize between threads.
- ACQUIRE = 'acquire'#
Acquire semantics. When this reads a value written by a release, the releasing thread’s prior writes become visible. Subsequent reads/writes within the same block cannot be reordered before this operation.
- RELEASE = 'release'#
Release semantics. When an acquire reads the value written by this, this thread’s prior writes become visible to the acquiring thread. Prior reads/writes within the same block cannot be reordered after this operation.
- ACQ_REL = 'acq_rel'#
Combined acquire and release semantics.
Memory Scope#
- class cuda.tile.MemoryScope#
The scope of threads that participate in memory ordering.
- BLOCK = 'block'#
Ordering guarantees apply to threads within the same block.
- DEVICE = 'device'#
Ordering guarantees apply to all threads on the same GPU.
- SYS = 'sys'#
Ordering guarantees apply to all threads across the entire system, including multiple GPUs and the host.