Interoperability#
Machine Representation#
cuTile executes Python tile code on NVIDIA GPUs by translating the Python code into a machine representation that can be executed by CUDA devices. Functions, types, and objects all have a machine representation.
Machine representations are defined in terms of corresponding CUDA C++ entities.
Example: cuda.tile.float16 has the same machine representation as __half in CUDA C++.
Interoperability with SIMT#
Inter-Kernel#
Inter-kernel interoperability refers to all interoperability concerns that do not cross the kernel boundary - everything except mixing tile and SIMT code in a kernel. This includes:
Writing tile and SIMT kernels in the same source file.
Linking tile and SIMT kernels into the same binary.
Passing the same kinds of arrays to both tile and SIMT kernels.
Intra-kernel interoperability will be supported in the future.
JAX FFI#
cuTile kernels can be launched from JAX-traced graphs via
cuda.tile.jax.cutile_call(), which threads buffers and scalar
arguments through the JAX FFI call site so the kernel runs as a regular
op inside a jax.jit-compiled graph.
See cuda.tile.jax.cutile_call() for the full argument convention,
along with cuda.tile.jax.OutputPlaceholder and
cuda.tile.jax.InputOutput for declaring outputs and in-place
updates.