Getting Started#

Prerequisites#

Hardware:

NVIDIA GPU with Compute Capability 8.x (Ampere), 10.x, 11.x, or 12.x (Blackwell)

Software:

NVIDIA Driver r580 or later
Python 3.10 or later
CUDA Toolkit 13.1 or later
cuda-tile[tileiras], cuda-python, cuda-core[cu13], cupy-cuda13x[ctk]

Installation#

Install the cutile-basic package from GitHub:

pip install git+https://github.com/nvidia/cuda-tile.git@basic-experimental

Quick Start#

Clone the repository and check out the basic-experimental branch:

git clone https://github.com/nvidia/cuda-tile.git
cd cuda-tile
git checkout basic-experimental

The repository ships with example BASIC programs in the examples/ directory. Compile one to a .cubin (the compiler prints the path to stdout):

python -m cutile_basic.cli examples/hello.bas

Write the .cubin to a specific path:

python -m cutile_basic.cli examples/hello.bas -o hello.cubin

Another example:

python -m cutile_basic.cli examples/vector_add.bas -o vector_add.cubin

Run a GPU demo end-to-end:

python examples/vector_add.py

Using the Python API#

from cutile_basic import compile_basic_to_cubin

source = """
10 DIM A(1024), B(1024), C(1024)
20 TILE A(128), B(128), C(128)
30 INPUT A(), B()
40 LET C(BID) = A(BID) + B(BID)
50 OUTPUT C
60 END
"""

result = compile_basic_to_cubin(source)
print(result.cubin_path)   # path to the compiled .cubin
print(result.meta)         # kernel metadata (arrays, grid size, etc.)