4. Binary Format#
The Tile IR bytecode is a binary representation of Tile IR modules including all items (globals, kernels, functions), attributes, and instructions.
The bytecode format is stable and versioned providing part of the stability guarantee of Tile IR. The format can be produced by older compilers, can be read by newer compilers and each driver can accept bytecode up to its supported version.
The remainder of this section describes the Tile IR bytecode format and its encoding.
The bytecode has a few top-level design goals: - To represent a finite but expandable over time set of operations. - To use a minimal set of types for the encoding. - To enable manual construction and inspection by humans. - To allow lazy loading of functions. - To support forward and backward compatibility of the encoding.
The encoding of individual operations is described in Operations.
4.1. Primitives#
The bytecode is composed of a handful of primitive types, each with their own encoding described below. The format uses these primitives to encode more complex data structures for representing the full set of Tile IR items.
4.1.1. Fixed-Width Integers#
Fixed width integers are unsigned integers of a known size (in bytes). The values are stored in little-endian byte order.
Note: All multi-byte values in the Tile IR bytecode format use little-endian encoding, including fixed-width integers, array offsets, and type indices.
byte ::= `0x00`...`0xFF`
4.1.2. Variable-Width Integers#
Variable width integers, or VarInt`s, provide a compact representation for integers.
Each encoded :code:`VarInt consists of one to nine bytes, which together represent a single
64-bit value. The Tile IR bytecode utilizes the PrefixVarInt encoding for VarInt`s.
This encoding is a variant of the :code:`LEB128 (“Little-Endian Base 128”) encoding, where each
byte of the encoding provides up to 7 bits for the value, with the remaining bit used to
store a tag indicating the number of bytes used for the encoding. Small unsigned integers
(less than 2^7) may be stored in one byte, larger unsigned integers (up to 2^14) may
be stored in two bytes, and so on.
varint ::= `0x00`...`0xFF`
4.2. File Structure#
The Tile IR bytecode file is composed of a stream of bytes; which is composed of a header followed by multiple sections. The header contains the 8-byte “magic number”, and a version number.
Each section is expected to appear once within a bytecode file and is identified by a type code, the length, alignment, padding and the payload. This design allows forward compatibility and selective parsing (tools can skip unknown sections, or unneeded sections).
bytecode {
magic: "\x7FTileIR\x00",
version: varint,
sections: section[]
}
4.2.1. Magic Number#
The Tile IR bytecode magic number consumes 8 bytes and is \x7FTileIR\x00. The magic number
must be present at the beginning of the bytecode file to be accepted by the driver.
4.2.2. Version#
Following the magic number is the version of the bytecode format. The version allows both forwards and backwards compatibility of the format.
The version is encoded as three little-endian fixed-width fields immediately following the magic number:
version {
major: uint8_t, // Major version number
minor: uint8_t, // Minor version number
tag: uint16_t // Version tag (little-endian)
}
Each file encodes the version which allows parsers to adapt to the specific version. We do not break old versions of the format, if a newer file uses features an older parser can’t interpret, it skips unknown optional features but will fail gracefully in cases where new features are unsupported. A new producer can also target older versions of the bytecode directly ensuring compatibility with older drivers.
Version Compatibility Guarantees#
Backward Compatibility (a new deserializer can read old bytecode):
The deserializer must support all previous versions of the bytecode format.
The deserializer must handle all previously existing opcodes, types, and sections.
The deserializer must maintain original program semantics.
Forward Compatibility (an old deserializer can read new bytecode):
The deserializer must fail only if it encounters unknown required features.
The deserializer must only read bytecode up to its version.
The deserializer must error with a clear message if there is a version mismatch.
Version Targeting (a serializer can target specific older versions):
The serializer may target one or more specific older versions.
The serializer must validate all features are supported by the requested target version.
The serializer must fail if attempting to serialize unsupported features.
Optional vs Required Features (required changes are clearly documented):
Required changes must be introduced in a new bytecode version and gracefully failure must be designed.
Optional changes must be clearly documented and must preserve the above guarantees.
Examples:
For new ops → Add new opcodes
For changes to existing ops → Typically keep the old format for backward compatibility and add new opcode rather than modifying existing ones
4.2.3. Sections#
The remainder of the bytecode stream is composed of sections. Sections are used to group data within the
bytecode and allow operations on the stream to be per-section enabling out-of-order processing of data and/or lazy-loading.
Each section contains a section ID, whose high bit indicates if the section has alignment requirements, a
length and an optional alignment. A section ID is a 7-bit integer with the high bit indicating if the section
has alignment requirements. When an alignment is present, a variable number of padding bytes (each byte = 0xCB)
may appear before the section data. The alignment of a section must be a power of 2. The alignment is represented
as a VarInt. The padding ensures the section data starts at the specified alignment boundary.
section {
idAndIsAligned: byte // low 7 bits = section ID, high bit = alignment bit
length: varint,
alignment: varint?, // present only if high bit was set
padding: byte[], // bytes of 0xCB as needed
data: byte[]
}
Deserialization#
The following are the implications of the section format on the deserialization process.
The deserializer must read the section ID and length (
idAndIsAligned) as a single byte.If the section ID has the high bit set, the deserializer must read the alignment (
alignment) and apply the alignment by skipping0xCBbytes as needed.The deserializer must read the length (
length) bytes of the payload. The length is aVarInt.The deserializer must move on to the next section until
EOF(end of file).
String Section#
The string section holds all textual names used by the module to avoid repeating them inline.
The section is encoded with the total number of strings, followed by the start
index of each of the individual strings. The remaining encoding contains a single blob containing all the strings
concatenated together. This design allows loading a specific string without reading the whole section. Strings in
the bytecode are stored as raw byte sequences (in UTF-8 encoding) with an associated size. Finding the i-th
string involves jumping to stringStartIndex[i] and reading until the next offset or end of the blob.
strings {
numStrings: varint,
stringStartIndex: uint32[],
stringData: byte[]
}
Function Table Section#
The function table section enumerates the module’s functions and embeds their code inline. First,
numFunctions indicates how many functions follow; for each function i,
nameIndex[i] is an index into the string section naming the function and
signatureIndex[i] is an index into the Type section specifying its parameter, return
types, global flags, etc (so multiple functions with identical signatures can share the
same type entry). The field functionLocIndex[i] is a VarInt referencing an entry
in the Debug Section that describes the function’s definition location (e.g., source
file and line). If functionLocIndex[i] is zero, there is no associated debug
metadata for the function’s definition scope. lengthOfFunction[i] states how many
bytes of code belong to function i, and functionBody[i] contains exactly those
bytes of instruction encodings. This layout avoids a separate code section and makes
parsing each function straightforward: once you read the metadata for function i,
you can either parse its instructions directly or skip them by advancing
lengthOfFunction[i] bytes. The instruction encoding itself allows each operation to
include a slot for its source location, referencing an entry in the debug section.
The instruction encodings (opcodes, operands, etc.) are described later under Operation Opcodes and Encodings Section.
The function table encoding has been updated to include function flags and optional optimization hints:
functionTable {
numFunctions : varint
// for each function i in [0..numFunctions-1]:
nameIndex[i] : varint // References the function's name in the StringSec
signatureIndex[i] : varint // References the extended function signature in the TypeSec
entryFlag[i] : byte // Function flags (visibility, kind, optimization hints)
functionLocIndex[i] : varint // 0 means no debug location
optimizationHints[i]? : self-contained // Present only if HasOptimizationHints flag is set
lengthOfFunction[i] : varint
functionBody[i] : byte[lengthOfFunction[i]]
}
The entryFlag byte encodes the following information:
Bit 0 (0x01): Visibility Flag (0 = Public, 1 = Private)
Bit 1 (0x02): Function Kind Flag (0 = Device Function, 1 = Kernel Entry Point)
Bit 2 (0x04): Has Optimization Hints Flag (0 = No, 1 = Yes)
Bits 3-7: Reserved for future extensions
Constant Data Section#
The constant data section holds large constants (e.g., dense tensors, large arrays) separately from code.
The current implementation does not impose explicit size limits on individual constants.
Constants are stored using uint64_t offsets, allowing for very large constant data.
However, practical limits may be imposed by available memory and any downstream
compiler or runtime limitations (such as CUBIN generation constraints).
As we have done with the string section we move the constants into their own section to avoid bloating the function table section and allow for the lazy loading of specific constants when needed. Individual operations may reference constants by an index into this section. The section is encoded with the total number of constants, followed by the start index of each of the individual constants. The remaining encoding contains a single blob containing all the constants concatenated together.
constant {
numConstants: varint,
constantStartIndex: uint64_t[],
constantData: byte[]
}
Type Section#
The type section stores all type definitions (scalar types, function signatures, parametric
types, etc.) used in the module. The section begins with numTypes specifying how many types
follow, then an array of offsets (typeStartIndex) into the encoded blob (typeData). To load the
i-th type definition, you use the offset contained in typeStartIndex[i] to index into
the typeData blob and parse the definition from there. The typeData blob is a single encoded binary
blob containing the type definitions concatenated together.
type {
numTypes: varint
typeStartIndex: uint32_t[] // array of offsets, length = numTypes
typeData: byte[] // concatenated bytes for all type definitions
}
Each individual type definition will consist of a typeTag followed by a predefined payload
structure specific to that typeTag. This deterministic approach allows parsers to
understand the payload layout based solely on the typeTag.
typeTag : byte
payload : byte[lengthOfPayload]
typeTagindicates the “kind” of type. This can be a simple scalar, a function signature, or a more complex structure like a tensor.payloadis interpreted based ontypeTag. For instance, a function signature might store the number of parameters, references to their types, etc, while a tensor type might store dimensions and an element type.
Example typeTag Values:
|
||||
|---|---|---|---|---|
0x01 | i1 (1-bit int) | None |
||||
0x02 |
i32 (32-bit int) |
None |
||
0x03 |
i64 (64-bit int) |
None |
||
0x04 |
tensor |
|
||
0x10 |
Function |
Extended function signature |
||
… |
Future types |
Depends on the type |
||
Detailed Type Encodings#
The following sections describe the specific encoding format for each type tag:
Integer Types (typeTag = 0x00-0x04)
Integer types require no additional payload data beyond the type tag:
I1: typeTag = 0x00 // 1-bit boolean
I8: typeTag = 0x01 // 8-bit integer
I16: typeTag = 0x02 // 16-bit integer
I32: typeTag = 0x03 // 32-bit integer
I64: typeTag = 0x04 // 64-bit integer
Float Types (typeTag = 0x05-0x0B)
Float types require no additional payload data beyond the type tag:
F16: typeTag = 0x05 // 16-bit IEEE float
BF16: typeTag = 0x06 // 16-bit bfloat
F32: typeTag = 0x07 // 32-bit IEEE float
TF32: typeTag = 0x08 // TensorFloat-32
F64: typeTag = 0x09 // 64-bit IEEE float
F8E4M3FN: typeTag = 0x0A // 8-bit float (E4M3FN format)
F8E5M2: typeTag = 0x0B // 8-bit float (E5M2 format)
Pointer Type (typeTag = 0x0C)
typeTag : byte = 0x0C // Pointer type
pointeeTypeIndex : varint // Index of the pointee type
Tile Type (typeTag = 0x0D)
typeTag : byte = 0x0D // Tile type
elementTypeIndex : varint // Index of the element type
shape : int64_t[] // Shape dimensions (var-length array)
TensorView Type (typeTag = 0x0E)
typeTag : byte = 0x0E // TensorView type
elementTypeIndex : varint // Index of the element type
shape : int64_t[] // Shape dimensions (var-length array)
strides : int64_t[] // Stride values (var-length array)
indexTypeTag : byte // Index type (I32=0x03 or I64=0x04)
PartitionView Type (typeTag = 0x0F)
typeTag : byte = 0x0F // PartitionView type
tileShape : int32_t[] // Tile shape (var-length array)
tensorViewTypeIndex : varint // Index of the TensorView type
dimMap : int32_t[] // Dimension mapping (var-length array)
masked : byte // Masked flag (0x00=false, 0x01=true)
Function Type (typeTag = 0x10)
typeTag : byte = 0x10 // Function Type
numParams : varint // Number of input parameters
paramTypeIndices : varint[] // Array of type indices for each parameter
numResults : varint // Number of results
resultTypeIndices : varint[] // Array of type indices for each return value
The function type encoding stores only the essential type information (input and result types). Argument attributes and other function metadata are stored separately in the function table section.
Token Type (typeTag = 0x11)
typeTag : byte = 0x11 // Token type (no additional payload)
Attribute Encoding#
Attributes in Tile IR bytecode can be encoded in two ways:
Inline encoding - Simple attributes are encoded directly in the instruction stream
Self-contained encoding - Complex attributes include a type tag followed by their data
Self-contained attributes use the following format:
attributeTag : byte
attributeData : byte[] // Format depends on attributeTag
The following attribute tags are supported:
Integer Attribute (attributeTag = 0x01)
attributeTag : byte = 0x01 // Integer attribute
typeIndex : varint // Type index for the integer type
value : varint // Integer value (zero-extended)
Float Attribute (attributeTag = 0x02)
attributeTag : byte = 0x02 // Float attribute
typeIndex : varint // Type index for the float type
value : byte[] // APFloat representation (variable length)
Bool Attribute (attributeTag = 0x03)
attributeTag : byte = 0x03 // Bool attribute
value : byte // 0x00=false, 0x01=true
Type Attribute (attributeTag = 0x04)
attributeTag : byte = 0x04 // Type attribute
typeIndex : varint // Index of the referenced type
String Attribute (attributeTag = 0x05)
attributeTag : byte = 0x05 // String attribute
stringIndex : varint // Index into the string section
Array Attribute (attributeTag = 0x06)
attributeTag : byte = 0x06 // Array attribute
numElements : varint // Number of elements
elements : self-contained[] // Array of self-contained attributes
DenseElements Attribute (attributeTag = 0x07)
attributeTag : byte = 0x07 // DenseElements attribute
typeIndex : varint // Type index for the shaped type
constantIndex : varint // Index into constant section (for numeric data)
// OR for string data:
numStrings : varint // Number of string elements
stringIndices : varint[] // Indices into string section
DivBy Attribute (attributeTag = 0x08)
attributeTag : byte = 0x08 // DivBy attribute
divisor : varint // Divisor value
flags : byte // Bit 0: unsignedInt, Bit 1: hasEvery, Bit 2: hasAlong
every : signed_varint? // Present if Bit 1 set
along : signed_varint? // Present if Bit 2 set
SameElements Attribute (attributeTag = 0x09)
attributeTag : byte = 0x09 // SameElements attribute
values : int64_t[] // Array of int64 values (var-length)
Dictionary Attribute (attributeTag = 0x0A)
attributeTag : byte = 0x0A // Dictionary attribute
numEntries : varint // Number of key-value pairs
entries : dictEntry[] // Array of dictionary entries
dictEntry {
keyStringIndex : varint // Index of key string
value : self-contained // Self-contained attribute value
}
OptimizationHints Attribute (attributeTag = 0x0B)
attributeTag : byte = 0x0B // OptimizationHints attribute
dictionary : dictionary // Dictionary attribute (without tag)
NonNegative Attribute (attributeTag = 0x0C)
attributeTag : byte = 0x0C // NonNegative attribute
// No additional payload - presence indicates non-negative constraint
Global Section#
The global section stores module-level global variables. This section is optional and is only present if the module contains cuda_tile.global operations.
global {
numGlobals: varint
// for each global i in [0..numGlobals-1]:
symbolNameIndex[i] : varint // References the global's symbol name in the StringSec
valueTypeIndex[i] : varint // Type index for the global's value type
constantValueIndex[i] : varint // Index into constant section for the global's initial value
}
Each global variable is encoded with:
symbolNameIndex: Index into the string section for the global’s symbol name
valueTypeIndex: Index into the type section for the global’s type (typically a shaped type like tensor)
constantValueIndex: Index into the constant section containing the global’s initial value
Debug Section#
The debug section stores the serialized debug information (for more details about debug information see Debug Info). This section is optional as certain tools may ignore it and serializers may leave it empty for release builds.
debug {
diOpsNum: varint // Total number of operations with debug info
padding: bytes // Align to 4 bytes
diIndexOffsets: uint32_t[] // Per op offset into the debug info indices
diIndicesNum: varint // Total number of debug info indices
padding: bytes // Align to 4 bytes
diIndices: uint64_t[] // Array of debug indices to debug info attributes
diAttrNum: varint // Total number of debug info attributes
padding: bytes // Align to 4 bytes
diOffsets: uint32_t[] // Per debug info attribute offset into the debug info data
diData: bytes // Data for each debug info attribute
}
The debug section uses a multi-level indirection scheme:
Operations:
diOpsNumoperations have debug info, withdiIndexOffsetspointing into the indices arrayIndices:
diIndicesNumtotal indices indiIndices, referencing debug info attributesAttributes:
diAttrNumdebug info attributes stored indiDatawith offsets indiOffsets
Each debug info attribute begins with a debugEntryType indicating what kind of debug info it is:
0x00= “Unknown”0x01= “DICompileUnit”0x02= “DIFile”0x03= “DILexicalBlock”0x04= “DILoc”0x05= “DISubprogram”0x06= “CallSite”
debugEntryPayload describes line, file, variable name, function index, instruction
offset, etc. Each debugEntryType has a fixed, known structure. If
functionLocIndex[i] or an instruction’s locationIndex is non-zero, it references
debugEntryOffset[...] in this section, whose payload can store file/line info or
other metadata.
Debug Entry Payload Formats#
Each debug entry in diData begins with a debugEntryType byte followed by type-specific data:
debugEntry {
debugEntryType : byte // Identifies the debug info type
entryData : byte[] // Type-specific payload (format below)
}
The following debug entry types are supported:
Unknown Debug Info (debugEntryType = 0x00)
debugEntryType : varint = 0x00 // Unknown debug info
// No additional payload
DICompileUnit (debugEntryType = 0x01)
debugEntryType : varint = 0x01 // DICompileUnit
language : varint // Source language identifier
fileIndex : varint // Index of associated DIFile
producer : varint // String index for compiler producer
optimized : byte // 0x00=false, 0x01=true
emissionKind : varint // Emission kind enumeration
DIFile (debugEntryType = 0x02)
debugEntryType : varint = 0x02 // DIFile
filename : varint // String index for filename
directory : varint // String index for directory
DILexicalBlock (debugEntryType = 0x03)
debugEntryType : varint = 0x03 // DILexicalBlock
line : varint // Line number
column : varint // Column number
scopeIndex : varint // Index of parent scope (DIFile or DISubprogram)
DILoc (debugEntryType = 0x04)
debugEntryType : varint = 0x04 // DILoc (source location)
line : varint // Line number
column : varint // Column number
scopeIndex : varint // Index of scope (DISubprogram, DILexicalBlock, etc.)
inlinedAtIndex : varint // Index of inlined location (0 if not inlined)
DISubprogram (debugEntryType = 0x05)
debugEntryType : varint = 0x05 // DISubprogram (function debug info)
name : varint // String index for function name
linkageName : varint // String index for linkage name
fileIndex : varint // Index of associated DIFile
line : varint // Line number where function is defined
typeIndex : varint // Index of function type
scopeLineIndex : varint // Line number where scope begins
flags : varint // Function flags (visibility, etc.)
unitIndex : varint // Index of associated DICompileUnit
CallSite (debugEntryType = 0x06)
debugEntryType : varint = 0x06 // CallSite location
calleeIndex : varint // Index of called location
callerIndex : varint // Index of calling location
4.3. Operation Opcodes and Encodings#
Each instruction in Tile IR bytecode is represented as:
opcode : byte
locationIndex : varint
instructionSpecificFields : byte[]
In this representation, opcode uniquely identifies the operation. This matches one
of the operations defined in the Tile IR dialect. locationIndex is always
present; if the instruction does not carry debug info, this field is 0. A non-zero value
refers to an entry in the Debug Section that contains file, line, or other source-level
metadata. The instruction-specific fields (operands, attributes, etc.) follow a layout
defined by the opcode.
Additionally, we do not store a resultIndex for each producing instruction. Instead, we
rely on a sequential pass to assign local value indices at parse-time.
Older parsers are expected to skip or reject unknown opcodes. Over time, new operations can be added simply by assigning new opcodes and defining their binary payload formats.
4.4. Operation Encoding Details#
The general structure for operation encoding follows a consistent pattern, but varies based on the operation’s characteristics:
General Operation Structure
opcode : byte // Operation identifier
locationIndex : varint // Debug location (0 = no debug info)
resultTypes : typeIndex[]? // Present for variadic result operations
flags : varint? // Optional flags for operations with optional fields
attributes : encoded_attr[] // Operation-specific attributes
operands : operand_encoding // Operation-specific operand encoding
regions : region_encoding[]? // Present for operations with regions
Flags Field Encoding
For operations with optional attributes or operands, a flags field is used:
flags : varint // Bitfield encoding optional presence
The flags field uses individual bits to indicate the presence of optional attributes and operands:
Bits 0-N: Optional attributes (in declaration order)
Bits N+1-M: Optional operands (in declaration order)
UnitAttr attributes are encoded only in the flags field - no additional data is written.
Operand Encoding Patterns
Operands are encoded differently based on the operation’s operand structure:
Fixed Operands: Written as sequential operand indices
Variadic Operands: Prefixed with operand count, then indices
AttrSizedOperandSegments: Each operand group encoded separately
// Fixed operands (e.g., binary operations)
operand1Index : varint
operand2Index : varint
// Variadic operands (e.g., function calls)
numOperands : varint
operandIndices : varint[numOperands]
// AttrSizedOperandSegments (e.g., operations with optional operand groups)
group1 : optional_operand_group
group2 : optional_operand_group
...
Result Type Encoding
Fixed Results: No result types encoded (inferred from operation)
Variadic Results: Number of results encoded, followed by type indices
// Variadic results
numResults : varint
resultTypeIndices : varint[numResults]
Region Encoding
Operations with regions encode them after operands:
numRegions : varint
regions : region[numRegions]
region {
numBlocks : varint
blocks : block[numBlocks]
}
block {
numArgs : varint
argTypeIndices : varint[numArgs]
numOps : varint
operations : operation[numOps]
}
Common Operation Examples
Arithmetic Operations (e.g., cuda_tile.add)
opcode : byte // e.g., 0x15 for add
locationIndex : varint // Debug location
lhs : varint // Left operand index
rhs : varint // Right operand index
// Result type inferred from operands
Memory Operations (e.g., cuda_tile.load)
opcode : byte // e.g., 0x20 for load
locationIndex : varint // Debug location
resultType : varint // Type index for loaded value
address : varint // Address operand index
// Optional attributes encoded via flags
Control Flow Operations (e.g., cuda_tile.if)
opcode : byte // e.g., 0x30 for if
locationIndex : varint // Debug location
condition : varint // Condition operand index
numRegions : varint // Always 2 for if (then, else)
thenRegion : region // Then block
elseRegion : region // Else block (may be empty)
4.5. Section Ordering#
Tile IR bytecode readers can handle sections in any order due to their flexible parsing design. The reader first discovers all sections and stores their payloads, then processes them in dependency order.
Default Writing Order
Writers typically emit sections in the following order:
Header (magic number + version)
Global Section (Section ID: 0x06) - Optional, only if globals present
Function Table Section (Section ID: 0x02) - Required
Constant Data Section (Section ID: 0x04) - Optional, only if constants present
Debug Section (Section ID: 0x03) - Optional
Type Section (Section ID: 0x05) - Required
String Section (Section ID: 0x01) - Required
End-of-Bytecode Marker (0x00) - Required
However, readers are not dependent on this order and can process sections regardless of their arrangement in the file. This flexibility enables future optimizations and different writing strategies.
Reader Implementation
The reader implements this flexibility by:
Discovery Phase: Reads all section headers and stores their payloads in memory
Processing Phase: Processes sections in dependency order regardless of file order
Lazy Resolution: Resolves forward references (e.g., types, strings) on-demand
This design allows for efficient random access to any section and supports future file format optimizations.
Section Dependencies
Function table references: types, strings, constants, debug info
Global section references: types, strings, constants
Debug section references: strings
All sections may reference the string section