JIT Compilation

cuVS uses Just-in-Time (JIT) Link-Time Optimization (LTO) compilation technology to compile certain kernels. When JIT compilation is triggered, cuVS compiles the kernel for your architecture and automatically caches it in memory and on disk.

The cache validity is:

In-memory cache is valid for the lifetime of the process.
On-disk cache is valid until a CUDA driver upgrade is performed. The cache can be shared between machines through network or cloud storage, and we recommend storing it in a persistent location. For more details on configuring the on-disk cache, see the CUDA documentation on JIT Compilation. The most relevant environment variables are CUDA_CACHE_PATH and CUDA_CACHE_MAX_SIZE.

JIT compilation is a one-time cost for a given kernel configuration. After the first compilation, you should not expect a steady-state performance loss. For latency-sensitive workflows, run a warmup step before the actual workload so the relevant kernels are compiled and cached ahead of time.

The following cuVS capabilities currently trigger JIT compilation:

IVF-Flat search APIs: cuvs::neighbors::ivf_flat::search()

For implementation details on building JIT LTO kernel fragments and linking them at runtime, see Link-time Optimization.