JIT Compilation

NVIDIA cuVS uses Just-in-Time (JIT) Link-Time Optimization (LTO) compilation technology to compile certain kernels. When JIT compilation is triggered, NVIDIA cuVS compiles the kernel for your architecture and automatically caches it in memory and on disk.

The cache validity is:

In-memory cache: lifetime of the process.
On-disk cache: until a CUDA driver upgrade is performed. The cache can be shared between machines through network or cloud storage, and we recommend storing it in a persistent location. For more details on configuring the on-disk cache, see the CUDA documentation on JIT Compilation. The most relevant environment variables are CUDA_CACHE_PATH and CUDA_CACHE_MAX_SIZE.

JIT compilation is a one-time cost for a given kernel configuration. After the first compilation, you should not expect a steady-state performance loss. For latency-sensitive workflows, run a warmup step before the actual workload so the relevant kernels are compiled and cached ahead of time.

The following public NVIDIA cuVS C++ APIs currently trigger JIT compilation. The search entries include single-GPU overloads and multi-GPU overloads where those overloads are exposed.

Custom distance metrics (UDFs) for IVF-flat search also use JIT compilation. See UDF Usage.

For implementation details on building JIT LTO kernel fragments and linking them at runtime, see Link-time Optimization.