For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
GitHubCUDA-X
    • Home
    • Installation
  • Getting Started
    • Introduction
    • Integrations
    • Use-cases
  • User Guide
    • API Guide
    • Benchmarking Guide
    • Integration Patterns
    • Field Guide
      • Compatibility
      • JIT Compilation
      • UDF Usage
    • References
  • Developer Guide
    • Coding Guidelines
    • Contributing
  • API Reference
    • C API Documentation
    • Cpp API Documentation
    • Python API Documentation
    • Java API Documentation
    • Rust API Documentation
    • Go API Documentation
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogocuVS
GitHubCUDA-X
User GuideField Guide

JIT Compilation

||View as Markdown|

NVIDIA cuVS uses Just-in-Time (JIT) Link-Time Optimization (LTO) compilation technology to compile certain kernels. When JIT compilation is triggered, NVIDIA cuVS compiles the kernel for your architecture and automatically caches it in memory and on disk.

The cache validity is:

  1. In-memory cache is valid for the lifetime of the process.
  2. On-disk cache is valid until a CUDA driver upgrade is performed. The cache can be shared between machines through network or cloud storage, and we recommend storing it in a persistent location. For more details on configuring the on-disk cache, see the CUDA documentation on JIT Compilation. The most relevant environment variables are CUDA_CACHE_PATH and CUDA_CACHE_MAX_SIZE.

JIT compilation is a one-time cost for a given kernel configuration. After the first compilation, you should not expect a steady-state performance loss. For latency-sensitive workflows, run a warmup step before the actual workload so the relevant kernels are compiled and cached ahead of time.

The following NVIDIA cuVS capabilities currently trigger JIT compilation:

  • IVF-Flat search APIs: cuvs::neighbors::ivf_flat::search()

Custom distance metrics (UDFs) for IVF-flat search also use JIT compilation. See UDF Usage.

For implementation details on building JIT LTO kernel fragments and linking them at runtime, see Link-time Optimization.

Previous

Compatibility

Next

UDF Usage