> For clean Markdown content of this page, append .md to this URL. For the complete documentation index, see https://docs.nvidia.com/dynamo/llms.txt. For full content including API reference and SDK examples, see https://docs.nvidia.com/dynamo/llms-full.txt.

# KV Cache Transfer in Disaggregated Serving

In disaggregated serving architectures, KV cache must be transferred between prefill and decode workers. TensorRT-LLM supports two methods for this transfer:

## Default Method: UCX
By default, TensorRT-LLM uses UCX (Unified Communication X) for KV cache transfer between prefill and decode workers. UCX provides high-performance communication optimized for GPU-to-GPU transfers.

## Beta Method: NIXL
TensorRT-LLM also supports using **NIXL** (NVIDIA Inference Xfer Library) for KV cache transfer. [NIXL](https://github.com/ai-dynamo/nixl) is NVIDIA's high-performance communication library designed for efficient data transfer in distributed GPU environments.

**Note:** NIXL support in TensorRT-LLM is currently beta and may have some sharp edges.

## Using NIXL for KV Cache Transfer

**Note:** NIXL version shipped with current dynamo is not supported by tensorrt-llm&lt;=1.2.0rc2. In order to use NIXL backend for KV cache transfer, users are required to build container image with tensorrt-llm>=1.2.0rc3.

To enable NIXL for KV cache transfer in disaggregated serving:

1. **Build the container with NIXL support(tensorrt-llm==1.2.0rc3):**
   ```bash
   ./container/build.sh --framework trtllm \
     --tensorrtllm-pip-wheel tensorrt-llm==1.2.0rc3
   ```

2. **Run the containerized environment:**
   See [run container](/dynamo/v-0-7-0/components/backends/tensor-rt-llm#run-container) section to learn how to start the container image built in previous step.

   Within container, unset `TRTLLM_USE_UCX_KVCACHE` variable so NIXL can be used instead of UCX.

   ```bash
    unset TRTLLM_USE_UCX_KVCACHE
    ```

3. **Start the disaggregated service:**
   See [disaggregated serving](/dynamo/v-0-7-0/components/backends/tensor-rt-llm#disaggregated-serving) to see how to start the deployment.

4. **Send the request:**
   See [client](/dynamo/v-0-7-0/components/backends/tensor-rt-llm#client) section to learn how to send the request to deployment.

**Important:** Ensure that ETCD and NATS services are running before starting the service.