Building from Source#
This document describes how to build the TensorRT-LLM backend and the Triton TRT-LLM container from source. The Triton container includes TensorRT-LLM, along with the TensorRT-LLM backend and the Python backend.
Build the TensorRT-LLM Backend from source#
Make sure TensorRT-LLM is installed before building the backend. Since the version of TensorRT-LLM and the TensorRT-LLM backend has to be aligned, it is recommended to directly use the Triton TRT-LLM container from NGC or build the whole container from source as described below in the Build the Docker Container section.
cd tensorrt_llm/triton_backend/inflight_batcher_llm
bash scripts/build.sh
Build the Docker Container#
[!CAUTION] build.sh is currently not working and will be fixed in the next weekly update.
Build via Docker#
You can build the container using the instructions in the TensorRT-LLM Docker Build
with tritonrelease stage. Please make sure to add CUDA_ARCHS flag for your GPU, for example if compute capability of your GPU is 89:
cd tensorrt_llm/
make -C docker tritonrelease_build CUDA_ARCHS='89-real'