Alternative Installation Methods#

Pip#

pip install tritonclient

perf_analyzer -m <model>

Warning: If any runtime dependencies are missing, Perf Analyzer will produce errors showing which ones are missing. You will need to manually install them.

Build from Source#

The Triton SDK container is used for building, so some build and runtime dependencies are already installed.

export RELEASE=<yy.mm> # e.g. to use the release from the end of February of 2023, do `export RELEASE=23.02`

docker pull nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

docker run --gpus all --rm -it --net host nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

# inside container
# prep installing newer version of cmake
apt update && apt install -y gpg wget && wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | tee /usr/share/keyrings/kitware-archive-keyring.gpg >/dev/null && . /etc/os-release && echo "deb [signed-by=/usr/share/keyrings/kitware-archive-keyring.gpg] https://apt.kitware.com/ubuntu/ $UBUNTU_CODENAME main" | tee /etc/apt/sources.list.d/kitware.list >/dev/null

# install build/runtime dependencies
apt update && apt install -y cmake-data=3.27.7* cmake=3.27.7* libcurl4-openssl-dev rapidjson-dev

rm -rf client ; git clone --depth 1 https://github.com/triton-inference-server/client

mkdir client/build ; cd client/build

cmake -DTRITON_ENABLE_PERF_ANALYZER=ON ..

make -j8 cc-clients

cc-clients/perf_analyzer/perf_analyzer -m <model>
  • To enable CUDA shared memory, add -DTRITON_ENABLE_GPU=ON to the cmake command.

  • To enable C API mode, add -DTRITON_ENABLE_PERF_ANALYZER_C_API=ON to the cmake command.

  • To enable TorchServe backend, add -DTRITON_ENABLE_PERF_ANALYZER_TS=ON to the cmake command.

  • To enable Tensorflow Serving backend, add -DTRITON_ENABLE_PERF_ANALYZER_TFS=ON to the cmake command.