Alternative Installation Methods#

pip#

pip install perf-analyzer

perf_analyzer -m <model>

Warning: If any runtime dependencies are missing, Perf Analyzer will produce errors showing which ones are missing. You will need to manually install them.

Build from Source#

docker run --rm --gpus all -it --network host ubuntu:24.04

# inside container, install build/runtime dependencies
apt update && apt install -y curl

curl -LsSf https://apt.kitware.com/kitware-archive.sh | sh

CMAKE_VERSION_FULL=$(apt-cache madison cmake | awk '/3.31.8/ {print $3; exit}')

apt update && DEBIAN_FRONTEND=noninteractive apt install -y cmake=${CMAKE_VERSION_FULL} cmake-data=${CMAKE_VERSION_FULL} g++ git libssl-dev nvidia-cuda-toolkit python3 rapidjson-dev zlib1g-dev

git clone --depth 1 https://github.com/triton-inference-server/perf_analyzer.git

mkdir perf_analyzer/build

cmake -B perf_analyzer/build -S perf_analyzer

cmake --build perf_analyzer/build --parallel 8

export PATH=$(pwd)/perf_analyzer/build/perf_analyzer/src/perf-analyzer-build${PATH:+:${PATH}}

perf_analyzer -m <model>
  • To enable OpenAI mode, add -D TRITON_ENABLE_PERF_ANALYZER_OPENAI=ON to the first cmake command.

  • To enable C API mode, add -D TRITON_ENABLE_PERF_ANALYZER_C_API=ON to the first cmake command.

  • To enable TorchServe backend, add -D TRITON_ENABLE_PERF_ANALYZER_TS=ON to the first cmake command.

  • To enable Tensorflow Serving backend, add -D TRITON_ENABLE_PERF_ANALYZER_TFS=ON to the first cmake command.

  • To disable CUDA shared memory support and the dependency on CUDA toolkit libraries, add -D TRITON_ENABLE_GPU=OFF to the first cmake command.