TensorRT LLM Backend test definitions#

The following subfolder contains test definitions for TURTLE (https://gitlab-master.nvidia.com/TensorRT/Infrastructure/turtle), which are used to validate TensorRT LLM Backend.

Directory structure#

.
└── turtle              # TURTLE-related definitions
    ├── defs            #     Test definitions (pytest functions)
    ├── perf_configs    #     Defines sm_clk and mem_clk used for perf testing
    └── test_lists      #     TURTLE-related test lists
        └── bloom       #         Test lists used by bloom automation
        └── qa          #         Test lists used by QA

How to run turtle test locally for TRT-LLM-Backend?#

Take gpt-350m inflight batching test case as example#

Download turtle and tekit_backend and llm-qa-test

mkdir ~/workspace && cd ~/workspace
git clone ssh://git@gitlab-master.nvidia.com:12051/TensorRT/Infrastructure/turtle.git
git clone --recurse-submodules ssh://git@gitlab-master.nvidia.com:12051/ftp/tekit_backend.git

Mount data server

mkdir -p ~/workspace/llm_data
sudo mount -o ro 10.117.145.14:/vol/scratch1/scratch.michaeln_blossom ~/workspace/llm_data/

Launch docker container

sudo docker run --gpus all --shm-size=32g --ulimit memlock=-1 --rm -it -e LLM_MODELS_ROOT=/code/llm-models -v ${PWD}/llm_data/llm-models:/code/llm-models -v ${PWD}/tekit_backend:/code/tekit_backend -v ${PWD}/turtle:/code/turtle urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:tritonserver-24.10-py3-x86_64-ubuntu22.04-trt10.6.0.26-pypi-devel-202411041524-861 bash

In Container

Set env

export LLM_BACKEND_ROOT=/code/tekit_backend/
export SKIP_CLEANUP_ENGINES=True

Build wheels and install

cd /code/tekit_backend/tensorrt_llm
python3 scripts/build_wheel.py --clean --trt_root /usr/local/tensorrt
pip3 install build/tensorrt_llm-*.whl

Build IFB lib and deploy

cd /code/tekit_backend/inflight_batcher_llm
bash scripts/build.sh
mkdir /opt/tritonserver/backends/tensorrtllm/
cp build/libtriton_tensorrtllm.so /opt/tritonserver/backends/tensorrtllm/
cp build/trtllmExecutorWorker /opt/tritonserver/backends/tensorrtllm/

Run TURTLE test.

cd /code
apt-get update && apt-get install -y libffi-dev
# Run TURTLE with "-k" to match test name, e.g. "-k test_gpt_350m_ib" to test all the (sub-)test case contains "test_gpt_350m_ib" in test name.
./turtle/bin/trt_test -D tekit_backend/tests/llm-backend-test-defs/turtle/defs/ --test-python3-exe /usr/bin/python3 --save-workspace -k test_gpt_350m_ib

# RUN TURTLE with "-f" to run a list of tests, e.g. run the L0 test list
./turtle/bin/trt_test -D tekit_backend/tests/llm-backend-test-defs/turtle/defs/ --test-python3-exe /usr/bin/python3 --save-workspace -f tekit_backend/tests/llm-backend-test-defs/turtle/test_lists/bloom/l0_functional.txt

Tips#

To list all test available (In container)

cd /code
./turtle/bin/trt_test -D tekit_backend/tests/llm-backend-test-defs/turtle/defs/ -l

To run perf test (In container)

cd /code
./turtle/bin/trt_test -D tekit_backend/tests/llm-backend-test-defs/turtle/defs/ \
                    --test-python3-exe /usr/bin/python3 --save-workspace \
                    --perf-log-formats csv \
                    --perf-clock-gpu-configs-file /code/tekit_backend/tests/llm-backend-test-defs/turtle/perf_configs/gpu_configs.yml \
                    --perf \
                    -k test_perf[gpt_350m-bs:1-input_output_len:128,8-num_runs:10]