TensorRT LLM Backend test definitions#
The following subfolder contains test definitions for TURTLE (https://gitlab-master.nvidia.com/TensorRT/Infrastructure/turtle),
which are used to validate TensorRT LLM Backend.
Directory structure#
.
└── turtle # TURTLE-related definitions
├── defs # Test definitions (pytest functions)
├── perf_configs # Defines sm_clk and mem_clk used for perf testing
└── test_lists # TURTLE-related test lists
└── bloom # Test lists used by bloom automation
└── qa # Test lists used by QA
How to run turtle test locally for TRT-LLM-Backend?#
Take gpt-350m inflight batching test case as example#
Download turtle and tekit_backend and llm-qa-test
mkdir ~/workspace && cd ~/workspace
git clone ssh://git@gitlab-master.nvidia.com:12051/TensorRT/Infrastructure/turtle.git
git clone --recurse-submodules ssh://git@gitlab-master.nvidia.com:12051/ftp/tekit_backend.git
Mount data server
mkdir -p ~/workspace/llm_data
sudo mount -o ro 10.117.145.14:/vol/scratch1/scratch.michaeln_blossom ~/workspace/llm_data/
Launch docker container
sudo docker run --gpus all --shm-size=32g --ulimit memlock=-1 --rm -it -e LLM_MODELS_ROOT=/code/llm-models -v ${PWD}/llm_data/llm-models:/code/llm-models -v ${PWD}/tekit_backend:/code/tekit_backend -v ${PWD}/turtle:/code/turtle urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:tritonserver-24.10-py3-x86_64-ubuntu22.04-trt10.6.0.26-pypi-devel-202411041524-861 bash
In Container
Set env
export LLM_BACKEND_ROOT=/code/tekit_backend/ export SKIP_CLEANUP_ENGINES=True
Build wheels and install
cd /code/tekit_backend/tensorrt_llm python3 scripts/build_wheel.py --clean --trt_root /usr/local/tensorrt pip3 install build/tensorrt_llm-*.whl
Build IFB lib and deploy
cd /code/tekit_backend/inflight_batcher_llm bash scripts/build.sh mkdir /opt/tritonserver/backends/tensorrtllm/ cp build/libtriton_tensorrtllm.so /opt/tritonserver/backends/tensorrtllm/ cp build/trtllmExecutorWorker /opt/tritonserver/backends/tensorrtllm/
Run TURTLE test.
cd /code apt-get update && apt-get install -y libffi-dev # Run TURTLE with "-k" to match test name, e.g. "-k test_gpt_350m_ib" to test all the (sub-)test case contains "test_gpt_350m_ib" in test name. ./turtle/bin/trt_test -D tekit_backend/tests/llm-backend-test-defs/turtle/defs/ --test-python3-exe /usr/bin/python3 --save-workspace -k test_gpt_350m_ib # RUN TURTLE with "-f" to run a list of tests, e.g. run the L0 test list ./turtle/bin/trt_test -D tekit_backend/tests/llm-backend-test-defs/turtle/defs/ --test-python3-exe /usr/bin/python3 --save-workspace -f tekit_backend/tests/llm-backend-test-defs/turtle/test_lists/bloom/l0_functional.txt
Tips#
To list all test available (In container)
cd /code
./turtle/bin/trt_test -D tekit_backend/tests/llm-backend-test-defs/turtle/defs/ -l
To run perf test (In container)
cd /code
./turtle/bin/trt_test -D tekit_backend/tests/llm-backend-test-defs/turtle/defs/ \
--test-python3-exe /usr/bin/python3 --save-workspace \
--perf-log-formats csv \
--perf-clock-gpu-configs-file /code/tekit_backend/tests/llm-backend-test-defs/turtle/perf_configs/gpu_configs.yml \
--perf \
-k test_perf[gpt_350m-bs:1-input_output_len:128,8-num_runs:10]