Dynamo
v1.0.1
Stable
This guide shows how to enable SGLang’s Hierarchical Cache (HiCache) inside Dynamo.
$python -m dynamo.sglang \> --model-path Qwen/Qwen3-0.6B \> --host 0.0.0.0 --port 8000 \> --page-size 64 \> --enable-hierarchical-cache \> --hicache-ratio 2 \> --hicache-write-policy write_through \> --hicache-storage-backend nixl \> --log-level debug \> --skip-tokenizer-init
write_through
nixl
Then, start the frontend:
$python -m dynamo.frontend --http-port 8000
$curl localhost:8000/v1/chat/completions \> -H "Content-Type: application/json" \> -d '{> "model": "Qwen/Qwen3-0.6B",> "messages": [> {> "role": "user",> "content": "Explain why Roger Federer is considered one of the greatest tennis players of all time"> }> ],> "stream": false,> "max_tokens": 30> }'
Run the perf script:
$bash -x $DYNAMO_ROOT/benchmarks/llm/perf.sh \> --model Qwen/Qwen3-0.6B \> --tensor-parallelism 1 \> --data-parallelism 1 \> --concurrency "2,4,8" \> --input-sequence-length 2048 \> --output-sequence-length 256