
Kimi-K2.6
Moonshot / NVIDIA
Multi-target vLLM matrix covering B200 and H200 for chat and agentic traffic, with Eagle3 MLA speculation and KV-aware routing.
vLLM4x B200 / 8x H200Chat + agentic
Open recipeAggregated vLLM targets for B200 and H200 chat and agentic traffic with MTP speculative decoding and trace-backed benchmarks.
vLLM4x B200 / 8x H200Chat + agentic
Open recipeNVFP4 and FP8 vLLM targets for B200 and H200 chat and agentic traffic with MTP speculative decoding and KV-aware routing.
vLLM4x B200 / 4x H200Chat + agentic
Open recipe20x GB200 SGLang P/D recipe for 1K input / 8K output traffic with EAGLE speculative decoding, plus an AWS EFA variant.
SGLang20x GB200Long output / static ISL-OSL
Open recipe
DeepSeek V3.2 NVFP4
NVIDIA checkpoint / DeepSeek
32x GB200 TensorRT-LLM recipe with P/D split, KV-aware routing, and WideEP for long-context coding traces.
TRT-LLM32x GB200Long-context reuseRelated benchmark
Open recipeTensorRT-LLM GB200 recipe pair organized by traffic target: aggregated serving and a disaggregated P/D split.
TRT-LLM4x GB200Static ISL-OSL
Open recipeDisaggregated vLLM serving with KV-aware routing on 16x H200, for multi-turn conversational traffic with prefix reuse.
vLLM16x H200Multi-turn conversationRelated benchmark
Open recipe16-GPU TensorRT-LLM recipe matrix for Hopper/Blackwell and aggregate/P-D serving.
TRT-LLM16x H100/H200Static ISL-OSL
Open recipeFP8 recipe set spanning TensorRT-LLM aggregate, TensorRT-LLM P/D, and vLLM P/D targets.
TRT-LLM2-8x GPUStatic ISL-OSL
Open recipe
Llama-3.3-70B FP8
Meta / Red Hat
Llama FP8 topology recipe set for 8K input / 1K output traffic on H100/H200.
vLLM4x H100/H200Long output / static ISL-OSLRelated benchmark
Open recipeNo recipes match the selected filters.