For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
  • Kubernetes Deployment
    • Deployment Guide
  • User Guides
    • KV Cache Aware Routing
    • Disaggregated Serving
    • KV Cache Offloading
    • Dynamo Benchmarking
    • Multimodal
    • Diffusion (Preview)
    • Tool Calling
    • LoRA Adapters
    • Agents
    • Observability (Local)
    • Fault Tolerance
    • Writing Python Workers
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • SGLang HiCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
    • Blog
  • Documentation
    • Dynamo Docs Guide
  • Additional Resources
      • Logits Processing
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • How it works
  • Quick test: HelloWorld processor
  • Bring your own processor
  • Current limitations
Additional ResourcesTensorRT-LLM Details

Logits Processing

||View as Markdown|
Edit this page
Previous

Dynamo Docs Guide

For general TensorRT-LLM features and configuration, see the Reference Guide.


Logits processors let you modify the next-token logits at every decoding step (e.g., to apply custom constraints or sampling transforms). Dynamo provides a backend-agnostic interface and an adapter for TensorRT-LLM so you can plug in custom processors.

How it works

  • Interface: Implement dynamo.logits_processing.BaseLogitsProcessor which defines __call__(input_ids, logits) and modifies logits in-place.
  • TRT-LLM adapter: Use dynamo.trtllm.logits_processing.adapter.create_trtllm_adapters(...) to convert Dynamo processors into TRT-LLM-compatible processors and assign them to SamplingParams.logits_processor.
  • Examples: See example processors in lib/bindings/python/src/dynamo/logits_processing/examples/ (temperature, hello_world).

Quick test: HelloWorld processor

You can enable a test-only processor that forces the model to respond with “Hello world!”. This is useful to verify the wiring without modifying your model or engine code.

$cd $DYNAMO_HOME/examples/backends/trtllm
$export DYNAMO_ENABLE_TEST_LOGITS_PROCESSOR=1
$./launch/agg.sh
  • When enabled, Dynamo initializes the tokenizer so the HelloWorld processor can map text to token IDs.
  • Expected chat response contains “Hello world”.

Bring your own processor

Implement a processor by conforming to BaseLogitsProcessor and modify logits in-place. For example, temperature scaling:

1from typing import Sequence
2import torch
3from dynamo.logits_processing import BaseLogitsProcessor
4
5class TemperatureProcessor(BaseLogitsProcessor):
6 def __init__(self, temperature: float = 1.0):
7 if temperature <= 0:
8 raise ValueError("Temperature must be positive")
9 self.temperature = temperature
10
11 def __call__(self, input_ids: Sequence[int], logits: torch.Tensor):
12 if self.temperature == 1.0:
13 return
14 logits.div_(self.temperature)

Wire it into TRT-LLM by adapting and attaching to SamplingParams:

1from dynamo.trtllm.logits_processing.adapter import create_trtllm_adapters
2from dynamo.logits_processing.examples import TemperatureProcessor
3
4processors = [TemperatureProcessor(temperature=0.7)]
5sampling_params.logits_processor = create_trtllm_adapters(processors)

Current limitations

  • Per-request processing only (batch size must be 1); beam width > 1 is not supported.
  • Processors must modify logits in-place and not return a new tensor.
  • If your processor needs tokenization, ensure the tokenizer is initialized (do not skip tokenizer init).