Logits Processing | NVIDIA Dynamo Documentation

For general TensorRT-LLM features and configuration, see the Reference Guide.

Logits processors let you modify the next-token logits at every decoding step (e.g., to apply custom constraints or sampling transforms). Dynamo provides a backend-agnostic interface and an adapter for TensorRT-LLM so you can plug in custom processors.

How it works

Interface: Implement dynamo.logits_processing.BaseLogitsProcessor which defines __call__(input_ids, logits) and modifies logits in-place.
TRT-LLM adapter: Use dynamo.trtllm.logits_processing.adapter.create_trtllm_adapters(...) to convert Dynamo processors into TRT-LLM-compatible processors and assign them to SamplingParams.logits_processor.
Examples: See example processors in lib/bindings/python/src/dynamo/logits_processing/examples/ (temperature, hello_world).

Quick test: HelloWorld processor

You can enable a test-only processor that forces the model to respond with “Hello world!”. This is useful to verify the wiring without modifying your model or engine code.

$ cd $DYNAMO_HOME/examples/backends/trtllm
$ export DYNAMO_ENABLE_TEST_LOGITS_PROCESSOR=1
$ ./launch/agg.sh

When enabled, Dynamo initializes the tokenizer so the HelloWorld processor can map text to token IDs.
Expected chat response contains “Hello world”.

Bring your own processor

Implement a processor by conforming to BaseLogitsProcessor and modify logits in-place. For example, temperature scaling:

1 from typing import Sequence
2 import torch
3 from dynamo.logits_processing import BaseLogitsProcessor
4 
5 class TemperatureProcessor(BaseLogitsProcessor):
6     def __init__(self, temperature: float = 1.0):
7         if temperature <= 0:
8             raise ValueError("Temperature must be positive")
9         self.temperature = temperature
10 
11     def __call__(self, input_ids: Sequence[int], logits: torch.Tensor):
12         if self.temperature == 1.0:
13             return
14         logits.div_(self.temperature)

Wire it into TRT-LLM by adapting and attaching to SamplingParams:

1 from dynamo.trtllm.logits_processing.adapter import create_trtllm_adapters
2 from dynamo.logits_processing.examples import TemperatureProcessor
3 
4 processors = [TemperatureProcessor(temperature=0.7)]
5 sampling_params.logits_processor = create_trtllm_adapters(processors)

Current limitations

Per-request processing only (batch size must be 1); beam width > 1 is not supported.
Processors must modify logits in-place and not return a new tensor.
If your processor needs tokenization, ensure the tokenizer is initialized (do not skip tokenizer init).