For general TensorRT-LLM features and configuration, see the Reference Guide.
Logits processors let you modify the next-token logits at every decoding step (e.g., to apply custom constraints or sampling transforms). Dynamo provides a backend-agnostic interface and an adapter for TensorRT-LLM so you can plug in custom processors.
dynamo.logits_processing.BaseLogitsProcessor which defines __call__(input_ids, logits) and modifies logits in-place.dynamo.trtllm.logits_processing.adapter.create_trtllm_adapters(...) to convert Dynamo processors into TRT-LLM-compatible processors and assign them to SamplingParams.logits_processor.lib/bindings/python/src/dynamo/logits_processing/examples/ (temperature, hello_world).You can enable a test-only processor that forces the model to respond with “Hello world!”. This is useful to verify the wiring without modifying your model or engine code.
Implement a processor by conforming to BaseLogitsProcessor and modify logits in-place. For example, temperature scaling:
Wire it into TRT-LLM by adapting and attaching to SamplingParams: