nemo_automodel.components.speculative.dflash.registry

View as Markdown

Dispatch registry mapping target architecture -> DFlash draft model.

Mirrors the EAGLE registry (components/speculative/eagle/registry.py). The DFlash draft is a non-causal Qwen3-style stack and is config-driven, so adding a Qwen3-shaped architecture is a one-line append.

Module Contents

Classes

NameDescription
DFlashDraftSpecHow to build a DFlash draft model for a particular target architecture.

Functions

NameDescription
resolve_dflash_draft_specReturn the first registered DFlash draft spec matching any architecture in the list.

Data

DFLASH_DRAFT_REGISTRY

_QWEN3_ARCHITECTURES

API

class nemo_automodel.components.speculative.dflash.registry.DFlashDraftSpec(
draft_cls: type[transformers.PreTrainedModel]
)
Dataclass

How to build a DFlash draft model for a particular target architecture.

draft_cls
type[PreTrainedModel]
nemo_automodel.components.speculative.dflash.registry.resolve_dflash_draft_spec(
architectures: list[str]
) -> nemo_automodel.components.speculative.dflash.registry.DFlashDraftSpec

Return the first registered DFlash draft spec matching any architecture in the list.

nemo_automodel.components.speculative.dflash.registry.DFLASH_DRAFT_REGISTRY: dict[str, DFlashDraftSpec] = {arch: (DFlashDraftSpec(draft_cls=Qwen3DFlashDraftModel)) for arch in _QWEN3_ARC...
nemo_automodel.components.speculative.dflash.registry._QWEN3_ARCHITECTURES: tuple[str, ...] = ('Qwen3ForCausalLM', 'Qwen3MoeForCausalLM')