nemo_curator.stages.math.modifiers.llm_cleanup
nemo_curator.stages.math.modifiers.llm_cleanup
Module Contents
Classes
API
Bases: ProcessingStage[DocumentBatch, DocumentBatch]
LLM-based text cleanup stage using vLLM for distributed inference.
This stage uses a VLLMModel wrapper to generate cleaned text from input prompts. It handles filtering, sorting, prompt formatting, and output field management.
_model_kwargs
model_name
name
resources
Create and initialize the VLLMModel.
Load tokenizer per worker. Falls back to full init if setup_on_node was not called.
Download weights and initialize vLLM once per node to avoid torch.compile race conditions.