nemo_curator.stages.interleaved.pdf.nemotron_parse.inference

View as Markdown

GPU inference stage for Nemotron-Parse.

Module Contents

Classes

NameDescription
NemotronParseInferenceStageGPU stage: run Nemotron-Parse inference on pre-rendered page images.

Functions

NameDescription
build_task_promptBuild the Nemotron-Parse task prompt with the appropriate text-in-pic token.

Data

DEFAULT_MODEL_PATH

PROMPT_BASE

API

class nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.NemotronParseInferenceStage(
model_path: str = DEFAULT_MODEL_PATH,
text_in_pic: bool = False,
task_prompt: str | None = None,
backend: str = 'vllm',
inference_batch_size: int = 4,
max_num_seqs: int = 64,
enforce_eager: bool = False,
name: str = 'nemotron_parse_inference',
resources: nemo_curator.stages.resources.Resources = (lambda: Resources(cpus=4.0...
)
Dataclass

Bases: ProcessingStage[InterleavedBatch, InterleavedBatch]

GPU stage: run Nemotron-Parse inference on pre-rendered page images.

Reads PNG page images from binary_content, runs model inference, and writes raw Nemotron-Parse output into text_content.

Supports two inference backends:

  • "vllm" (recommended): vLLM offline mode with continuous batching. Batching is handled internally by vLLM via max_num_seqs.
  • "hf": HuggingFace Transformers with manual micro-batching via inference_batch_size.

Parameters

model_path HuggingFace model ID or local path (e.g. nvidia/NVIDIA-Nemotron-Parse-v1.2). text_in_pic Whether to predict text inside pictures. When True, uses the <predict_text_in_pic> prompt token; when False (default), uses <predict_no_text_in_pic>. Only applies to Nemotron-Parse v1.2+. task_prompt Override the full prompt string. When set, text_in_pic is ignored. backend Inference backend: "vllm" or "hf". inference_batch_size Pages per GPU forward pass (HF backend only). max_num_seqs Maximum concurrent sequences (vLLM backend only).

backend
str = 'vllm'
enforce_eager
bool = False
inference_batch_size
int = 4
max_num_seqs
int = 64
model_path
str = DEFAULT_MODEL_PATH
name
str = 'nemotron_parse_inference'
resources
Resources
task_prompt
str | None = None
text_in_pic
bool = False
nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.NemotronParseInferenceStage.__post_init__() -> None
nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.NemotronParseInferenceStage._infer_batch_hf(
images: list[PIL.Image.Image]
) -> list[str]
nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.NemotronParseInferenceStage._infer_hf(
images: list[PIL.Image.Image]
) -> list[str]
nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.NemotronParseInferenceStage._infer_hf_single_fallback(
images: list[PIL.Image.Image]
) -> list[str]

Process each image individually when batch inference fails.

nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.NemotronParseInferenceStage._infer_vllm(
images: list[PIL.Image.Image]
) -> list[str]
nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.NemotronParseInferenceStage._initialize_model() -> None
nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.NemotronParseInferenceStage._reset_vllm() -> None

Teardown and reinit vLLM engine (mirrors Cosmos Curate’s _reset pattern).

nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.NemotronParseInferenceStage._setup_hf() -> None
nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.NemotronParseInferenceStage._setup_vllm() -> None
nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.NemotronParseInferenceStage.inputs() -> tuple[list[str], list[str]]
nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.NemotronParseInferenceStage.outputs() -> tuple[list[str], list[str]]
nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.NemotronParseInferenceStage.process(
task: nemo_curator.tasks.InterleavedBatch
) -> nemo_curator.tasks.InterleavedBatch | None
nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.NemotronParseInferenceStage.setup(
worker_metadata: dict | None = None
) -> None
nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.NemotronParseInferenceStage.setup_on_node(
node_info: dict | None = None,
worker_metadata: dict | None = None
) -> None

Initialize model once per node (serially) to avoid torch.compile race conditions.

nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.NemotronParseInferenceStage.teardown() -> None
nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.build_task_prompt(
text_in_pic: bool = False
) -> str

Build the Nemotron-Parse task prompt with the appropriate text-in-pic token.

nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.DEFAULT_MODEL_PATH = 'nvidia/NVIDIA-Nemotron-Parse-v1.2'
nemo_curator.stages.interleaved.pdf.nemotron_parse.inference.PROMPT_BASE = '</s><s><predict_bbox><predict_classes><output_markdown>'