nv_ingest.api.v2 package#

Submodules#

nv_ingest.api.v2.ingest module#

async nv_ingest.api.v2.ingest.fetch_job_v2(
job_id: str,
ingest_service: Annotated[IngestServiceMeta, Depends(dependency=_get_ingest_service, use_cache=True, scope=None)],
)[source]#

V2 fetch that handles parent job aggregation.

nv_ingest.api.v2.ingest.get_pdf_page_count(pdf_content: bytes) int[source]#

Get the number of pages in a PDF using pypdfium2.

nv_ingest.api.v2.ingest.get_pdf_split_page_count(client_override: int | None = None) int[source]#

Resolve the page chunk size for PDF splitting with client override support.

Priority: client_override (clamped) > env var > default (32) Enforces boundaries: min=1, max=128

nv_ingest.api.v2.ingest.get_qos_tier_for_page_count(page_count: int) str[source]#

Select QoS tier for a document based on its total page count. Tiers: ‘micro’, ‘small’, ‘medium’, ‘large’, ‘default’ Thresholds can be tuned via environment variables:

  • QOS_MAX_PAGES_MICRO (default: 4)

  • QOS_MAX_PAGES_SMALL (default: 16)

  • QOS_MAX_PAGES_MEDIUM (default: 64)

Anything above MEDIUM is ‘large’. Non-positive page_count returns ‘default’.

nv_ingest.api.v2.ingest.split_pdf_to_chunks(
pdf_content: bytes,
pages_per_chunk: int,
) List[Dict[str, Any]][source]#

Split a PDF into multi-page chunks using pypdfium2.

Returns a list of dictionaries containing the chunk bytes and page range metadata. Note: this currently buffers each chunk in-memory; consider streaming in future upgrades.

async nv_ingest.api.v2.ingest.submit_job_v2(
request: Request,
response: Response,
job_spec: MessageWrapper,
ingest_service: Annotated[IngestServiceMeta, Depends(dependency=_get_ingest_service, use_cache=True, scope=None)],
)[source]#

Module contents#