nv_ingest.api.v2 package#

Submodules#

nv_ingest.api.v2.ingest module#

async nv_ingest.api.v2.ingest.fetch_job_v2( job_id: str, ingest_service: Annotated[IngestServiceMeta, Depends(dependency=_get_ingest_service, use_cache=True, scope=None)], )[source]#: V2 fetch that handles parent job aggregation.

nv_ingest.api.v2.ingest.get_pdf_page_count(pdf_content: bytes) → int[source]#: Get the number of pages in a PDF using pypdfium2.

nv_ingest.api.v2.ingest.get_pdf_split_page_count(client_override: int | None = None) → int[source]#

Resolve the page chunk size for PDF splitting with client override support.

Priority: client_override (clamped) > env var > default (32) Enforces boundaries: min=1, max=128

nv_ingest.api.v2.ingest.get_qos_tier_for_page_count(page_count: int) → str[source]#

Select QoS tier for a document based on its total page count. Tiers: ‘micro’, ‘small’, ‘medium’, ‘large’, ‘default’ Thresholds can be tuned via environment variables:

QOS_MAX_PAGES_MICRO (default: 4)

QOS_MAX_PAGES_SMALL (default: 16)

QOS_MAX_PAGES_MEDIUM (default: 64)

Anything above MEDIUM is ‘large’. Non-positive page_count returns ‘default’.

nv_ingest.api.v2.ingest.split_pdf_to_chunks( pdf_content: bytes, pages_per_chunk: int, ) → List[Dict[str, Any]][source]#

Split a PDF into multi-page chunks using pypdfium2.

Returns a list of dictionaries containing the chunk bytes and page range metadata. Note: this currently buffers each chunk in-memory; consider streaming in future upgrades.

async nv_ingest.api.v2.ingest.submit_job_v2( request: Request, response: Response, job_spec: MessageWrapper, ingest_service: Annotated[IngestServiceMeta, Depends(dependency=_get_ingest_service, use_cache=True, scope=None)], )[source]#

nv_ingest.api.v2 package#

Submodules#

nv_ingest.api.v2.ingest module#

Module contents#