nv_ingest.api.v2 package#
Submodules#
nv_ingest.api.v2.ingest module#
- async nv_ingest.api.v2.ingest.fetch_job_v2(
- job_id: str,
- ingest_service: Annotated[IngestServiceMeta, Depends(dependency=_get_ingest_service, use_cache=True, scope=None)],
V2 fetch that handles parent job aggregation.
- nv_ingest.api.v2.ingest.get_pdf_page_count(pdf_content: bytes) int[source]#
Get the number of pages in a PDF using pypdfium2.
- nv_ingest.api.v2.ingest.get_pdf_split_page_count(client_override: int | None = None) int[source]#
Resolve the page chunk size for PDF splitting with client override support.
Priority: client_override (clamped) > env var > default (32) Enforces boundaries: min=1, max=128
- nv_ingest.api.v2.ingest.get_qos_tier_for_page_count(page_count: int) str[source]#
Select QoS tier for a document based on its total page count. Tiers: ‘micro’, ‘small’, ‘medium’, ‘large’, ‘default’ Thresholds can be tuned via environment variables:
QOS_MAX_PAGES_MICRO (default: 4)
QOS_MAX_PAGES_SMALL (default: 16)
QOS_MAX_PAGES_MEDIUM (default: 64)
Anything above MEDIUM is ‘large’. Non-positive page_count returns ‘default’.
- nv_ingest.api.v2.ingest.split_pdf_to_chunks(
- pdf_content: bytes,
- pages_per_chunk: int,
Split a PDF into multi-page chunks using pypdfium2.
Returns a list of dictionaries containing the chunk bytes and page range metadata. Note: this currently buffers each chunk in-memory; consider streaming in future upgrades.
- async nv_ingest.api.v2.ingest.submit_job_v2(
- request: Request,
- response: Response,
- job_spec: MessageWrapper,
- ingest_service: Annotated[IngestServiceMeta, Depends(dependency=_get_ingest_service, use_cache=True, scope=None)],