nv_ingest_api.internal.extract.pptx.engines package#
Submodules#
nv_ingest_api.internal.extract.pptx.engines.pptx_helper module#
- nv_ingest_api.internal.extract.pptx.engines.pptx_helper.format_text(
- text: str,
- bold: bool = False,
- italic: bool = False,
- underline: bool = False,
- nv_ingest_api.internal.extract.pptx.engines.pptx_helper.get_bbox(
- presentation_object: Presentation | None = None,
- shape_object: Slide | None = None,
- text_depth: TextTypeEnum | None = None,
- nv_ingest_api.internal.extract.pptx.engines.pptx_helper.process_shape(
- shape,
- shape_idx,
- slide_idx,
- slide_count,
- pending_images,
- page_nearby_blocks,
- source_metadata,
- base_unified_metadata,
- Recursively process a shape:
If the shape is a group, iterate over its child shapes.
If it is a picture or a placeholder with an embedded image, append it to pending_images.
- nv_ingest_api.internal.extract.pptx.engines.pptx_helper.python_pptx(
- *,
- pptx_stream: IO,
- extract_text: bool,
- extract_images: bool,
- extract_infographics: bool,
- extract_tables: bool,
- extract_charts: bool,
- extraction_config: dict,
- execution_trace_log: List | None = None,
Uses python-pptx to extract text from a PPTX bytestream, while deferring image classification into tables/charts if requested.