nv_ingest_api.internal.extract.image.image_helpers package#

Submodules#

nv_ingest_api.internal.extract.image.image_helpers.common module#

nv_ingest_api.internal.extract.image.image_helpers.common.convert_svg_to_bitmap(image_stream: BytesIO) ndarray[source]#

Converts an SVG image from a bytestream to a bitmap format.

Parameters:

image_stream (io.BytesIO) – A bytestream of the SVG file.

Returns:

Preprocessed image as a numpy array in bitmap format.

Return type:

np.ndarray

nv_ingest_api.internal.extract.image.image_helpers.common.extract_page_element_images(
annotation_dict: Dict[str, List[List[float]]],
original_image: ndarray,
page_idx: int,
page_elements: List[Tuple[int, CroppedImageWithContent]],
) None[source]#

Handle the extraction of tables and charts from the inference results and run additional model inference.

Parameters:
  • annotation_dict (dict of {str : list of list of float}) – A dictionary containing detected objects and their bounding boxes. Keys should include “table” and “chart”, and each key’s value should be a list of bounding boxes, with each bounding box represented as a list of floats.

  • original_image (np.ndarray) – The original image from which objects were detected, expected to be in RGB format with shape (H, W, 3).

  • page_idx (int) – The index of the current page being processed.

  • page_elements (list of tuple of (int, CroppedImageWithContent)) – A list to which extracted tables and charts will be appended. Each item in the list is a tuple where the first element is the page index, and the second is an instance of CroppedImageWithContent representing a cropped image and associated metadata.

Return type:

None

Notes

This function iterates over detected objects labeled as “table” or “chart”. For each object, it crops the original image according to the bounding box coordinates, then creates an instance of CroppedImageWithContent containing the cropped image and metadata, and appends it to page_elements.

Examples

>>> annotation_dict = {"table": [[0.1, 0.1, 0.5, 0.5, 0.8]], "chart": [[0.6, 0.6, 0.9, 0.9, 0.9]]}
>>> original_image = np.random.rand(1536, 1536, 3)
>>> page_elements = []
>>> extract_page_element_images(annotation_dict, original_image, 0, page_elements)
>>> len(page_elements)
2
nv_ingest_api.internal.extract.image.image_helpers.common.extract_page_elements_from_images(
images: List[ndarray],
config: ImageConfigSchema,
trace_info: List | None = None,
) List[Tuple[int, object]][source]#

Detect and extract tables/charts from a list of NumPy images using YOLOX.

Parameters:
  • images (List[np.ndarray]) – List of images in NumPy array format.

  • config (ImageConfigSchema) – Configuration object containing YOLOX endpoints, auth token, etc.

  • trace_info (Optional[List], optional) – Optional tracing data for debugging/performance profiling.

Returns:

A list of (image_index, CroppedImageWithContent) representing extracted table/chart data from each image.

Return type:

List[Tuple[int, object]]

nv_ingest_api.internal.extract.image.image_helpers.common.load_and_preprocess_image(
image_stream: BytesIO,
) ndarray[source]#

Loads and preprocesses a JPEG, JPG, or PNG image from a bytestream.

Parameters:

image_stream (io.BytesIO) – A bytestream of the image file.

Returns:

Preprocessed image as a numpy array.

Return type:

np.ndarray

nv_ingest_api.internal.extract.image.image_helpers.common.unstructured_image_extractor(
*,
image_stream: IO[bytes],
extract_text: bool,
extract_images: bool,
extract_infographics: bool,
extract_tables: bool,
extract_charts: bool,
extraction_config: Dict[str, Any],
extraction_trace_log: Dict[str, Any] | None = None,
) List[Any][source]#

Extract primitives from an unstructured image bytestream.

This helper function processes an image bytestream according to the provided extraction configuration. It supports extraction of tables, charts, and infographics from the image. (Note: text and additional image extraction are not supported yet for raw images.)

Parameters:
  • image_stream (io.BytesIO) – A bytestream (e.g. io.BytesIO) containing the image file data.

  • image_stream – A bytestream for the image file.

  • document_type (str) – Specifies the type of the image document (‘png’, ‘jpeg’, ‘jpg’, ‘svg’, ‘tiff’, ‘bmp’).

  • extract_text (bool) – Flag specifying whether to extract text (currently not supported for raw images).

  • extract_images (bool) – Flag specifying whether to extract images (currently not supported for raw images).

  • extract_infographics (bool) – Flag specifying whether to extract infographics.

  • extract_tables (bool) – Flag specifying whether to extract tables.

  • extract_charts (bool) – Flag specifying whether to extract charts.

  • extraction_config (Dict[str, Any]) – A dictionary containing additional extraction parameters and configurations. Expected keys include “document_type”, “row_data”, “metadata_column”, and “image_extraction_config”.

  • extraction_trace_log (Optional[Dict[str, Any]], optional) – An optional dictionary containing trace information for logging or debugging, by default None.

Returns:

A list of extracted data items (e.g., metadata dictionaries) from the image.

Return type:

List[Any]

Raises:
  • ValueError – If the document type is unsupported.

  • Exception – If an error occurs during extraction.

Module contents#