nv_ingest_client.util.file_processing package#

Submodules#

nv_ingest_client.util.file_processing.extract module#

nv_ingest_client.util.file_processing.extract.detect_encoding_and_read_text_file(file_stream: BytesIO) str[source]#

Detects encoding and reads a text file from a BytesIO object accordingly.

nv_ingest_client.util.file_processing.extract.extract_file_content(
path: str,
) Tuple[str, DocumentTypeEnum][source]#

Extracts content from a file, supporting different formats.

nv_ingest_client.util.file_processing.extract.get_or_infer_file_type(
file_path: str,
) DocumentTypeEnum[source]#

Determines the file type by inspecting its extension and optionally falls back to MIME type detection if the extension is not recognized.

Parameters:

file_path (str) – The path to the file.

Returns:

An enum value representing the detected file type.

Return type:

DocumentTypeEnum

Raises:

ValueError – If a valid extension is not found and MIME type detection cannot determine a valid type.

nv_ingest_client.util.file_processing.extract.serialize_to_base64(file_stream: BytesIO) str[source]#

Reads a PDF file from a BytesIO object and encodes it in base64.

Module contents#