nv_ingest_client.util.file_processing package#

Submodules#

nv_ingest_client.util.file_processing.extract module#

class nv_ingest_client.util.file_processing.extract.DocumentTypeEnum(value)[source]#

Bases: str, Enum

An enumeration.

bmp = 'bmp'#
docx = 'docx'#
html = 'html'#
jpeg = 'jpeg'#
md = 'md'#
mp3 = 'mp3'#
pdf = 'pdf'#
png = 'png'#
pptx = 'pptx'#
svg = 'svg'#
tiff = 'tiff'#
txt = 'text'#
wav = 'wav'#
nv_ingest_client.util.file_processing.extract.detect_encoding_and_read_text_file(file_stream: BytesIO) str[source]#

Detects encoding and reads a text file from a BytesIO object accordingly.

nv_ingest_client.util.file_processing.extract.extract_file_content(
path: str,
) Tuple[str, DocumentTypeEnum][source]#

Extracts content from a file, supporting different formats.

nv_ingest_client.util.file_processing.extract.get_or_infer_file_type(
file_path: str,
) DocumentTypeEnum[source]#

Determines the file type by inspecting its extension and optionally falls back to MIME type detection if the extension is not recognized.

Parameters:

file_path (str) – The path to the file.

Returns:

An enum value representing the detected file type.

Return type:

DocumentTypeEnum

Raises:

ValueError – If a valid extension is not found and MIME type detection cannot determine a valid type.

nv_ingest_client.util.file_processing.extract.serialize_to_base64(file_stream: BytesIO) str[source]#

Reads a PDF file from a BytesIO object and encodes it in base64.

Module contents#