nv_ingest_client.util.file_processing package#
Submodules#
nv_ingest_client.util.file_processing.extract module#
- class nv_ingest_client.util.file_processing.extract.DocumentTypeEnum(value)[source]#
Bases:
str
,Enum
An enumeration.
- bmp = 'bmp'#
- docx = 'docx'#
- html = 'html'#
- jpeg = 'jpeg'#
- md = 'md'#
- mp3 = 'mp3'#
- pdf = 'pdf'#
- png = 'png'#
- pptx = 'pptx'#
- svg = 'svg'#
- tiff = 'tiff'#
- txt = 'text'#
- wav = 'wav'#
- nv_ingest_client.util.file_processing.extract.detect_encoding_and_read_text_file(file_stream: BytesIO) str [source]#
Detects encoding and reads a text file from a BytesIO object accordingly.
- nv_ingest_client.util.file_processing.extract.extract_file_content(
- path: str,
Extracts content from a file, supporting different formats.
- nv_ingest_client.util.file_processing.extract.get_or_infer_file_type(
- file_path: str,
Determines the file type by inspecting its extension and optionally falls back to MIME type detection if the extension is not recognized.
- Parameters:
file_path (str) – The path to the file.
- Returns:
An enum value representing the detected file type.
- Return type:
- Raises:
ValueError – If a valid extension is not found and MIME type detection cannot determine a valid type.