nv_ingest_api.util.converters package#

Submodules#

nv_ingest_api.util.converters.bytetools module#

nv_ingest_api.util.converters.bytetools.base64frombytes(bytes_input, encoding='utf-8')[source]#

Function to bytes to base64 string.

Parameters:

bytes_input (bytes) – Raw bytes of object.

Returns:

base64 encoded string to store bytes in cuDF.

Return type:

base64

nv_ingest_api.util.converters.bytetools.bytesfrombase64(base64_input)[source]#

Function to convert base64 encoded string to bytes.

Parameters:

base64_input (hex) – Base64 encoded string to store bytes in cuDF.

Returns:

Base64 encoded string converted to bytes.

Return type:

bytes

nv_ingest_api.util.converters.bytetools.bytesfromhex(hex_input)[source]#

Function to convert hex to bytes.

Parameters:

hex_input (hex) – Hex string to store bytes in cuDF.

Returns:

Hex encoded object converted to bytes.

Return type:

bytes

nv_ingest_api.util.converters.bytetools.hexfrombytes(bytes_input)[source]#

Function to bytes to hex string.

Parameters:

bytes_input (bytes) – Raw bytes of object.

Returns:

Hex string to store bytes in cuDF.

Return type:

hex

nv_ingest_api.util.converters.containers module#

nv_ingest_api.util.converters.containers.merge_dict(
defaults: Dict[str, Any],
overrides: Dict[str, Any],
) Dict[str, Any][source]#

Recursively merges two dictionaries, with values from the overrides dictionary taking precedence.

This function merges the overrides dictionary into the defaults dictionary. If a key in both dictionaries has a dictionary as its value, the function will recursively merge those dictionaries. Otherwise, the value from the overrides dictionary will overwrite the value in the defaults dictionary.

Parameters:
  • defaults (dict of {str: Any}) – The default dictionary that will be updated with values from the overrides dictionary.

  • overrides (dict of {str: Any}) – The dictionary containing values that will override or extend those in the defaults dictionary.

Returns:

dict of {str – The merged dictionary, with values from the overrides dictionary taking precedence.

Return type:

Any}

Examples

>>> defaults = {
...     "a": 1,
...     "b": {
...         "c": 3,
...         "d": 4
...     },
...     "e": 5
... }
>>> overrides = {
...     "b": {
...         "c": 30
...     },
...     "f": 6
... }
>>> result = merge_dict(defaults, overrides)
>>> result
{'a': 1, 'b': {'c': 30, 'd': 4}, 'e': 5, 'f': 6}

Notes

  • The merge_dict function modifies the defaults dictionary in place. If you need to preserve the original defaults dictionary, consider passing a copy instead.

  • This function is particularly useful when combining configuration dictionaries where certain settings should override defaults.

nv_ingest_api.util.converters.datetools module#

nv_ingest_api.util.converters.datetools.remove_tz(datetime_obj: datetime) datetime[source]#

Remove timezone and add offset to a datetime object.

Parameters:

datetime_obj (datetime.datetime) – A datetime object with or without the timezone attribute set.

Returns:

A datetime object with the timezone offset added and the timezone attribute removed.

Return type:

datetime.datetime

nv_ingest_api.util.converters.datetools.validate_iso8601(date_string: str) None[source]#

Verify that the given date string is in ISO 8601 format.

Parameters:

date_string (str) – A date string in human-readable format, ideally ISO 8601.

Raises:

ValueError – If the date string is not in a valid ISO 8601 format.

nv_ingest_api.util.converters.dftools module#

class nv_ingest_api.util.converters.dftools.MemoryFiles[source]#

Bases: object

open(fn, mode='rb')[source]#
nv_ingest_api.util.converters.dftools.cudf_to_json(
gdf: DataFrame,
deserialize_cols: list = [],
) str[source]#

Helper function to convert from cudf to json until apache/arrow#40412 is resolved.

Parameters:
  • gdf (cudf.DataFrame) – A cuDF dataframe.

  • nested_cols (list) – A list of columns containing nested data.

Returns:

A JSON formated string.

Return type:

str

nv_ingest_api.util.converters.dftools.cudf_to_pandas(
gdf: DataFrame,
deserialize_cols: list = [],
) DataFrame[source]#

Helper function to convert from cudf to pandas until apache/arrow#40412 is resolved.

Parameters:
  • gdf (cudf.DataFrame) – A cuDF dataframe.

  • nested_cols (list) – A list of columns containing nested data.

Returns:

A pandas dataframe.

Return type:

pd.DataFrame

nv_ingest_api.util.converters.dftools.pandas_to_cudf(
df: ~pandas.core.frame.DataFrame,
deserialize_cols: list = [],
default_cols: dict = {'document_type': <class 'str'>,
'metadata': <class 'str'>},
default_type: type = <class 'str'>,
) DataFrame[source]#

Helper function to convert from pandas to cudf until apache/arrow#40412 is resolved.

Parameters:

df (pd.DataFrame) – A pandas dataframe.

Returns:

A cuDF dataframe.

Return type:

cudf.DataFrame

nv_ingest_api.util.converters.formats module#

nv_ingest_api.util.converters.formats.ingest_json_results_to_blob(result_content)[source]#

Parse a JSON string or BytesIO object, combine and sort entries, and create a blob string.

Returns:

The generated blob string.

Return type:

str

nv_ingest_api.util.converters.type_mappings module#

nv_ingest_api.util.converters.type_mappings.doc_type_to_content_type(
doc_type: DocumentTypeEnum,
) ContentTypeEnum[source]#

Convert DocumentTypeEnum to ContentTypeEnum

Module contents#