nv_ingest_api.util.converters package#
Submodules#
nv_ingest_api.util.converters.bytetools module#
- nv_ingest_api.util.converters.bytetools.base64frombytes(bytes_input, encoding='utf-8')[source]#
Function to bytes to base64 string.
- Parameters:
bytes_input (bytes) – Raw bytes of object.
- Returns:
base64 encoded string to store bytes in cuDF.
- Return type:
base64
- nv_ingest_api.util.converters.bytetools.bytesfrombase64(base64_input)[source]#
Function to convert base64 encoded string to bytes.
- Parameters:
base64_input (hex) – Base64 encoded string to store bytes in cuDF.
- Returns:
Base64 encoded string converted to bytes.
- Return type:
bytes
nv_ingest_api.util.converters.containers module#
- nv_ingest_api.util.converters.containers.merge_dict(
- defaults: Dict[str, Any],
- overrides: Dict[str, Any],
Recursively merges two dictionaries, with values from the overrides dictionary taking precedence.
This function merges the overrides dictionary into the defaults dictionary. If a key in both dictionaries has a dictionary as its value, the function will recursively merge those dictionaries. Otherwise, the value from the overrides dictionary will overwrite the value in the defaults dictionary.
- Parameters:
defaults (dict of {str: Any}) – The default dictionary that will be updated with values from the overrides dictionary.
overrides (dict of {str: Any}) – The dictionary containing values that will override or extend those in the defaults dictionary.
- Returns:
dict of {str – The merged dictionary, with values from the overrides dictionary taking precedence.
- Return type:
Any}
Examples
>>> defaults = { ... "a": 1, ... "b": { ... "c": 3, ... "d": 4 ... }, ... "e": 5 ... } >>> overrides = { ... "b": { ... "c": 30 ... }, ... "f": 6 ... } >>> result = merge_dict(defaults, overrides) >>> result {'a': 1, 'b': {'c': 30, 'd': 4}, 'e': 5, 'f': 6}
Notes
The merge_dict function modifies the defaults dictionary in place. If you need to preserve the original defaults dictionary, consider passing a copy instead.
This function is particularly useful when combining configuration dictionaries where certain settings should override defaults.
nv_ingest_api.util.converters.datetools module#
- nv_ingest_api.util.converters.datetools.remove_tz(datetime_obj: datetime) datetime [source]#
Remove timezone and add offset to a datetime object.
- Parameters:
datetime_obj (datetime.datetime) – A datetime object with or without the timezone attribute set.
- Returns:
A datetime object with the timezone offset added and the timezone attribute removed.
- Return type:
datetime.datetime
- nv_ingest_api.util.converters.datetools.validate_iso8601(date_string: str) None [source]#
Verify that the given date string is in ISO 8601 format.
- Parameters:
date_string (str) – A date string in human-readable format, ideally ISO 8601.
- Raises:
ValueError – If the date string is not in a valid ISO 8601 format.
nv_ingest_api.util.converters.dftools module#
- nv_ingest_api.util.converters.dftools.cudf_to_json(
- gdf: DataFrame,
- deserialize_cols: list = [],
Helper function to convert from cudf to json until apache/arrow#40412 is resolved.
- Parameters:
gdf (cudf.DataFrame) – A cuDF dataframe.
nested_cols (list) – A list of columns containing nested data.
- Returns:
A JSON formated string.
- Return type:
str
- nv_ingest_api.util.converters.dftools.cudf_to_pandas(
- gdf: DataFrame,
- deserialize_cols: list = [],
Helper function to convert from cudf to pandas until apache/arrow#40412 is resolved.
- Parameters:
gdf (cudf.DataFrame) – A cuDF dataframe.
nested_cols (list) – A list of columns containing nested data.
- Returns:
A pandas dataframe.
- Return type:
pd.DataFrame
- nv_ingest_api.util.converters.dftools.pandas_to_cudf(
- df: ~pandas.core.frame.DataFrame,
- deserialize_cols: list = [],
- default_cols: dict = {'document_type': <class 'str'>,
- 'metadata': <class 'str'>},
- default_type: type = <class 'str'>,
Helper function to convert from pandas to cudf until apache/arrow#40412 is resolved.
- Parameters:
df (pd.DataFrame) – A pandas dataframe.
- Returns:
A cuDF dataframe.
- Return type:
cudf.DataFrame
nv_ingest_api.util.converters.formats module#
nv_ingest_api.util.converters.type_mappings module#
- nv_ingest_api.util.converters.type_mappings.doc_type_to_content_type(
- doc_type: DocumentTypeEnum,
Convert DocumentTypeEnum to ContentTypeEnum