nv_ingest_api.internal.store package#

Submodules#

nv_ingest_api.internal.store.embed_text_upload module#

nv_ingest_api.internal.store.embed_text_upload.store_text_embeddings_internal(
df_store_ledger: DataFrame,
task_config: BaseModel | Dict[str, Any],
store_config: EmbeddingStorageSchema,
execution_trace_log: Dict[str, Any] | None = None,
) DataFrame[source]#

Stores embeddings by uploading content from a DataFrame to MinIO.

This function prepares the necessary parameters for the upload based on the task configuration, invokes the upload routine, and returns the updated DataFrame.

Parameters:
  • df_store_ledger (pd.DataFrame) – DataFrame containing the data whose embeddings need to be stored.

  • task_config (Union[BaseModel, Dict[str, Any]]) – Task configuration. If it is a Pydantic model, it will be converted to a dictionary.

  • store_config (Dict[str, Any]) – Configuration parameters for storage (not directly used in the current implementation).

  • execution_trace_log (Optional[Dict[str, Any]], default=None) – Optional dictionary for trace logging information.

Returns:

The updated DataFrame after embeddings have been uploaded and metadata updated.

Return type:

pd.DataFrame

Raises:

Exception – If any error occurs during the storage process, it is logged and re-raised with additional context.

nv_ingest_api.internal.store.image_upload module#

nv_ingest_api.internal.store.image_upload.store_images_to_minio_internal(
df_storage_ledger: DataFrame,
task_config: Dict[str, Any],
storage_config: Dict[str, Any],
execution_trace_log: List[Any] | None = None,
) DataFrame[source]#

Processes a storage ledger DataFrame to upload images (and structured content) to MinIO.

This function validates the input DataFrame and task configuration, then creates a mask to select rows where the “document_type” is among the desired types specified in the configuration. If matching rows are found, it calls the internal upload function to process and update the DataFrame; otherwise, it returns the original DataFrame unmodified.

Parameters:
  • df_storage_ledger (pd.DataFrame) – The DataFrame containing storage ledger information, which must include at least the columns “document_type” and “metadata”.

  • task_config (Dict[str, Any]) – A flat dictionary containing configuration parameters for image storage. Expected to include the key “content_types” (a dict mapping document types to booleans) along with connection and credential details.

  • storage_config (Dict[str, Any]) – A dictionary reserved for additional storage configuration (currently unused).

  • execution_trace_log (Optional[List[Any]], optional) – An optional list for capturing execution trace details (currently unused), by default None.

Returns:

The updated DataFrame after attempting to upload images for rows with matching document types. Rows that do not match remain unchanged.

Return type:

pd.DataFrame

Raises:

ValueError – If the input DataFrame is missing required columns or if the task configuration is invalid.

Module contents#