nv_ingest_api.internal.store package#
Submodules#
nv_ingest_api.internal.store.embed_text_upload module#
- nv_ingest_api.internal.store.embed_text_upload.store_text_embeddings_internal(
- df_store_ledger: DataFrame,
- task_config: BaseModel | Dict[str, Any],
- store_config: EmbeddingStorageSchema,
- execution_trace_log: Dict[str, Any] | None = None,
Stores embeddings by uploading content from a DataFrame to MinIO.
This function prepares the necessary parameters for the upload based on the task configuration, invokes the upload routine, and returns the updated DataFrame.
- Parameters:
df_store_ledger (pd.DataFrame) – DataFrame containing the data whose embeddings need to be stored.
task_config (Union[BaseModel, Dict[str, Any]]) – Task configuration. If it is a Pydantic model, it will be converted to a dictionary.
store_config (Dict[str, Any]) – Configuration parameters for storage (not directly used in the current implementation).
execution_trace_log (Optional[Dict[str, Any]], default=None) – Optional dictionary for trace logging information.
- Returns:
The updated DataFrame after embeddings have been uploaded and metadata updated.
- Return type:
pd.DataFrame
- Raises:
Exception – If any error occurs during the storage process, it is logged and re-raised with additional context.
nv_ingest_api.internal.store.image_upload module#
- nv_ingest_api.internal.store.image_upload.store_images_to_minio_internal(
- df_storage_ledger: DataFrame,
- task_config: Dict[str, Any],
- storage_config: Dict[str, Any],
- execution_trace_log: List[Any] | None = None,
Processes a storage ledger DataFrame to upload images (and structured content) to MinIO.
This function validates the input DataFrame and task configuration, then creates a mask to select rows where the “document_type” is among the desired types specified in the configuration. If matching rows are found, it calls the internal upload function to process and update the DataFrame; otherwise, it returns the original DataFrame unmodified.
- Parameters:
df_storage_ledger (pd.DataFrame) – The DataFrame containing storage ledger information, which must include at least the columns “document_type” and “metadata”.
task_config (Dict[str, Any]) – A flat dictionary containing configuration parameters for image storage. Expected to include the key “content_types” (a dict mapping document types to booleans) along with connection and credential details.
storage_config (Dict[str, Any]) – A dictionary reserved for additional storage configuration (currently unused).
execution_trace_log (Optional[List[Any]], optional) – An optional list for capturing execution trace details (currently unused), by default None.
- Returns:
The updated DataFrame after attempting to upload images for rows with matching document types. Rows that do not match remain unchanged.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the input DataFrame is missing required columns or if the task configuration is invalid.