nv_ingest.stages.embeddings package#

Submodules#

nv_ingest.stages.embeddings.text_embeddings module#

nv_ingest.stages.embeddings.text_embeddings.generate_text_embed_extractor_stage(
c: Any,
stage_config: Dict[str, Any],
task: str = 'embed',
task_desc: str = 'text_embed_extraction',
pe_count: int = 1,
)[source]#

Generates a multiprocessing stage to perform text embedding extraction from a pandas DataFrame.

Parameters:
  • c (Config) – Global configuration object.

  • stage_config (Dict[str, Any]) – Configuration parameters for the text embedding extractor, validated against EmbedExtractionsSchema.

  • task (str, optional) – The task name for the stage worker function (default: “embed”).

  • task_desc (str, optional) – A descriptor used for latency tracing and logging (default: “text_embed_extraction”).

  • pe_count (int, optional) – Number of process engines to use concurrently (default: 1).

Returns:

A configured stage with a worker function that takes a pandas DataFrame, enriches it with embeddings, and returns a tuple of (pandas DataFrame, trace_info dict).

Return type:

MultiProcessingBaseStage

Module contents#