nv_ingest.stages.embeddings package#
Submodules#
nv_ingest.stages.embeddings.text_embeddings module#
- nv_ingest.stages.embeddings.text_embeddings.generate_text_embed_extractor_stage(
- c: Any,
- stage_config: Dict[str, Any],
- task: str = 'embed',
- task_desc: str = 'text_embed_extraction',
- pe_count: int = 1,
Generates a multiprocessing stage to perform text embedding extraction from a pandas DataFrame.
- Parameters:
c (Config) – Global configuration object.
stage_config (Dict[str, Any]) – Configuration parameters for the text embedding extractor, validated against EmbedExtractionsSchema.
task (str, optional) – The task name for the stage worker function (default: “embed”).
task_desc (str, optional) – A descriptor used for latency tracing and logging (default: “text_embed_extraction”).
pe_count (int, optional) – Number of process engines to use concurrently (default: 1).
- Returns:
A configured stage with a worker function that takes a pandas DataFrame, enriches it with embeddings, and returns a tuple of (pandas DataFrame, trace_info dict).
- Return type: