nv_ingest.stages.nim package#
Submodules#
nv_ingest.stages.nim.chart_extraction module#
- nv_ingest.stages.nim.chart_extraction.generate_chart_extractor_stage(
- c: Config,
- stage_config: Dict[str, Any],
- task: str = 'chart_data_extract',
- task_desc: str = 'chart_data_extraction',
- pe_count: int = 1,
Generates a multiprocessing stage to perform chart data extraction from PDF content.
- Parameters:
c (Config) – Morpheus global configuration object.
stage_config (Dict[str, Any]) – Configuration parameters for the chart content extractor, passed as a dictionary validated against the ChartExtractorSchema.
task (str, optional) – The task name for the stage worker function, defining the specific chart extraction process. Default is “chart_data_extract”.
task_desc (str, optional) – A descriptor used for latency tracing and logging during chart extraction. Default is “chart_data_extraction”.
pe_count (int, optional) – The number of process engines to use for chart data extraction. This value controls how many worker processes will run concurrently. Default is 1.
- Returns:
A configured Morpheus stage with an applied worker function that handles chart data extraction from PDF content.
- Return type:
nv_ingest.stages.nim.infographic_extraction module#
- nv_ingest.stages.nim.infographic_extraction.generate_infographic_extractor_stage(
- c: Config,
- stage_config: Dict[str, Any],
- task: str = 'infographic_data_extract',
- task_desc: str = 'infographic_data_extraction',
- pe_count: int = 1,
Generates a multiprocessing stage to perform infographic data extraction from PDF content.
- Parameters:
c (Config) – Morpheus global configuration object.
stage_config (Dict[str, Any]) – Configuration parameters for the infographic content extractor, passed as a dictionary validated against the TableExtractorSchema.
task (str, optional) – The task name for the stage worker function, defining the specific infographic extraction process. Default is “infographic_data_extract”.
task_desc (str, optional) – A descriptor used for latency tracing and logging during infographic extraction. Default is “infographic_data_extraction”.
pe_count (int, optional) – The number of process engines to use for infographic data extraction. This value controls how many worker processes will run concurrently. Default is 1.
- Returns:
A configured Morpheus stage with an applied worker function that handles infographic data extraction from PDF content.
- Return type:
nv_ingest.stages.nim.table_extraction module#
- nv_ingest.stages.nim.table_extraction.generate_table_extractor_stage(
- c: Config,
- stage_config: Dict[str, Any],
- task: str = 'table_data_extract',
- task_desc: str = 'table_data_extraction',
- pe_count: int = 1,
Generates a multiprocessing stage to perform table data extraction from PDF content.
- Parameters:
c (Config) – Morpheus global configuration object.
stage_config (Dict[str, Any]) – Configuration parameters for the table content extractor, passed as a dictionary validated against the TableExtractorSchema.
task (str, optional) – The task name for the stage worker function, defining the specific table extraction process. Default is “table_data_extract”.
task_desc (str, optional) – A descriptor used for latency tracing and logging during table extraction. Default is “table_data_extraction”.
pe_count (int, optional) – The number of process engines to use for table data extraction. This value controls how many worker processes will run concurrently. Default is 1.
- Returns:
A configured Morpheus stage with an applied worker function that handles table data extraction from PDF content.
- Return type: