nv_ingest.stages.nim package#

Submodules#

nv_ingest.stages.nim.chart_extraction module#

nv_ingest.stages.nim.chart_extraction.generate_chart_extractor_stage(
c: Config,
stage_config: Dict[str, Any],
task: str = 'chart_data_extract',
task_desc: str = 'chart_data_extraction',
pe_count: int = 1,
)[source]#

Generates a multiprocessing stage to perform chart data extraction from PDF content.

Parameters:
  • c (Config) – Morpheus global configuration object.

  • stage_config (Dict[str, Any]) – Configuration parameters for the chart content extractor, passed as a dictionary validated against the ChartExtractorSchema.

  • task (str, optional) – The task name for the stage worker function, defining the specific chart extraction process. Default is “chart_data_extract”.

  • task_desc (str, optional) – A descriptor used for latency tracing and logging during chart extraction. Default is “chart_data_extraction”.

  • pe_count (int, optional) – The number of process engines to use for chart data extraction. This value controls how many worker processes will run concurrently. Default is 1.

Returns:

A configured Morpheus stage with an applied worker function that handles chart data extraction from PDF content.

Return type:

MultiProcessingBaseStage

nv_ingest.stages.nim.infographic_extraction module#

nv_ingest.stages.nim.infographic_extraction.generate_infographic_extractor_stage(
c: Config,
stage_config: Dict[str, Any],
task: str = 'infographic_data_extract',
task_desc: str = 'infographic_data_extraction',
pe_count: int = 1,
)[source]#

Generates a multiprocessing stage to perform infographic data extraction from PDF content.

Parameters:
  • c (Config) – Morpheus global configuration object.

  • stage_config (Dict[str, Any]) – Configuration parameters for the infographic content extractor, passed as a dictionary validated against the TableExtractorSchema.

  • task (str, optional) – The task name for the stage worker function, defining the specific infographic extraction process. Default is “infographic_data_extract”.

  • task_desc (str, optional) – A descriptor used for latency tracing and logging during infographic extraction. Default is “infographic_data_extraction”.

  • pe_count (int, optional) – The number of process engines to use for infographic data extraction. This value controls how many worker processes will run concurrently. Default is 1.

Returns:

A configured Morpheus stage with an applied worker function that handles infographic data extraction from PDF content.

Return type:

MultiProcessingBaseStage

nv_ingest.stages.nim.table_extraction module#

nv_ingest.stages.nim.table_extraction.generate_table_extractor_stage(
c: Config,
stage_config: Dict[str, Any],
task: str = 'table_data_extract',
task_desc: str = 'table_data_extraction',
pe_count: int = 1,
)[source]#

Generates a multiprocessing stage to perform table data extraction from PDF content.

Parameters:
  • c (Config) – Morpheus global configuration object.

  • stage_config (Dict[str, Any]) – Configuration parameters for the table content extractor, passed as a dictionary validated against the TableExtractorSchema.

  • task (str, optional) – The task name for the stage worker function, defining the specific table extraction process. Default is “table_data_extract”.

  • task_desc (str, optional) – A descriptor used for latency tracing and logging during table extraction. Default is “table_data_extraction”.

  • pe_count (int, optional) – The number of process engines to use for table data extraction. This value controls how many worker processes will run concurrently. Default is 1.

Returns:

A configured Morpheus stage with an applied worker function that handles table data extraction from PDF content.

Return type:

MultiProcessingBaseStage

Module contents#