nv_ingest_client.primitives.jobs package#
Submodules#
nv_ingest_client.primitives.jobs.job_spec module#
- class nv_ingest_client.primitives.jobs.job_spec.BatchJobSpec(
- job_specs_or_files: List[JobSpec] | List[str] | None = None,
Bases:
object
A class used to represent a batch of job specifications (JobSpecs).
This class allows for batch processing of multiple jobs, either from a list of JobSpec instances or from file paths. It provides methods for adding job specifications, associating tasks with those specifications, and serializing the batch to a dictionary format.
- _file_type_to_job_spec#
A dictionary that maps document types to a list of JobSpec instances.
- Type:
defaultdict
- add_job_spec(
- job_spec: JobSpec,
Adds a JobSpec to the batch.
- Parameters:
job_spec (JobSpec) – The job specification to add.
- add_task(task, document_type=None)[source]#
Adds a task to the relevant job specifications in the batch.
If a document_type is provided, the task will be added to all job specifications matching that document type. If no document_type is provided, the task will be added to all job specifications in the batch.
- Parameters:
task (Task) – The task to add. Must derive from the nv_ingest_client.primitives.Task class.
document_type (str, optional) – The document type used to filter job specifications. If not provided, the document_type is inferred from the task, or the task is applied to all job specifications.
- Raises:
ValueError – If the task does not derive from the Task class.
- property file_types: List[str]#
Returns the list of unique file types present in the batch.
This property retrieves the document types currently stored in the batch’s job specifications.
- Returns:
A list of document types for the jobs in the batch.
- Return type:
List[str]
- classmethod from_dataset(dataset: str, shuffle_dataset: bool = True)[source]#
Class method to create a BatchJobSpec instance from a dataset.
- Parameters:
dataset (str) – The path to the dataset file.
shuffle_dataset (bool, optional) – Whether to shuffle the dataset files before adding them to the batch, by default True.
- Returns:
A new instance of BatchJobSpec initialized with the dataset files.
- Return type:
- from_files(files: str | List[str]) None [source]#
Initializes the batch by generating job specifications from file paths.
- Parameters:
files (Union[str, List[str]]) – A single file path or a list of file paths to create job specifications from.
- property job_specs: Dict[str, List[str]]#
A property that returns a dictionary of job specs categorized by document type.
- Returns:
A dictionary mapping document types to job specifications.
- Return type:
Dict[str, List[str]]
- property tasks: Dict[str, List[Task]]#
Adds a task to the relevant job specifications in the batch.
If a document_type is provided, the task will be added to all job specifications matching that document type. If no document_type is provided, the task will be added to all job specifications in the batch.
- Parameters:
task (Task) – The task to add. Must derive from the nv_ingest_client.primitives.Task class.
document_type (str, optional) – The document type used to filter job specifications. If not provided, the document_type is inferred from the task, or the task is applied to all job specifications.
- class nv_ingest_client.primitives.jobs.job_spec.JobSpec(
- payload: str | None = None,
- tasks: List | None = None,
- source_id: str | None = None,
- source_name: str | None = None,
- document_type: str | None = None,
- extended_options: Dict | None = None,
Bases:
object
Specification for creating a job for submission to the nv-ingest microservice.
- Parameters:
payload (Dict) – The payload data for the job.
tasks (Optional[List], optional) – A list of tasks to be added to the job, by default None.
source_id (Optional[str], optional) – An identifier for the source of the job, by default None.
job_id (Optional[UUID], optional) – A unique identifier for the job, by default a new UUID is generated.
extended_options (Optional[Dict], optional) – Additional options for job processing, by default None.
- _payload#
Storage for the payload data.
- Type:
Dict
- _tasks#
Storage for the list of tasks.
- Type:
List
- _source_id#
Storage for the source identifier.
- Type:
str
- _job_id#
Storage for the job’s unique identifier.
- Type:
UUID
- _extended_options#
Storage for the additional options.
- Type:
Dict
- add_task(task):
Adds a task to the job specification.
- add_task(task) None [source]#
Adds a task to the job specification.
- Parameters:
task – The task to add to the job specification. Assumes the task has a to_dict method.
- Raises:
ValueError – If the task does not have a to_dict method.
- property document_type: str#
- property job_id: UUID#
- property payload: Dict#
- property source_id: str#
- property source_name: str#
nv_ingest_client.primitives.jobs.job_state module#
- class nv_ingest_client.primitives.jobs.job_state.JobState(
- job_spec: JobSpec,
- state: JobStateEnum = JobStateEnum.PENDING,
- future: Future | None = None,
- response: Dict | None = None,
- response_channel: str | None = None,
- trace_id: str | None = None,
Bases:
object
Encapsulates the state information for a job managed by the NvIngestClient.
- state#
The current state of the job.
- Type:
str
- future#
The future object associated with the job’s asynchronous operation.
- Type:
Future, optional
- response#
The response data received for the job.
- Type:
Dict, optional
- response_channel#
The channel through which responses for the job are received.
- Type:
str, optional
- __init__(self, job_id: str, state: str, future: Optional[Future] = None,
response: Optional[Dict] = None, response_channel: Optional[str] = None)
Initializes a new instance of JobState.
- property future: Future | None#
Gets the future object associated with the job’s asynchronous operation.
- property job_id: UUID | str#
Gets the job’s unique identifier.
- property response: Dict | None#
Gets the response data received for the job.
- property state: JobStateEnum#
Gets the current state of the job.
- property trace_id: str | None#
Gets the trace_id from the job submission
Module contents#
- class nv_ingest_client.primitives.jobs.BatchJobSpec(
- job_specs_or_files: List[JobSpec] | List[str] | None = None,
Bases:
object
A class used to represent a batch of job specifications (JobSpecs).
This class allows for batch processing of multiple jobs, either from a list of JobSpec instances or from file paths. It provides methods for adding job specifications, associating tasks with those specifications, and serializing the batch to a dictionary format.
- _file_type_to_job_spec#
A dictionary that maps document types to a list of JobSpec instances.
- Type:
defaultdict
- add_job_spec(
- job_spec: JobSpec,
Adds a JobSpec to the batch.
- Parameters:
job_spec (JobSpec) – The job specification to add.
- add_task(task, document_type=None)[source]#
Adds a task to the relevant job specifications in the batch.
If a document_type is provided, the task will be added to all job specifications matching that document type. If no document_type is provided, the task will be added to all job specifications in the batch.
- Parameters:
task (Task) – The task to add. Must derive from the nv_ingest_client.primitives.Task class.
document_type (str, optional) – The document type used to filter job specifications. If not provided, the document_type is inferred from the task, or the task is applied to all job specifications.
- Raises:
ValueError – If the task does not derive from the Task class.
- property file_types: List[str]#
Returns the list of unique file types present in the batch.
This property retrieves the document types currently stored in the batch’s job specifications.
- Returns:
A list of document types for the jobs in the batch.
- Return type:
List[str]
- classmethod from_dataset(dataset: str, shuffle_dataset: bool = True)[source]#
Class method to create a BatchJobSpec instance from a dataset.
- Parameters:
dataset (str) – The path to the dataset file.
shuffle_dataset (bool, optional) – Whether to shuffle the dataset files before adding them to the batch, by default True.
- Returns:
A new instance of BatchJobSpec initialized with the dataset files.
- Return type:
- from_files(files: str | List[str]) None [source]#
Initializes the batch by generating job specifications from file paths.
- Parameters:
files (Union[str, List[str]]) – A single file path or a list of file paths to create job specifications from.
- property job_specs: Dict[str, List[str]]#
A property that returns a dictionary of job specs categorized by document type.
- Returns:
A dictionary mapping document types to job specifications.
- Return type:
Dict[str, List[str]]
- property tasks: Dict[str, List[Task]]#
Adds a task to the relevant job specifications in the batch.
If a document_type is provided, the task will be added to all job specifications matching that document type. If no document_type is provided, the task will be added to all job specifications in the batch.
- Parameters:
task (Task) – The task to add. Must derive from the nv_ingest_client.primitives.Task class.
document_type (str, optional) – The document type used to filter job specifications. If not provided, the document_type is inferred from the task, or the task is applied to all job specifications.
- class nv_ingest_client.primitives.jobs.JobSpec(
- payload: str | None = None,
- tasks: List | None = None,
- source_id: str | None = None,
- source_name: str | None = None,
- document_type: str | None = None,
- extended_options: Dict | None = None,
Bases:
object
Specification for creating a job for submission to the nv-ingest microservice.
- Parameters:
payload (Dict) – The payload data for the job.
tasks (Optional[List], optional) – A list of tasks to be added to the job, by default None.
source_id (Optional[str], optional) – An identifier for the source of the job, by default None.
job_id (Optional[UUID], optional) – A unique identifier for the job, by default a new UUID is generated.
extended_options (Optional[Dict], optional) – Additional options for job processing, by default None.
- _payload#
Storage for the payload data.
- Type:
Dict
- _tasks#
Storage for the list of tasks.
- Type:
List
- _source_id#
Storage for the source identifier.
- Type:
str
- _job_id#
Storage for the job’s unique identifier.
- Type:
UUID
- _extended_options#
Storage for the additional options.
- Type:
Dict
- add_task(task):
Adds a task to the job specification.
- add_task(task) None [source]#
Adds a task to the job specification.
- Parameters:
task – The task to add to the job specification. Assumes the task has a to_dict method.
- Raises:
ValueError – If the task does not have a to_dict method.
- property document_type: str#
- property job_id: UUID#
- property payload: Dict#
- property source_id: str#
- property source_name: str#
- class nv_ingest_client.primitives.jobs.JobState(
- job_spec: JobSpec,
- state: JobStateEnum = JobStateEnum.PENDING,
- future: Future | None = None,
- response: Dict | None = None,
- response_channel: str | None = None,
- trace_id: str | None = None,
Bases:
object
Encapsulates the state information for a job managed by the NvIngestClient.
- state#
The current state of the job.
- Type:
str
- future#
The future object associated with the job’s asynchronous operation.
- Type:
Future, optional
- response#
The response data received for the job.
- Type:
Dict, optional
- response_channel#
The channel through which responses for the job are received.
- Type:
str, optional
- __init__(self, job_id: str, state: str, future: Optional[Future] = None,
response: Optional[Dict] = None, response_channel: Optional[str] = None)
Initializes a new instance of JobState.
- property future: Future | None#
Gets the future object associated with the job’s asynchronous operation.
- property job_id: UUID | str#
Gets the job’s unique identifier.
- property response: Dict | None#
Gets the response data received for the job.
- property state: JobStateEnum#
Gets the current state of the job.
- property trace_id: str | None#
Gets the trace_id from the job submission