5. Core Concepts
This section describes the core concepts of the Clara Deploy SDK. At its core, the Clara Deploy SDK provides an engine that enables packaging of an individual step in a medical imaging processing pipeline as a container. In addition, it provides flexible mechanism for specifying constraints between the steps in a pipeline and artifact management for linking the output of any step as an input to subsequent steps. Ability to containerize individual steps of a pipeline, combined with the power of artifact management enables portability, scalability and simplicity for medical imaging application developers.
In mathematics, particularly graph theory, a directed acyclic graph (DAG /ˈdæɡ/ ), is a finite directed graph with no directed cycles. It consists of finitely many vertices and edges, with each edge directed from one vertex to another, such that there is no way to start at any vertex v and follow a consistently-directed sequence of edges that eventually loops back to v again. Equivalently, a DAG is a directed graph that has a topological ordering, a sequence of the vertices such that every edge is directed from earlier to later in the sequence.
A Data Source acts as the origination point of medical imaging data used in applications developed for the Clara Deploy SDK. One common type of input data used for Clara is DICOM objects representing medical images and associated information. Often such DICOM data is organized in a picture archiving and communication system (PACS) that provides storage and convenient access to images from multiple modalities. PACS is the primary data source for applications developed for the Clara Deploy SDK. Medical images and other related information can be transmitted from a PACS Data Source to the DICOM Adapter component of the Clara Deploy SDK using the DICOM networking protocol. In such a scenario, PACS performs the role of a Service Class User (SCU) and the Clara DICOM Adapter acts as a Service Class Provider (SCP).
An operator is an element of the Clara Deploy Pipeline. Each operator is typically designed to perform a specific function/analysis on incoming input data. Common examples of such functions are: reading images from disk, performing image processing, performing AI inference, writing images to disk etc. The Clara Deploy SDK comes with a bundled set of operators. A bundled operator is delivered as a container which packages code and dependencies so that the operator can be executed reliably from one computing environment to another. Users can also build their own custom operators. Operator definitions include:
URI and Tag required to retrieve the operator’s container.
Set of required services needed by the operator.
Set of required resources (CPU, GPU, other) needed to execute the operator.
Time limit applied to the operator, after which it is to be considered non-responsive and its resources reclaimed.
Human friendly name of the operator.
Human friendly description of the operator.
A Pipeline is a collection of operators that are configured to work together to execute a medical imaging pipeline. Each step in a pipeline is an operator which is deployed as a container. Using the Pipeline Definition Language, users can define a multi-step pipeline as a sequence of tasks and capture the dependencies between tasks. For example, a pipeline could have the following operators connected in sequence: DICOM Reader Operator, AI Inference Operator and DICOM Writer Operator. During the execution of the pipeline, the Clara Platform engine will orchestrate feeding of requisite data to a specific operator in the pipeline. For this specific example, incoming DICOM images from a registered Data Source will be made available to the DICOM Reader operator. When the DICOM reader operator gets done with its execution, its output will be fed to the AI Inference Operator. Similarly, output from the AI Inference Operator will be provided to the DICOM Writer Operator. Finally, the output from the DICOM Writer Operator can be pushed to a PACS DICOM Storage service.
A Service is an abstraction which defines a mechanism to access a computational facility available in the Clara Platform. Operators defined in a Clara pipeline could make use of one or more services at run-time. For example: consider a stateless AI inference backend which is running with multiple replicas. Those replicas are fungible—an application which needs inference as a service does not care which backend it uses. While the actual orchestration nodes that compose the backend set may change, the frontend clients should not need to be aware of that, nor should they need to keep track of the set of backends themselves. The Service abstraction enables this decoupling. Service definitions include:
URI and Tag required to retrieve the service’s container.
Set of required resources (CPU, GPU, other) needed to run the service.
Indicator if the service is long-lived or to be recycled when the pipeline completes.
The type of connection supplied by the server, and therefore the connection info supplied to any operations requesting the service.
Human friendly name of the service.
Human friendly description of the service.
Once a pipeline is defined it can be published so that when appropriate data is sent to the Clara Deploy SDK, an instance of the Pipeline is instantiated as a Job and the incoming data is processed based on the configuration of the Pipeline. A Job is an instance of a Pipeline executed for a specific dataset.
Once a pipeline is published, there needs to be a mechanism to trigger it so that a job is created to process appropriate incoming data sent to the Clara Platform. The primary mode of triggering a pipeline and instantiating a job is performed via the DICOM Adapter component of the Clara Deploy SDK. In a typical scenario, images from an external DICOM Store SCU (such as a PACS) is sent to a specific Application Entity (AE) Title of the Clara DICOM Adapter. A Clara Application developer can configure so that an AE title is linked with a specific pipeline UID managed inside the Clara Platform. The end result is whenever data is sent to that AE title, the corresponding linked pipeline gets triggered and a job is created to process the incoming data.
A Data Sink acts as the destination point where final results from Clara pipelines are stored. Often output from a Clara pipeline is packaged in one or more DICOM objects and transferred to a PACS for further consumption by medical image processing applications and viewers. In this scenario, the DICOM Adapter component of the Clara Deploy SDK acts as a Service Class User (SCU) and the receiving PACS acts as a Service Class Provider (SCP).