NVIDIA Docs Hub NVIDIA Clara Clara Holoscan Deploy 0.7.4 8.9. Services

8.9. Services

Sometimes operators need assistance, or need access to resources which are too expensive, complex, or restricted to run as a short lived container execution stage. In these cases, we turn to Clara Deploy SDK Pipeline Services (or service definitions). A good example of a pipeline service is NVIDIA Tensor RT Inference Server, aka Triton, which provides client-server inference services over network connections.

Services are much like operators in that they must declare a container and its image. However, they differ in that pipeline definitions cannot reference services directly. Instead, services are referenced by operator declarations in a pipeline.

Additionally, services have a very different life-cycle / expectancy than operators. Pipeline operators are expected to run, work, and exit as quickly as possible. Pipeline services are guaranteed to persist for at least the life-time of the pipeline, if not longer. This means processes which have time consuming or expensive startup and / or shutdown costs can be kept available without having to worry too much about a pipeline’s execution life-cycle.

Clara Deploy SDK distinguishes individual services by their container property. Specifically the combination of the container property’s image, tag, and command properties. When a pipeline operator declares that it requires the availability of a service, the Clara Deploy SDK will determine if it knows of any running service with the same combination of image, tag, and command properties. If it does, it will provide connection information to the pipeline operator at execution time. If it does not, then it will start the service prior to running any operator in the pipeline.

Let’s take a look at an example pipeline service definition.

Copy
Copied!

            
            # Example of a pipeline with an operator which requires a service (complex!)
api-version: 0.4.0
name: do-things-with-trtis
operators:
  - name: inference
    container:
      image: clara/examples/inference
    services:
      - name: trtis
        # Services, just like operators, are container based; and the
        # operators::services::container::image is the only required
        # property in a service declaration.
        container:
          image: clara/examples/trtis
          tag: 1.1.10
        # operators::services::connections defines how the service expects to
        # be interacted with. Clara Deploy SDK supports network ("http") and
        # volume ("file") connections.
        connections:
          http:
              # The name of the connection is used to populate an environment
              # variable inside the operator's container during execution.
              # The application inside the container can read the variable to
              # know how to connection to the service.
            - name: TRTIS_URI
        # Some services need a specialized or minimal set of hardware. In this case
        # NVIDIA Tensor RT Inference Server [TRTIS] requires a GPU to function.
        requests:
          gpu: 1
    input:
      - path: /in
    output:
      - path: /out

*Notice: Pipeline services are deprecated as of Clara Deploy SDK v0.7.1 and support for pipeline services, including the ability to utilize them as part of a pipeline job, is expected to be dropped in a future release._