21. Release Notes
21.1.1.What’s New
21.1.1.1.Security Patches
Security patching of libraries within the following components/containers:
Clara Platform container image
Render Server
Clara CLI
NodeMonitor Container image
App Base inference v2
AI Spleen
AI Pancreas Tumor
AI Chest Xray
AI Brain Tumor
AI Lung Tumor
AI Liver Tumor
AI Covid 19
AI Prostate
Results Service
The version of Kubernetes that gets installed with Clara (v1.19.4) is impacted by CVE-2021-25741. To mitigate this vulnerability without upgrading kubelet, do the following:
# First find the name of the API-server deployment:
kubectl -n kube-system get deploy | grep apiserver
# Then, edit the deployment
kubectl -n kube-system edit deploy <API-server Name>
# add the following flag to the container command spec:
--feature-gates="VolumeSubPath=false"
# Save and quit.
# Next, run the following to update kubelet in place:
kubelet --feature-gates="VolumeSubPath=false"
# Finally, restart kubelet:
sudo systemctl restart kubelet
This will fix the host path CVE in place, any pods using this feature will need to be removed/restarted. Please see this page for more information.
21.1.1.1.1.Known Issues
21.1.2.Spleen Operator
The spleen-segmentation operator sometimes exist due to an OOM condition. Subsequently, the job is terminated by Clara Platform on reaching job timeout. However, the spleen operator’s status is never set to Stopped
and wrongly gives the impression that the job is still running within Clara Console.
21.2.1.What’s New
21.2.1.1.Ubuntu 20.04 Support
Clara now supports Ubuntu 20.04. Backwards compatibility with Ubuntu 18.04 is still maintained.
21.2.1.2.Kubernetes and Helm Version Update
The Kubernetes version has been upgraded from v1.15 to v1.19. The version of Helm used is updated to v3 (from Helm v2). Upgrading from prior versions of Clara require uninstalling existing instances of Clara.
21.2.1.3.Advanced DICOM Integration Pipeline
This new reference pipeline demonstrates how to use the enhanced DICOM Parser to parse and select a specific or multiple series from a multi-series DICOM study which is the input payload to a Clara pipeline.
21.2.1.4.Render SDK
The Clara Render SDK contains the render technology used by Clara. The volume renderer is provided as a shared library which can easily be used from C++ code or as a micro service through gRPC from many other programming languages. Also the functionality to implement remote rendering is provided in source code form. With that one can create their own render server. The SDK also contains examples on how to connect to the Clara volume renderer from a browser using gRPC or directly through the provided C++ interface. There is also an example of how to implement a remote render server, and an example showing how to stream volume data to be rendered.
21.2.1.5.Reference Applications with Model Repository Support
The following reference applications now supports registering and loading models using Model Repository. Please refer the the setup instructions on NGC for more information.
AI Brain Tumor
AI Chest X-Ray
AI Colon Tumor
AI Liver Tumor
AI Spleen Tumor
21.2.1.5.1.Issues Fixed and Enhancements
21.2.2.DICOM Parser Supporting Series Selection Rules
The DICOM Parser has been enhanced to support user defined series selection rules, via a configuration file or a well-known environment variable whose value is the Base64 encoded rules. The selected series and the converted image file names are saved in a well-known JSON file for the downstream consumers.
21.2.3.Series Selector Supporting Series Selection Rules via Environment Variable
The Series Selector has been enhanced to support passing selection rules, encoded as Base64 string, via a well-known environment variable.
21.2.4.Operators Supporting Selected Images Metadata
A number of operators have been enhanced to support the selected images metadata through a well-known JSON file generated by the DICOM Parser or Series Selector. These include AI inference operators built on app_base_inference_v2, DICOM Segmentation Writer, DICOM RT Struct Writer, and DICOM Report Writers.
21.2.5.Reference Pipeline Definition Specifying System Memory Request
Due to the Clara Platform Server change to limit a pipeline operator’s system RAM usage to a default of 1 GiB, reference pipeline definitions have been updated to request RAM when an operator’s RAM usage is anticipated to be over the default limit.
21.2.6.Fixed: COVID Lung Lesion DICOM Segmentation Image Orientation Issue
The Lesion operator in the COVID-19 pipeline loads input image, in NIfTI format, as closest canonical, and then creates tensors. The image writer in this operator is then required to revert the orientation of the image generated from the output tensors, in order to align with input DICOM anatomical orientation. The setting to revert to original orientation must be explicitly set in the inference configuration writers section.
It was found that in Release 0.7 the setting to revert to the original orientation was missing in the inference configuration, and as a result the generated DICOM Segmentation of the output image was of wrong orientation.
The issue has been fixed in R0.8.1, and the workaround in R0.7 is to remove the "as_closest_canonical": true
in loadNifti
in the config_inference.json
of the Lesion operator.
21.2.7.Platform Server Now Performs Clean-up as Part of its Start-up Routine
As part of its start-up routine, Clara Platform Server will shutdown, reclaim, and/or delete any pipeline-jobs and pipeline-services which were deployed by a previous instance of the server.
Pipeline-jobs stopped in such a manner will be assigned JOB_STATE_STOPPED
and JOB_STATUS_EVICTED
.
Additionally, any physical payload storage which does not have a corresponding payload record will be removed from the storage device.
21.2.8.Long Running Pipeline-jobs Will be Timed Out and Terminated
Pipeline-jobs, which utilize the Clara Orchestrator, which exceed pipeline timeouts will be terminated. Pipeline-jobs which timeout will be assigned JOB_STATE_STOPPED
and JOB_STATUS_TERMINATED
.
Pipeline-job timeout is ten minutes by default and can be configured via the pipelineTimeoutDefault
value in the Clara Platform Server Helm chart’s values.yaml file.
21.2.9.Memory and CPU Utilization Limits are Now Applied to Pipelines
Pipeline-jobs have limits applied to the amount of memory they can consume, and the amount of CPU resources they can utilize. The default values are 1GB of memory and 1 CPU core. The default values can be adjusted by the defaultCpuLimitForOperator
and defaultMemoryLimitForOperator
values in the Clara Platform Server Helm chart’s values.yaml file.
When an application attempts to use more than the allocated number of CPU cores, it will be throttled such that the sum of all CPU core utilization does not exceed the allocation size. When exceeding the amount of memory, out-of-memory [OOM] errors will occur in the application. If the application is unable to handle the errors, it will be terminated.
A sample for resource requests for a pipeline operator is as follows:
operators:
- name: sample-operator
requests:
cpu: 3
gpu: 1
memory: 8192
cpu: 3
means the operator will be assigned three CPU cores for its exclusive use.
memory: 8192
means the operator will be allocated 8GB (= 8192MB) of memory for its exclusive use. When an application exceeds its memory allocation it will encounter out of memory errors and can be terminated by the operating system.
gpu: 1
means the operator will be assigned one GPU for its exclusive use.
For more details regarding adding resource requests to pipeline definitions, refer to documentation for Pipeline Definition Language Resource Requests
.
21.2.10.Creation of New Pipeline-jobs Will be Rejected When Storage Space is Low
When the available storage space falls below a critical threshold (default is 80% of storage device’s capacity), requests to create new pipeline-jobs will be rejected. Rejected creation requests will receive an error code of -8,450
with a message of "Payload creation failed."
from the Platform GRPC API.
The default threshold is 80% and can be adjusted via the maxStorageUsagePercentage
value in the Clara Platform Server Helm chart’s values.yaml file.
21.2.11.Clara Platform Server Removes Stopped Pods More Quickly
Clara Platform Server will delete stopped pipeline-job pods more quickly that it did in previous releases. This is because of the availability of the Jobs::ReadLogs
API and clara logs
CLI command which enable users to download pipeline-job logs after the pipeline-job pod has been deleted.
21.2.12.Pyloads Download feature in Clara Management Console can now be disabled
Clara Management Console allows admins to disable payloads download feature for one or more users. The users having payloads download feature disabled will not see links to download payloads from Jobs View and Job Details View.
21.2.13.Ansible Uninstall Support
Clara Ansible playbooks have been enhanced with uninstall
option to remove Clara and all components added via the playbooks.
21.2.14.Removal of Bootstrap.sh
Clara Bootstrap install script is removed in this release, the new default installer is Clara Ansible.
21.2.15.Memory and CPU Utilization Limits are Now Applied to Pipelines
See details in Issues Fixed and Enhancements.
21.2.16.Creation of New Pipeline-jobs Will be Rejected When Storage Space is Low
See details in Issues Fixed and Enhancements.
21.2.17.Platform Pod Eviction Due To Low Ephemeral Storage
When pipeline operators write to folders which are not specified as output folders in their pipeline definition, the data gets written to temporary storage under /var/lib/docker/. When the partition which contains /var/lib/docker/ runs out of space, Kubernetes will begin evicting pods (running containers) in an attempt to free storage space.
If the platformapiserver pod is evicted any queued pipeline-jobs will get lost and when a new platform pod comes up, any running jobs will be terminated (faulted). If a pipeline-job pod is evicted, Clara Platform will not be notified by Kubernetes and any resources associated with the job will remain occupied until the job times out (default is 10 minutes). Additionally, the pipeline-job will be considered faulted by Clara Platform.
21.2.17.1.Recommendations
In order to prevent such issues, it is recommended to have at least 100 GB of space in the partition of the /var/lib/docker directory.
It it also recommended that individual operators do not write to any paths except the output paths specified in their pipeline definition.
21.2.18.Clara GPU Profiler
Nsight-systems profiling (for an operator) may fail in systems using a version of CUDA other than 10.2
21.2.19.Clara Pipeline Driver
Access violations related to the signaling mechanism used by Clara Pipeline Driver can cause a pipeline operator to fail suddenly. The effect being that the operator is unable to signal awaiting operators, leaving the pipeline “stuck”.
Platform Server will eventually terminate the pipeline once its timeout has been reached.
This issue has been extremely rare. Internal testing has shown it happens less frequently than 1 in 100,000 pipeline-job executions.
If you believe this issue is affecting your pipeline-jobs, you can enable tracing from the Clara Pipeline Driver by adding
variables:
CLARA_TRACE=2
to each operator in the affected pipeline definition. You can then use the clara logs -j <job-id> -o <operator-name>
to fetch the logs from the job’s operator. The logs should contain trace logs from Clara Pipeline Driver. If you do not see an entry similar to the example below, near the end of the operator’s logs, then you might be experiencing this issue.
[1621961708.720][driver.cpp#1501](dicom-reader) synchronization point "/waitlock-0098b2ef-dicom-reader" unlocked
21.2.20.CLI - State Filter in List jobs command
clara list jobs
fails to apply state filters when used without status filters.
# With state flags only, all filtering is ignored
clara list jobs --pending --running --stopped
# With state and status flags, all filtering works as intended
clara list jobs --pending --running --stopped --healthy
clara list jobs --faulted --healthy --canceled --evicted --terminated
clara list jobs --all
21.2.21.Data Corruption when Using Pipeline-Services
Data corruption is a known risk of combining pipeline-services which utilize GPU resoruces with other pipeline-services which utilize GPU resources or with Model Repository deployed Triton Inference Servers. The recommendation is to not mix the use of pipeline-services dependant pipelines with Model Repository dependent pipelines.
21.3.1.What’s New
21.3.1.1.Clara GPU Profiler
Nsight-system time series graph: Nsight-systems time series data is extracted and integrated with the Clara Deploy GPU profiler. Bone operator is prepared with nsight-systems profiler and can be executed via bone segmentation pipeline. The operator and the pipeline are part of the GPU Profiler. Capability to compare results with and without Clara platform for bone operator is integrated with the profiler. Clara GPU Profiler Workflow: This version of Clara Deploy GPU profiler enables the profiler Workflow. Multi-Level Profiling: Multi-level profiling is part of Profiler Workflow and is integrated with the tool. Multi-GPU tracking: This version of profiler supports multi-GPU tracking and monitoring.
21.3.1.2.Render Server - displaying orientation labels to slice view
Render Server is now displaying orientation labels (Left (L), Right (R), Posterior (P), Anterior (A), Head (H), Foot (F)) in slice views.
21.3.1.3.Jupyter notebook based interactive rendering
Added the Clara Render Server Jupyter Notebook Widget which provides access to realistic visualization of 3D medical data from a Jupyter Notebook. The widget can either use volume data from provided numpy arrays, or visualize data provided by the Dataset Service. It provides the same user interaction (rotate, zoom, pan) as the Clara UI. It’s also possible to access all the render settings like cameras, transfer functions or lights.
21.3.1.4.Clara Management Console: Ability to view datasets which were not pre-registered
Users can now successfully visualize input/output artifacts from a Job using Render Server inside Clara Management Console, even if such data artifacts were not registered with the Result Service for rendering.
21.3.1.5.DICOM Adapter Open Source
DICOM Adapter is now an open source project on GitHub. NVIDIA welcomes any contributions, features requests and bug reports. Please feel free to use the Issues section to give us feedback, request new features or report bugs you may have. For this release, please continue to use Clara DICOM Adapter version 0.7.3, no upgrade is required.
21.3.1.6.Surface Mesh in STL Format in DICOM RTSTRUCT Instance
Surface mesh in STL binary format is added to the DICOM RTSTRUCT instance generated by Clara Deploy DICOM RT Structure Set writer. New functionalities include algorithms to generate the surface mesh of the segmentation image, and to encode the mesh in binary form compatible with stl file format. Additionally, the binary stl data is included in a private tag in the DICOM RT Structure Set: Tag (0011, 1001) has the VR of OB and its value the binary data of the stl file, while (0011,0010) is for the Private Creator Data Element.
21.3.1.6.1.Issues Fixed and Enhancements
21.3.2.Clara Management Console: Enhanced Jobs filtering
Clara Management Console now has custom filters on Jobs List View to support filtering of jobs on various criterias like Status, Priority, Start Time and Duration.
21.3.3.Clara Management Console: Auto Log Off
Clara Management Console now supports auto log off on detecting user inactivity timeout of 15 minutes.
None
21.3.4.Using different environment variables for the same pipeline service can make Clara Platform API Server confusing
For our reference pipelines, we are using NVIDIA_CLARA_TRTISURI
environment variable name for the connection name of the Triton Pipeline service.
Since the pipeline service would keep running after a pipeline job is finished and Clara Platform would use the existing pipeline service as long as the signature (docker image, command, etc.) of the service is not changed, triggering a pipeline job that uses the different pipeline service connection name may not work as expected (it would populate the old environment variable for the pipeline service connection.)
21.3.5.Clara Management Console - Log Viewer may have problems visualizing very large logs
Clara Management Console Log Viewer may fail to visualize logs for an operator if the log file is of very large size. Workaround is to view logs using CLI.
21.3.6.Clara Management Console - Jobs View filters don’t get reset on navigating to the other view
Clara Management Console supports filtering of jobs in Jobs View. If the filters are applied and user navigates to different view without clearing filters, filters are still applied but few filters don’t show current value. Workaround to clear filters is to click on ‘Close’ button for filter elements or reload the page.
21.3.7. Digital Pathology Nuclei Segmentation Pipeline doesn’t work on A100
You may see the following error message from TRITON Server.
[E:onnxruntime:, sequential_executor.cc:183 Execute] Non-zero status code returned while running Cast node. Name:'Cast_0' Status Message: CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device
The pipeline is using a ONNX model and the error message looks related to this issue. We are looking into solving the problem.
21.4.1.What’s New
21.4.1.1.Support for A100 platform
Clara Deploy AI reference operators and pipelines have been updated to use Triton Inference Server Docker image version 20.07-v1-py3, which is compatible with NVIDIA GPU with Ampere architect, e.g. A100. In addition, Clara Render Server can now be deployed on the A100 platform as well.
21.4.1.2.DICOM Adapter: Enable plug-in development for grouping data and triggering
The new JobProcessorBase
base class found in Nvidia.Clara.Dicom.API
assembly enables custom data grouping and simplifies job trigger workflow. Please refer to the DICOM Adapter documentation for details.
21.4.1.3.AI Base Operator V2 Updated
AI Base Operator V2 has been updated to use Triton Client SDK Version 1.15.0 which was released along with Triton Inference Server Docker image 20.07-v1-py3.
21.4.1.4.COVID-19 Pipeline Updated with new AI operators
A new AI operator was introduced to inference on COVID-19 Ground Glass Opacity lesion, as well as a new operator to derive lesion to lung volume ratio. Furthermore, the GGO lesion is reported in DICOM SEG IOD, and the volume ration in DICOM Encapsulated PDF IOD.
21.4.1.5.Render Server - Video Stream Encoding using Software Mode
Up until the last version Render Server required the presence of nvenc module to encode data into H.264 stream. It was modified to use Open H.264 where nvenc module is not available.
21.4.1.6.GPU Profiling Tool
The tool now allows any operator/pipeline and any dataset to be registered with the Clara GPU Profiler.
21.4.1.7.Clara Management Console - Job DAG View
Clara Management Console now supports rendering visual representation of the job DAG.
21.4.1.8.Clara Management Console - Basic Authentication
Clara Management Console now offers basic authentication. The users will have to enter a valid username and password to access Clara Management Console.Clara Management Console provides default credentials which admin can change after the deployment. Web browsers need to have cookies enabled, sign in will not work otherwise. All the data is still served over HTTP and so it is still unencrypted when sent to and from the server.
21.4.1.9.Ansible Scripts
Clara can be installed on remote hosts via a set of Ansible scripts. The Ansible playbook installs NVIDIA drivers, NVIDIA docker, Kubernetes, Helm, Clara CLI and Clara Components. Includes Vagrantfile for testing.
21.4.1.10.Platform Server: Enable metadata for jobs, models, pipelines and reusable payloads.
The Platform Server now supports metadata for all platform related domain entities such as jobs, models, pipelines and reusable payloads. Metadata can be added to each of these objects at creation time or using a separate API, AddMetadata. Metadata can be removed from the objects using the RemoveMetadata API. The status/details API for each object is augmented to enumerate Metadata for the object.
21.4.1.11.Platform Server - Clara::Utilization API
Platform Server now has an API to request the GPU utilization data for all Clara Platform managed GPUs. The Utilization API has two modes of functioning, users can either request a single snapshot of current GPU utilization metrics or the data can be streamed. GPU Utilization data contains following information about every Clara Platform managed GPU - name of the node which contains the GPU, PCIE ID, compute utilization, memory utilization, free GPU memory, used GPU memory, timestamp when the data was collected and all the Clara Platform managed processes utilizing the GPU.
21.4.1.11.1.Issues Fixed and Enhancements
21.4.2.Error in submitting Clara jobs due to valid job names which can not be inserted as labels due to sizing constraints:
Pod and workflow names in Kubernetes follow different naming constraints from labels, therefore using job names in labels causes failure to submit Argo workflow. Since this label is currently not used anywhere by the Server, it is safe to remove the label. This label can be added again in the future if and when job names follow Kubernetes naming conventions for labels.
None
21.4.3.Multi-AI with shmem
The Multi-AI pipeline with shared memory may have issues in the register operator. In this scenario, you have to manually copy the results from the “volume-merger” operator to the render-server for visualization:
1- Locate the payload ID for the job .
2- Navigate to folder /clara/payloads/<payload-id>/operators/volume-merger/rendering
.
3- Create a folder at /clara-io/datasets/<a folder for current job>
.
4- Copy the contents from the folder in step 2 to the folder in step 3.
5- Open Render Server and click on the data with folder name in step 3.
21.4.4.DeepStream Batch Pipeline doesn’t support A100
Current DeepStream Batch Pipeline is based on DeepStream’s samples
image (4.0.2-19.12-samples) from NGC. However, the latest samples image (5.0-20.07-samples) doesn’t support A100 for now so DeepStream Batch Pipeline would also not support A100 until new DeepStream samples
image is available on NGC.
21.5.1.What’s New
21.5.1.1.DICOM Adapter Now Supports Concurrent Associations
Please refer to the DICOM Adapter section for additional details.
21.5.1.2.Exporting DICOM SR Object
A Structured Report (SR) is a DICOM object used to exchange structured text data. For Clara Deploy, the primary use case is to export classification results, but it can also be extended to generate segmentation results from an AI pipeline in an interoperable way. The design and implementation follows the IHE AI Results Rev1.1-Trial Implementation
The same operator can also be configured via an environment variable, NVIDIA_DICOM_REPORT_TYPE := [pdf | sr]
, to generate a DICOM encapsulated PDF instead of a DICOM SR of the classification result.
21.5.1.3.COVID-19 Pipeline Support for DICOM Segmentation and Encapsulated PDF Output
The COVID-19 classification pipeline uses the new Clara Deploy DICOM Segmentation writer and DICOM SR writer to output DICOM Segmentation instances (for segmentation results) and Encapsulated PDF instances (for classification results), both of which are DICOM standards compliant.
21.5.1.4.Viewing payloads from Clara Console
Clara Console UI and Render Server UI have been integrated so that users can easily discover the right set of payloads for a job and visualize the relevant datasets using Render Server. In a given job, as long as the output payload of an operator has been registered with the Render Server, users can visualize the output simply by clicking on the “Render” link associated with that output.
21.5.1.5.Viewing Logs for Operators
Users can now also view the log data for operators in the Job Details view.
21.5.1.6.GPU Profiling Tool
Clara Deploy GPU Profiler monitors GPU utilization (compute and memory) in real time while a job is executed on Clara Platform. In this version, the tool supports three pipelines and data in MHD (MetaImage Header) format.
CT Multi AI: This pipeline supports automatic bone segmentation and lung ROI detection with shared memory support. Additionally, the pipeline has four AI models (liver-tumor, spleen, colon-tumor, lung-tumor). The pipeline requires six GPUs for execution on Clara Platform R7.2.
CT Bone: Bone segmentation pipeline is a single operator pipeline that segments the bone from an input CT dataset. This is the only pipeline in the tool that supports the Nsight-systems profiler and detailed operator-level profiling. The pipeline requires an Nsight-systems integrated docker container. The container must be loaded before running this pipeline. Details are on the Setup page.
CT Liver-Tumor: Liver Tumor segmentation pipeline on input CT MHD data.
21.5.1.7.Model Analyzer
Model Analyzer is a tool used with Triton Inference Server to identify the compute and memory requirements of a model given a specified batch size. It is available on NGC via a Helm chart, standalone Docker image, and command line interface.
You can learn more about the Model Analyzer from our recent blog post: Maximizing Deep Learning Inference Performance with NVIDIA Model Analyzer
21.5.1.7.1.Issues Fixed and Enhancements
21.5.2.Fixed Platform Server Crash Issue Caused by Argo
Fixed an issue where the Platform Server would crash with the message Unhandled exception. Microsoft.Rest.HttpOperationException: Operation returned an invalid status code 'UnprocessableEntity'
.
The crash should no longer happen. The Argo failure has not been resolved at this time, and pipeline jobs that encounter the error will be marked as Faulted
and not be executed.
21.5.3.DICOM Adapter Configuration File Changes
The existing YAML formatted configuration file has now been consolidated into a single file, appsettings.json
. Please refer to the DICOM Adapter section [#link here] for additional details.
21.5.4.DICOM Adapter Kubernetes Custom Resource Changes
In this release of DICOM Adapter, the following three Custom Resources have been upgraded to v1beta2:
claraaetitles.dicom.clara.nvidia.com
destinations.dicom.clara.nvidia.com
sources.dicom.clara.nvidia.com
Please remove any existing Custom Resources prior to starting DICOM Adapter:
kubectl delete crd claraaetitles.dicom.clara.nvidia.com
kubectl delete crd destinations.dicom.clara.nvidia.com
kubectl delete crd sources.dicom.clara.nvidia.com
Otherwise, you may see the following error message:
Error: rpc error: code = Unknown desc = customresourcedefinitions.apiextensions.k8s.io "claraaetitles.dicom.clara.nvidia.com" already exists
21.5.5.AE Title Job Processor for DICOM Adapter CLI
The DICOM Adapter CLI now supports the AE Title Job Processor.
Please refer to the CLI section for the new commands and arguments.
21.5.6.Platform API JobsService::ClaraStop RPC Has Been Removed
Platform API JobsService::ClaraStop
RPC has been deprecated since Platform Server 0.5.0 in favor of ClaraService::Stop
RPC, and has now been removed.
Please use the new RPC as any attempts to use the removed RPC will result in errors.
21.5.7.Clara Console
Clara Console uses a fixed host and port for Render Server to visualize payloads. The Render Server host is assumed to be the same as the Clara Console host with a fixed port of 8080. It’s currently not possible to use the measurement tools when visualizing payloads with Render Server from Clara Console. To use the measurement tools, open the Render Server UI at port 8080 and load the dataset.
21.5.8.Render Server fails to start on A100 GPU
When starting the Render Server with ‘clara render start’, the command does not finish. The Kubernetes pod state is ‘CrashLoopBackOff’. Root cause: The A100 GPU does not include the video encoding hardware required to encode the video stream produced by the Render Server. The Render Server Docker image fails to start with an error message. This will be fixed in a future version.
21.5.9.Multi-AI with shmem
The Multi-AI pipeline with shared memory may have issues in the register operator. In this scenario, you have to manually copy the results from the “volume-merger” operator to the render-server for visualization:
Locate the payload ID for the job .
Navigate to folder
/clara/payloads/<payload-id>/operators/volume-merger/rendering
.Create a folder at
/clara-io/datasets/<a folder for current job>
.Copy the contents from the folder in step 2 to the folder in step 3.
Open Render Server and click on the data with folder name in step 3.
21.5.10.Containers started by pipeline services are not correctly managed by Platform Server
When deploying a service using pipeline services the container’s GPU assignment is correctly managed. This means that there’s no behavior change for containers started by pipeline services from previous versions of Clara Deploy.
This affects all 0.7.2 (R7) reference pipelines which request a TRTIS or Triton Inference Server instance using pipeline services.
21.6.1.What’s New
21.6.1.1.Segmentation Algorithm: Automatic Bone Segmentation for Multi-AI Pipeline
The Multi-AI Pipeline now provides a mechanism to segment bone voxels. Isolation of bone voxels is instrumental in the robust detection of lung ROI (and other organs), even when the lung is infected. A new Multi-AI Pipeline with CUDA-accelerated automatic bone segmentation is included. In this release, bone segmentation is supported with shared memory.
21.6.1.2.Reference Operator: Automatic detection of Lung ROI for Multi-AI Pipeline
In this release, Lung ROI is determined automatically in the multi-AI pipeline to facilitate faster segmentation of Lung voxels. This operator makes use of the Bone segmentation algorithm mentioned above. A new multi-AI pipeline with automatic bone segmentation is included; automatic bone segmentation facilitates automatic detection of lung ROI. In this release, automatic lung ROI detection is supported with shared memory.
21.6.1.3.Reference Pipeline: Nuclei Segmentation Pipeline for Digital Pathology
This pipeline showcases how users of Clara Deploy can tile a whole slide image, pre-process the tiles, use an AI model to perform inference, stitch the resultant tiles, and then visualize the final multi-resolution image.
21.6.1.4.Reference Operator: DICOM Segmentation Object Writer
CAUTION: Limited by Federal (or United States) law to investigational use only; not for diagnostic use.
This research-use-only software has not been cleared or approved by the FDA or any regulatory agency.
This example application creates a DICOM Segmentation object from a segmentation mask image in MetaImage or NIfTI format, along with metadata extracted from the original DICOM Part 10 files. The output is a DICOM instance of modality type SEG, saved in a DICOM Part 10 file.
21.6.1.5.Clara Management Console: Downloading Payloads
With a single click, Clara Management Console can now to download all job payloads and analyze the input/output/intermediate artifacts of the pipeline. The payloads can also be downloaded per operator from the Job Details View.
21.6.1.6.Render Server: Shared Memory for Loading Datasets
Render Server now uses shared memory to load datasets.
21.6.1.7.Python Client
The new Python 3 package can interact with the Clara Platform server to manage jobs, pipelines, and payloads. The package operates through the GRPC API and enables users to develop third-party Python applications with Clara.
21.6.1.8.Platform Server Resource Management
The Platform Server now automatically detects the number and type of GPU devices available in the cluster. GPU devices are individually managed and assigned to pipeline job operators and pipeline services.
Pipeline job operators and services will no longer be able to access GPU resources not assigned to them.
21.6.1.9.Platform Server Job Operator Logs API
Operator logs for jobs are now available through the Jobs Service in Platform through the new ReadLogs API. You can retrieve the logs for a particular operator by providing a job identifier and an operator name to the ReadLogs API.
21.6.1.9.1.Issues Fixed and Enhancements
21.6.2.DICOM Parser generated image slices may be out of order
Root Cause: The DICOM series image reader does not sort instances correctly when a list of DICOM instance files is given.
Fix: Use the DICOM Series Images file reader to first read all the instance files of the series with its built-in sorting strategy: first with image position and orientation patient, then SOP instance number, then file names, before reading the instances from the sorted list of instance files.
Impact: There is a slight performance impact as all files under the given folder will be read multiple times, instead of only once before. However, for a set of fifty (50) DICOM instance files for two (2) studies and three (3) series, the elapsed time for the whole image reading execution was measured at 25ms.
21.6.3.DICOM RT STRUCT Writer output misaligned
Not a defect as confirmed by both external and internal teams.
21.6.4.Pipeline job operators and services no longer have access to unassigned GPU resources
Pipeline job operators and services no longer have access to GPU resources not assigned to them by the Platform Server.
Previous versions of Clara Deploy SDK failed to correctly scope GPU resource access to pipeline job operators and services that did not specify any GPU resource requirements. The net result was that pipeline job operators and services that did not specify any GPU resource requirements were able to access all GPU resources on their cluster node.
21.6.5.Configuration
availableGpus
no longer affects Platform Server
Platform Server now automatically detects the number and type of GPU available in the cluster. Starting with release v0.7.1, Platform Server no longer honors the availableGpus
configuration option in the values.yaml
Helm chart.
21.6.6.Pipeline operators and services that need GPU resources must specify GPU resource requirements
Prior to release 0.7.1, when a pipeline job operator or service failed to specify GPU resource requirements, it was still able to utilize GPU resources. This was a bug in the way Kubernetes managed GPU resources.
Starting with 0.7.1, pipeline operators and services that require GPU resources must specify GPU resource requirements in their pipeline definition YAML.
21.6.7.Render Server fails to start on A100 GPU
When starting the Render Server with clara render start
, the command does not finish. The Kubernetes pod state is “CrashLoopBackOff”.
Root cause: The A100 GPU does not include the video encoding hardware required to encode the video stream produced by the Render Server. The Render Server Docker image fails to start with an error message.
21.6.8.
nvidia-smi
command inside the container may not work in KVM
The resource manager of the Clara Platform server uses the NVIDIA_VISIBLE_DEVICES
environment variable to select available GPU devices in the container. We found that the nvidia-smi
command doesn’t work with NVIDIA_VISIBLE_DEVICES
(e.g., “NVIDIA_VISIBLE_DEVICES=0” on a multi-GPU system) on a Linux Kernel-based Virtual Machine (KVM), showing an error message such as Unable to determine the device handle for GPU 0000:07:00.0: Unknown Error
.
This issue is related to the GPU driver version of the guest OS (on version 445.xx, particularly). Upgrading/downgrading the NVIDIA GPU driver may solve the issue.
21.6.9.Register results with multi AI + shared memory
The Register Result operator may fail with the Multi-AI pipeline with shared memory enabled. This operator may try to access shared memory that is still under creation, which may result in failure. The register result operator prepares the result of the Multi-AI pipeline for visualization in the render server. If the operator fails, the result of the Multi-AI pipeline must be manually copied into the render server for visualization.
21.6.10.Pipeline jobs which require more GPU resources than are available in the cluster are unable to be executed
When a pipeline definition specifies GPU resource requirements that exceed the available GPU resources in the cluster, any pipeline job using the definition will be refused and cannot be executed.
This issue will be resolved in future releases of Clara Deploy SDK.
21.6.11.Clara Orchestrator pipeline operator run times are incorrectly reported
Run times for Clara Orchestrator pipeline operators are incorrectly reported by the Platform API. The value reported is the length of time from when the first operator starts until the reported operator completes.
21.6.12.Job status for jobs using Argo orchestrator is incorrectly reported
For pipelines with API version 0.3.0, clara describe job
shows non-zero operators, but the status shows “UNKNOWN” and empty timestamps.
21.6.13.Job status for jobs using Clara Orchestrator without CPDriver is incorrectly reported
For pipelines with API version 0.4.0, clara describe job
shows a “PENDING” status even though the job is completed (stopped) if operators in the pipeline do not use CPDriver.
21.6.14.Failure to cancel downloading the operators/jobs payload
Clicking the cancel button when downloading the operators/jobs payload may not work, getting a flash.
The Clara Deploy SDK provides an extensible platform for designing and deploying AI-enabled medical imaging pipelines. A Clara pipeline consists of a collection of operators configured to work together to execute a workflow. These operators accelerate modality-specific computation and can be reused across different pipelines.
21.7.1.Reference Pipeline: Implementation Update for the Digital Pathology Pipeline
The following items are updated in the pipeline:
An issue where right/bottom corner tiles are broken in RenderServer UI is fixed.
The Filtering operator provides Zarr-based commands (
tile_image_zarr
,filter_image_zarr
, andstitch_image_zarr
), allowing the pipeline to use the Zarr format, instead of JPEG, for intermediate images.Internal source code has been refactored to facilitate adding various operators with different configurations.
The ArrayFire-based Canny edge filter is supported. You can select the filter method to use in the Filter stage using the
-f
or--filter
option in the operator (canny.canny_itk
for the ITK Canny filter andcanny.canny_af
for the ArrayFire Canny filter), or change the FILTER_METHOD parameter of the pipeline definition.
Note that the ArrayFire Canny edge filter doesn’t handle border cases well, so you will see border lines for each tile. We recommend using canny.canny_itk
, though the ITK Canny edge filter is slower than the ArrayFire filter.
21.7.2.Reference Pipeline: Implementation Update for the Multi-AI Pipeline
The Multi-AI pipeline is now integrated with the shared-memory service. A pipeline that supports multi-AI with shared memory (multiAI-mhd-shmem.yaml) is now part of the SDK. The shared memory service with Multi-AI (four operators) has accelerated the runtime of the Multi-AI pipeline by approximately 16% and reduced the disk space requirements by approximately 15% on an input data of size 950 MB.
21.7.3.Clara Deploy GPU Profiling Tool
The GPU Profiling tool is a browser-based application that enables monitoring and tracking of GPU utilization while a job is being executed on the Clara platform. In this release, the GPU Profiling tool supports three versions of the Multi-AI pipeline with four operators, including data selection, its execution, and visualization of end results and GPU tracking.
21.7.4.Reference Operator: New Base AI Inference Operator
The base AI inference operator is a customizable base container for deploying Clara Train pre-trained models. The new version of this operator supports Clara Train SDK Version 3: The transform functions and both the simple and scanning window inference logic are the same as those used in Clara Train SDK Version 3. The output writer, however, is still specific to Clara Deploy due to the need to support the Clara Deploy pipeline DICOM Writer and its results registration.
Similar to the previous version, to customize this base container, the inference or validation configuration file used during model training with Clara Train SDK 3.0 must be available. In addition, the trained models must be exported to a format compatible with Triton Inference Server, formerly known as TensorRT Inference Server.
21.7.4.1.Version information
This new base inference operator has following target environment:
Ubuntu 18.04
Python 3.6
NVIDIA TensorRT Inference Server Release 1.5.0, container version 19.08
21.7.5.Render Server: Area Measurement
Render Server now supports area measurement on 2D views, including original tomographic slices (of modalities such as CT and MR), multiplanar reformatted Views, and Digital Pathology Slides. The area and perimeter are provided in physical units if pixel spacing is included with the original data header.
21.7.6.Clara Console: Details for a Job
Clara Console now supports the Job Details View. To open the Job Details View for a job, click on a row for a job in Jobs List View. The Job Details View shows additional information about each job, such as a list of all operators with their current status and duration.
21.7.7.DICOM Adapter CLI
The DICOM Adapter CLI has been updated with new options to create, delete, and list SCP AE Titles and DICOM sources, as well as SCU DICOM destinations. Please refer to the DICOM CLI section for more information.
21.7.8.Clara Platform Server: Pod Cleaners and Job Controllers
Pod and Argo Workflow Cleaners have been added to Platform Server to periodically clean up the job pods that have completed or faulted after the pod reaches a certain age. Logs from deleted pods are retained and remain accessible.
To configure the cleaners, use the “podCleaner” section in the values.yaml
file in the platform helm charts. The cleaners are enabled by default and can be disabled by setting the “podCleanerEnable” option to false
.
You can adjust the frequency at which the cleaners attempt to delete the job pods by setting the appropriate numerical value for the “podCleanerFrequencyInMinutes” option (the default value is 30). You can also set the time to keep the job pods alive before they can be cleaned up by setting the appropriate numerical value for the “podCleanerBufferInMinutes” option (the default value is 90). Note that the numerical values for the “podCleanerFrequencyInMinutes” and “podCleanerBufferInMinutes” are set in minutes.
The Argo Job Controller has also been added to the Platform Server to monitor the status of Argo-orchestrated jobs and update job records accordingly.
21.7.9.Reference Operator DICOM Parser does not support DICOM files .DCM extension
The DICOM Parser operator has been updated to search for DICOM files matching extension .DCM (ignoring case).
21.7.10.Reference Operator DICOM Parser does not extract Series Description tag
The DICOM Parser has been updated to extract the Series Description tag and saves the “SeriesDescription” attribute in the metadata JSON file. The attribute value will be a blank string if the original DICOM file(s) does not contain the Series Description metadata tag.
21.7.11.Helm 2.15.2 upgrade
The updated bootstrap.sh
script now attempts to upgrade existing versions of Helm that are less than 2.15.2. Newer versions of Helm are not guaranteed to work with this release: If you have a newer version of Helm, uninstall it prior to executing the bootstrap.sh
script.
21.7.12.DICOM Adapter
The Timeout Grouping feature has been removed and no longer supported; this behavior may be accomplished by extending IFileProcessor.
The Clara Deploy SDK provides an extensible platform for designing and deploying AI-enabled medical imaging pipelines. A Clara pipeline consists of a collection of operators configured to work together to execute a workflow. These operators accelerate modality-specific computation and can be reused across different pipelines.
21.8.1.Reference Pipeline: Detection of COVID-19 in CT datasets
This reference pipeline computes the probability of COVID-19 infection using the lung CT scan of a patient. It makes use of two new AI operators: the first segments the Lung and the second classifies the segmented regions for presence or absence of COVID-19. The input to the pipeline is a single aixial DICOM series of a lung CT scan. The pipeline generates the following outputs:
A lung segmentation image in MetaImage format
A new DICOM series for the segmentation image, optionally sent to a DICOM device
Probabilities indicating COVID-19 and non-COVID-19 in CSV format
An issue related to the classification operator was fixed for this pipeline in the R6_2 release. The classification operator consumes an original CT dataset and a Lung segmentation mask object. When both the original and the segmentation datasets used the same file name, the pipeline was not producing expected results even though the datasets were stored in different folders.
21.8.2.Reference Pipeline: Usage of Shared Memory in Multi-AI CT Pipeline
The Shared Memory based FastIO interface is now used in the ROI-generator Split Operator in the Multi-AI Pipeline. The Base AI operator has been updated with the option to read and write shared memory.
21.8.3.Reference Operator: DICOM Parser Operator
This operator parses DICOM instances in the DICOM Part 10 file format to extract and save key DICOM metadata in a JSON file. It also converts DICOM pixel data for applicable series such as CT and MR axial into volume images and saves them in a configurable file format (e.g. MetaImage or NIfTI), along with a JSON file that maps the series instance UID to the file name. The output of this operator should be consumed by the downstream operators in the pipeline in order to intelligently select and arrange image files as input to their processing step (for example, in a multi-channel AI inference operator).
21.8.4.Reference Pipeline: Digital Pathology
This reference pipeline showcases how to tile, process, and stitch together large Digital Pathology pipelines. The pipeline contains the following operators:
Tiling: Loads a multi-resolution SVS file, tiles it, and then writes out the tiled images to disk.
Filtering: Loads each tile from disk, applies a Canny Edge Detector filter, overlays the resulting edges with blue lines on the original image, and saves the output to disk.
Stitching: Loads filtered tiles, stitches them, and saves the resultant image as a multi-resolution TIFF file.
Registering: Sends the final output data to a configured Render Server.
The pipeline features the following configurable elements:
The Tiling Operator can store the tiled images in JPEG format.
The Filtering Operator provides the following option for loading tiles:
Load each tile serially using the Python Pillow Library and then apply the Canny Edge detection filter to the tile.
Load multiple tiles at a time (multithreading) using the Pillow Library and then apply the Canny Edge detection filter to the tile.
Load multiple tiles at a time using the Pillow Library and the Python multiprocessing package, which utilizes subprocesses instead of threads.
Load tiles using DALI and perform edge detection. DALI makes use of nvJPEG internally to load the tiles.
21.8.5.Reference Pipeline: DICOM Series Selection Pipeline
This pipeline demonstrates how DICOM studies can be parsed with metadata preserved in the standards-based Clara Deploy JSON format, how the relevant series can be matched with simple series-matching rules, and how the converted image file(s) can be selected for the matched series. This pipeline can serve as the basis for developing pipelines that can intelligently validate input DICOM studies and choose the correct DICOM series for processing.
21.8.6.Render Server: Visualization of Multi-Resolution Data
Render Server now supports loading of multi-resolution data encoded in TIFF formats. Currently visible data is loaded on-demand, which makes it possible to view images with a data size larger than the available CPU memory. If a new region of the image needs to be displayed, first the image is scaled up from a coarser level, then previously fetched data is copied for that level, and finally any other missing data is asynchronously fetched.
21.8.7.Render Server: Supporting Color Data Type
The Render Server interface for configuring and specifying data has been generalized. Data order and dimensions are now freely configurable. Density and maks volumes are supported as before. The interface also supports 2D grayscale and color data with various pixel data types.
21.8.8.Render Server: Static Scale
A scale bar has been added to the bottom left of the viewer. This enables eyeballing/measuring the distance between two pixels.
21.8.9.Render Server: Distance Measurement
Render Server now supports distance measurement on 2D views including original tomographic slices (of modalities such as CT, MR), multi-planar reformatted Views, and Digital Pathology slides. The distance between two pixels is provided in physical units if pixel spacing is included as part of the original data header.
21.8.10.Render Server: Picture-in-Picture
Picture-in-picture support has been added to the top left of the viewer to indicate relative location of the displayed region within the whole image. This allows localization of the subregion of a currently visible portion of an image when a large zoom-in factor is used.
21.8.11.Configuring DICOM Adapter via REST APIs
New RESTful APIs are available to configure Clara (local) DICOM AE Titles, DICOM Source AE Titles, and DICOM Destination AE Titles. Refer to DICOM Adapter documentation for details.
21.8.12.Helm Upgrade
Helm version 2.15.2 is now required to run Clara and is now installed as part of the bootstrap.sh
script. If your system has Helm installed, and the version is less than 2.15.2, uninstall Helm before running boostrap.sh
.
21.8.13.Results Service
The Report Failures API now takes a JSON object instead of a simple Boolean. See the Results Service API for details.
The documentation now uses a default port of 8088 instead of 5000.
21.8.14.Jobs API Reports Incorrect Job State
The Clara Jobs API Details and List RPC report the incorrect state for jobs. This is due to the Clara Platform job controller becoming out of sync with the Kubernetes CRD for pipeline jobs. Once the controller is out of sync, jobs will be reported as PENDING
, HEALTHY
regardless of their actual state. The problem primarily affects Argo Workflow based pipeline jobs.
The only available workaround is to restart Clara Platform using clara platform stop
followed by clara platform start
. Optionally, kubectl get pods
will return the state of all Kubernetes pods. This information can be used to determine the state of pipeline jobs if the user querying the information understands how to identify specific jobs: Pods with a status of “completed” are STOPPED
and HEALTHY
, pods with a status of “errored” are STOPPED
and FAULTED
. Any other state should be considered RUNNING
and HEALTHY
.
21.8.15.Pipeline Services Conflict with Resource Manager
Non-deterministic behavior can result from deploying a mix of pipelines using both pipeline services and Resource Manager to manage resources. We recommend using one resource-management method across all pipelines for a given deployment of Clara Deploy.
21.8.16.Deploying Triton via Pipeline Services Consumes All GPUs
The behavior of the Triton Inference Server when deployed by pipeline servers (aka as part of a pipeline definition) does not change. This means that a single deployment of Triton Inference Server will be assigned to all available GPU resources in the system. This behavior is expected; however, because Resource Manager does not take into consideration resources consumed by pipeline services, it can oversubscribe individual GPU resources, which can lead to failures in pipeline job operators.
We recommend that pipelines not use both pipeline services and Resource Manager. Additionally, we do not recommend executing pipeline jobs that use Resource Manager in parallel with pipeline jobs that rely on pipeline services.
21.8.17.Triton Inference Server Consumes Memory of Unassigned GPU(s)
Triton Inference Server allocates memory on all GPU resources in the system, even when assigned to a single GPU resource. This can cause memory-pressure issues, including out-of-memory errors. This issue can happen with any setup of Triton Inference Server, but has primarily been reported when Triton is instructed to load models that request more GPU resources than are assigned to Triton. There is currently no workaround for this issue.
21.8.18.Deleted Jobs Listed and/or Deleting Jobs with CLI Can Error
When using the Clara CLI to delete jobs, it is possible to see the following error message Error: Code: -8452, Failed to cancel job
. This error is caused by corruption of the Clara Platform Server internal state due to job records being overwritten with the incorrect state. This can cause completed or canceled jobs to revert to pending or unknown states.
There’s currently no workaround, but the problem should not impact execution of inference pipelines using Clara Deploy.
21.8.19.Corrupted Tiles in the Output of the Digital Pathology Image Processing Pipeline
Right edge tiles in the result image (.tif
) are corrupted, causing the RenderServer UI to show black lines on the right side. This issue will be addressed in the next release (R6_3).
The Clara Deploy SDK provides an extensible platform for designing and deploying AI-enabled medical imaging pipelines. A Clara pipeline consists of a collection of operators configured to work together to execute a workflow. These operators accelerate modality-specific computation and can be reused across different pipelines.
21.9.1.Platform
21.9.1.1.Strongly Typed Operator Interface
Clara Deploy SDK now supports pipeline composition using operators that conform to a signature, or well-defined interface. This enables the following:
Pre-runtime validation of pipelines
Compatibility of concatenated operators in terms of data type (where specified)
Allocation of memory for the pipeline using FastIO via CPDriver
21.9.1.2.Scheduler
The Clara platform now has a scheduler for managing resources allocated to the platform for executing pipeline jobs, as well as other resources such as render servers. The scheduler is responsible for queuing and scheduling pipeline job requests based on available resources. When the system has insufficient resources to fulfill the requirements of a queued job, the scheduler will retain the pending job until sufficient resources become available.
21.9.1.3.Model Repository
Clara Deploy now offers management of AI models for instances of Inference Server. The following aspects of model management are available:
Store and manage models locally through user inputs.
Create and manage Model Catalogs.
21.9.1.4.Distribution of Clara Deploy on NGC
Clara Deploy can now be easily installed over NGC, allowing for flexible installation options. Once the core components are installed, you may install over twenty reference pipelines easily with the Clara CLI. This eliminates the need for downloading a 20GB ZIP file.
21.9.1.5.CLI Load Generator
The load generator simulates hospital workloads by feeding the Clara Platform with a serial workload: For example, “I want to feed two CT studies per second to Pipeline xx-yy-zz on the Clara platform”. The Load Generator CLI allows users to specify the pipeline(s) used to create the jobs, the data set(s) used as input for the jobs, and other options:
The number of jobs to create
The frequency at which to create jobs
The type of dataset (sequential or non-sequential)
The priority of jobs
21.9.2.Application
21.9.2.1.Fast I/O Integrated with Clara Platform Driver
The integrated Fast I/O feature provides an interface to memory resources for all operators running in the same pipeline. These memory resources can be used for efficient, zero-copy sharing and passing of data between operators. Fast I/O allocations can be assigned optional metadata to describe the resource, such as data type and array size. The metadata and the allocation it describes can be easily passed between operators using string identifiers.
21.9.2.2.Histopathology Pipeline
The Breast Classification pipeline makes use of a pre-trained model for classification of breast images. This pipeline depends on the Clara Deploy CLI to send a PNG image and trigger a job. After an image is classified, the operator saves the output as a new image with the classification label burnt-in on top of the image. If the class category of a specific image is “cancerous”, the operator burns in the letter “T” to the upper left corner of the output image; otherwise, the letter “F” is written out.
21.9.2.3.Prostate Segmentation Pipeline
This pipeline ingests a single-channel MR dataset of a prostate and provides segmentation of the prostate anatomy. The pipeline generates three outputs:
A DICOM RT Structure Set instance in a new series of the original study, optionally sent to a configurable DICOM device.
A binary mask in a new DICOM series of the original study, optionally sent to the same DICOM device as above.
The original and segmented volumes in MetaImage format, sent to the Clara Deploy Render Server for visualization on the Clara Dashboard.
21.9.2.4.Multi-AI Pipeline
The Multi-AI pipeline takes a single CT volumetric dataset and splits it into multiple Region of Interests (ROIs). These ROIs are then fed into respective AI operators. Results from AI operators are finally merged into a single volume. Operators for segmenting Liver tumor, Lung tumor, Colon tumor, and Spleen are used in this pipeline.
21.9.2.5.De Novo Assembly Pipeline
This is a reference pipeline that makes use of Clara Genomic Analysis tools to assemble genomes with Clara Deploy SDK. These tools exploit the abilities of the GPU to accelerate gene sequencing.
21.9.2.6.3D Cropping Pipeline using Shared Memory
The 3D Image Processing pipeline accepts a volume image in MetaImage format and optionally accepts parameters for cropping. The output is the cropped volume image, and the image is published to the Render Server so that it can be viewed on a web browser. It makes use of shared memory among all operators to pass voxel data around.
21.9.2.7.Deep Stream Batch Pipeline
The DeepStream Batch pipeline makes use of an organ-detection model running on top of DeepStream SDK, which provides a reference application. It accepts an .mp4 file in H.264 format and performs object detection to find stomach and intestines from the input video. The output of the pipeline is a rendered video (bounding boxes with labels are overlaid on top of the original video) in H.264 format (“output.mp4”), as well as the primary detector output in a modified KITTI metadata format (.txt files).
21.9.2.8.DICOM Parser Operator
A new DICOM Parser Operator is available in this release. Given an input directory containing one or more DICOM studies, this operator can parse important DICOM attributes from the studies and persist them in a JSON file. In addition, this operator can be configured to export a volumetric file containing pixel data in NIFTI or MHD format.
21.9.3.Render Server
21.9.3.1.Original Slice Rendering
The Render Server can now display original slices in addition to volume-rendered views.
21.9.3.2.Touch Support for Render Server
On a touch-friendly device, users can now interact with rendered views using gestures.
21.9.3.3.Visualization for Segmentation Masks on Original Slices
Segmentation masks can now be displayed on any rendered view of a volume. The color and opacity of such masks are controlled via the corresponding transfer functions.
21.9.3.4.Oblique Multiplanar Reformatting
This feature enables reformatting of the original slices along an arbitrary plane of orientation. For example, axial slices can be reformatted with sagittal and/or coronal planes. An oblique slice is displayed within the context of a colored axis cube. The view can be rotated and the displayed slice can be interactively modified.
21.9.4.Management Console
This release features a new management console that can be used to administer pipelines and jobs registered with the Clara Deploy Platform. In this release, users can view a list of pipelines containing information such as the pipeline name, date when registered, and number of queued jobs that were instantiated from the pipeline. Similarly, in the Jobs view, users can see a list of jobs containing information such as job status, priority, ID, start time, and duration.
21.9.5.Jobs API Reports Incorrect Job Information for Argo Based Pipelines
The Clara Jobs API reports that incorrect timestamp values returned by the Clara Jobs API are invalid for pipelines that rely on Argo workflow orchestration.
Pipelines with api-version <= 0.3.0
or which specify orchestrator: Argo
rely on Argo workflow orchestration.
21.9.6.Clara Deploy SDK from NGC Only Uses a Single GPU
By default, when deploying from NVIDIA GPU Cloud [NGC] (https://ngc.nvidia.com/), an instance of Clara Deployment SDK is only configured to utilize a single GPU regardless of how many are physically available.
This can be adjusted by modifying the values.yaml
(~/.clara/charts/clara/values.yaml
) file and changing the availableGpus
value from 1
to the number of GPUs the instance should be utilizing. The value of availableGpus
must be an integer value; otherwise the platformapiserver
container will fail to start correctly.
21.9.7.Pipeline Services Conflict with Resource Manager
Deploying a mix of pipelines using both pipeline services and Resource Manager to manage resources is non-deterministic. It is recommended that one strategy is chosen to be used across all pipelines for a given deployment of Clara Deploy.
21.9.8.Deploying Triton via Pipeline Services Consumes All GPUs
The behavior of Triton Inference Server when deployed by pipeline servers (aka as part of a pipeline definition) is unchanged. This means that a single deployment of Triton Inference Server will be assigned to all available GPU resources in the system. This behavior is expected, however because Resource Manager does not take into consideration resources consumed by pipeline services, individual GPU resources can be oversubscribed which can lead to failures in pipeline job operators.
It is recommended that pipelines not use both pipelines services and Resource Manager. Additionally, it is recommended that pipeline jobs which use Resource Manager are not executed in parallel to pipeline jobs which rely on pipeline services.
21.9.9.Argo Workflow Support is Deprecated
Starting with Clara Deploy R5 (0.5.0) Argo based workflow orchestration and the corresponding Argo UI are deprecated. In Clara Deploy R6 (0.6.0), support for Argo based workflow orchestration and the corresponding Argo UI will be removed.
Clara Deploy users are advised to make use of Clara Pipeline Driver [CPDriver] based orchestration for any new pipelines, as well as modify to old pipelines. CPDriver-based orchestration is the default if the pipeline definition specifies an API version of 0.4.0 or higher.
21.9.10.Support for Pipeline Definitions < 0.4.0 is Deprecated
Starting with Clara Deploy R5 (0.5.0) support for pipeline definitions with a specified api-version less than 0.4.0 is deprecated. Support for pipeline definitions with a specified api-version less than 0.4.0 will be removed in Clara Deploy R6 (0.6.0). This is in alignment with the deprecation of support for Argo based workflow orchestration.
21.9.11.Pipeline Services are Deprecated
Starting with Clara Deploy R5 (0.5.0) support for pipeline services, or services defined as part of a pipeline definition, is deprecated. Support for pipeline services is expected to be removed as part of Clara Deploy R6 (0.6.0).
Fix errors when accessing properties of CP Driver’s driver or payload object in Python.
The Clara Deploy SDK provides an extensible platform for designing and deploying AI-enabled medical imaging pipelines. A Clara pipeline consists of a collection of operators configured to work together to execute a workflow. These operators accelerate modality-specific computation and can be reused across different pipelines.
21.11.1.Clara Pipeline Driver
Clara Pipeline Driver (CP Driver) provides a new mechanism for pipeline orchestration. Clients can make use of the CP driver using C, C#, and Python. Application developers need to use a base application container to design their operators. They can import the Clara library in their app and provide an implementation for the execute callback to make the operator compliant. An application developer can also query the Clara Pipeline Driver for a list of input/output paths for a given operator.
21.11.2.Orchestration in two modes
Clara R3 supported orchestration of pipelines using Argo. Users who would prefer to continue using Argo-based orchestration can do so even in this release. This is handy for users who want to visualize the pipeline execution in the Argo dashboard. CPDriver-based orchestration is the default if the pipeline definition specifies an API version of 0.4.0 or higher; otherwise, Argo is the default orchestration. Starting with Clara 0.4.0, the orchestration used can be overridden by specifying the orchestrator and choosing either “Clara” for CPDriver-based orchestration or “Argo” for Argo based orchestration.
21.11.3.File Adapter
Till R3, Clara only supported starting a job with payloads packaged as DICOM network objects. This release supports starting a job manually with payloads that are available as files on disk.
21.11.4.CLI Improvements
This release enables users to download input, intermediate results or the final outputs resulting from one or more jobs. It also enables deleting a job manually while retaining any important metadata related to a job even after it is deleted.
21.11.5.Centralized Logging
This feature enables a mechanism to log the application level and platform level data in a unified way. It enables the collection of performance metrics and allows saving metrics after the job is done executing. It also allows querying metrics based on job id and application metadata.
21.11.6.Monitoring Performance
This feature provides an interface for collecting system metrics related to GPU, CPU, and Disk. It also allows correlations between system metrics and operator-execution-specific information.
21.11.7.Shared Memory Context
The feature preview for Shared Memory Context provides an interface to memory resources that are accessible by all operators running in the same pipeline. These memory resources can be used for efficient, zero-copy sharing and passing of data between operators. Fast I/O allocations can be optionally assigned metadata to describe the resource, such as data type and array size, and this metadata and the allocation they describe can be easily passed between operators using string identifiers.
21.11.8.Optimized CT Recon Pipeline
The CT Recon Pipeline ingests a set of CT projection images and configuration parameters as input. It uses GPU based Iterative FDK to reconstruct CT slices. The output is a CT volumetric series. The implementation is based on the open-source reconstruction toolkit (Open RTK). During R4, this implementation was optimized. Some of the major techniques used to optimize this pipeline are:
Asynchronous transfer
Device to Device memory transfer optimization
Porting Filters to GPU
Extract Image Filter
Ray Box Filter
Divide, Subtract, Multiply filter
Minimize Host to Device communication.
Forward projection filter optimization
Back projection filter optimization
Adaptive Input projection size
Optimized memory usage (CNMEM)
21.11.9.Optimized Liver Tumor Segmentation Pipelines
The Liver Tumor Segmentation Pipeline ingests CT images containing tomographic liver slices. It segments the liver anatomy and any tumors found in the liver. The final output is a segmentation mask object. During R4, this implementation was optimized. Some of the major techniques used to optimize this pipeline are:
GPU Optimized Scale Intensity Transform
GPU Optimized Resample Volume Transform
Dynamic Batching using TensorRT Inference Server
Multi-Threaded Inference
Asynchronous Inference with callbacks
21.11.10.Render Server Improvements
In this release, the Clara Deploy Render server offers better performance and higher image quality. During interactive rendering, it improves image quality by reducing the noise. It supports the OptiX AI denoising filter (requires Volta/Turing, driver >= 435 and Docker >= 19.03). It also uses higher-order interpolation (B-Spline) and adaptive refinement during ray marching. It uses an acceleration structure to skip empty volume regions.
21.11.11.Automatic Payload Cleanup
Clara Deploy now can be configured to automatically clean up any processed data after the pipeline completes. This includes data sent to Clara and data generated by Clara.
21.11.12.New Pipelines
This release comes bundled with the following new reference pipelines.
MR Hippocampus Segmentation
MR Brain Tumor Segmentation
CT Lung Tumor Segmentation
CT Spleen Segmentation
CT Colon Tumor Segmentation
CT Pancreas Tumor Segmentation
Microscopy Malaria classification
Chest X-Ray Classification
Hardware Qualification
The Clara Deploy SDK 0.2.0 provides the following capability:
A new pipeline orchestration engine.
A pipeline definition specification.
New platform gRPC based API to create pipelines, trigger jobs, and upload/download payloads.
The DICOM Adapter has been updated to this new platform.
The Render Server has been updated to this new platform.
The Clara Deploy SDK is not backward compatible and therefore the pipelines created with the previous versions have to be migrated to the new pipeline definition.
The Render Server is now enabled in the dashboard.
Installation of pre-requisites no longer deletes the current Docker configuration.
Deployment of Clara Deploy SDK via Helm Charts and Kubernetes.
Pipeline Client API provides integration for containers that need to be part of a pipeline. The Pipeline Client API supports:
Publish Study
Send to TRTIS
DICOM Adapter provides an integration point for a service such as a PACS server. It reads and writes DICOM data and pushes it to Clara Core to get a pipeline started.
Clara Core provides handling of the pipeline. Clara Core supports running a single pipeline at a time. New pipelines require a new deployment. Clara Core supports:
TRTIS as a service
Reference application is available to describe how to run in Clara Deploy SDK.
The Clara Deploy SDK no longer supports running pipelines with docker-compose.
The Clara Deploy SDK no longer supports the Clara Inference Client API.
The following are known issues in this release of Clara Deploy SDK
21.16.1.Installation error with message
[ERROR Port-XXXXX]: Port XXXXX is in use
It is possible that your machine has minikube
or microk8s
installed. Please remove those k8s distributions and install the prerequisites (sudo ./bootstrap.sh
) again.
In the case of microk8s
being installed by snap
, you can find if it is installed or not by executing snap list
$ snap list
Name Version Rev Tracking Publisher Notes
...
microk8s v1.15.0 671 stable canonical✓ classic
And, can remove it by executing the following commands
microk8s.reset
sudo snap remove microk8s
21.16.2.Installation error with message
/var/lib/etcd is not empty
If during the installation of the prerequisites, a failure occurs with the message /var/lib/etcd is not empty
, try removing this folder, uninstall the prerequisites(sudo ./uninstall-prereqs.sh
) and re-run the prerequisites installation(sudo ./bootstrap.sh
).
21.16.3.Installation error with coreDNS pod failures
If after the installation, an error occurs and the coredns pods of Kubernetes are in CrashLoopBackOff
state, the workaround is documented here https://stackoverflow.com/questions/52645473/coredns-fails-to-run-in-kubernetes-cluster.
21.16.4.Installation error with space in the path
If the installer is downloaded to a directory that contains a space, the installation will fail. Move the installer to a directory that does not contain space in the path.
Due to Kubelet’s garbage collection feature,
Kubelet(‘node agent’ that runs on each node of k8s cluster) will perform garbage collection for containers every minute and garbage collection for images every five minutes.
Once disk usage exceeds the threshold (default: 85%), Kubelet will free (remove) container images until usage is below the threshold (default: 80%).
The user needs to make sure that the percent of disk usage in the VM is lower than 85% so that the necessary images for Clara Deploy SDK won’t be deleted locally.
If the internet connection is provided through an HTTP Proxy server, docker containers cannot access the internet while building docker images or running containers.
Even if proxies for Docker is set up properly, Clara Operators cannot access other Kubernetes services such as TRTIS because Kubernetes is using a specific service CIDR (default to ‘10.96.0.0/12’) and Clara Deploy is set up to use ‘10.244.0.0/16’ as a pod network CIDR of the Kubernetes node (defined in bootstrap.sh
script). Docker needs to be configured to not use proxies for those IP addresses.
To address those issues, you need to add/update proxies
key in the ~/.docker/config.json
file (if the file doesn’t exist, create it) like below (assuming that proxy server’s address is http://proxy.xxxx.edu:8080
) for Docker to have proper proxy settings (See https://docs.docker.com/network/proxy/ for the detailed information):
{
"proxies": {
"default":
{
"httpProxy": "http://proxy.xxxx.edu:8080",
"httpsProxy": "http://proxy.xxxx.edu:8080",
"noProxy": "127.0.0.1,10.96.0.0/12,10.244.0.0/16"
}
}
}
21.16.5.Recon operator does not exit with the right error code
Recon operator doesn’t catch the subprocess’s exit code properly so the execution of the recon operator is shown as ‘Success’ even if its execution has failed. This behavior confuses the user because the user may think that the subsequent operator has the root cause of the pipeline failure rather than the Recon operator.
Please also check Recon operator’s log messages when the subsequent operator is shown failed wihout any error in its execution log.
This issue will be fixed in the next release.
21.16.6.Minimum opacity value of the transfer function editor cannot be changed (since 0.4.0)
The minimum value of the opacity value in the transfer function editor is set to 0.1 for now.
For now, the only way to set the value that is below 0.1 is for the user to change config_render.json
file manually in the Render Service’s dataset folder (/clara-io/datasets/<dataset name>
).
21.16.7.Intensity Range Selectors in Transfer Function Editor are not displayed properly (since 0.4.0)
When the transfer function editor component is scrolled downwards, the intensity range selectors are overlayed onto the other sub-components. This issue will be fixed in the next release.
21.16.8.White empty viewport
After selecting other datasets or after going to <IP of the machine>:8080
, the viewport could be empty. Refreshing the browser usually solves the rendering problem.
21.16.9.Session management
The Render service only supports one single session. The last connected user will grab the session.
21.16.10.Changing datasets results in “Unable to reach RenderServer”
After changing the dataset, the user may probably see the error message Unable to reach RenderServer! Please restart your container
.
If the issue persists after refreshing the browser, the workaround for this issue is to restart the Render Service by running the following command:
clara render stop
clara render start
We are interested in your feedback and any bugs found while using Clara Deploy SDK.
Post questions, feedback and bugs in the member-only forums: https://devtalk.nvidia.com/default/board/362/clara-sdk/ Note: New forum accounts may take one business day to reflect new memberships.
For any problems related to this developer program please use the general contact form: https://developer.nvidia.com/contact.