9.26. Digital Pathology Image Processing Operator

9.26.1. Overview

Digital Pathology Image Processing Operator is a reference application that makes use of Clara Pipeline Driver and OpenSlide for for Digital Pathology image processing.

This application, in the form of a Docker container, is expected to work with Clara (CPDriver) orchestrator engine to use FastIO’s features so it wouldn’t work as standalone with docker.

The main code is available at /app/main.py and it is executed with parameters inside the container like below:

/bin/bash -c 'python -u /app/main.py <command name>'
usage: main.py [-h] [-d DEBUG_LEVEL] [--input-path INPUT_PATH]
                             [--output-path OUTPUT_PATH]
                             [--config-path CONFIG_PATH]
                             [-f {canny.canny_itk,canny.canny_af}]
                             [--variance VARIANCE]
                             [--lower-threshold LOWER_THRESHOLD]
                             [--upper-threshold UPPER_THRESHOLD] [--host HOST]
                             [-w NUM_WORKERS]
                             [--block-size-limit BLOCK_SIZE_LIMIT]
                             [-t TILE_SIZE] [-o OVERLAP]
                             command

positional arguments:
  command               Command to execute

optional arguments:
  -h, --help            show this help message and exit
  -d DEBUG_LEVEL, --debug-level DEBUG_LEVEL
                        Set debug level (e.g., 'INFO', 'DEBUG')
  --input-path INPUT_PATH
                        Input folder path. Default is '/input'
  --output-path OUTPUT_PATH
                        Output folder path. Default is '/output'
  --config-path CONFIG_PATH
                        Config folder path. Default is '/config'
  -f {canny.canny_itk,canny.canny_af}, --filter {canny.canny_itk,canny.canny_af}
                        Filter name (e.g., 'canny.canny_itk' or
                        'canny.canny_af')
  --variance VARIANCE   Variance for Canny filter. Default is 10
  --lower-threshold LOWER_THRESHOLD
                        LowerThreshold for Canny filter. Default is 0.01
  --upper-threshold UPPER_THRESHOLD
                        UpperThreshold for Canny filter. Default is 0.03
  --host HOST           Host address on which the scheduler will listen. If
                        empty, it creates a local cluster and run workers in
                        the same process to avoid inter-worker communication.
                        Default is '0.0.0.0'
  -w NUM_WORKERS, --num-workers NUM_WORKERS
                        Number of workers (default: # of cpus)
  --block-size-limit BLOCK_SIZE_LIMIT
                        Dask block's size limit (e.g., '1kiB', '1 GB', '1e6
                        kB'. default: '100MB')
  -t TILE_SIZE, --tile-size TILE_SIZE
                        Tile size. Default is 224
  -o OVERLAP, --overlap OVERLAP
                        Overlap size. Default is 0

According to the <command>, it does a different job and each command acts as a stage in the pipeline.

--variance, --lower-threshold, and --upper-threshold can be provided to change the parameters of Canny Edge Detection Filter of ITK or ArrayFire. The filter method that can be used in Filter stage can be selected by -f or --filter option (canny.canny_itk for ITK’s canny filter and canny.canny_af for ArrayFire’s canny filter).

9.26.2. Commands

9.26.2.1. process_image_no

This executes all the operations (load/filter/stitch) at once.

9.26.2.1.1. Input

Input requires a folder (mounted at /input folder inside the container) containing the following files:

  • .tif or .svs - Input image file

  • config_render.json - Configuration for Render Server

9.26.2.1.2. Output

The following files would be stored at /output folder inside the container:

  • image.tif - Output image file

  • config.meta - Metadata for Render Server

  • config_render.json - Configuration for Render Server

9.26.2.2. tile_image_jpg

This loads a tiled/multi-resolution image and saves tiles into multiple JPEG files. This doesn’t use the multithreading/multiprocessing to process the image.

9.26.2.2.1. Input

Input requires a folder (mounted at /input folder inside the container) containing the following files:

  • .tif or .svs - Input image file

9.26.2.2.2. Output

The following files would be stored at /output folder inside the container:

  • .jpg files - Output tiled image files

    • File name is "tile_l{:02d}_x{:04d}_y{:04d}.jpg".format(level, x, y) where level is a resolution level (0 is highest) and x/y is the tile’s position inside the image. Tile size of 512 is used.

  • tile_info.json - Metadata used for loading the tiled/chunked images

9.26.2.3. tile_image_jpg_chunk

This is the same with tile_image_jpg but each tiled image size (chunk size) is calculated as follows:

the largest value which is multiple of tile size (--tile-size) and whose data size is less than the block size limit (--block-size-limit)

For example, if the tile size is 224, 5600 (= 224 x 25) is set to the chunk size as 5600 is multiple of 224 and 5600 x 5600 x 3 (channels) = 94080000 which is close to (but equal or less than) 100,000,000 (100MB).

9.26.2.4. tile_image_zarr

This loads a tiled/multi-resolution image and saves tiles into the folder in Zarr format. This doesn’t use the multithreading/multiprocessing to process the image.

9.26.2.4.1. Input

Input requires a folder (mounted at /input folder inside the container) containing the following files:

  • .tif or .svs - Input image file

9.26.2.4.2. Output

The following files would be stored at /output folder inside the container:

  • tiled_image.zarr folder - Output Zarr folder

    • levels/<index> group has the chunk data for each level (0 is the highest resolution)

  • tile_info.json - Metadata used for loading the tiled/chunked images

The folder’s structure looks like below:

tiled_image.zarr
└── levels
    ├── 0
    │   ├── 0.0.0
    │   ├── 0.1.0
    │   ├── 0.2.0
     ...
    │   └── 4.3.0
    ...
    ├── 2
    │   ├── 0.0.0
    │   └── 1.0.0
    └── 3
        └── 0.0.0

The chunk size is calculated as follows:

the largest value which is multiple of tile size (--tile-size) and whose data size is less than the block size limit (--block-size-limit)

For example, if the tile size is 224, 5600 (= 224 x 25) is set to the chunk size as 5600 is multiple of 224 and 5600 x 5600 x 3 (channels) = 94080000 which is close to (but equal or less than) 100,000,000 (100MB).

9.26.2.5. filter_image_jpg_serial

This loads tiled JPEG images and filters each image with Canny Edge Detection Filter, then saves each processed tiled image into /output folder in JPEG format.

9.26.2.5.1. Input

Input requires a folder (mounted at /input folder inside the container) containing the following files:

  • .jpg files - Input tiled image files

  • tile_info.json - Metadata used for loading the tiled/chunked images

9.26.2.5.2. Output

The following files would be stored at /output folder inside the container:

  • .jpg files - Output filtered/tiled image files (prefix with filtered_)

  • tile_info.json - Metadata used for loading the tiled/chunked images

9.26.2.6. filter_image_jpg_multithreading

This loads tiled JPEG images and filters each image with Canny Edge Detection Filter, then saves each processed tiled image into /output folder in JPEG format.

It uses multi-threading to process images.

9.26.2.6.1. Input

Input requires a folder (mounted at /input folder inside the container) containing the following files:

  • .jpg files - Input tiled image files

  • tile_info.json - Metadata used for loading the tiled/chunked images

9.26.2.6.2. Output

The following files would be stored at /output folder inside the container:

  • .jpg files - Output filtered/tiled image files (prefix with filtered_)

  • tile_info.json - Metadata used for loading the tiled/chunked images

9.26.2.7. filter_image_jpg_multiprocessing

This loads tiled JPEG images and filters each image with Canny Edge Detection Filter, then saves each processed tiled image into /output folder in JPEG format.

It uses multi-processing to process images.

9.26.2.7.1. Input

Input requires a folder (mounted at /input folder inside the container) containing the following files:

  • .jpg files - Input tiled image files

  • tile_info.json - Metadata used for loading the tiled/chunked images

9.26.2.7.2. Output

The following files would be stored at /output folder inside the container:

  • .jpg files - Output filtered/tiled image files (prefix with filtered_)

  • tile_info.json - Metadata used for loading the tiled/chunked images

9.26.2.8. filter_image_jpg_dali

This loads tiled JPEG images and filters each image with Canny Edge Detection Filter, then saves each processed tiled image into /output folder in JPEG format.

It uses multi-thread and multi-batch for loading images by using NVIDIA DALI, and it processes images by using multi-processing.

9.26.2.8.1. Input

Input requires a folder (mounted at /input folder inside the container) containing the following files:

  • .jpg files - Input tiled image files

  • tile_info.json - Metadata used for loading the tiled/chunked images

9.26.2.8.2. Output

The following files would be stored at /output folder inside the container:

  • .jpg files - Output filtered/tiled image files (prefix with filtered_)

  • tile_info.json - Metadata used for loading the tiled/chunked images

9.26.2.9. filter_image_jpg_dali_chunk

This is the same with filter_image_jpg_dali but it assumes that the input image size is bigger than the specified tile size (so chunked) to avoid very large partitions. Since chunk size is multiple of the tile size, the loaded chunk image is partitioned into tile image in memory and provided to the Canny edge detection filter.

9.26.2.10. filter_image_zarr

This loads the Zarr file and filters each tile image with Canny Edge Detection Filter, then saves each processed tiled image into /output folder in the same Zarr format used in tile_image_zarr command.

It uses multi-processing to process each tile.

9.26.2.10.1. Input

Input requires a folder (mounted at /input folder inside the container) containing the following files:

  • .zarr folder - Input chunked image data in Zarr format

  • tile_info.json - Metadata used for loading the tiled/chunked images

9.26.2.10.2. Output

The following files would be stored at /output folder inside the container:

  • filtered.zarr folder - Output Zarr folder

    • levels/<index> group has the chunk data for each level (0 is the highest resolution)

  • tile_info.json - Metadata used for loading the tiled/chunked images

9.26.2.11. stitch_image_jpg

This loads filtered/tiled JPEG images and create a big tiff file (image.tif) by using the metadata (tile_info.json) and stitching input images.

9.26.2.11.1. Input

Input requires two folders (mounted at /input and /config folder inside the container) containing the following files:

/input folder

  • .jpg files - Input filtered/tiled image files

  • tile_info.json - Metadata used for loading the tiled/chunked images

/config folder

  • config_render.json - Configuration for Render Server

9.26.2.11.2. Output

The following files would be stored at /output folder inside the container:

  • image.tif - Output image file

  • config.meta - Metadata for Render Server

  • config_render.json - Configuration for Render Server

9.26.2.12. stitch_image_jpg_chunk

This is the same with stitch_image_jpg but this assumes the input images are chunked images processed by filter_image_jpg_dali_chunk command.

9.26.2.13. stitch_image_zarr

This loads filtered/chunked image data and create a big tiff file (image.tif) by using the metadata (tile_info.json) and stitching input images.

9.26.2.13.1. Input

Input requires two folders (mounted at /input and /config folder inside the container) containing the following files:

/input folder

  • .zarr folder - Input chunked image data in Zarr format

  • tile_info.json - Metadata used for loading the tiled/chunked images

/config folder

  • config_render.json - Configuration for Render Server

9.26.2.13.2. Output

The following files would be stored at /output folder inside the container:

  • image.tif - Output image file

  • config.meta - Metadata for Render Server

  • config_render.json - Configuration for Render Server