10.26. Digital Pathology Image Processing Operator
Digital Pathology Image Processing Operator is a reference application that makes use of Clara Pipeline Driver and OpenSlide for for Digital Pathology image processing.
This application, in the form of a Docker container, is expected to work with Clara (CPDriver) orchestrator engine to use FastIO’s features so it wouldn’t work as standalone with docker.
The main code is available at /app/main.py
and it is executed with parameters inside the container like below:
/bin/bash -c 'python -u /app/main.py <command name>'
usage: main.py [-h] [-d DEBUG_LEVEL] [--input-path INPUT_PATH]
[--output-path OUTPUT_PATH]
[--config-path CONFIG_PATH]
[-f {canny.canny_itk,canny.canny_af}]
[--variance VARIANCE]
[--lower-threshold LOWER_THRESHOLD]
[--upper-threshold UPPER_THRESHOLD] [--host HOST]
[-w NUM_WORKERS]
[--block-size-limit BLOCK_SIZE_LIMIT]
[-t TILE_SIZE] [-o OVERLAP]
command
positional arguments:
command Command to execute
optional arguments:
-h, --help show this help message and exit
-d DEBUG_LEVEL, --debug-level DEBUG_LEVEL
Set debug level (e.g., 'INFO', 'DEBUG')
--input-path INPUT_PATH
Input folder path. Default is '/input'
--output-path OUTPUT_PATH
Output folder path. Default is '/output'
--config-path CONFIG_PATH
Config folder path. Default is '/config'
-f {canny.canny_itk,canny.canny_af}, --filter {canny.canny_itk,canny.canny_af}
Filter name (e.g., 'canny.canny_itk' or
'canny.canny_af')
--variance VARIANCE Variance for Canny filter. Default is 10
--lower-threshold LOWER_THRESHOLD
LowerThreshold for Canny filter. Default is 0.01
--upper-threshold UPPER_THRESHOLD
UpperThreshold for Canny filter. Default is 0.03
--host HOST Host address on which the scheduler will listen. If
empty, it creates a local cluster and run workers in
the same process to avoid inter-worker communication.
Default is '0.0.0.0'
-w NUM_WORKERS, --num-workers NUM_WORKERS
Number of workers (default: # of cpus)
--block-size-limit BLOCK_SIZE_LIMIT
Dask block's size limit (e.g., '1kiB', '1 GB', '1e6
kB'. default: '100MB')
-t TILE_SIZE, --tile-size TILE_SIZE
Tile size. Default is 224
-o OVERLAP, --overlap OVERLAP
Overlap size. Default is 0
According to the <command>
, it does a different job and each command acts as a stage in the pipeline.
--variance
, --lower-threshold
, and --upper-threshold
can be provided to change the parameters of Canny Edge Detection Filter of ITK or ArrayFire.
The filter method that can be used in Filter stage can be selected by -f
or --filter
option (canny.canny_itk
for ITK’s canny filter and canny.canny_af
for ArrayFire’s canny filter).
10.26.2.1.
process_image_no
This executes all the operations (load/filter/stitch) at once.
10.26.2.1.1.Input
Input requires a folder (mounted at /input
folder inside the container) containing the following files:
.tif or .svs - Input image file
config_render.json - Configuration for Render Server
10.26.2.1.2.Output
The following files would be stored at /output
folder inside the container:
image.tif - Output image file
config.meta - Metadata for Render Server
config_render.json - Configuration for Render Server
10.26.2.2.
tile_image_jpg
This loads a tiled/multi-resolution image and saves tiles into multiple JPEG files. This doesn’t use the multithreading/multiprocessing to process the image.
10.26.2.2.1.Input
Input requires a folder (mounted at /input
folder inside the container) containing the following files:
.tif or .svs - Input image file
10.26.2.2.2.Output
The following files would be stored at /output
folder inside the container:
.jpg files - Output tiled image files
File name is
"tile_l{:02d}_x{:04d}_y{:04d}.jpg".format(level, x, y)
wherelevel
is a resolution level (0 is highest) andx
/y
is the tile’s position inside the image. Tile size of 512 is used.
tile_info.json - Metadata used for loading the tiled/chunked images
10.26.2.3.
tile_image_jpg_chunk
This is the same with tile_image_jpg
but each tiled image size (chunk size) is calculated as follows:
the largest value which is multiple of tile size (--tile-size
) and whose data size is less than the block size limit (--block-size-limit
)
For example, if the tile size is 224, 5600 (= 224 x 25) is set to the chunk size as 5600 is multiple of 224 and 5600 x 5600 x 3 (channels) = 94080000 which is close to (but equal or less than) 100,000,000 (100MB).
10.26.2.4.
tile_image_zarr
This loads a tiled/multi-resolution image and saves tiles into the folder in Zarr format. This doesn’t use the multithreading/multiprocessing to process the image.
10.26.2.4.1.Input
Input requires a folder (mounted at /input
folder inside the container) containing the following files:
.tif or .svs - Input image file
10.26.2.4.2.Output
The following files would be stored at /output
folder inside the container:
tiled_image.zarr folder - Output Zarr folder
levels/<index>
group has the chunk data for each level (0 is the highest resolution)
tile_info.json - Metadata used for loading the tiled/chunked images
The folder’s structure looks like below:
tiled_image.zarr
└── levels
├── 0
│ ├── 0.0.0
│ ├── 0.1.0
│ ├── 0.2.0
...
│ └── 4.3.0
...
├── 2
│ ├── 0.0.0
│ └── 1.0.0
└── 3
└── 0.0.0
The chunk size is calculated as follows:
the largest value which is multiple of tile size (--tile-size
) and whose data size is less than the block size limit (--block-size-limit
)
For example, if the tile size is 224, 5600 (= 224 x 25) is set to the chunk size as 5600 is multiple of 224 and 5600 x 5600 x 3 (channels) = 94080000 which is close to (but equal or less than) 100,000,000 (100MB).
10.26.2.5.
filter_image_jpg_serial
This loads tiled JPEG images and filters each image with Canny Edge Detection Filter, then saves each processed tiled image into /output
folder in JPEG format.
10.26.2.5.1.Input
Input requires a folder (mounted at /input
folder inside the container) containing the following files:
.jpg files - Input tiled image files
tile_info.json - Metadata used for loading the tiled/chunked images
10.26.2.5.2.Output
The following files would be stored at /output
folder inside the container:
.jpg files - Output filtered/tiled image files (prefix with
filtered_
)tile_info.json - Metadata used for loading the tiled/chunked images
10.26.2.6.
filter_image_jpg_multithreading
This loads tiled JPEG images and filters each image with Canny Edge Detection Filter, then saves each processed tiled image into /output
folder in JPEG format.
It uses multi-threading to process images.
10.26.2.6.1.Input
Input requires a folder (mounted at /input
folder inside the container) containing the following files:
.jpg files - Input tiled image files
tile_info.json - Metadata used for loading the tiled/chunked images
10.26.2.6.2.Output
The following files would be stored at /output
folder inside the container:
.jpg files - Output filtered/tiled image files (prefix with
filtered_
)tile_info.json - Metadata used for loading the tiled/chunked images
10.26.2.7.
filter_image_jpg_multiprocessing
This loads tiled JPEG images and filters each image with Canny Edge Detection Filter, then saves each processed tiled image into /output
folder in JPEG format.
It uses multi-processing to process images.
10.26.2.7.1.Input
Input requires a folder (mounted at /input
folder inside the container) containing the following files:
.jpg files - Input tiled image files
tile_info.json - Metadata used for loading the tiled/chunked images
10.26.2.7.2.Output
The following files would be stored at /output
folder inside the container:
.jpg files - Output filtered/tiled image files (prefix with
filtered_
)tile_info.json - Metadata used for loading the tiled/chunked images
10.26.2.8.
filter_image_jpg_dali
This loads tiled JPEG images and filters each image with Canny Edge Detection Filter, then saves each processed tiled image into /output
folder in JPEG format.
It uses multi-thread and multi-batch for loading images by using NVIDIA DALI, and it processes images by using multi-processing.
10.26.2.8.1.Input
Input requires a folder (mounted at /input
folder inside the container) containing the following files:
.jpg files - Input tiled image files
tile_info.json - Metadata used for loading the tiled/chunked images
10.26.2.8.2.Output
The following files would be stored at /output
folder inside the container:
.jpg files - Output filtered/tiled image files (prefix with
filtered_
)tile_info.json - Metadata used for loading the tiled/chunked images
10.26.2.9.
filter_image_jpg_dali_chunk
This is the same with filter_image_jpg_dali
but it assumes that the input image size is bigger than the specified tile size (so chunked) to avoid very large partitions.
Since chunk size is multiple of the tile size, the loaded chunk image is partitioned into tile image in memory and provided to the Canny edge detection filter.
10.26.2.10.
filter_image_zarr
This loads the Zarr file and filters each tile image with Canny Edge Detection Filter, then saves each processed tiled image into /output
folder in the same Zarr format used in tile_image_zarr
command.
It uses multi-processing to process each tile.
10.26.2.10.1.Input
Input requires a folder (mounted at /input
folder inside the container) containing the following files:
.zarr folder - Input chunked image data in Zarr format
tile_info.json - Metadata used for loading the tiled/chunked images
10.26.2.10.2.Output
The following files would be stored at /output
folder inside the container:
filtered.zarr folder - Output Zarr folder
levels/<index>
group has the chunk data for each level (0 is the highest resolution)
tile_info.json - Metadata used for loading the tiled/chunked images
10.26.2.11.
stitch_image_jpg
This loads filtered/tiled JPEG images and create a big tiff file (image.tif
) by using the metadata (tile_info.json
) and stitching input images.
10.26.2.11.1.Input
Input requires two folders (mounted at /input
and /config
folder inside the container) containing the following files:
/input
folder
.jpg files - Input filtered/tiled image files
tile_info.json - Metadata used for loading the tiled/chunked images
/config
folder
config_render.json - Configuration for Render Server
10.26.2.11.2.Output
The following files would be stored at /output
folder inside the container:
image.tif - Output image file
config.meta - Metadata for Render Server
config_render.json - Configuration for Render Server
10.26.2.12.
stitch_image_jpg_chunk
This is the same with stitch_image_jpg
but this assumes the input images are chunked images processed by filter_image_jpg_dali_chunk
command.
10.26.2.13.
stitch_image_zarr
This loads filtered/chunked image data and create a big tiff file (image.tif
) by using the metadata (tile_info.json
) and stitching input images.
10.26.2.13.1.Input
Input requires two folders (mounted at /input
and /config
folder inside the container) containing the following files:
/input
folder
.zarr folder - Input chunked image data in Zarr format
tile_info.json - Metadata used for loading the tiled/chunked images
/config
folder
config_render.json - Configuration for Render Server
10.26.2.13.2.Output
The following files would be stored at /output
folder inside the container:
image.tif - Output image file
config.meta - Metadata for Render Server
config_render.json - Configuration for Render Server