class BaseImagePipeline(task, data_list_file_path, data_file_base_dir, data_list_key, crop_size, data_format='channels_first', num_data_dims=3, num_channels=1, num_label_channels=1, label_format=None, shuffle=True, duplicate_count=1, batch_transforms=None, items_per_category=None, category_weights=None)

Bases: ai4med.components.data.data_pipeline.DataPipeline

Base class for ImagePipelines.

Parameters: batch_transforms (list) – List of transforms to be applied to batched data.

get_batched_data(session)

get_next_batch(session)

Get the next batch of data.

Parameters: session – the TF session

Returns: batched data

process_data_list()

ai4med.components.data.cls_image_pipeline module

class ClassificationImagePipeline(data_list_file_path, data_file_base_dir, data_list_key, output_crop_size, transforms, output_data_format='channels_first', output_data_dims=3, output_image_channels=1, output_image_dtype='float32', output_label_format=None, output_batch_size=10, batched_by_transforms=False, num_workers=4, prefetch_size=20, shuffle=True, repeat=True, duplicate_count=1, extra_inputs=None, batch_transforms=None, items_per_category=None, category_weights=None)

Bases: ai4med.components.data.image_pipeline.ImagePipeline

An ImagePipeline for classification tasks.

Note that data_list_file_path must point to a json file that is similar to what you get from http://medicaldecathlon.com/.

Parameters

data_list_file_path (string) – The path to the json file
data_file_base_dir (string) – The base directory of the dataset
data_list_key (string) – The key to get a list of dictionary to be used
output_crop_size (tuple, list) – Crop size of the output data
transforms – A list of transforms to be applied to the data
output_data_format – Format of the output data. Must be a valid format from DataFormat. See ai4med.common.data_format
output_data_dims (int) – Number of dimensions of output images
output_image_channels (int) – Number of channels of output images
output_image_dtype (string) – Data type of output images
output_label_format – Format of output labels, refer to ai4med.common.label_format
output_batch_size (int) – Batch size of output
batched_by_transforms (bool) – Batching can be done either by transforms or by the TF dataset. This arg specifies how the batching is done.
num_workers (int) – Number of worker threads for data transformation
prefetch_size (int) – Number of data subjects to prefetch
shuffle (bool) – To shuffle the data or not
duplicate_count (int) – Number of times to duplicate the datalist.
extra_inputs – Extra placeholders for data inputs
batch_transforms (list) – List of transforms to be applied to batched data.

ai4med.components.data.cls_image_pipeline_with_cache module

class ClassificationImagePipelineWithCache(data_list_file_path, data_file_base_dir, data_list_key, output_crop_size, transforms, output_data_format='channels_first', output_data_dims=3, output_image_channels=1, output_image_dtype='float32', output_label_format=None, output_batch_size=10, batched_by_transforms=False, num_workers=4, prefetch_size=20, shuffle=True, repeat=True, duplicate_count=1, extra_inputs=None, num_cache_objects=10000, replace_percent=0.1, caches_data=True, batch_transforms=None, items_per_category=None, category_weights=None)

Bases: ai4med.components.data.image_pipeline_with_cache.ImagePipelineWithCache

An implementation of DataPipeline that uses SmartCache to efficiently generate data for training/testing of classification tasks.

Note that data_list_file_path must point to a json file that is similar to what you get from http://medicaldecathlon.com/.

Parameters

data_list_file_path (string) – The path to the json file
data_file_base_dir (string) – The base directory of the dataset
data_list_key (string) – The key to get a list of dictionary to be used
output_crop_size (tuple, list) – Crop size of the output data
transforms – A list of transforms to be applied to the data
output_data_format – Format of the output data. Must be a valid format from DataFormat. See ai4med.common.data_format
output_data_dims (int) – Number of dimensions of output images
output_image_channels (int) – Number of channels of output images
output_image_dtype (string) – Data type of output images
output_label_format – Format of output labels, refer to ai4med.common.label_format
output_batch_size (int) – Batch size of output
batched_by_transforms (bool) – Batching can be done either by transforms or by the TF dataset. This parameter specifies how the batching is done.
num_workers (int) – Number of worker threads for data transformation
prefetch_size (int) – Number of data subjects to prefetch
shuffle (bool) – To shuffle the data or not
duplicate_count (int) – Number of times to duplicate the datalist.
extra_inputs – Extra placeholders for data inputs
num_cache_objects (int) – Number of objects to be cached
replace_percent (float) – The percent of cached data to be replaced in every epoch
caches_data (bool) – Whether to cache data in memory
batch_transforms (list) – List of transforms to be applied to batched data.

Note

SmartCache has a content rotation feature that is based on the config parameters, and it determines the content in the cache by dynamically rotating through the whole dataset. Only the data in the cache is used for training.

If you set ‘caches_data’ to False, then you are only using this content rotation feature without incurring any memory consumption.

ai4med.components.data.cls_keras_image_pipeline module

class ClassificationKerasImagePipeline(data_list_file_path, data_file_base_dir, data_list_key, output_crop_size, transforms, output_data_format='channels_first', output_data_dims=3, output_image_channels=1, output_image_dtype='float32', output_label_format=None, output_batch_size=10, batched_by_transforms=False, num_workers=4, prefetch_size=20, shuffle=True, repeat=True, duplicate_count=1, extra_inputs=None, batch_transforms=None, items_per_category=None, category_weights=None, multiprocessing=False, sampling=None)

Bases: ai4med.components.data.keras_image_pipeline.KerasImagePipeline

An ImagePipeline for classification tasks using keras backend.

Note that data_list_file_path must point to a json file that is similar to what you get from http://medicaldecathlon.com/.

Parameters

data_list_file_path (string) – The path to the json file
data_file_base_dir (string) – The base directory of the dataset
data_list_key (string) – The key to get a list of dictionary to be used
output_crop_size (tuple, list) – Crop size of the output data
transforms – A list of transforms to be applied to the data
output_data_format – Format of the output data. Must be a valid format from DataFormat. See ai4med.common.data_format
output_data_dims (int) – Number of dimensions of output images
output_image_channels (int) – Number of channels of output images
output_image_dtype (string) – Data type of output images
output_label_format – Format of output labels, refer to ai4med.common.label_format
output_batch_size (int) – Batch size of output
batched_by_transforms (bool) – Batching can be done either by transforms or by the TF dataset. This arg specifies how the batching is done.
num_workers (int) – Number of worker threads for data transformation
prefetch_size (int) – Number of data subjects to prefetch
shuffle (bool) – To shuffle the data or not
duplicate_count (int) – Number of times to duplicate the datalist.
extra_inputs – Extra placeholders for data inputs
batch_transforms (list) – List of transforms to be applied to batched data.
multiprocessing (bool) – Whether to use multiprocessing lib or python’s native threading. Default: False.
sampling (str) – Should use weighted sampling for data. Default sampling=None means no uniform sampling. Options are ‘element’ and ‘automatic’. ‘element’ picks weights from dataset json. ‘Automatic’ calculates it based on the number of elements in each class.

ai4med.components.data.cls_keras_image_pipeline_with_cache module

class ClassificationKerasImagePipelineWithCache(data_list_file_path, data_file_base_dir, data_list_key, output_crop_size, transforms, output_data_format='channels_first', output_data_dims=3, output_image_channels=1, output_image_dtype='float32', output_label_format=None, output_batch_size=10, batched_by_transforms=False, num_workers=4, prefetch_size=20, shuffle=True, repeat=True, duplicate_count=1, extra_inputs=None, num_cache_objects=10000, replace_percent=0.1, caches_data=True, batch_transforms=None, items_per_category=None, category_weights=None, multiprocessing=False, sampling=None)

Bases: ai4med.components.data.keras_image_pipeline_with_cache.KerasImagePipelineWithCache

An implementation of DataPipeline that uses SmartCache to efficiently generate data for training/testing of classification tasks.

Note that data_list_file_path must point to a json file that is similar to what you get from http://medicaldecathlon.com/.

Parameters

data_list_file_path (string) – The path to the json file
data_file_base_dir (string) – The base directory of the dataset
data_list_key (string) – The key to get a list of dictionary to be used
output_crop_size (tuple, list) – Crop size of the output data
transforms – A list of transforms to be applied to the data
output_data_format – Format of the output data. Must be a valid format from DataFormat. See ai4med.common.data_format
output_data_dims (int) – Number of dimensions of output images
output_image_channels (int) – Number of channels of output images
output_image_dtype (string) – Data type of output images
output_label_format – Format of output labels, refer to ai4med.common.label_format
output_batch_size (int) – Batch size of output
batched_by_transforms (bool) – Batching can be done either by transforms or by the TF dataset. This parameter specifies how the batching is done.
num_workers (int) – Number of worker threads for data transformation
prefetch_size (int) – Number of data subjects to prefetch
shuffle (bool) – To shuffle the data or not
duplicate_count (int) – Number of times to duplicate the datalist.
extra_inputs – Extra placeholders for data inputs
num_cache_objects (int) – Number of objects to be cached
replace_percent (float) – The percent of cached data to be replaced in every epoch
caches_data (bool) – Whether to cache data in memory
batch_transforms (list) – List of transforms to be applied to batched data.
multiprocessing (bool) – Whether to use multiprocessing lib or python’s native threading. Default: False.
sampling (str) – Should use weighted sampling for data. Default sampling=None means no uniform sampling. Options are ‘element’ and ‘automatic’. ‘element’ picks weights from dataset json. ‘Automatic’ calculates it based on the number of elements in each class.

Note

If you set ‘caches_data’ to False, then you are only using this content rotation feature without incurring any memory consumption.

ai4med.components.data.data_pipeline module

class DataPipeline

Bases: ai4med.common.graph_component.GraphComponent

This class defines the required methods for data pipeline implementations.

A DataPipeline produces data items for training and validation.

Note

DataPipeline is a graph building component. Implementations must implement the build method required by GraphComponent’s interface.

abstract get_data_property()

Get the property of produced data

Returns: DataProperty object

abstract get_dataset_size()

Get the size of the dataset, which is the number of training subjects.

Returns: size of dataset

get_extra_inputs()

Get the placeholder specs of extra data inputs, if any

Returns: list of PlaceholderSpec objects, or None

abstract get_next_batch(session)

Get the next batch of data.

Parameters: session – the TF session

Returns: batched data

abstract initialize_dataset(session, state=- 1)

Initializes the dataset.

Note

This method is called at the beginning of each training epoch.

Parameters

session – the TF session.
state – the current training state. It is usually the epoch number. State -1 means
training has not started. (the) –

abstract number_of_subjects_per_batch()

Get the number of subjects used to produce a batch. Depending on how the training data is transformed, training samples in the same batch could be produced from one or more subjects.

Returns: number of subjects used to produce a batch

abstract set_sharding(rank, num_shards, equal_shard_size=True, fixed_shard_data=False)

Computes the parameters for dividing the dataset to multiple partitions (shards). This is used for horovod-based multi-gpu training.

Parameters

rank (int) – the rank of the current process
num_shards (int) – total number of shards
equal_shard_size (bool) – whether to make all shards equal size (Default: True)
fixed_shard_data (bool) – whether to make content of each shard fixed. If not fixed, content of each shard is recomputed randomly across the whole dataset each time the dataset is initialized. (Default: False)

abstract shutdown(): Shuts down the data pipeline. This is for an implementation to clean up resources used.

ai4med.components.data.image_pipeline module

class ImagePipeline(task, data_list_file_path, data_file_base_dir, data_list_key, crop_size, transforms, data_format='channels_first', num_data_dims=3, num_channels=1, image_dtype='float32', num_label_channels=1, label_format=None, label_dtype='float32', batch_size=10, num_workers=4, prefetch_size=20, shuffle=True, repeat=True, duplicate_count=1, extra_inputs=None, batch_transforms=None, items_per_category=None, category_weights=None)

Bases: ai4med.components.data.base_image_pipeline.BaseImagePipeline

An implementation of DataPipeline that generates images for training/testing.

Note that data_list_file_path must point to a json file that is similar to what you get from http://medicaldecathlon.com/.

Note

This class implements dataset with TF’s dataset.

Parameters

task (string) – Task to perform
data_list_file_path (string) – The path to the json file
data_file_base_dir (string) – The base directory of the dataset
data_list_key (string) – The key to get a list of dictionary to be used
crop_size (tuple, list) – Crop size of the output data
transforms – A list of transforms to be applied to the data
data_format – Format of the output data. Must be a valid format from DataFormat. See ai4med.common.data_format
num_data_dims (int) – Number of dimensions of output images
num_channels (int) – Number of channels of output images
image_dtype (string) – Data type of output images
num_label_channels (int) – Number of channels of output label images (for segmentation task)
label_format – Format of output labels, refer to ai4med.common.label_format (for classification task)
label_dtype (string) – Data type for output labels
batch_size (int) – Batch size of output
num_workers (int) – Number of worker threads for data transformation
duplicate_count (int) – Number of times to duplicate the datalist.
prefetch_size (int) – Number of data subjects to prefetch
shuffle (bool) – To shuffle the data or not
extra_inputs – Extra placeholders for data inputs
batch_transforms (list) – List of transforms to be applied to batched data.

build(build_ctx: ai4med.common.build_ctx.BuildContext)

Builds TF graph components. It reads and processes the data list file, creates data property object based on the data list content and init parameters.

Parameters: build_ctx – the build context.

get_batched_data(session)

Get the next batch of data.

Parameters: session – the TF session

Returns: batched data

get_data_property()

Get the property of produced data

Returns: DataProperty object

get_dataset_size()

Get the size of the TF dataset, which is the number of training subjects.

Returns: size of dataset

get_extra_inputs()

Get the placeholder specs of extra data inputs, if any

Returns: list of PlaceholderSpec objects, or None

initialize_dataset(session, state=- 1)

Initializes the dataset. Note: this method is called at the beginning of each training epoch.

Parameters

session – the TF session.
state – the current training state. It is usually the epoch number. State -1 means the training has not started.

number_of_subjects_per_batch()

Get the number of subjects used to produce a batch. Depending on how the training data is transformed, training samples in the same batch could be produced from one or more subjects.

Returns: number of subjects used to produce a batch

set_sharding(rank, num_shards, equal_shard_size=True, fixed_shard_data=False)

Computes the parameters for dividing the dataset to multiple partitions (shards). This is used for horovod-based multi-gpu training.

Parameters

rank (int) – the rank of the current process
num_shards (int) – total number of shards
equal_shard_size (bool) – whether to make all shards equal size
fixed_shard_data (bool) – whether to make content of each shard fixed. If not fixed, content of each shard is recomputed randomly across the whole dataset each time the dataset is initialized.

Returns:

shutdown(): Shut down the image pipeline and clean up dataset resources.

ai4med.components.data.image_pipeline_with_cache module

class ImagePipelineWithCache(task, data_list_file_path, data_file_base_dir, data_list_key, crop_size, transforms, data_format='channels_first', num_data_dims=3, num_channels=1, image_dtype='float32', num_label_channels=1, label_format=None, label_dtype='float32', batch_size=10, num_workers=4, prefetch_size=20, shuffle=True, repeat=True, duplicate_count=1, extra_inputs=None, num_cache_objects=10000, replace_percent=0.1, caches_data=True, batch_transforms=None, items_per_category=None, category_weights=None)

Bases: ai4med.components.data.image_pipeline.ImagePipeline

An implementation of DataPipeline that uses SmartCache to efficiently generate data for training/testing.

Note that data_list_file_path must point to a json file that is similar to what you get from http://medicaldecathlon.com/.

Parameters

task (string) – Task to perform
data_list_file_path (string) – The path to the json file
data_file_base_dir (string) – The base directory of the dataset
data_list_key (string) – The key to get a list of dictionary to be used
crop_size (tuple, list) – Crop size of the output data
transforms – A list of transforms to be applied to the data
data_format – Format of the output data. Must be a valid format from DataFormat. See ai4med.common.data_format
num_data_dims (int) – Number of dimensions of output images
num_channels (int) – Number of channels of output images
image_dtype (string) – Data type of output images
num_label_channels (int) – Number of channels of output label images (for segmentation task)
label_format – Format of output labels, refer to ai4med.common.label_format (for classification task)
label_dtype (string) – Data type for output labels
batch_size (int) – Batch size of output
num_workers (int) – Number of worker processes for data transformation
prefetch_size (int) – Number of data subjects to prefetch
shuffle (bool) – To shuffle the data or not
duplicate_count (int) – Number of times to duplicate the datalist.
extra_inputs – Extra placeholders for data inputs
num_cache_objects (int) – Number of objects to be cached
replace_percent (float) – The percent of cached data to be replaced in every epoch
caches_data (bool) – Whether to cache data in memory.
batch_transforms (list) – List of transforms to be applied to batched data.

Note

SmartCache has a content rotation feature that based on the config parameters, and it determines the content in the cache by dynamically rotating through the whole dataset. Only the data in the cache is used for training.

If you set ‘caches_data’ to False, then you are only using this content rotation feature without incurring any memory consumption.

ai4med.components.data.keras_image_pipeline module

class KerasImagePipeline(task, data_list_file_path, data_file_base_dir, data_list_key, crop_size, transforms, data_format='channels_first', num_data_dims=3, num_channels=1, image_dtype='float32', num_label_channels=1, label_format=None, label_dtype='float32', batch_size=10, num_workers=4, prefetch_size=20, shuffle=True, repeat=True, duplicate_count=1, extra_inputs=None, batch_transforms=None, items_per_category=None, category_weights=None, multiprocessing=False, sampling=None)

Bases: ai4med.components.data.base_image_pipeline.BaseImagePipeline

Implementation of data pipeline using keras.

Note

This class uses Keras’s data enqueuer to manage worker threads.

Parameters

task (string) – Task to perform
data_list_file_path (string) – The path to the json file
data_file_base_dir (string) – The base directory of the dataset
data_list_key (string) – The key to get a list of dictionary to be used
crop_size (tuple, list) – Crop size of the output data
transforms – A list of transforms to be applied to the data
data_format – Format of the output data. Must be a valid format from DataFormat. See ai4med.common.data_format
num_data_dims (int) – Number of dimensions of output images
num_channels (int) – Number of channels of output images
image_dtype (string) – Data type of output images
num_label_channels (int) – Number of channels of output label images (for segmentation task)
label_format – Format of output labels, refer to ai4med.common.label_format (for classification task)
label_dtype (string) – Data type for output labels
batch_size (int) – Batch size of output
num_workers (int) – Number of worker threads for data transformation
prefetch_size (int) – Number of data subjects to prefetch
shuffle (bool) – To shuffle the data or not
duplicate_count (int) – Number of times to duplicate the datalist.
extra_inputs – Extra placeholders for data inputs
batch_transforms (list) – List of transforms to be applied to batched data.
multiprocessing (bool) – Whether to use multiprocessing lib or python’s native threading. Default: False.
sampling (str) – Should use weighted sampling for data. Default sampling=None means no uniform sampling. Options are ‘element’ and ‘automatic’. ‘element’ picks weights from dataset json. ‘Automatic’ calculates it based on the number of elements in each class.

begin_generator()

build(build_ctx: ai4med.common.build_ctx.BuildContext): Builds the keras pipeline using Sequence generator and starts queue operation.

create_data_gen_and_enqueuer()

create_sample_weights(): Adds weights to items list if sampling is enabled.

get_batched_data(session): Get the next batch of data. In keras, session shouldn’t be used.

get_data_property(): Get property of produced data.

get_dataset_size(): Get size of the keras dataset.

get_extra_inputs(): Get placeholder specs for extra inputs

initialize_dataset(session, state=- 1)

Initializes the dataset. Note: this method is called at the beginning of each training epoch.

Parameters

session – the TF session.
state – the current training state. It is usually the epoch number. State -1 means the training has not started.

number_of_subjects_per_batch()

Get the number of subjects used to produce a batch. Depending on how the training data is transformed, training samples in the same batch could be produced from one or more subjects.

Returns: number of subjects used to produce a batch

set_sharding(rank, num_shards, equal_shard_size=True, fixed_shard_data=False)

Computes the parameters for dividing the dataset to multiple partitions (shards). This is used for horovod-based multi-gpu training.

Parameters

rank (int) – the rank of the current process
num_shards (int) – total number of shards
equal_shard_size (bool) – whether to make all shards equal size
fixed_shard_data (bool) – whether to make content of each shard fixed. If not fixed, content of each shard is recomputed randomly across the whole dataset each time the dataset is initialized.

Returns:

shutdown(): Shut down the image pipeline and clean up dataset resources.

ai4med.components.data.keras_image_pipeline_with_cache module

class KerasImagePipelineWithCache(task, data_list_file_path, data_file_base_dir, data_list_key, crop_size, transforms, data_format='channels_first', num_data_dims=3, num_channels=1, image_dtype='float32', num_label_channels=1, label_format=None, label_dtype='float32', batch_size=10, num_workers=4, prefetch_size=20, shuffle=True, repeat=True, duplicate_count=1, extra_inputs=None, num_cache_objects=10000, replace_percent=0.1, caches_data=True, batch_transforms=None, items_per_category=None, category_weights=None, multiprocessing=False, sampling=None)

Bases: ai4med.components.data.keras_image_pipeline.KerasImagePipeline

An implementation of KerasPipeline that uses SmartCache to efficiently generate data for training/testing.

Note

This class uses Keras’s data enqueuer to manage worker threads.

Parameters

task (string) – Task to perform
data_list_file_path (string) – The path to the json file
data_file_base_dir (string) – The base directory of the dataset
data_list_key (string) – The key to get a list of dictionary to be used
crop_size (tuple, list) – Crop size of the output data
transforms – A list of transforms to be applied to the data
data_format – Format of the output data. Must be a valid format from DataFormat. See ai4med.common.data_format
num_data_dims (int) – Number of dimensions of output images
num_channels (int) – Number of channels of output images
image_dtype (string) – Data type of output images
num_label_channels (int) – Number of channels of output label images (for segmentation task)
label_format – Format of output labels, refer to ai4med.common.label_format (for classification task)
label_dtype (string) – Data type for output labels
batch_size (int) – Batch size of output
num_workers (int) – Number of worker processes for data transformation
prefetch_size (int) – Number of data subjects to prefetch
shuffle (bool) – To shuffle the data or not
duplicate_count (int) – Number of times to duplicate the datalist.
extra_inputs – Extra placeholders for data inputs
num_cache_objects (int) – Number of objects to be cached
replace_percent (float) – The percent of cached data to be replaced in every epoch
caches_data (bool) – Whether to cache data in memory.
batch_transforms (list) – List of transforms to be applied to batched data.
multiprocessing (bool) – Whether to use multiprocessing lib or python’s native threading. Default: False.
sampling (str) – Should use weighted sampling for data. Default sampling=None means no uniform sampling. Options are ‘element’ and ‘automatic’. ‘element’ picks weights from dataset json. ‘Automatic’ calculates it based on the number of elements in each class.

Note

If you set ‘caches_data’ to False, then you are only using this content rotation feature without incurring any memory consumption.

ai4med.components.data.seg_image_pipeline module

class SegmentationImagePipeline(data_list_file_path, data_file_base_dir, data_list_key, output_crop_size, transforms, output_data_format='channels_first', output_data_dims=3, output_image_channels=1, output_image_dtype='float32', output_label_channels=1, output_label_dtype='float32', output_batch_size=10, batched_by_transforms=False, num_workers=4, prefetch_size=20, shuffle=True, repeat=True, duplicate_count=1, extra_inputs=None, batch_transforms=None)

Bases: ai4med.components.data.image_pipeline.ImagePipeline

An ImagePipeline for segmentation tasks.

Note that data_list_file_path must point to a json file that is similar to what you get from http://medicaldecathlon.com/.

Parameters

data_list_file_path (string) – The path to the json file
data_file_base_dir (string) – The base directory of the dataset
data_list_key (string) – The key to get a list of dictionary to be used
output_crop_size (tuple, list) – Crop size of the output data
transforms – A list of transforms to be applied to the data
output_data_format – Format of the output data. Must be a valid format from DataFormat. See ai4med.common.data_format
output_data_dims (int) – Number of dimensions of output images
output_image_channels (int) – Number of channels of output images
output_image_dtype (string) – Data type of output images
output_label_channels (int) – Number of channels of output label images
output_label_dtype (string) – Data type for output label images
output_batch_size (int) – Batch size of output data
batched_by_transforms (bool) – Batching can be done either by transforms or by the TF dataset. This arg specifies how the batching is done.
num_workers (int) – Number of worker threads for data transformation
prefetch_size (int) – Number of data subjects to prefetch
shuffle (bool) – To shuffle the data or not
duplicate_count (int) – Number of times to duplicate the datalist.
extra_inputs – Extra placeholders for data inputs
batch_transforms (list) – List of transforms to be applied to batched data.

ai4med.components.data.seg_image_pipeline_with_cache module

class SegmentationImagePipelineWithCache(data_list_file_path, data_file_base_dir, data_list_key, output_crop_size, transforms, output_data_format='channels_first', output_data_dims=3, output_image_channels=1, output_image_dtype='float32', output_label_channels=1, output_label_dtype='float32', output_batch_size=10, batched_by_transforms=False, num_workers=4, prefetch_size=20, shuffle=True, repeat=True, duplicate_count=1, extra_inputs=None, num_cache_objects=10000, replace_percent=0.1, caches_data=True, batch_transforms=None)

Bases: ai4med.components.data.image_pipeline_with_cache.ImagePipelineWithCache

An implementation of DataPipeline that uses SmartCache to efficiently generate data for training/testing of segmentation tasks.

Note that data_list_file_path must point to a json file that is similar to what you get from http://medicaldecathlon.com/.

Parameters

data_list_file_path (string) – The path to the json file
data_file_base_dir (string) – The base directory of the dataset
data_list_key (string) – The key to get a list of dictionary to be used
output_crop_size (tuple, list) – Crop size of the output data
transforms – A list of transforms to be applied to the data
output_data_format – Format of the output data. Must be a valid format from DataFormat. See ai4med.common.data_format
output_data_dims (int) – Number of dimensions of output images
output_image_channels (int) – Number of channels of output images
output_image_dtype (string) – Data type of output images
output_label_channels (int) – Number of channels of output label images
output_label_dtype (string) – Data type for output label images
output_batch_size (int) – Batch size of output data
batched_by_transforms (bool) – Batching can be done either by transforms or by the TF dataset. This parameter specifies how the batching is done.
num_workers (int) – Number of worker threads for data transformation
prefetch_size (int) – Number of data subjects to prefetch
shuffle (bool) – To shuffle the data or not
duplicate_count (int) – Number of times to duplicate the datalist.
extra_inputs – Extra placeholders for data inputs
num_cache_objects (int) – Number of objects to be cached
replace_percent (float) – The percent of cached data to be replaced in every epoch
caches_data (bool) – Whether to cache data in memory.
batch_transforms (list) – List of transforms to be applied to batched data.

Note

If you set ‘caches_data’ to False, then you are only using this content rotation feature without incurring any memory consumption.

ai4med.components.data.seg_keras_image_pipeline module

class SegmentationKerasImagePipeline(data_list_file_path, data_file_base_dir, data_list_key, output_crop_size, transforms, output_data_format='channels_first', output_data_dims=3, output_image_channels=1, output_image_dtype='float32', output_label_channels=1, output_label_dtype='float32', output_batch_size=10, batched_by_transforms=False, num_workers=4, prefetch_size=20, shuffle=True, repeat=True, duplicate_count=1, extra_inputs=None, batch_transforms=None, multiprocessing=False)

Bases: ai4med.components.data.keras_image_pipeline.KerasImagePipeline

An ImagePipeline for segmentation tasks using keras backend.

Note that data_list_file_path must point to a json file that is similar to what you get from http://medicaldecathlon.com/.

Parameters

data_list_file_path (string) – The path to the json file
data_file_base_dir (string) – The base directory of the dataset
data_list_key (string) – The key to get a list of dictionary to be used
output_crop_size (tuple, list) – Crop size of the output data
transforms – A list of transforms to be applied to the data
output_data_format – Format of the output data. Must be a valid format from DataFormat. See ai4med.common.data_format
output_data_dims (int) – Number of dimensions of output images
output_image_channels (int) – Number of channels of output images
output_image_dtype (string) – Data type of output images
output_label_channels (int) – Number of channels of output label images
output_label_dtype (string) – Data type for output label images
output_batch_size (int) – Batch size of output data
batched_by_transforms (bool) – Batching can be done either by transforms or by the TF dataset. This arg specifies how the batching is done.
num_workers (int) – Number of worker threads for data transformation
prefetch_size (int) – Number of data subjects to prefetch
shuffle (bool) – To shuffle the data or not
duplicate_count (int) – Number of times to duplicate the datalist.
extra_inputs – Extra placeholders for data inputs
multiprocessing (bool) – Whether to use multiprocessing lib or python’s native threading. Default: False.
batch_transforms (list) – List of transforms to be applied to batched data.

ai4med.components.data.seg_keras_image_pipeline_with_cache module

class SegmentationKerasImagePipelineWithCache(data_list_file_path, data_file_base_dir, data_list_key, output_crop_size, transforms, output_data_format='channels_first', output_data_dims=3, output_image_channels=1, output_image_dtype='float32', output_label_channels=1, output_label_dtype='float32', output_batch_size=10, batched_by_transforms=False, num_workers=4, prefetch_size=20, shuffle=True, repeat=True, duplicate_count=1, extra_inputs=None, num_cache_objects=10000, replace_percent=0.1, caches_data=True, batch_transforms=None, multiprocessing=False)

Bases: ai4med.components.data.keras_image_pipeline_with_cache.KerasImagePipelineWithCache

An implementation of DataPipeline that uses SmartCache to efficiently generate data for training/testing of segmentation tasks.

Note that data_list_file_path must point to a json file that is similar to what you get from http://medicaldecathlon.com/.

Parameters

data_list_file_path (string) – The path to the json file
data_file_base_dir (string) – The base directory of the dataset
data_list_key (string) – The key to get a list of dictionary to be used
output_crop_size (tuple, list) – Crop size of the output data
transforms – A list of transforms to be applied to the data
output_data_format – Format of the output data. Must be a valid format from DataFormat. See ai4med.common.data_format
output_data_dims (int) – Number of dimensions of output images
output_image_channels (int) – Number of channels of output images
output_image_dtype (string) – Data type of output images
output_label_channels (int) – Number of channels of output label images
output_label_dtype (string) – Data type for output label images
output_batch_size (int) – Batch size of output data
batched_by_transforms (bool) – Batching can be done either by transforms or by the TF dataset. This parameter specifies how the batching is done.
num_workers (int) – Number of worker threads for data transformation
prefetch_size (int) – Number of data subjects to prefetch
shuffle (bool) – To shuffle the data or not
duplicate_count (int) – Number of times to duplicate the datalist.
extra_inputs – Extra placeholders for data inputs
num_cache_objects (int) – Number of objects to be cached
replace_percent (float) – The percent of cached data to be replaced in every epoch
caches_data (bool) – Whether to cache data in memory
batch_transforms (list) – List of transforms to be applied to batched data.
multiprocessing (bool) – Whether to use multiprocessing lib or python’s native threading. Default: False.

Note

If you set ‘caches_data’ to False, then you are only using this content rotation feature without incurring any memory consumption.

ai4med.components.data package

Submodules

ai4med.components.data.base_image_pipeline module

ai4med.components.data.cls_image_pipeline module

ai4med.components.data.cls_image_pipeline_with_cache module

ai4med.components.data.cls_keras_image_pipeline module

ai4med.components.data.cls_keras_image_pipeline_with_cache module

ai4med.components.data.data_pipeline module

ai4med.components.data.image_pipeline module

ai4med.components.data.image_pipeline_with_cache module

ai4med.components.data.keras_image_pipeline module

ai4med.components.data.keras_image_pipeline_with_cache module

ai4med.components.data.seg_image_pipeline module

ai4med.components.data.seg_image_pipeline_with_cache module

ai4med.components.data.seg_keras_image_pipeline module

ai4med.components.data.seg_keras_image_pipeline_with_cache module

Module contents