nvidia.dali.fn.readers.coco#
- nvidia.dali.fn.readers.coco(
- *,
- annotations_file='',
- avoid_class_remapping=False,
- bytes_per_sample_hint=[0],
- dont_use_mmap=False,
- file_root=None,
- image_ids=False,
- images=None,
- include_iscrowd=True,
- initial_fill=1024,
- lazy_init=False,
- ltrb=False,
- num_shards=1,
- pad_last_batch=False,
- pixelwise_masks=False,
- polygon_masks=False,
- prefetch_queue_depth=1,
- preprocessed_annotations='',
- preserve=False,
- random_shuffle=False,
- ratio=False,
- read_ahead=False,
- save_preprocessed_annotations=False,
- save_preprocessed_annotations_dir='',
- seed=-1,
- shard_id=0,
- shuffle_after_epoch=False,
- size_threshold=0.1,
- skip_cached_images=False,
- skip_empty=False,
- stick_to_shard=False,
- tensor_init_bytes=1048576,
- device=None,
- name=None,
Reads data from a COCO dataset that is composed of a directory with images and annotation JSON files.
This readers produces the following outputs:
images, bounding_boxes, labels, ((polygons, vertices) | (pixelwise_masks)), (image_ids)
images Each sample contains image data with layout
HWC
(height, width, channels).bounding_boxes Each sample can have an arbitrary
M
number of bounding boxes, each described by 4 coordinates:[[x_0, y_0, w_0, h_0], [x_1, y_1, w_1, h_1] ... [x_M, y_M, w_M, h_M]]
or in
[l, t, r, b]
format if requested (seeltrb
argument).labels Each bounding box is associated with an integer label representing a category identifier:
[label_0, label_1, ..., label_M]
polygons and vertices (Optional, present if
polygon_masks
is set to True) Ifpolygon_masks
is enabled, two extra outputs describing masks by a set of polygons. Each mask contains an arbitrary number of polygonsP
, each associated with a mask index in the range [0, M) and composed by a group ofV
vertices. The outputpolygons
describes the polygons as follows:[[mask_idx_0, start_vertex_idx_0, end_vertex_idx_0], [mask_idx_1, start_vertex_idx_1, end_vertex_idx_1], ... [mask_idx_P, start_vertex_idx_P, end_vertex_idx_P]]
where
mask_idx
is the index of the mask the polygon, in the range[0, M)
, andstart_vertex_idx
andend_vertex_idx
define the range of indices of vertices, as they appear in the outputvertices
, belonging to this polygon. Each sample invertices
contains a list of vertices that composed the different polygons in the sample, as 2D coordinates:[[x_0, y_0], [x_1, y_1], ... [x_V, y_V]]
pixelwise_masks (Optional, present if argument
pixelwise_masks
is set to True) Contains image-like data, same shape and layout asimages
, representing a pixelwise segmentation mask.image_ids (Optional, present if argument
image_ids
is set to True) One element per sample, representing an image identifier.
- Supported backends
‘cpu’
- Keyword Arguments:
annotations_file (str, optional, default = ‘’) – List of paths to the JSON annotations files.
avoid_class_remapping (bool, optional, default = False) –
If set to True, lasses ID values are returned directly as they are defined in the manifest file.
Otherwise, classes’ ID values are mapped to consecutive values in range 1-number of classes disregarding exact values from the manifest (0 is reserved for a special background class.
bytes_per_sample_hint (int or list of int, optional, default = [0]) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dont_use_mmap (bool, optional, default = False) –
If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.
Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.
file_root (str, optional) –
Path to a directory that contains the data files.
If a file list is not provided, this argument is required.
image_ids (bool, optional, default = False) – If set to True, the image IDs will be produced in an extra output.
images (str or list of str, optional) –
A list of image paths.
If provided, it specifies the images that will be read. The images will be read in the same order as they appear in the list, and in case of duplicates, multiple copies of the relevant samples will be produced.
If left unspecified or set to None, all images listed in the annotation file are read exactly once, ordered by their image id.
The paths to be kept should match exactly those in the annotations file.
Note: This argument is mutually exclusive with
preprocessed_annotations
.include_iscrowd (bool, optional, default = True) – If set to True annotations marked as
iscrowd=1
are included as well.initial_fill (int, optional, default = 1024) –
Size of the buffer that is used for shuffling.
If
random_shuffle
is False, this parameter is ignored.lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.
ltrb (bool, optional, default = False) –
If set to True, bboxes are returned as [left, top, right, bottom].
If set to False, the bboxes are returned as [x, y, width, height].
masks (bool, optional, default = False) –
Enable polygon masks.
Warning
Use
polygon_masks
instead. Note that the polygon format has changedmask_id, start_coord, end_coord
tomask_id, start_vertex, end_vertex
where start_coord and end_coord are total number of coordinates, effectlystart_coord = 2 * start_vertex
andend_coord = 2 * end_vertex
. Example: A polygon with vertices[[x0, y0], [x1, y1], [x2, y2]]
would be represented as[mask_id, 0, 6]
when using the deprecated argumentmasks
, but[mask_id, 0, 3]
when using the new argumentpolygon_masks
.num_shards (int, optional, default = 1) –
Partitions the data into the specified number of parts (shards).
This is typically used for multi-GPU or multi-node training.
pad_last_batch (bool, optional, default = False) –
If set to True, pads the shard by repeating the last sample.
Note
If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.
pixelwise_masks (bool, optional, default = False) – If true, segmentation masks are read and returned as pixel-wise masks. This argument is mutually exclusive with
polygon_masks
.polygon_masks (bool, optional, default = False) – If set to True, segmentation mask polygons are read in the form of two outputs:
polygons
andvertices
. This argument is mutually exclusive withpixelwise_masks
.prefetch_queue_depth (int, optional, default = 1) –
Specifies the number of batches to be prefetched by the internal Loader.
This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.
preprocessed_annotations (str, optional, default = ‘’) – Path to the directory with meta files that contain preprocessed COCO annotations.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
random_shuffle (bool, optional, default = False) –
Determines whether to randomly shuffle data.
A prefetch buffer with a size equal to
initial_fill
is used to read data sequentially, and then samples are selected randomly to form a batch.ratio (bool, optional, default = False) – If set to True, the returned bbox and mask polygon coordinates are relative to the image dimensions.
read_ahead (bool, optional, default = False) –
Determines whether the accessed data should be read ahead.
For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.
save_preprocessed_annotations (bool, optional, default = False) – If set to True, the operator saves a set of files containing binary representations of the preprocessed COCO annotations.
save_preprocessed_annotations_dir (str, optional, default = ‘’) – Path to the directory in which to save the preprocessed COCO annotations files.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shard_id (int, optional, default = 0) – Index of the shard to read.
shuffle_after_epoch (bool, optional, default = False) – If set to True, the reader shuffles the entire dataset after each epoch.
size_threshold (float, optional, default = 0.1) – If the width or the height, in number of pixels, of a bounding box that represents an instance of an object is lower than this value, the object will be ignored.
skip_cached_images (bool, optional, default = False) –
If set to True, the loading data will be skipped when the sample is in the decoder cache.
In this case, the output of the loader will be empty.
skip_empty (bool, optional, default = False) – If true, reader will skip samples with no object instances in them
stick_to_shard (bool, optional, default = False) –
Determines whether the reader should stick to a data shard instead of going through the entire dataset.
If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.
tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
dump_meta_files (bool) –
Warning
The argument
dump_meta_files
is a deprecated alias forsave_preprocessed_annotations
. Usesave_preprocessed_annotations
instead.dump_meta_files_path (str) –
Warning
The argument
dump_meta_files_path
is a deprecated alias forsave_preprocessed_annotations_dir
. Usesave_preprocessed_annotations_dir
instead.meta_files_path (str) –
Warning
The argument
meta_files_path
is a deprecated alias forpreprocessed_annotations
. Usepreprocessed_annotations
instead.save_img_ids (bool) –
Warning
The argument
save_img_ids
is a deprecated alias forimage_ids
. Useimage_ids
instead.