nvidia.dali.fn.readers.coco(*inputs, **kwargs)

Reads data from a COCO dataset that is composed of a directory with images and annotation JSON files.

This readers produces the following outputs:

images, bounding_boxes, labels, ((polygons, vertices) | (pixelwise_masks)), (image_ids)
  • images Each sample contains image data with layout HWC (height, width, channels).

  • bounding_boxes Each sample can have an arbitrary M number of bounding boxes, each described by 4 coordinates:

    [[x_0, y_0, w_0, h_0],
     [x_1, y_1, w_1, h_1]
     [x_M, y_M, w_M, h_M]]

    or in [l, t, r, b] format if requested (see ltrb argument).

  • labels Each bounding box is associated with an integer label representing a category identifier:

    [label_0, label_1, ..., label_M]
  • polygons and vertices (Optional, present if polygon_masks is set to True) If polygon_masks is enabled, two extra outputs describing masks by a set of polygons. Each mask contains an arbitrary number of polygons P, each associated with a mask index in the range [0, M) and composed by a group of V vertices. The output polygons describes the polygons as follows:

    [[mask_idx_0, start_vertex_idx_0, end_vertex_idx_0],
     [mask_idx_1, start_vertex_idx_1, end_vertex_idx_1],
     [mask_idx_P, start_vertex_idx_P, end_vertex_idx_P]]

    where mask_idx is the index of the mask the polygon, in the range [0, M), and start_vertex_idx and end_verted_idx define the range of indices of vertices, as they appear in the output vertices, belonging to this polygon. Each sample in vertices contains a list of vertices that composed the different polygons in the sample, as 2D coordinates:

    [[x_0, y_0],
     [x_1, y_1],
     [x_V, y_V]]
  • pixelwise_masks (Optional, present if argument pixelwise_masks is set to True) Contains image-like data, same shape and layout as images, representing a pixelwise segmentation mask.

  • image_ids (Optional, present if argument image_ids is set to True) One element per sample, representing an image identifier.

Supported backends
  • ‘cpu’

Keyword Arguments
  • annotations_file (str, optional, default = ‘’) – List of paths to the JSON annotations files.

  • avoid_class_remapping (bool, optional, default = False) –

    If set to True, lasses ID values are returned directly as they are defined in the manifest file.

    Otherwise, classes’ ID values are mapped to consecutive values in range 1-number of classes disregarding exact values from the manifest (0 is reserved for a special background class.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

    Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.

  • file_root (str, optional) –

    Path to a directory that contains the data files.

    If a file list is not provided, this argument is required.

  • image_ids (bool, optional, default = False) – If set to True, the image IDs will be produced in an extra output.

  • images (str or list of str, optional) –

    A list of image paths.

    If provided, it specifies the images that will be read. The images will be read in the same order as they appear in the list, and in case of duplicates, multiple copies of the relevant samples will be produced.

    If left unspecified or set to None, all images listed in the annotation file are read exactly once, ordered by their image id.

    The paths to be kept should match exactly those in the annotations file.

    Note: This argument is mutually exclusive with preprocessed_annotations.

  • initial_fill (int, optional, default = 1024) –

    Size of the buffer that is used for shuffling.

    If random_shuffle is False, this parameter is ignored.

  • lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.

  • ltrb (bool, optional, default = False) –

    If set to True, bboxes are returned as [left, top, right, bottom].

    If set to False, the bboxes are returned as [x, y, width, height].

  • masks (bool, optional, default = False) –

    Enable polygon masks.


    Use polygon_masks instead. Note that the polygon format has changed mask_id, start_coord, end_coord to mask_id, start_vertex, end_vertex where start_coord and end_coord are total number of coordinates, effectly start_coord = 2 * start_vertex and end_coord = 2 * end_vertex. Example: A polygon with vertices [[x0, y0], [x1, y1], [x2, y2]] would be represented as [mask_id, 0, 6] when using the deprecated argument masks, but [mask_id, 0, 3] when using the new argument polygon_masks.

  • num_shards (int, optional, default = 1) –

    Partitions the data into the specified number of parts (shards).

    This is typically used for multi-GPU or multi-node training.

  • pad_last_batch (bool, optional, default = False) –

    If set to True, pads the shard by repeating the last sample.


    If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.

  • pixelwise_masks (bool, optional, default = False) – If true, segmentation masks are read and returned as pixel-wise masks. This argument is mutually exclusive with polygon_masks.

  • polygon_masks (bool, optional, default = False) –

    If set to True, segmentation mask polygons are read in the form of two outputs: polygons and vertices. This argument is mutually exclusive with pixelwise_masks.


    Currently objects with iscrowd=1 annotations are skipped.

  • prefetch_queue_depth (int, optional, default = 1) –

    Specifies the number of batches to be prefetched by the internal Loader.

    This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.

  • preprocessed_annotations (str, optional, default = ‘’) – Path to the directory with meta files that contain preprocessed COCO annotations.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_shuffle (bool, optional, default = False) –

    Determines whether to randomly shuffle data.

    A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.

  • ratio (bool, optional, default = False) – If set to True, the returned bbox and mask polygon coordinates are relative to the image dimensions.

  • read_ahead (bool, optional, default = False) –

    Determines whether the accessed data should be read ahead.

    For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.

  • save_preprocessed_annotations (bool, optional, default = False) – If set to True, the operator saves a set of files containing binary representations of the preprocessed COCO annotations.

  • save_preprocessed_annotations_dir (str, optional, default = ‘’) – Path to the directory in which to save the preprocessed COCO annotations files.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shard_id (int, optional, default = 0) – Index of the shard to read.

  • shuffle_after_epoch (bool, optional, default = False) – If set to True, the reader shuffles the entire dataset after each epoch.

  • size_threshold (float, optional, default = 0.1) – If the width or the height, in number of pixels, of a bounding box that represents an instance of an object is lower than this value, the object will be ignored.

  • skip_cached_images (bool, optional, default = False) –

    If set to True, the loading data will be skipped when the sample is in the decoder cache.

    In this case, the output of the loader will be empty.

  • skip_empty (bool, optional, default = False) – If true, reader will skip samples with no object instances in them

  • stick_to_shard (bool, optional, default = False) –

    Determines whether the reader should stick to a data shard instead of going through the entire dataset.

    If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

  • dump_meta_files (bool) –


    The argument dump_meta_files is a deprecated alias for save_preprocessed_annotations. Use save_preprocessed_annotations instead.

  • dump_meta_files_path (str) –


    The argument dump_meta_files_path is a deprecated alias for save_preprocessed_annotations_dir. Use save_preprocessed_annotations_dir instead.

  • meta_files_path (str) –


    The argument meta_files_path is a deprecated alias for preprocessed_annotations. Use preprocessed_annotations instead.

  • save_img_ids (bool) –


    The argument save_img_ids is a deprecated alias for image_ids. Use image_ids instead.