Sharding¶
Sharding allows DALI to partition the dataset into nonoverlapping pieces on which each DALI pipeline
instance can work. This functionality addresses the issue of having a global and a shared state
that allows the distribution of training samples among the ranks. After each epoch, by default,
the DALI pipeline advances to the next shard to increase the entropy of the data that is seen by
this pipeline. You can alter this behavior by setting the stick_to_shard
reader parameter.
This mode of operation, however, leads to problems when the dataset size is not divisible by the
number of pipelines used or when the shard size is not divisible by the batch size. To address this
issue, and adjust the behavior, you can use the pad_last_batch
reader parameter.
This parameter asks the reader to duplicate the last sample in the last batch of a shard, which prevents DALI from reading data from the next shard when the batch doesn’t divide its size. The parameter also ensures that all pipelines return the same number of batches, when one batch is divisible by the batch size but other batches are bigger by one sample. This process pads every shard to the same size, which is a multiple of the batch size.
Framework iterator configuration¶
DALI is used in the Deep Learning Frameworks through dedicated iterators, and these iterators need to be aware of this padding and other reader properties.
Here are the iterator options:
reader_name
- Allows you to provide the name of the reader that drives the iterator and provides the necessary parameters.Note
We recommend that you use this option, so that the next two options (
size
andlast_batch_padded
) are obtained automatically from the pipeline configuration. If it is used, thesize
andlast_batch_padded
should not be provided explicitly to the iterator.This option is more flexible and accurate and takes into account that shard size for a pipeline can differ between epochs when the shards are rotated.size
: Provides the size of the shard for an iterator or, if there is more than one shard, the sum of all shard sizes for all wrapped pipelines.last_batch_padded
: Determines whether the tail of the data consists of data from the next shard (False
) or is duplicated dummy data (True
).It is applicable when the shard size is not a multiple of the batch size,last_batch_policy
- Determines the handling of the last batch when the shard size is not divisible by the batch size.It affects batches only partially filled with the data. SeeLastBatchPolicy()
enum for possible values..fill_last_batch
– (Deprecated in favour oflast_batch_policy
) Determines whether the last batch should be full, regardless of whether the shard size is divisible by the batch size.
Enums¶
-
class
nvidia.dali.plugin.base_iterator.
LastBatchPolicy
(value)¶ Describes the last batch policy behavior when there are not enough samples in the epoch to fill a whole batch.
FILL - The last batch is filled by either repeating the last sample or by wrapping up the data set. The precise behavior depends on the reader’s
pad_last_batch
argumentDROP - The last batch is dropped if it cannot be fully filled with data from the current epoch
PARTIAL - The last batch is partially filled with the remaining data from the current epoch, keeping the rest of the samples empty
-
FILL
¶
-
DROP
¶
-
PARTIAL
¶
Shard calculation¶
Here is the formula to calculate the shard size for a shard ID:
floor((id + 1) * dataset_size / num_shards) - floor(id * dataset_size / num_shards)
When the pipeline advances through the epochs and the reader moves to the next shard, the formula needs to be extended to reflect this change:
floor(((id + epoch_num) % num_shards + 1) * dataset_size / num_shards) - floor(((id + epoch_num) % num_shards) * dataset_size / num_shards)
When the second formula is used, providing a size value once at the beginning of the training works
only when the stick_to_shard
reader option is enabled and prevents DALI from rotating shards.
When this occurs, use the first formula.
To address these challenges, use the reader_name
parameter and allow the iterator to
handle the configuration automatically.