TorchData Integration Reference#

DALI Dynamic provides integration with torchdata.nodes to build composable data loading pipelines. The following node classes can be composed with standard torchdata.nodes building blocks such as Prefetcher and Loader.

Reader#

class nvidia.dali.experimental.dynamic.pytorch.nodes.Reader(reader_type, *, batch_size, output_names=None, **kwargs)#

Wraps a reader as a node, yielding dictionaries.

Parameters:

reader_type¶ (reader subclass) – The type of the reader to construct.
batch_size¶ (int, optional) – The batch size to pass to next_epoch(). If None, the iterator returns tensors.
output_names¶ (iterable of str, optional) – Names of the outputs, used as keys in the output dict. If the reader has two outputs, it can be omited and defaults to ("data", "label").
**kwargs¶ – Additional keyword arguments to pass to the reader constructor.

get_metadata()#: Returns the metadata of the underlying reader operator

get_state()#

Subclasses must implement this method, instead of state_dict(). Should only be called by BaseNode.

Returns:: Dict[str, Any] - a state dict that may be passed to reset() at some point in the future

next()#

Subclasses must implement this method, instead of __next__. Should only be called by BaseNode.

Returns:: T - the next value in the sequence, or throw StopIteration

reset(initial_state=None)#

Resets the iterator to the beginning, or to the state passed in by initial_state.

Reset is a good place to put expensive initialization, as it will be lazily called when next() or state_dict() is called. Subclasses must call super().reset(initial_state).

Parameters:: initial_state¶ – Optional[dict] - a state dict to pass to the node. If None, reset to the beginning.

DictMapper#

class nvidia.dali.experimental.dynamic.pytorch.nodes.DictMapper(source, map_fn, key='data')#

Applies a transform to a single key in the dict yielded by a source node.

Parameters:

source¶ (torchdata.nodes.BaseNode) – The source node to pull from. Yields dictionaries of tensors or batches.
map_fn¶ (callable) – The function to apply to the specified key. Must return a tensor or batch.
key¶ (str, optional) – The key to apply the function to. Defaults to "data".

get_state()#

Subclasses must implement this method, instead of state_dict(). Should only be called by BaseNode.

Returns:: Dict[str, Any] - a state dict that may be passed to reset() at some point in the future

next()#

Subclasses must implement this method, instead of __next__. Should only be called by BaseNode.

Returns:: T - the next value in the sequence, or throw StopIteration

reset(initial_state=None)#

Resets the iterator to the beginning, or to the state passed in by initial_state.

Reset is a good place to put expensive initialization, as it will be lazily called when next() or state_dict() is called. Subclasses must call super().reset(initial_state).

Parameters:: initial_state¶ – Optional[dict] - a state dict to pass to the node. If None, reset to the beginning.

ToTorch#

class nvidia.dali.experimental.dynamic.pytorch.nodes.ToTorch(source, output_stream=None)#

Converts dictionaries of tensors or batches to tuples of torch.Tensor.

Parameters:

source¶ (torchdata.nodes.BaseNode) – The source node to pull data from. Yields dictionaries of tensors or batches.
output_stream¶ (a compatible stream object, optional) – The CUDA stream on which the output tensors will be used. If provided, ensure that work on this stream will wait for any pending GPU operations before the tensors are consumed. Defaults to the current CUDA stream at the time of construction.

get_state()#

Subclasses must implement this method, instead of state_dict(). Should only be called by BaseNode.

Returns:: Dict[str, Any] - a state dict that may be passed to reset() at some point in the future

next()#

Subclasses must implement this method, instead of __next__. Should only be called by BaseNode.

Returns:: T - the next value in the sequence, or throw StopIteration

reset(initial_state=None)#

Resets the iterator to the beginning, or to the state passed in by initial_state.

Reset is a good place to put expensive initialization, as it will be lazily called when next() or state_dict() is called. Subclasses must call super().reset(initial_state).

Parameters:: initial_state¶ – Optional[dict] - a state dict to pass to the node. If None, reset to the beginning.

Usage Pattern#

A typical pipeline composes these nodes with torchdata.nodes utilities:

import nvidia.dali.experimental.dynamic as ndd
import torchdata.nodes as tn

reader_node = ndd.pytorch.nodes.Reader(
    ndd.readers.File,
    batch_size=batch_size,
    file_root=data_dir,
    random_shuffle=True,
)
mapper_node = ndd.pytorch.nodes.DictMapper(
    source=reader_node,
    map_fn=my_processing_function,
)
torch_node = ndd.pytorch.nodes.ToTorch(mapper_node)
prefetch_node = tn.Prefetcher(torch_node, prefetch_factor=2)
loader = tn.Loader(prefetch_node)

for images, labels in loader:
    # images, labels are torch.Tensors on GPU
    ...

The above snippet defines the following simple graph:

$digraph simple_pipeline { rankdir=LR; bgcolor="transparent"; margin="0.71, 0"; dpi=300; node [shape=box, style="filled,rounded", fontname="NVIDIA Sans, sans-serif", fontsize=14, penwidth=0]; edge [color="#707070"]; Reader [fillcolor="#76B900", fontcolor="white", label="Reader"]; DictMapper [fillcolor="#76B900", fontcolor="white", label="DictMapper"]; ToTorch [fillcolor="#76B900", fontcolor="white", label="ToTorch"]; Prefetcher [fillcolor="#DE3412", fontcolor="white", label="Prefetcher"]; Reader -> DictMapper -> ToTorch -> Prefetcher; }$

In practice, torchdata.nodes allows composing nodes to define complex graphs. For instance,
if we wanted to apply a transformation to the labels we could add another DictMapper node
that takes the output of the Reader. torchdata.nodes.Mapper() can be used to
combine the outputs:

$digraph complex_pipeline { rankdir=LR; bgcolor="transparent"; dpi=300; node [shape=box, style="filled,rounded", fontname="NVIDIA Sans, sans-serif", fontsize=14, penwidth=0, fontcolor="white"]; edge [color="#707070"]; Reader [fillcolor="#76B900", label="Reader"]; DictMapper1 [fillcolor="#76B900", label="DictMapper"]; DictMapper2 [fillcolor="#76B900", label="DictMapper"]; Mapper [fillcolor="#DE3412", label="Mapper"]; ToTorch [fillcolor="#76B900", label="ToTorch"]; Prefetcher [fillcolor="#DE3412", label="Prefetcher"]; Reader -> DictMapper1 -> Mapper; Reader -> DictMapper2 -> Mapper; Mapper -> ToTorch -> Prefetcher; }$