This API is experimental and subject to change without notice!
Functional API is designed to simplify the usage of DALI operators in a psuedo-imperative way.
It exposes operators as functions, with ths same name as the operator class, but converted
to snake_case - for example
ops.FileReader will be exposed as
import nvidia.dali as dali pipe = dali.pipeline.Pipeline(batch_size = 3, num_threads = 2, device_id = 0) with pipe: files, labels = dali.fn.file_reader(file_root = "./my_file_root") images = dali.fn.image_decoder(files, device = "mixed") images = dali.fn.rotate(images, angle = dali.fn.uniform(range=(-45,45))) images = dali.fn.resize(images, resize_x = 300, resize_y = 300) pipe.set_outputs(images, labels) pipe.build() outputs = pipe.run()
The use of functional API does not change other aspects of pipeline definition - the functions
still operate on and return
Interoperability with operator objects¶
Functional API is, for the major part, only a wrapper around operator objects - as such, it is inherently compatible with the object-based API. The following example mixes the two, using object API to pre-configure a file reader and a resize operator:
pipe = dali.pipeline.Pipeline(batch_size = 3, num_threads = 2, device_id = 0) reader = dali.ops.FileReader(file_root = ".") resize = dali.ops.Resize(device = "gpu", resize_x = 300, resize_y = 300) with pipe: files, labels = reader() images = dali.fn.image_decoder(files, device = "mixed") images = dali.fn.rotate(images, angle = dali.fn.uniform(range=(-45,45))) images = resize(images) pipe.set_outputs(images, labels) pipe.build() outputs = pipe.run()
external_source(source=None, num_outputs=None, *, cycle=None, name=None, device='cpu', layout=None, cuda_stream=None, use_copy_kernel=None, **kwargs)¶
Creates a data node which is populated with data from a Python source. The data can be provided by the
sourcefunction or iterable, or it can be provided by
pipeline.feed_input(name, data, layout, cuda_stream)inside
In the case of the GPU input, it is the user responsibility to modify the provided GPU memory content only using provided stream (DALI schedules a copy on it and all work is properly queued). If no stream is provided feeding input blocks until the provided memory is copied to the internal buffer.
To return a batch of copies of the same tensor, use
nvidia.dali.types.Constant(), which is more performant.
source (callable or iterable) – The source of the data. The source is polled for data (via a call
next(source)whenever the pipeline needs input for the next iteration. The source can supply one or more data batches, depending on the value of
num_outputsis not set, the
sourceis expected to return a single batch. If it’s specified, the data is expected to a be tuple or list where each element corresponds to respective return value of the external_source. If the source is a callable and has a positional argument, it is assumed to be the current iteration number and consecutive calls will be
source(1), etc. If the source is a generator function, it is invoked and treated as an iterable - however, unlike a generator, it can be used with
cycle, in which case the function will be called again when the generator reaches end of iteration. In the case of the GPU input, it is the user responsibility to modify the provided GPU memory content only using provided stream (DALI schedules a copy on it and all work is properly queued). If no stream is provided, DALI will use a default, with best-effort approach at correctness (see
cuda_streamargument documentation for details).
num_outputs (int, optional) – If specified, denotes the number of TensorLists produced by the source function
- Keyword Arguments
cycle (bool) – If
True, the source will be wrapped. Otherwise, StopIteration will be raised when end of data is reached. This flag requires that
sourceis either a collection, i.e. an iterable object where
iter(source)will return a fresh iterator on each call or a generator function. In the latter case, the generator function will be called again when more data is requested than was yielded by the function.
name (str, optional) – The name of the data node - used when feeding the data in
iter_setup; can be omitted if the data is provided by
layout (layout str or list/tuple thereof) – If provided, sets the layout of the data. When
num_outputs > 1, layout can be a list containing a distinct layout for each output. If the list has fewer elements than
num_outputs, only the first outputs have the layout set, the reset have it cleared.
cuda_stream (optional, cudaStream_t or an object convertible to cudaStream_t, e.g. cupy.cuda.Stream, torch.cuda.Stream) –
The CUDA stream, which is going to be used for copying data to GPU or from a GPU source. If not set, best effort will be taken to maintain correctness - i.e. if the data is provided as a tensor/array from a recognized library (CuPy, PyTorch), the library’s current stream is used. This should work in typical scenarios, but advanced use cases (and code using unsupported libraries) may still need to supply the stream handle explicitly.
- Special values:
0 - use default CUDA stream
-1 - use DALI’s internal stream
If internal stream is used, the call to
feed_inputwill block until the copy to internal buffer is complete, since there’s no way to synchronize with this stream to prevent overwriting the array with new data in another stream.
use_copy_kernel (optional, bool) – If set to True, DALI will use a CUDA kernel to feed the data (only applicable when copying data to/from GPU memory) instead of cudaMemcpyAsync (default).
blocking (optional, Whether external source should block until data is available or just) – fail when it is not
no_copy (Whether DALI should copy the buffer when feed_input is called) –
If True, DALI passes the user memory directly to the Pipeline, instead of copying. It is the user’s responsibility to keep the buffer alive and unmodified until it is consumed by the pipeline.
The buffer can be modified or freed again after the relevant iteration output has been consumed. (when they are not equal) iterations following the``feed_input`` call. Effectively, it happens after
cpu_queue_depth * gpu_queue_depth
Provided memory must match the specified device parameter of the operator. For CPU, the provided memory can be one contiguous buffer or a list of contiguous Tensors. For GPU to not do any copies the provided buffer must be contiguous. If user provides a list of separate Tensors there will be an additional internal copy made.