API Reference#

This page documents the public API of DALI Dynamic. For the list of all available operators, see Operation Reference.

Tensor and Batch objects#

Batch#

class nvidia.dali.experimental.dynamic.Batch(tensors=None, dtype=None, device=None, layout=None, invocation_result=None, copy=False)#

A Batch object.

This class represents a batch of tensors usable with DALI dynamic API. The tensors in the batch have the same element type, layout and number of dimensions, but can differ in shape.

A Batch can contain:

  • a single buffer and shape, owned by DALI, representing consecutive tensors

  • a list of Tensor objects.

  • a result of a lazy evaluation of a DALI operator.

In case of lazy evaluation, the operations are executed only after an attempt is made to access the tensor data or properties which cannot be obtained without running the underlying operation.

property batch_size#

The number of tensors in the batch.

static broadcast(sample, batch_size, device=None, dtype=None)#

Creates a batch by repeating a single sample batch_size times.

This function returns a batch obtained by repeating the sample sample batch_size times. Optionally, the result may be placed on the specified device (otherwise it will inherit the device from the sample argument) or converted to the desired data type.

This function yields result equivalent to as_batch([tensor(sample, dtype=dtype, device=device)] * batch_size) but is much more efficient.

cpu()#

Returns the batch on the CPU. If it’s already there, this function returns self.

property device#

The device on which the batch resides (or will reside, in case of lazy evaluation).

property dtype#

The element type of the tensors in the batch.

evaluate()#

Evaluates the underlying lazy expression, if any.

If the batch is a result of a lazy evaluation, calling evaluate will cause the expression to be evaluated. If the batch already contains concrete data, this function has no effect.

The behavior of this function is affected by the current evaluation context and current device. See EvalContext and Device for details.

The function returns self.

gpu(index=None)#

Returns the batch on the GPU. If it’s already there, this function returns self.

If index is not specified, the current CUDA device is used.

property layout#

The layout of tensors in the batch.

The “batch dimension” (commonly denoted as N) is not included - a batch of HWC images will have HWC layout, not NHWC.

property ndim#

The number of dimensions of the samples in the batch.

The “batch dimension” is not included - e.g. a batch of HWC is still a 3D object.

select(sample_range)#

Selects a range of samples.

The result of this function is either a Batch (if sample_range is a slice or a list) or a Tensor if sample_range is a number.

property shape#

The shape of the batch.

Returns the list of shapes of individual samples.

Example:

>>> import nvidia.dali.experimental.dynamic as ndd
>>> import numpy as np
>>> t0 = ndd.tensor(np.zeros((480, 640, 3)))
>>> t1 = ndd.tensor(np.zeros((720, 1280, 1)))
>>> b = ndd.as_batch([t0, t1])
>>> print(b.shape)
[(480, 640, 3), (720, 1280, 1)]
property slice#

Interface for samplewise slicing.

Regular slicing selects samples first and then slices each sample with common slicing parameters.

Samplewise slicing interface allows the slicing parmaters to be batches (with the same number of samples) and the slicing parameters are applied to respective samples.

start = Batch([1, 2, 3])
stop = Batch([4, 5, 6])
step = Batch([1, 1, 2])
sliced = input.slice[start, stop, step]
# the result is equivalent to
sliced = Batch([
    sample[start[i]:stop[i]:step[i]]
    for i, sample in enumerate(input)
])

If the slicing parameters are not batches, they are broadcast to all samples.

property tensors#

Returns an indexable list of Tensor objects that comprise the batch.

to_device(device, force_copy=False)#

Returns the data batch on the specified device.

If the batch already resides on the device specified, the function will return self unless a copy is explicitly requested by passing force_copy=True

Tensor#

class nvidia.dali.experimental.dynamic.Tensor(data=None, dtype=None, device=None, layout=None, batch=None, index_in_batch=None, invocation_result=None, copy=False)#

A Tensor object.

This class represents a single tensor usable with DALI dynamic API. It can contain any of the following:

  • tensor data owned by DALI

  • external tensor data wrapped into a DALI tensor

  • a sample taken out of a Batch object

  • a result of a lazy evaluation of a DALI operator

In case of lazy evaluation, the operations are executed only after an attempt is made to access the tensor data or properties which cannot be obtained without running the underlying operation.

cpu()#

Returns the tensor on the CPU. If it’s already there, this function returns self.

property device#

The device on which the tensor resides (or will reside, in case of lazy evaluation).

property dtype#

The type of the elements of the tensor.

evaluate()#

Evaluates the underlying lazy expression, if any.

If the tensor is a result of a lazy evaluation, calling evaluate will cause the expression to be evaluated. If the tensor already contains concrete data, this function has no effect.

The behavior of this function is affected by the current evaluation context and current device. See EvalContext and Device for details.

The function returns self.

gpu(index=None)#

Returns the tensor on the GPU. If it’s already there, this function returns self.

If index is not specified, the current CUDA device is used.

item()#

Returns the only item in the tensor. Useful for scalars (0D tensors).

property itemsize#

The size, in bytes, of a single element.

property layout#

The semantic layout of the tensor, e.g. HWC, CHW.

The layout assigns meaning to the axes. It affects the way in which the data is interpreted by some operators.

Image/video/volume layouts: H - height, W - width, D - depth, C - channels, F - frames

Audio layouts: f - frequency t - time C - channels

property nbytes#

The number of bytes required to store all elements in the tensor assuming dense packing.

property ndim#

The number of dimensions of the tensor.

A 0D tensor is a scalar and cannot be empty (it always contains a single value). Tensors with higher ndim can be empty if any of the extents is 0.

property shape#

The shape of the tensor, returned as a tuple of integers.

property size#

The number of elements in the tensor.

to_device(device, force_copy=False)#

Returns the tensor on the specified device.

If the tensor already resides on the device specified, the function will return self unless a copy is explicitly requested by passing force_copy=True

tensor#

nvidia.dali.experimental.dynamic.tensor(data, dtype=None, device=None, layout=None)#

Copies an existing tensor-like object into a DALI tensor.

Parameters:
  • data (TensorLike, default: None) –

    The data to construct the tensor from. It can be a tensor-like object, a (nested) list, TensorCPU/TensorGPU or other supported type. Supported types are:

    • numpy arrays

    • torch tensors

    • types exposing __dlpack__ or __array__ interface

    • existing Tensor objects

  • dtype (DType, default: None) – The desired data type of the tensor. If not specified, the data type is inferred from the input data. If specified, the input data is cast to the desired data type.

  • device (Device or str, optional, default: None) – The device on which the tensor should reside (e.g., “cpu” or “gpu”). If not specified, the device is inferred from the input data.

  • layout (str, optional, default: None) – The layout string describing the dimensions of the tensor (e.g., “HWC”). If not specified, the layout is inferred from the input data, if possible.

as_tensor#

nvidia.dali.experimental.dynamic.as_tensor(data, dtype=None, device=None, layout=None)#

Wraps an existing tensor-like object into a DALI tensor.

Parameters:
  • data (TensorLike, default: None) –

    The data to construct the tensor from. It can be a tensor-like object, a (nested) list, TensorCPU/TensorGPU or other supported type. Supported types are:

    • numpy arrays

    • torch tensors

    • types exposing __dlpack__ or __array__ interface

    • existing Tensor objects

  • dtype (DType, default: None) – The desired data type of the tensor. If not specified, the data type is inferred from the input data. If specified, the input data is cast to the desired data type.

  • device (Device or str, optional, default: None) – The device on which the tensor should reside (e.g., “cpu” or “gpu”). If not specified, the device is inferred from the input data.

  • layout (str, optional, default: None) – The layout string describing the dimensions of the tensor (e.g., “HWC”). If not specified, the layout is inferred from the input data, if possible.

batch#

nvidia.dali.experimental.dynamic.batch(tensors, dtype=None, device=None, layout=None)#

Constructs a Batch object.

Constructs a batch by copying the input tensors and optionally converting them to the desired data type and storing on the specified device.

Parameters:
  • tensors (TensorLike, default: None) –

    The data to construct the batch from. Can be a list of tensors, a TensorList, or other supported types. Supported types are:

    • a Batch object; the batch is copied and the data is converted and moved to the specified device, if necessary

    • a list of tensor-like objects; the objects need to have matching number of dimensions, data types and layouts,

    • a tensor-like object; the outermost dimenion is interpreted as the batch dimension

    • a dali.backend.TensorListCPU or dali.backend.TensorListGPU

  • dtype (DType, default: None) – The desired data type of the batch. If not specified, the data type is inferred from the input tensors. If specified, the input tensors are cast to the desired data type. The dtype is required if tensors are an empty list.

  • device (Device or str, optional, default: None) – The device on which the batch should reside (e.g., “cpu” or “gpu”). If not specified, the device is inferred from the input tensors.

  • layout (str, optional, default: None) – The layout string describing the dimensions of the batch (e.g., “HWC”). If not specified, the layout is inferred from the input tensors.

as_batch#

nvidia.dali.experimental.dynamic.as_batch(tensors, dtype=None, device=None, layout=None)#

Constructs a Batch object, avoiding the copy.

Constructs a batch by viewing the input tensors as a batch. If the input tensors do not reside on the specified device or do not match the desired type, the data will be converted and/or copied, as necessary.

Parameters:
  • tensors (TensorLike, default: None) –

    The data to construct the batch from. It can be a list of tensors, a TensorList, or other supported types. In general, the input tensors must be kept alive by the caller until the batch is no longer needed. Supported types are:

    • a Batch object; the batch is copied and the data is converted and moved to the specified device, if necessary

    • a list of tensor-like objects; the objects need to have matching number of dimensions, data types and layouts,

    • a tensor-like object; the outermost dimenion is interpreted as the batch dimension

    • a dali.backend.TensorListCPU or dali.backend.TensorListGPU

  • dtype (DType, default: None) – The desired data type of the batch. If not specified, the data type is inferred from the input tensors. If specified, the input tensors are cast to the desired data type. The dtype is required if tensors are an empty list.

  • device (Device or str, optional, default: None) – The device on which the batch should reside (e.g., “cpu” or “gpu”). If not specified, the device is inferred from the input tensors.

  • layout (str, optional, default: None) – The layout string describing the dimensions of the batch (e.g., “HWC”). If not specified, the layout is inferred from the input tensors.

Execution context#

Device#

class nvidia.dali.experimental.dynamic.Device(name, device_id=None)#

Device on which data is stored and operators are executed.

The device can be either CPU or (specific) GPU.

static current()#

Returns the device on top of the thread-local device stack.

If the stack is empty, returns the default GPU for the calling thread.

static default_device_id(device_type)#

Returns the default device id for the device type passed as an argument.

For CPU it’s always 0, for GPU it’s the current CUDA device.

static from_dlpack(dlpack_device)#

Creates a Device object form a DLPack device descriptor.

static type_from_dlpack(dev_type)#

Gets the device type string froma DLPack device type enum value.

EvalContext#

class nvidia.dali.experimental.dynamic.EvalContext(*, num_threads=None, device_id=None, cuda_stream=None)#

Evaluation context for DALI dynamic API.

This class aggregates state and auxiliary objects that are necessary to execute DALI operators. These include:

  • CUDA device

  • thread pool

  • cuda stream.

EvalContext is a context manager.

cache_results(invocation, results)#

Reserved for future use

cached_results(invocation)#

Reserved for future use

property cuda_stream#

CUDA stream for this EvalContext

static current()#

Returns the currently active EvalContext for the calling thread.

property device_id#

CUDA device ordinal of the device associated with this EvalContext.

evaluate_all()#

Evaluates all pending invocations.

EvalMode#

class nvidia.dali.experimental.dynamic.EvalMode(value)#

Enum defining different evaluation modes for Dynamic Mode operations.

default#

Default evaluation mode. TBD.

deferred#

Deferred evaluation mode - operations are evaluated only when their results are needed; error reporting (including input validation) may be delayed until the results are requested. In this mode operations with unused results may be skipped and repeated operations may be merged into one.

eager#

The evaluation starts immediately. Input validation is immediate. The operations may finish asynchronously.

sync_cpu#

Synchronous evaluation mode - evaluation on the CPU finishes before the operation returns.

sync_full#

Fully synchronous evaluation mode - evaluation on all devices finishes before the operation returns.

Random state#

RNG#

class nvidia.dali.experimental.dynamic.random.RNG(seed=None)#

Random number generator for DALI dynamic mode operations.

This RNG can be used to provide reproducible random state to DALI operators.

Parameters:

seed (int, optional) – Seed for the random number generator. If not provided, a random seed is used.

Examples

>>> import nvidia.dali.experimental.dynamic as ndd
>>>
>>> # Create an RNG with a specific seed
>>> my_rng = ndd.random.RNG(seed=1234)
>>>
>>> # Use it with random operators
>>> result = ndd.ops.random.Uniform(device="cpu")(range=(-1, 1), shape=[10], rng=my_rng)
clone()#

Create a new RNG with the same seed.

Returns:

A new RNG instance initialized with the same seed as this one. This allows creating independent RNG streams that produce the same sequence of random numbers.

Return type:

RNG

Examples

>>> import nvidia.dali.experimental.dynamic as ndd
>>>
>>> # Create an RNG
>>> rng1 = ndd.random.RNG(seed=1234)
>>>
>>> # Clone it to create an independent copy
>>> rng2 = rng1.clone()
>>>
>>> # Both will generate the same sequence
>>> for i in range(10):
>>>     assert rng1() == rng2()
property seed#

Get the seed used to initialize this RNG.