API Reference#

This page documents the public API of DALI Dynamic. For the list of all available operators, see Operation Reference.

Tensor and Batch objects#

Batch#

class nvidia.dali.experimental.dynamic.Batch(
tensors=None,
dtype=None,
device=None,
layout=None,
invocation_result=None,
copy=False,
)#

A Batch object.

This class represents a batch of tensors usable with DALI dynamic API. The tensors in the batch have the same element type, layout and number of dimensions, but can differ in shape.

A Batch can contain:

  • a single buffer and shape, owned by DALI, representing consecutive tensors

  • a list of Tensor objects.

  • a result of a lazy evaluation of a DALI operator.

In case of lazy evaluation, the operations are executed only after an attempt is made to access the tensor data or properties which cannot be obtained without running the underlying operation.

property batch_size#

The number of tensors in the batch.

static broadcast(sample, batch_size, device=None, dtype=None)#

Creates a batch by repeating a single sample batch_size times.

This function returns a batch obtained by repeating the sample sample batch_size times. Optionally, the result may be placed on the specified device (otherwise it will inherit the device from the sample argument) or converted to the desired data type.

This function yields result equivalent to as_batch([tensor(sample, dtype=dtype, device=device)] * batch_size) but is much more efficient.

cpu()#

Returns the batch on the CPU. If it’s already there, this function returns self.

property device#

The device on which the batch resides (or will reside, in case of lazy evaluation).

property dtype#

The element type of the tensors in the batch.

evaluate()#

Evaluates the underlying lazy expression, if any.

If the batch is a result of a lazy evaluation, calling evaluate will cause the expression to be evaluated. If the batch already contains concrete data, this function has no effect.

The behavior of this function is affected by the current evaluation context and current device. See EvalContext and Device for details.

The function returns self.

gpu(index=None)#

Returns the batch on the GPU. If it’s already there, this function returns self.

If index is not specified, the current CUDA device is used.

property layout#

The layout of tensors in the batch.

The “batch dimension” (commonly denoted as N) is not included - a batch of HWC images will have HWC layout, not NHWC.

property ndim#

The number of dimensions of the samples in the batch.

The “batch dimension” is not included - e.g. a batch of HWC is still a 3D object.

select(sample_range)#

Selects a range of samples.

The result of this function is either a Batch (if sample_range is a slice or a list) or a Tensor if sample_range is a number.

property shape#

The shape of the batch.

Returns the list of shapes of individual samples.

Example:

>>> import nvidia.dali.experimental.dynamic as ndd
>>> import numpy as np
>>> t0 = ndd.tensor(np.zeros((480, 640, 3)))
>>> t1 = ndd.tensor(np.zeros((720, 1280, 1)))
>>> b = ndd.as_batch([t0, t1])
>>> print(b.shape)
[(480, 640, 3), (720, 1280, 1)]
property slice#

Interface for samplewise slicing.

Regular slicing selects samples first and then slices each sample with common slicing parameters.

Samplewise slicing interface allows the slicing parmaters to be batches (with the same number of samples) and the slicing parameters are applied to respective samples.

start = Batch([1, 2, 3])
stop = Batch([4, 5, 6])
step = Batch([1, 1, 2])
sliced = input.slice[start, stop, step]
# the result is equivalent to
sliced = Batch([
    sample[start[i]:stop[i]:step[i]]
    for i, sample in enumerate(input)
])

If the slicing parameters are not batches, they are broadcast to all samples.

property tensors#

Returns an indexable list of Tensor objects that comprise the batch.

to_device(device, force_copy=False)#

Returns the data batch on the specified device.

If the batch already resides on the device specified, the function will return self unless a copy is explicitly requested by passing force_copy=True

torch(copy=None, pad=False)#

Returns self as a PyTorch tensor. Requires self to be dense and PyTorch to be installed.

Parameters:
  • copy (bool, optional, default: None) – An optional boolean value indicating how to handle copying. None - avoid copy if possible True - always copy False - raise error if copy cannot be avoided

  • pad (bool, default: False) – If True, the tensors in the batch will be padded before being stacked into one tensor. If False, an error is raised if the batch has a non-uniform shape.

Tensor#

class nvidia.dali.experimental.dynamic.Tensor(
data=None,
dtype=None,
device=None,
layout=None,
batch=None,
index_in_batch=None,
invocation_result=None,
copy=False,
)#

A Tensor object.

This class represents a single tensor usable with DALI dynamic API. It can contain any of the following:

  • tensor data owned by DALI

  • external tensor data wrapped into a DALI tensor

  • a sample taken out of a Batch object

  • a result of a lazy evaluation of a DALI operator

In case of lazy evaluation, the operations are executed only after an attempt is made to access the tensor data or properties which cannot be obtained without running the underlying operation.

cpu()#

Returns the tensor on the CPU. If it’s already there, this function returns self.

property device#

The device on which the tensor resides (or will reside, in case of lazy evaluation).

property dtype#

The type of the elements of the tensor.

evaluate()#

Evaluates the underlying lazy expression, if any.

If the tensor is a result of a lazy evaluation, calling evaluate will cause the expression to be evaluated. If the tensor already contains concrete data, this function has no effect.

The behavior of this function is affected by the current evaluation context and current device. See EvalContext and Device for details.

The function returns self.

gpu(index=None)#

Returns the tensor on the GPU. If it’s already there, this function returns self.

If index is not specified, the current CUDA device is used.

item()#

Returns the only item in the tensor. Useful for scalars (0D tensors).

property itemsize#

The size, in bytes, of a single element.

property layout#

The semantic layout of the tensor, e.g. HWC, CHW.

The layout assigns meaning to the axes. It affects the way in which the data is interpreted by some operators.

Image/video/volume layouts: H - height, W - width, D - depth, C - channels, F - frames

Audio layouts: f - frequency t - time C - channels

property nbytes#

The number of bytes required to store all elements in the tensor assuming dense packing.

property ndim#

The number of dimensions of the tensor.

A 0D tensor is a scalar and cannot be empty (it always contains a single value). Tensors with higher ndim can be empty if any of the extents is 0.

property shape#

The shape of the tensor, returned as a tuple of integers.

property size#

The number of elements in the tensor.

to_device(device, force_copy=False)#

Returns the tensor on the specified device.

If the tensor already resides on the device specified, the function will return self unless a copy is explicitly requested by passing force_copy=True

torch(copy=False)#

Returns self as a PyTorch tensor. Requires PyTorch to be installed.

Parameters:

copy (bool, default: False) – Boolean indicating whether to perform a copy.

tensor#

nvidia.dali.experimental.dynamic.tensor(data, dtype=None, device=None, layout=None, pad=False)#

Copies an existing tensor-like object into a DALI tensor.

Parameters:
  • data (TensorLike, default: None) –

    The data to construct the tensor from. It can be a tensor-like object, a (nested) list, TensorCPU/TensorGPU or other supported type. Supported types are:

    • numpy arrays

    • torch tensors

    • types exposing __dlpack__ or __array__ interface

    • existing Tensor objects

  • dtype (DType, default: None) – The desired data type of the tensor. If not specified, the data type is inferred from the input data. If specified, the input data is cast to the desired data type.

  • device (Device or str, optional, default: None) – The device on which the tensor should reside (e.g., “cpu” or “gpu”). If not specified, the device is inferred from the input data.

  • layout (str, optional, default: None) – The layout string describing the dimensions of the tensor (e.g., “HWC”). If not specified, the layout is inferred from the input data, if possible.

  • pad (bool, optional, default: False) – If True and data is a batch, the batch will be zero-padded. If False and data is a batch of non-uniformly shaped tensors, an error is raised.

as_tensor#

nvidia.dali.experimental.dynamic.as_tensor(data, dtype=None, device=None, layout=None, pad=False)#

Wraps an existing tensor-like object into a DALI tensor.

Parameters:
  • data (TensorLike, default: None) –

    The data to construct the tensor from. It can be a tensor-like object, a (nested) list, TensorCPU/TensorGPU or other supported type. Supported types are:

    • numpy arrays

    • torch tensors

    • types exposing __dlpack__ or __array__ interface

    • existing Tensor objects

  • dtype (DType, default: None) – The desired data type of the tensor. If not specified, the data type is inferred from the input data. If specified, the input data is cast to the desired data type.

  • device (Device or str, optional, default: None) – The device on which the tensor should reside (e.g., “cpu” or “gpu”). If not specified, the device is inferred from the input data.

  • layout (str, optional, default: None) – The layout string describing the dimensions of the tensor (e.g., “HWC”). If not specified, the layout is inferred from the input data, if possible.

  • pad (bool, optional, default: False) – If True and data is a batch, the batch will be zero-padded. If False and data is a batch of non-uniformly shaped tensors, an error is raised.

batch#

nvidia.dali.experimental.dynamic.batch(tensors, dtype=None, device=None, layout=None)#

Constructs a Batch object.

Constructs a batch by copying the input tensors and optionally converting them to the desired data type and storing on the specified device.

Parameters:
  • tensors (TensorLike, default: None) –

    The data to construct the batch from. Can be a list of tensors, a TensorList, or other supported types. Supported types are:

    • a Batch object; the batch is copied and the data is converted and moved to the specified device, if necessary

    • a list of tensor-like objects; the objects need to have matching number of dimensions, data types and layouts,

    • a tensor-like object; the outermost dimenion is interpreted as the batch dimension

    • a dali.backend.TensorListCPU or dali.backend.TensorListGPU

  • dtype (DType, default: None) – The desired data type of the batch. If not specified, the data type is inferred from the input tensors. If specified, the input tensors are cast to the desired data type. The dtype is required if tensors are an empty list.

  • device (Device or str, optional, default: None) – The device on which the batch should reside (e.g., “cpu” or “gpu”). If not specified, the device is inferred from the input tensors.

  • layout (str, optional, default: None) – The layout string describing the dimensions of the batch (e.g., “HWC”). If not specified, the layout is inferred from the input tensors.

as_batch#

nvidia.dali.experimental.dynamic.as_batch(tensors, dtype=None, device=None, layout=None)#

Constructs a Batch object, avoiding the copy.

Constructs a batch by viewing the input tensors as a batch. If the input tensors do not reside on the specified device or do not match the desired type, the data will be converted and/or copied, as necessary.

Parameters:
  • tensors (TensorLike, default: None) –

    The data to construct the batch from. It can be a list of tensors, a TensorList, or other supported types. In general, the input tensors must be kept alive by the caller until the batch is no longer needed. Supported types are:

    • a Batch object; the batch is copied and the data is converted and moved to the specified device, if necessary

    • a list of tensor-like objects; the objects need to have matching number of dimensions, data types and layouts,

    • a tensor-like object; the outermost dimenion is interpreted as the batch dimension

    • a dali.backend.TensorListCPU or dali.backend.TensorListGPU

  • dtype (DType, default: None) – The desired data type of the batch. If not specified, the data type is inferred from the input tensors. If specified, the input tensors are cast to the desired data type. The dtype is required if tensors are an empty list.

  • device (Device or str, optional, default: None) – The device on which the batch should reside (e.g., “cpu” or “gpu”). If not specified, the device is inferred from the input tensors.

  • layout (str, optional, default: None) – The layout string describing the dimensions of the batch (e.g., “HWC”). If not specified, the layout is inferred from the input tensors.

Data types#

These are the data type objects that DALI Dynamic uses to indicate the type of elements of Tensors and Batches. They are typically passed as dtype argument to request specific type of the tensor element. There are also several DALI-specific types, representing DALI enums.

All of the types below are instances of the DType class.

Type

Description

nvidia.dali.experimental.dynamic.int8

8-bit signed integer

nvidia.dali.experimental.dynamic.int16

16-bit signed integer

nvidia.dali.experimental.dynamic.int32

32-bit signed integer

nvidia.dali.experimental.dynamic.int64

64-bit signed integer

nvidia.dali.experimental.dynamic.uint8

8-bit unsigned integer

nvidia.dali.experimental.dynamic.uint16

16-bit unsigned integer

nvidia.dali.experimental.dynamic.uint32

32-bit unsigned integer

nvidia.dali.experimental.dynamic.uint64

64-bit unsigned integer

nvidia.dali.experimental.dynamic.float16

16-bit floating point number (IEEE 754)

nvidia.dali.experimental.dynamic.float32

32-bit floating point number (IEEE 754)

nvidia.dali.experimental.dynamic.float64

64-bit floating point number (IEEE 754)

nvidia.dali.experimental.dynamic.bfloat16

Brain Floating Point. A 16-bit floating point number with 8-bit exponent and 7-bit mantissa

nvidia.dali.experimental.dynamic.bool

Boolean value

nvidia.dali.experimental.dynamic.DataType

DALI data type. See nvidia.dali.types.DALIDataType.

nvidia.dali.experimental.dynamic.ImageType

Image type. See nvidia.dali.types.DALIImageType

nvidia.dali.experimental.dynamic.InterpType

Interpolation type. See nvidia.dali.types.DALIInterpType

Type conversion functions#

nvidia.dali.experimental.dynamic.dtype(*args)#

Returns a DType associated with given nvidia.dali.types.DALIDataType, string or numpy datatype (instance of numpy.dtype).

nvidia.dali.experimental.dynamic.type_id(dtype)#

Returns nvidia.dali.types.DALIDataType for the given dtype.

DType class#

class nvidia.dali.experimental.dynamic.DType(
kind,
bits,
exponent_bits=None,
significand_bits=None,
bytes=None,
type_id=None,
name=None,
docs=None,
)#

DALI Dynamic Mode data type.

This class is used to represent the data type of a Tensor or Batch. Data types such as uint32 are instances of this class.

Note

This class is used to represent the types supported by DALI. Creation of custom user types is not supported.

bits#

Size of the type, in bits.

Type:

int

bytes#

Size of the type, in bytes.

Type:

int

exponent_bits#

Number of exponent bits in a floating-point number.

Type:

int

kind#

kind of the type.

Type:

Kind

name#

Name of the type.

Type:

str

significand_bits#

Number of significand bits in a floating-point number.

Type:

int

type_id#

Corresponding DALI data type

Type:

DALIDataType

class Kind(value)#

Enum defining type kind in DALI dynamic mode.

signed#

Signed integer

unsigned#

Unsigned integer

float#

Floating-point number

bool#

Boolean

enum#

Enumerator type

static from_fw_type(numpy_type)#

Returns a DType associated with given numpy datatype (instance of numpy.dtype).

static from_type_id(type_id)#

Returns a DType associated with given nvidia.dali.types.DALIDataType.

static parse(name)#

Parses a type name into a DType.

Execution context#

Device#

class nvidia.dali.experimental.dynamic.Device(name, device_id=None)#

Device on which data is stored and operators are executed.

The device can be either CPU or (specific) GPU.

static current()#

Returns the device on top of the thread-local device stack.

If the stack is empty, returns the default GPU for the calling thread.

static default_device_id(device_type)#

Returns the default device id for the device type passed as an argument.

For CPU it’s always None, for GPU it’s the current CUDA device.

static from_dlpack(dlpack_device)#

Creates a Device object form a DLPack device descriptor.

static type_from_dlpack(dev_type)#

Gets the device type string froma DLPack device type enum value.

EvalContext#

class nvidia.dali.experimental.dynamic.EvalContext(*, num_threads=None, device_id=None, cuda_stream=None)#

Evaluation context for DALI dynamic API.

This class aggregates state and auxiliary objects that are necessary to execute DALI operators. These include:

  • CUDA device

  • thread pool

  • cuda stream.

EvalContext is a context manager.

cache_results(invocation, results)#

Reserved for future use

cached_results(invocation)#

Reserved for future use

property cuda_stream#

CUDA stream for this EvalContext

Note

In case of the thread’s default context, this value is affected by calls to methods set_default_stream() and set_current_stream().

static current()#

Returns the currently active EvalContext for the calling thread.

static default()#

The default EvalContext for the calling thread.

property device_id#

CUDA device ordinal of the device associated with this EvalContext.

evaluate_all()#

Evaluates all pending invocations.

property num_threads#

The number of thread pool workers in this EvalContext.

If the value was not specified at construction, get_num_threads() is used.

EvalMode#

class nvidia.dali.experimental.dynamic.EvalMode(value)#

Enum defining different evaluation modes for Dynamic Mode operations.

default#

Default evaluation mode, alias of EvalMode.eager.

deferred#

Deferred evaluation mode - operations are evaluated only when their results are needed; error reporting (including input validation) may be delayed until the results are requested. In this mode operations with unused results may be skipped and repeated operations may be merged into one.

eager#

The evaluation starts immediately. Input validation is immediate. The operations may finish asynchronously.

sync_cpu#

Synchronous evaluation mode - evaluation on the CPU finishes before the operation returns.

sync_full#

Fully synchronous evaluation mode - evaluation on all devices finishes before the operation returns.

get_num_threads#

nvidia.dali.experimental.dynamic.get_num_threads()#

Gets the number of threads in the default thread pool.

The value is determined by (in decreasing priority): 1. The value (not None) passed to set_num_threads() 2. The value from DALI_NUM_THREADS environment variable. 3. The number of CPUs in the calling process affinity list: len(os.sched_getaffinity(0))

set_num_threads#

nvidia.dali.experimental.dynamic.set_num_threads(n)#

Sets (or clears) the number of threads in the default thread pool.

Changing this value will cause all EvalContexts which were constructed without an explicitly given number of threads to recreate their associated thread pools.

Setting None will cause the default value to be used.

The value must be a positive integer and must not exceed 100 threads per CPU.

Warning

This function should be called once, at the beginning of the program. Changing this value later is very costly and should be avoided.

CUDA streams#

Stream#

class nvidia.dali.experimental.dynamic.Stream(*, stream=<object object>, device_id=None)#

Wrapper for a CUDA stream object.

This class wraps a CUDA stream object. It can be either a stream created by DALI or a compatible object created by a third-party library.

property device_id#

The CUDA device ordinal associated with this stream.

property handle#

A raw CUDA stream handle, returned as an integer.

synchronize()#

Wait until all work scheduled on this stream is complete.

stream#

nvidia.dali.experimental.dynamic.stream(*, stream=<object object>, device_id=None)#

Wraps an existing object or creates a new stream.

This function wraps a compatible stream object with a DALI Stream class or creates a new stream.

Keyword Arguments:
  • stream (a compatible stream object, None or CreateNew sentinel value, optional) –

    When this parameter contains a compatible stream object, the function returns a Stream object wrapping it. If the value is not set and contains the Stream.create_new flag, a new stream on the specified device will be returned. If None is passed, the function returns None. Compatible objects are:

    • objects exposing __cuda_stream__ interface

    • PyTorch streams

    • raw stream handles

    If stream is not a raw handle but rather a stream object created by a third-party library, it is referenced by the wrapper object returned by this function, thereby prolonging its lifetime.

  • device_id (int or None, optional) – If not None, the function will create a new stream on the device specifed or, if stream contains a stream object, the function will verify that the stream is on the device specified in device_id. When stream is None, this value is ignored.

get_default_stream#

nvidia.dali.experimental.dynamic.get_default_stream(device_id=None)#

Gets the default stream

This stream is used when not overridden by thread’s current stream (see set_current_stream()).

set_default_stream#

nvidia.dali.experimental.dynamic.set_default_stream(cuda_stream, /, device_id=None)#

Sets the default stream for a CUDA device. If the device id is not specified, the device associated with the stream will be used. If there’s no device associated with the stream, current CUDA device is used.

Passing None clears the default stream.

Warning

This function is intended to be used once, at the beginning of the program, to set the default stream for DALI operations. Calling it affects all default contexts in all threads that haven’t set their current streams with a call to set_current_stream().

get_current_stream#

nvidia.dali.experimental.dynamic.get_current_stream()#

Gets the stream associated with the calling thread’s default context.

The value returned by this function is equivalent to EvalContext.default().cuda_stream.

set_current_stream#

nvidia.dali.experimental.dynamic.set_current_stream(cuda_stream, /)#

Sets the stream associated with the calling thread’s default context for the current device.

The stream must match the current CUDA device. See Device.

In addition to changing the stream of the current thread’s default context, this method causes newly created EvalContext objects with the current device to use this stream.

Passing None resets the current thread’s default context stream. After that, the value returned by get_current_stream() will either point to the value returned by get_default_stream() or a new stream.

Warning

Setting the current stream doesn’t establish any synchronization between the work previously scheduled and new work.

Random state#

RNG#

class nvidia.dali.experimental.dynamic.random.RNG(seed=None)#

Random number generator for DALI dynamic mode operations.

This RNG can be used to provide reproducible random state to DALI operators.

Parameters:

seed (int, optional) – Seed for the random number generator. If not provided, a random seed is used.

Examples

>>> import nvidia.dali.experimental.dynamic as ndd
>>>
>>> # Create an RNG with a specific seed
>>> my_rng = ndd.random.RNG(seed=1234)
>>>
>>> # Use it with random operators
>>> result = ndd.random.uniform(range=(-1, 1), shape=[10], rng=my_rng, device="cpu")
clone()#

Create a new RNG with the same seed.

Returns:

A new RNG instance initialized with the same seed as this one. This allows creating independent RNG streams that produce the same sequence of random numbers.

Return type:

RNG

Examples

>>> import nvidia.dali.experimental.dynamic as ndd
>>>
>>> # Create an RNG
>>> rng1 = ndd.random.RNG(seed=1234)
>>>
>>> # Clone it to create an independent copy
>>> rng2 = rng1.clone()
>>>
>>> # Both will generate the same sequence
>>> for i in range(10):
>>>     assert rng1() == rng2()
property seed#

Get the seed used to initialize this RNG.