API Reference#
This page documents the public API of DALI Dynamic. For the list of all available operators, see Operation Reference.
Tensor and Batch objects#
Batch#
- class nvidia.dali.experimental.dynamic.Batch(
- tensors=None,
- dtype=None,
- device=None,
- layout=None,
- invocation_result=None,
- copy=False,
A Batch object.
This class represents a batch of tensors usable with DALI dynamic API. The tensors in the batch have the same element type, layout and number of dimensions, but can differ in shape.
A
Batchcan contain:a single buffer and shape, owned by DALI, representing consecutive tensors
a list of
Tensorobjects.a result of a lazy evaluation of a DALI operator.
In case of lazy evaluation, the operations are executed only after an attempt is made to access the tensor data or properties which cannot be obtained without running the underlying operation.
- property batch_size#
The number of tensors in the batch.
- static broadcast(sample, batch_size, device=None, dtype=None)#
Creates a batch by repeating a single
samplebatch_sizetimes.This function returns a batch obtained by repeating the sample
samplebatch_sizetimes. Optionally, the result may be placed on the specified device (otherwise it will inherit the device from thesampleargument) or converted to the desired data type.This function yields result equivalent to
as_batch([tensor(sample, dtype=dtype, device=device)] * batch_size)but is much more efficient.
- property device#
The device on which the batch resides (or will reside, in case of lazy evaluation).
- property dtype#
The element type of the tensors in the batch.
- evaluate()#
Evaluates the underlying lazy expression, if any.
If the batch is a result of a lazy evaluation, calling evaluate will cause the expression to be evaluated. If the batch already contains concrete data, this function has no effect.
The behavior of this function is affected by the current evaluation context and current device. See
EvalContextandDevicefor details.The function returns
self.
- gpu(index=None)#
Returns the batch on the GPU. If it’s already there, this function returns
self.If index is not specified, the current CUDA device is used.
- property layout#
The layout of tensors in the batch.
The “batch dimension” (commonly denoted as N) is not included - a batch of HWC images will have HWC layout, not NHWC.
- property ndim#
The number of dimensions of the samples in the batch.
The “batch dimension” is not included - e.g. a batch of HWC is still a 3D object.
- select(sample_range)#
Selects a range of samples.
The result of this function is either a
Batch(ifsample_rangeis a slice or a list) or aTensorifsample_rangeis a number.
- property shape#
The shape of the batch.
Returns the list of shapes of individual samples.
Example:
>>> import nvidia.dali.experimental.dynamic as ndd >>> import numpy as np >>> t0 = ndd.tensor(np.zeros((480, 640, 3))) >>> t1 = ndd.tensor(np.zeros((720, 1280, 1))) >>> b = ndd.as_batch([t0, t1]) >>> print(b.shape) [(480, 640, 3), (720, 1280, 1)]
- property slice#
Interface for samplewise slicing.
Regular slicing selects samples first and then slices each sample with common slicing parameters.
Samplewise slicing interface allows the slicing parmaters to be batches (with the same number of samples) and the slicing parameters are applied to respective samples.
start = Batch([1, 2, 3]) stop = Batch([4, 5, 6]) step = Batch([1, 1, 2]) sliced = input.slice[start, stop, step] # the result is equivalent to sliced = Batch([ sample[start[i]:stop[i]:step[i]] for i, sample in enumerate(input) ])
If the slicing parameters are not batches, they are broadcast to all samples.
- to_device(device, force_copy=False)#
Returns the data batch on the specified device.
If the batch already resides on the device specified, the function will return
selfunless a copy is explicitly requested by passingforce_copy=True
- torch(copy=None, pad=False)#
Returns
selfas a PyTorch tensor. Requiresselfto be dense and PyTorch to be installed.- Parameters:
copy¶ (bool, optional, default: None) – An optional boolean value indicating how to handle copying. None - avoid copy if possible True - always copy False - raise error if copy cannot be avoided
pad¶ (bool, default: False) – If True, the tensors in the batch will be padded before being stacked into one tensor. If False, an error is raised if the batch has a non-uniform shape.
Tensor#
- class nvidia.dali.experimental.dynamic.Tensor(
- data=None,
- dtype=None,
- device=None,
- layout=None,
- batch=None,
- index_in_batch=None,
- invocation_result=None,
- copy=False,
A Tensor object.
This class represents a single tensor usable with DALI dynamic API. It can contain any of the following:
tensor data owned by DALI
external tensor data wrapped into a DALI tensor
a sample taken out of a Batch object
a result of a lazy evaluation of a DALI operator
In case of lazy evaluation, the operations are executed only after an attempt is made to access the tensor data or properties which cannot be obtained without running the underlying operation.
- property device#
The device on which the tensor resides (or will reside, in case of lazy evaluation).
- property dtype#
The type of the elements of the tensor.
- evaluate()#
Evaluates the underlying lazy expression, if any.
If the tensor is a result of a lazy evaluation, calling evaluate will cause the expression to be evaluated. If the tensor already contains concrete data, this function has no effect.
The behavior of this function is affected by the current evaluation context and current device. See
EvalContextandDevicefor details.The function returns
self.
- gpu(index=None)#
Returns the tensor on the GPU. If it’s already there, this function returns
self.If index is not specified, the current CUDA device is used.
- item()#
Returns the only item in the tensor. Useful for scalars (0D tensors).
- property itemsize#
The size, in bytes, of a single element.
- property layout#
The semantic layout of the tensor, e.g. HWC, CHW.
The layout assigns meaning to the axes. It affects the way in which the data is interpreted by some operators.
Image/video/volume layouts: H - height, W - width, D - depth, C - channels, F - frames
Audio layouts: f - frequency t - time C - channels
- property nbytes#
The number of bytes required to store all elements in the tensor assuming dense packing.
- property ndim#
The number of dimensions of the tensor.
A 0D tensor is a scalar and cannot be empty (it always contains a single value). Tensors with higher ndim can be empty if any of the extents is 0.
- property shape#
The shape of the tensor, returned as a tuple of integers.
- property size#
The number of elements in the tensor.
tensor#
- nvidia.dali.experimental.dynamic.tensor(data, dtype=None, device=None, layout=None, pad=False)#
Copies an existing tensor-like object into a DALI tensor.
- Parameters:
data¶ (TensorLike, default: None) –
The data to construct the tensor from. It can be a tensor-like object, a (nested) list, TensorCPU/TensorGPU or other supported type. Supported types are:
numpy arrays
torch tensors
types exposing __dlpack__ or __array__ interface
existing
Tensorobjects
dtype¶ (DType, default: None) – The desired data type of the tensor. If not specified, the data type is inferred from the input data. If specified, the input data is cast to the desired data type.
device¶ (Device or str, optional, default: None) – The device on which the tensor should reside (e.g., “cpu” or “gpu”). If not specified, the device is inferred from the input data.
layout¶ (str, optional, default: None) – The layout string describing the dimensions of the tensor (e.g., “HWC”). If not specified, the layout is inferred from the input data, if possible.
pad¶ (bool, optional, default: False) – If
Trueanddatais a batch, the batch will be zero-padded. IfFalseanddatais a batch of non-uniformly shaped tensors, an error is raised.
as_tensor#
- nvidia.dali.experimental.dynamic.as_tensor(data, dtype=None, device=None, layout=None, pad=False)#
Wraps an existing tensor-like object into a DALI tensor.
- Parameters:
data¶ (TensorLike, default: None) –
The data to construct the tensor from. It can be a tensor-like object, a (nested) list, TensorCPU/TensorGPU or other supported type. Supported types are:
numpy arrays
torch tensors
types exposing __dlpack__ or __array__ interface
existing
Tensorobjects
dtype¶ (DType, default: None) – The desired data type of the tensor. If not specified, the data type is inferred from the input data. If specified, the input data is cast to the desired data type.
device¶ (Device or str, optional, default: None) – The device on which the tensor should reside (e.g., “cpu” or “gpu”). If not specified, the device is inferred from the input data.
layout¶ (str, optional, default: None) – The layout string describing the dimensions of the tensor (e.g., “HWC”). If not specified, the layout is inferred from the input data, if possible.
pad¶ (bool, optional, default: False) – If
Trueanddatais a batch, the batch will be zero-padded. IfFalseanddatais a batch of non-uniformly shaped tensors, an error is raised.
batch#
- nvidia.dali.experimental.dynamic.batch(tensors, dtype=None, device=None, layout=None)#
Constructs a
Batchobject.Constructs a batch by copying the input tensors and optionally converting them to the desired data type and storing on the specified device.
- Parameters:
tensors¶ (TensorLike, default: None) –
The data to construct the batch from. Can be a list of tensors, a TensorList, or other supported types. Supported types are:
a
Batchobject; the batch is copied and the data is converted and moved to the specified device, if necessarya list of tensor-like objects; the objects need to have matching number of dimensions, data types and layouts,
a tensor-like object; the outermost dimenion is interpreted as the batch dimension
a dali.backend.TensorListCPU or dali.backend.TensorListGPU
dtype¶ (DType, default: None) – The desired data type of the batch. If not specified, the data type is inferred from the input tensors. If specified, the input tensors are cast to the desired data type. The
dtypeis required if tensors are an empty list.device¶ (Device or str, optional, default: None) – The device on which the batch should reside (e.g., “cpu” or “gpu”). If not specified, the device is inferred from the input tensors.
layout¶ (str, optional, default: None) – The layout string describing the dimensions of the batch (e.g., “HWC”). If not specified, the layout is inferred from the input tensors.
as_batch#
- nvidia.dali.experimental.dynamic.as_batch(tensors, dtype=None, device=None, layout=None)#
Constructs a
Batchobject, avoiding the copy.Constructs a batch by viewing the input tensors as a batch. If the input tensors do not reside on the specified device or do not match the desired type, the data will be converted and/or copied, as necessary.
- Parameters:
tensors¶ (TensorLike, default: None) –
The data to construct the batch from. It can be a list of tensors, a TensorList, or other supported types. In general, the input tensors must be kept alive by the caller until the batch is no longer needed. Supported types are:
a
Batchobject; the batch is copied and the data is converted and moved to the specified device, if necessarya list of tensor-like objects; the objects need to have matching number of dimensions, data types and layouts,
a tensor-like object; the outermost dimenion is interpreted as the batch dimension
a dali.backend.TensorListCPU or dali.backend.TensorListGPU
dtype¶ (DType, default: None) – The desired data type of the batch. If not specified, the data type is inferred from the input tensors. If specified, the input tensors are cast to the desired data type. The
dtypeis required iftensorsare an empty list.device¶ (Device or str, optional, default: None) – The device on which the batch should reside (e.g., “cpu” or “gpu”). If not specified, the device is inferred from the input tensors.
layout¶ (str, optional, default: None) – The layout string describing the dimensions of the batch (e.g., “HWC”). If not specified, the layout is inferred from the input tensors.
Data types#
These are the data type objects that DALI Dynamic uses to indicate the type of elements of Tensors and Batches.
They are typically passed as dtype argument to request specific type of the tensor element.
There are also several DALI-specific types, representing DALI enums.
All of the types below are instances of the DType class.
Type |
Description |
|---|---|
|
8-bit signed integer |
|
16-bit signed integer |
|
32-bit signed integer |
|
64-bit signed integer |
|
8-bit unsigned integer |
|
16-bit unsigned integer |
|
32-bit unsigned integer |
|
64-bit unsigned integer |
|
16-bit floating point number (IEEE 754) |
|
32-bit floating point number (IEEE 754) |
|
64-bit floating point number (IEEE 754) |
|
Brain Floating Point. A 16-bit floating point number with 8-bit exponent and 7-bit mantissa |
|
Boolean value |
|
DALI data type. See |
|
Image type. See |
|
Interpolation type. See |
Type conversion functions#
- nvidia.dali.experimental.dynamic.dtype(*args)#
Returns a
DTypeassociated with givennvidia.dali.types.DALIDataType, string or numpy datatype (instance of numpy.dtype).
- nvidia.dali.experimental.dynamic.type_id(dtype)#
Returns
nvidia.dali.types.DALIDataTypefor the given dtype.
DType class#
- class nvidia.dali.experimental.dynamic.DType(
- kind,
- bits,
- exponent_bits=None,
- significand_bits=None,
- bytes=None,
- type_id=None,
- name=None,
- docs=None,
DALI Dynamic Mode data type.
This class is used to represent the data type of a Tensor or Batch. Data types such as
uint32are instances of this class.Note
This class is used to represent the types supported by DALI. Creation of custom user types is not supported.
- bits#
Size of the type, in bits.
- Type:
int
- bytes#
Size of the type, in bytes.
- Type:
int
- exponent_bits#
Number of exponent bits in a floating-point number.
- Type:
int
- name#
Name of the type.
- Type:
str
- significand_bits#
Number of significand bits in a floating-point number.
- Type:
int
- type_id#
Corresponding DALI data type
- Type:
- class Kind(value)#
Enum defining type kind in DALI dynamic mode.
- signed#
Signed integer
- unsigned#
Unsigned integer
- float#
Floating-point number
- bool#
Boolean
- enum#
Enumerator type
- static from_fw_type(numpy_type)#
Returns a
DTypeassociated with given numpy datatype (instance of numpy.dtype).
- static from_type_id(type_id)#
Returns a
DTypeassociated with givennvidia.dali.types.DALIDataType.
Execution context#
Device#
- class nvidia.dali.experimental.dynamic.Device(name, device_id=None)#
Device on which data is stored and operators are executed.
The device can be either CPU or (specific) GPU.
- static current()#
Returns the device on top of the thread-local device stack.
If the stack is empty, returns the default GPU for the calling thread.
- static default_device_id(device_type)#
Returns the default device id for the device type passed as an argument.
For CPU it’s always None, for GPU it’s the current CUDA device.
- static type_from_dlpack(dev_type)#
Gets the device type string froma DLPack device type enum value.
EvalContext#
- class nvidia.dali.experimental.dynamic.EvalContext(*, num_threads=None, device_id=None, cuda_stream=None)#
Evaluation context for DALI dynamic API.
This class aggregates state and auxiliary objects that are necessary to execute DALI operators. These include:
CUDA device
thread pool
cuda stream.
EvalContextis a context manager.- cache_results(invocation, results)#
Reserved for future use
- cached_results(invocation)#
Reserved for future use
- property cuda_stream#
CUDA stream for this
EvalContextNote
In case of the thread’s default context, this value is affected by calls to methods
set_default_stream()andset_current_stream().
- static current()#
Returns the currently active EvalContext for the calling thread.
- static default()#
The default
EvalContextfor the calling thread.
- property device_id#
CUDA device ordinal of the device associated with this
EvalContext.
- evaluate_all()#
Evaluates all pending invocations.
- property num_threads#
The number of thread pool workers in this
EvalContext.If the value was not specified at construction,
get_num_threads()is used.
EvalMode#
- class nvidia.dali.experimental.dynamic.EvalMode(value)#
Enum defining different evaluation modes for Dynamic Mode operations.
- default#
Default evaluation mode, alias of EvalMode.eager.
- deferred#
Deferred evaluation mode - operations are evaluated only when their results are needed; error reporting (including input validation) may be delayed until the results are requested. In this mode operations with unused results may be skipped and repeated operations may be merged into one.
- eager#
The evaluation starts immediately. Input validation is immediate. The operations may finish asynchronously.
- sync_cpu#
Synchronous evaluation mode - evaluation on the CPU finishes before the operation returns.
- sync_full#
Fully synchronous evaluation mode - evaluation on all devices finishes before the operation returns.
get_num_threads#
- nvidia.dali.experimental.dynamic.get_num_threads()#
Gets the number of threads in the default thread pool.
The value is determined by (in decreasing priority): 1. The value (not None) passed to
set_num_threads()2. The value from DALI_NUM_THREADS environment variable. 3. The number of CPUs in the calling process affinity list:len(os.sched_getaffinity(0))
set_num_threads#
- nvidia.dali.experimental.dynamic.set_num_threads(n)#
Sets (or clears) the number of threads in the default thread pool.
Changing this value will cause all EvalContexts which were constructed without an explicitly given number of threads to recreate their associated thread pools.
Setting None will cause the default value to be used.
The value must be a positive integer and must not exceed 100 threads per CPU.
Warning
This function should be called once, at the beginning of the program. Changing this value later is very costly and should be avoided.
CUDA streams#
Stream#
- class nvidia.dali.experimental.dynamic.Stream(*, stream=<object object>, device_id=None)#
Wrapper for a CUDA stream object.
This class wraps a CUDA stream object. It can be either a stream created by DALI or a compatible object created by a third-party library.
- property device_id#
The CUDA device ordinal associated with this stream.
- property handle#
A raw CUDA stream handle, returned as an integer.
- synchronize()#
Wait until all work scheduled on this stream is complete.
stream#
- nvidia.dali.experimental.dynamic.stream(*, stream=<object object>, device_id=None)#
Wraps an existing object or creates a new stream.
This function wraps a compatible stream object with a DALI Stream class or creates a new stream.
- Keyword Arguments:
stream¶ (a compatible stream object, None or CreateNew sentinel value, optional) –
When this parameter contains a compatible stream object, the function returns a
Streamobject wrapping it. If the value is not set and contains theStream.create_newflag, a new stream on the specified device will be returned. IfNoneis passed, the function returnsNone. Compatible objects are:objects exposing
__cuda_stream__interfacePyTorch streams
raw stream handles
If
streamis not a raw handle but rather a stream object created by a third-party library, it is referenced by the wrapper object returned by this function, thereby prolonging its lifetime.device_id¶ (int or None, optional) – If not
None, the function will create a new stream on the device specifed or, ifstreamcontains a stream object, the function will verify that thestreamis on the device specified indevice_id. WhenstreamisNone, this value is ignored.
get_default_stream#
- nvidia.dali.experimental.dynamic.get_default_stream(device_id=None)#
Gets the default stream
This stream is used when not overridden by thread’s current stream (see
set_current_stream()).
set_default_stream#
- nvidia.dali.experimental.dynamic.set_default_stream(cuda_stream, /, device_id=None)#
Sets the default stream for a CUDA device. If the device id is not specified, the device associated with the stream will be used. If there’s no device associated with the stream, current CUDA device is used.
Passing
Noneclears the default stream.Warning
This function is intended to be used once, at the beginning of the program, to set the default stream for DALI operations. Calling it affects all default contexts in all threads that haven’t set their current streams with a call to
set_current_stream().
get_current_stream#
- nvidia.dali.experimental.dynamic.get_current_stream()#
Gets the stream associated with the calling thread’s default context.
The value returned by this function is equivalent to
EvalContext.default().cuda_stream.
set_current_stream#
- nvidia.dali.experimental.dynamic.set_current_stream(cuda_stream, /)#
Sets the stream associated with the calling thread’s default context for the current device.
The stream must match the current CUDA device. See
Device.In addition to changing the stream of the current thread’s default context, this method causes newly created
EvalContextobjects with the current device to use this stream.Passing
Noneresets the current thread’s default context stream. After that, the value returned byget_current_stream()will either point to the value returned byget_default_stream()or a new stream.Warning
Setting the current stream doesn’t establish any synchronization between the work previously scheduled and new work.
Random state#
RNG#
- class nvidia.dali.experimental.dynamic.random.RNG(seed=None)#
Random number generator for DALI dynamic mode operations.
This RNG can be used to provide reproducible random state to DALI operators.
- Parameters:
seed¶ (int, optional) – Seed for the random number generator. If not provided, a random seed is used.
Examples
>>> import nvidia.dali.experimental.dynamic as ndd >>> >>> # Create an RNG with a specific seed >>> my_rng = ndd.random.RNG(seed=1234) >>> >>> # Use it with random operators >>> result = ndd.random.uniform(range=(-1, 1), shape=[10], rng=my_rng, device="cpu")
- clone()#
Create a new RNG with the same seed.
- Returns:
A new RNG instance initialized with the same seed as this one. This allows creating independent RNG streams that produce the same sequence of random numbers.
- Return type:
Examples
>>> import nvidia.dali.experimental.dynamic as ndd >>> >>> # Create an RNG >>> rng1 = ndd.random.RNG(seed=1234) >>> >>> # Clone it to create an independent copy >>> rng2 = rng1.clone() >>> >>> # Both will generate the same sequence >>> for i in range(10): >>> assert rng1() == rng2()
- property seed#
Get the seed used to initialize this RNG.