Python API¶
Pipeline¶
-
class
nvidia.dali.pipeline.
Pipeline
(batch_size=-1, num_threads=-1, device_id=-1, seed=-1, exec_pipelined=True, prefetch_queue_depth=2, exec_async=True, bytes_per_sample=0, set_affinity=False, max_streams=-1, default_cuda_stream_priority=0)¶ Pipeline class encapsulates all data required to define and run DALI input pipeline.
- Parameters
batch_size (int, optional, default = -1) – Batch size of the pipeline. Negative values for this parameter are invalid - the default value may only be used with serialized pipeline (the value stored in serialized pipeline is used instead).
num_threads (int, optional, default = -1) – Number of CPU threads used by the pipeline. Negative values for this parameter are invalid - the default value may only be used with serialized pipeline (the value stored in serialized pipeline is used instead).
device_id (int, optional, default = -1) – Id of GPU used by the pipeline. Negative values for this parameter are invalid - the default value may only be used with serialized pipeline (the value stored in serialized pipeline is used instead).
seed (int, optional, default = -1) – Seed used for random number generation. Leaving the default value for this parameter results in random seed.
exec_pipelined (bool, optional, default = True) – Whether to execute the pipeline in a way that enables overlapping CPU and GPU computation, typically resulting in faster execution speed, but larger memory consumption.
prefetch_queue_depth (int or {"cpu_size": int, "gpu_size": int}, optional, default = 2) – Depth of the executor pipeline. Deeper pipeline makes DALI more resistant to uneven execution time of each batch, but it also consumes more memory for internal buffers. Specifying a dict:
{ "cpu_size": x, "gpu_size": y }
instead of an integer will cause the pipeline to use separated queues executor, with buffer queue size x for cpu stage and y for mixed and gpu stages. It is not supported when both exec_async and exec_pipelined are set to False. Executor will buffer cpu and gpu stages separatelly, and will fill the buffer queues when the firstnvidia.dali.pipeline.Pipeline.run()
is issued.exec_async (bool, optional, default = True) – Whether to execute the pipeline asynchronously. This makes
nvidia.dali.pipeline.Pipeline.run()
method run asynchronously with respect to the calling Python thread. In order to synchronize with the pipeline one needs to callnvidia.dali.pipeline.Pipeline.outputs()
method.bytes_per_sample (int, optional, default = 0) – A hint for DALI for how much memory to use for its tensors.
set_affinity (bool, optional, default = False) – Whether to set CPU core affinity to the one closest to the GPU being used.
max_streams (int, optional, default = -1) – Limit the number of CUDA streams used by the executor. Value of -1 does not impose a limit. This parameter is currently unused (and behavior of unrestricted number of streams is assumed).
default_cuda_stream_priority (int, optional, default = 0) – CUDA stream priority used by DALI. See cudaStreamCreateWithPriority in CUDA documentation
-
property
batch_size
¶ Batch size.
-
build
()¶ Build the pipeline.
Pipeline needs to be built in order to run it standalone. Framework-specific plugins handle this step automatically.
-
define_graph
()¶ This function is defined by the user to construct the graph of operations for their pipeline.
It returns a list of outputs created by calling DALI Operators.
-
deserialize_and_build
(serialized_pipeline)¶ Deserialize and build the pipeline given in serialized form.
- Parameters
serialized_pipeline (str) – Serialized pipeline.
-
property
device_id
¶ Id of the GPU used by the pipeline.
-
empty
()¶ If there is any work scheduled in the pipeline but not yet consumed
-
enable_api_check
(enable)¶ Allows to enable or disable API check in the runtime
-
epoch_size
(name=None)¶ Epoch size of a pipeline.
If the name parameter is None, returns a dictionary of pairs (reader name, epoch size for that reader). If the name parameter is not None, returns epoch size for that reader.
- Parameters
name (str, optional, default = None) – The reader which should be used to obtain epoch size.
-
feed_input
(ref, data, layout='')¶ Bind the NumPy array to a tensor produced by ExternalSource operator. It is worth mentioning that ref should not be overridden with other operator outputs and it should be called from the inside of iter_setup method
-
iter_setup
()¶ This function can be overriden by user-defined pipeline to perform any needed setup for each iteration. For example, one can use this function to feed the input data from NumPy arrays.
-
property
num_threads
¶ Number of CPU threads used by the pipeline.
-
outputs
()¶ Returns the outputs of the pipeline and releases previous buffer.
If the pipeline is executed asynchronously, this function blocks until the results become available. It rises StopIteration if data set reached its end - usually when iter_setup cannot produce any more data
-
release_outputs
()¶ Release buffers returned by share_outputs calls.
It helps in case when output call result is consumed (copied) and buffers can be marked as free before the next call to share_outputs. It provides the user with better control about when he wants to run the pipeline, when he wants to obtain the resulting buffers and when they can be returned to DALI pool when the results have been consumed. Needs to be used together with
nvidia.dali.pipeline.Pipeline.schedule_run()
andnvidia.dali.pipeline.Pipeline.share_outputs()
Should not be mixed withnvidia.dali.pipeline.Pipeline.run()
in the same pipeline
-
reset
()¶ Resets pipeline iterator
If pipeline iterator reached the end then reset its state to the beginning.
-
run
()¶ Run the pipeline and return the result.
If the pipeline was created with exec_pipelined option set to True, this function will also start prefetching the next iteration for faster execution. Should not be mixed with
nvidia.dali.pipeline.Pipeline.schedule_run()
in the same pipeline,nvidia.dali.pipeline.Pipeline.share_outputs()
andnvidia.dali.pipeline.Pipeline.release_outputs()
-
save_graph_to_dot_file
(filename, show_tensors=False, show_ids=False, use_colors=False)¶ Saves the pipeline graph to a file.
- Parameters
filename (str) – Name of the file to which the graph is written.
show_tensors (bool) – Show the Tensor nodes in the graph (by default only Operator nodes are shown)
show_ids (bool) – Add the node id to the graph representation
use_colors (bool) – Whether use color to distinguish stages
-
schedule_run
()¶ Run the pipeline without returning the resulting buffers.
If the pipeline was created with exec_pipelined option set to True, this function will also start prefetching the next iteration for faster execution. It provides better control to the users about when they want to run the pipeline, when they want to obtain resulting buffers and return them to DALI buffer pool when the results have been consumed. Needs to be used together with
nvidia.dali.pipeline.Pipeline.release_outputs()
andnvidia.dali.pipeline.Pipeline.share_outputs()
. Should not be mixed withnvidia.dali.pipeline.Pipeline.run()
in the same pipeline
-
serialize
()¶ Serialize the pipeline to a Protobuf string.
Returns the outputs of the pipeline.
Main difference to
nvidia.dali.pipeline.Pipeline.outputs()
is that share_outputs doesn’t release returned buffers, release_outputs need to be called for that. If the pipeline is executed asynchronously, this function blocks until the results become available. It provides the user with better control about when he wants to run the pipeline, when he wants to obtain the resulting buffers and when they can be returned to DALI pool when the results have been consumed. Needs to be used together withnvidia.dali.pipeline.Pipeline.release_outputs()
andnvidia.dali.pipeline.Pipeline.schedule_run()
Should not be mixed withnvidia.dali.pipeline.Pipeline.run()
in the same pipeline
Tensor¶
-
class
nvidia.dali.backend.
TensorCPU
¶ -
copy_to_external
(self: nvidia.dali.backend_impl.TensorCPU, ptr: object) → None¶ Copy to external pointer in the CPU memory.
- Parameters
ptr (ctypes.c_void_p) – Destination of the copy.
-
dtype
(self: nvidia.dali.backend_impl.TensorCPU) → str¶ String representing NumPy type of the Tensor.
-
layout
(self: nvidia.dali.backend_impl.TensorCPU) → nvidia.dali.backend_impl.types.TensorLayout¶
-
shape
(self: nvidia.dali.backend_impl.TensorCPU) → list¶ Shape of the tensor.
-
squeeze
(self: nvidia.dali.backend_impl.TensorCPU) → None¶ Remove single-dimensional entries from the shape of the Tensor.
-
-
class
nvidia.dali.backend.
TensorGPU
¶ -
copy_to_external
(self: nvidia.dali.backend_impl.TensorGPU, ptr: object, cuda_stream: object=0, non_blocking: bool=False) → None¶ Copy to external pointer in the GPU memory.
- Parameters
ptr (ctypes.c_void_p) – Destination of the copy.
cuda_stream (ctypes.c_void_p) – CUDA stream to schedule the copy on (default stream if not provided).
non_blocking (bool) – Asynchronous copy.
-
dtype
(self: nvidia.dali.backend_impl.TensorGPU) → str¶ String representing NumPy type of the Tensor.
-
layout
(self: nvidia.dali.backend_impl.TensorGPU) → nvidia.dali.backend_impl.types.TensorLayout¶
-
shape
(self: nvidia.dali.backend_impl.TensorGPU) → list¶ Shape of the tensor.
-
squeeze
(self: nvidia.dali.backend_impl.TensorGPU) → None¶ Remove single-dimensional entries from the shape of the Tensor.
-
TensorList¶
-
class
nvidia.dali.backend.
TensorListCPU
¶ -
as_array
(self: nvidia.dali.backend_impl.TensorListCPU) → array¶ Returns TensorList as a numpy array. TensorList must be dense.
-
as_reshaped_tensor
(self: nvidia.dali.backend_impl.TensorListCPU, arg0: List[int]) → nvidia.dali.backend_impl.TensorCPU¶ Returns a tensor that is a view of this TensorList cast to the given shape.
This function can only be called if TensorList is continuous in memory and the volumes of requested Tensor and TensorList matches.
-
as_tensor
(self: nvidia.dali.backend_impl.TensorListCPU) → nvidia.dali.backend_impl.TensorCPU¶ Returns a tensor that is a view of this TensorList.
This function can only be called if is_dense_tensor returns True.
-
at
(self: nvidia.dali.backend_impl.TensorListCPU, arg0: int) → array¶ Returns tensor at given position in the list.
-
copy_to_external
(self: nvidia.dali.backend_impl.TensorListCPU, arg0: object) → None¶ Copy the contents of this TensorList to an external pointer (of type ctypes.c_void_p) residing in CPU memory.
This function is used internally by plugins to interface with tensors from supported Deep Learning frameworks.
-
is_dense_tensor
(self: nvidia.dali.backend_impl.TensorListCPU) → bool¶ Checks whether all tensors in this TensorList have the same shape (and so the list itself can be viewed as a tensor).
For example, if TensorList contains N tensors, each with shape (H,W,C) (with the same values of H, W and C), then the list may be viewed as a tensor of shape (N, H, W, C).
-
layout
(self: nvidia.dali.backend_impl.TensorListCPU) → nvidia.dali.backend_impl.types.TensorLayout¶
-
-
class
nvidia.dali.backend.
TensorListGPU
¶ -
as_cpu
(self: nvidia.dali.backend_impl.TensorListGPU) → nvidia.dali.backend_impl.TensorListCPU¶ Returns a TensorListCPU object being a copy of this TensorListGPU.
-
as_reshaped_tensor
(self: nvidia.dali.backend_impl.TensorListGPU, arg0: List[int]) → nvidia.dali.backend_impl.TensorGPU¶ Returns a tensor that is a view of this TensorList cast to the given shape.
This function can only be called if TensorList is continuous in memory and the volumes of requested Tensor and TensorList matches.
-
as_tensor
(self: nvidia.dali.backend_impl.TensorListGPU) → nvidia.dali.backend_impl.TensorGPU¶ Returns a tensor that is a view of this TensorList.
This function can only be called if is_dense_tensor returns True.
-
at
(self: nvidia.dali.backend_impl.TensorListGPU, arg0: int) → nvidia.dali.backend_impl.TensorGPU¶ Returns a tensor at given position in the list.
-
copy_to_external
(self: nvidia.dali.backend_impl.TensorListGPU, ptr: object, cuda_stream: object=0, non_blocking: bool=False) → None¶ Copy the contents of this TensorList to an external pointer residing in CPU memory.
This function is used internally by plugins to interface with tensors from supported Deep Learning frameworks.
- Parameters
ptr (ctypes.c_void_p) – Destination of the copy.
cuda_stream (ctypes.c_void_p) – CUDA stream to schedule the copy on (default stream if not provided).
non_blocking (bool) – Asynchronous copy.
-
is_dense_tensor
(self: nvidia.dali.backend_impl.TensorListGPU) → bool¶ Checks whether all tensors in this TensorList have the same shape (and so the list itself can be viewed as a tensor).
For example, if TensorList contains N tensors, each with shape (H,W,C) (with the same values of H, W and C), then the list may be viewed as a tensor of shape (N, H, W, C).
-
layout
(self: nvidia.dali.backend_impl.TensorListGPU) → nvidia.dali.backend_impl.types.TensorLayout¶
-
Enums¶
-
class
nvidia.dali.types.
DALIDataType
¶ Data type of image
-
BOOL
= DALIDataType.BOOL¶
-
DATA_TYPE
= DALIDataType.DATA_TYPE¶
-
FEATURE
= DALIDataType.FEATURE¶
-
FLOAT
= DALIDataType.FLOAT¶
-
FLOAT16
= DALIDataType.FLOAT16¶
-
FLOAT64
= DALIDataType.FLOAT64¶
-
IMAGE_TYPE
= DALIDataType.IMAGE_TYPE¶
-
INT16
= DALIDataType.INT16¶
-
INT32
= DALIDataType.INT32¶
-
INT64
= DALIDataType.INT64¶
-
INT8
= DALIDataType.INT8¶
-
INTERP_TYPE
= DALIDataType.INTERP_TYPE¶
-
NO_TYPE
= DALIDataType.NO_TYPE¶
-
PYTHON_OBJECT
= DALIDataType.PYTHON_OBJECT¶
-
STRING
= DALIDataType.STRING¶
-
TENSOR_LAYOUT
= DALIDataType.TENSOR_LAYOUT¶
-
UINT16
= DALIDataType.UINT16¶
-
UINT32
= DALIDataType.UINT32¶
-
UINT64
= DALIDataType.UINT64¶
-
UINT8
= DALIDataType.UINT8¶
-
-
class
nvidia.dali.types.
DALIInterpType
¶ Interpolation mode
-
INTERP_CUBIC
= DALIInterpType.INTERP_CUBIC¶
-
INTERP_GAUSSIAN
= DALIInterpType.INTERP_GAUSSIAN¶
-
INTERP_LANCZOS3
= DALIInterpType.INTERP_LANCZOS3¶
-
INTERP_LINEAR
= DALIInterpType.INTERP_LINEAR¶
-
INTERP_NN
= DALIInterpType.INTERP_NN¶
-
INTERP_TRIANGULAR
= DALIInterpType.INTERP_TRIANGULAR¶
-
-
class
nvidia.dali.types.
DALIImageType
¶ Image type
-
ANY_DATA
= DALIImageType.ANY_DATA¶
-
BGR
= DALIImageType.BGR¶
-
GRAY
= DALIImageType.GRAY¶
-
RGB
= DALIImageType.RGB¶
-
YCbCr
= DALIImageType.YCbCr¶
-
-
class
nvidia.dali.types.
TensorLayout
¶