nvidia.dali.fn.dl_tensor_python_function#

nvidia.dali.fn.dl_tensor_python_function(*input, function, batch_processing=False, bytes_per_sample_hint=[0], num_outputs=1, output_layouts=None, preserve=False, seed=-1, synchronize_stream=True, device=None, name=None)#

Executes a Python function that operates on DLPack tensors.

The function should not modify input tensors.

For the GPU operator, it is the user’s responsibility to synchronize the device code with DALI. To synchronize the device code with DALI, synchronize DALI’s work before the operator call with the synchronize_stream flag (enabled by default) and ensure that the scheduled device tasks are finished in the operator call. The GPU code can be executed on the CUDA stream used by DALI, which can be obtained by calling the current_dali_stream() function. In this case, the synchronize_stream flag can be set to False.

Warning

This operator is not compatible with TensorFlow integration.

This operator allows sequence inputs and supports volumetric data.

This operator will not be optimized out of the graph.

Supported backends
  • ‘cpu’

  • ‘gpu’

Parameters:

__input_[0..255] (TensorList, optional) – This function accepts up to 256 optional positional inputs

Keyword Arguments:
  • function (object) –

    A callable object that defines the function of the operator.

    Warning

    The function must not hold a reference to the pipeline in which it is used. If it does, a circular reference to the pipeline will form and the pipeline will never be freed.

  • batch_processing (bool, optional, default = False) –

    Determines whether the function is invoked once per batch or separately for every sample in the batch.

    If set to True, the function will receive its arguments as lists of DLPack tensors.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • num_outputs (int, optional, default = 1) – Number of outputs.

  • output_layouts (layout str or list of layout str, optional) –

    Tensor data layouts for the outputs.

    This argument can be a list that contains a distinct layout for each output. If the list has fewer than num_outputs elements, only the first outputs have the layout set and the rest of the outputs have no layout assigned.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • synchronize_stream (bool, optional, default = True) –

    Ensures that DALI synchronizes its CUDA stream before calling the Python function.

    Warning

    This argument should be set to False only if the called function schedules device work to the stream that is used by DALI.