nvidia.dali.plugin.numba.fn.experimental.numba_function¶
- nvidia.dali.plugin.numba.fn.experimental.numba_function(*inputs, **kwargs)¶
Invokes a njit compiled Numba function.
The run function should be a Python function that can be compiled in Numba
nopython
mode. A function taking a single input and producing a single output should follow the following definition:def run_fn(out0, in0)
where
out0
andin0
are numpy array views of the input and output tensors. If the operator is configured to run in batch mode, then the first dimension of the arrays is the sample index.Note that the function can take at most 6 inputs and 6 outputs.
Additionally, an optional setup function calculating the shape of the output so DALI can allocate memory for the output with the following definition:
def setup_fn(outs, ins)
The setup function is invoked once for the whole batch. The first dimension of
outs
,ins
is the number of outputs/inputs, respectively. The second dimension is the sample index. For example, the first sample on the second output can be accessed byouts[1][0]
.If no setup function provided, the output shape and data type will be the same as the input.
Note
This operator is experimental and its API might change without notice.
Example 1:
The following example shows a simple setup function which permutes the order of dimensions in the shape.
def setup_change_out_shape(outs, ins): out0 = outs[0] in0 = ins[0] perm = [1, 0, 2] for sample_idx in range(len(out0)): for d in range(len(perm)): out0[sample_idx][d] = in0[sample_idx][perm[d]]
Since the setup function is running for the whole batch, we need to iterate and permute each sample’s shape individually. For
shapes = [(10, 20, 30), (20, 10, 30)]
it will produce output withshapes = [(20, 10, 30), (10, 20, 30)]
.Also lets provide run function:
def run_fn(out0, in0): for i in range(in0.shape[0]): for j in range(in0.shape[1]): out0[j, i] = in0[i, j]
The run function can work per-sample or per-batch, depending on the
batch_processing
argument.A run function working per-batch may look like this:
def run_fn(out0_samples, in0_samples): for out0, in0 in zip(out0_samples, in0_samples): for i in range(in0.shape[0]): for j in range(in0.shape[1]): out0[j, i] = in0[i, j]
A run function working per-sample may look like this:
def run_fn(out0, in0): for i in range(in0.shape[0]): for j in range(in0.shape[1]): out0[j, i] = in0[i, j]
This operator allows sequence inputs and supports volumetric data.
This operator will not be optimized out of the graph.
- Supported backends
‘cpu’
‘gpu’
- Parameters:
input0 (TensorList) – Input to the operator.
input[1..5] (TensorList, optional) – This function accepts up to 5 optional positional inputs
- Keyword Arguments:
in_types (DALIDataType or list of DALIDataType) – Types of inputs.
ins_ndim (int or list of int) – Number of dimensions which inputs shapes should have.
out_types (DALIDataType or list of DALIDataType) – Types of outputs.
outs_ndim (int or list of int) – Number of dimensions which outputs shapes should have.
run_fn (object) – Function to be invoked. This function must work in Numba
nopython
mode.batch_processing (bool, optional, default = False) –
Determines whether the function is invoked once per batch or separately for each sample in the batch.
When
batch_processing
is set toTrue
, the function processes the whole batch. It is necessary if the function has to perform cross-sample operations and may be beneficial if significant part of the work can be reused. For other use cases, specifying False and using per-sample processing function allows the operator to process samples in parallel.blocks (int or list of int, optional) –
- 3-item list specifying the number of blocks per grid used to
execute a CUDA kernel
bytes_per_sample_hint (int or list of int, optional, default = [0]) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
setup_fn (object, optional) – Setup function setting shapes for outputs. This function is invoked once per batch. Also this function must work in Numba
nopython
mode.threads_per_block (int or list of int, optional) –
- 3-item list specifying the number of threads per
block used to execute a CUDA kernel