NvMedia Tensor

This topic explains how to use the NvMedia Tensor API.
NvMedia Tensors are multi-dimensional data structures that NvMedia creates in SoC DRAM memory to store multi-dimensional arrays of a specific data type. For example: integers, float etc.
This topic assumes a basic understanding of NvSciBuf APIs. See the NvSciBuf User Guide for more information.

Types of Tensors

Currently, NvMedia only supports 4-dimentional tensors:
NvMedia Tensors are used with NvMedia DLA components.
NvMedia Tensors can be created by allocating NvSciBuf through NvMedia Tensor attributes using NvSciBuf API. As NvSciBuf APIs facilitate data sharing between NvMedia and NVIDIA® CUDA®, this allows tensors allocated to be reused as permitted by NvSciBuf API. For more information, see the NvSciBuf API and use cases.
NvMedia Tensor have two types of attributes:
Tensor format attributes describe a tensor’s order and format in memory.
Tensor allocation attributes describe additional properties of a tensor, such as:
Width, height, channels, and number of tensor surfaces.
CPU access mapping (cached/uncached/unmapped).
Shared memory space across virtual machine partitions.

Tensor Format Attributes

The following sections describe the tensor format attributes that NvMedia Tensors may have.
This attribute specifies the tensor type. The value may be:
Indicates tensor of unsigned integers data types.
Indicates tensor of signed integers data types.
Indicates tensor of float data types.
Specifies the layout and order of the tensor elements.
4D Tensor layout includes N, C, H, and W dimensions, where N refers to the number of surfaces (or batch size), C refers to the number of channels in the surface (for example, RGB if the surface type is an image), H refers to the height of the surface, and W refers to the width of the surface.
The following are possible values for 4D tensor formats:
Specifies the number of bits per element. The value can be:
Indicates that each element is 64 bits wide.
Indicates that each element is 32 bits wide.
Indicates that each element is 16 bits wide.
Indicates that each element is 8 bits wide.
Specifies the data type of the color components. The value may be:
Indicates tensor allocation on CVRAM.
Indicates tensor allocation on SoC DRAM.

Tensor Allocation Attributes

The following sections describe the tensor allocation attributes that NvMedia Tensor may have.
Specifies the number of tensor surfaces in a tensor. It is required to determine the size of memory to be allocated.
Specifies the number of tensor channels. It is required to determine the size of memory to be allocated.
Specifies the width and height of the tensor. It is required to determine the size of memory to be allocated.
Specifies the interleaving factor of tensor surfaces (only in NCxHWx tensor ordering). It is required to determine the size of memory to be allocated.
Specifies the coherency policy to use for accesses of the tensor from the CPU. The value may be:
Specifies that accesses from CPU never cache data.
Setting this attribute results in the following behavior: While writing to the tensor buffers from the CPU using NvMediaTensorLock() and NvMediaTensorUnlock(), NvMedia uses appropriate memory barriers before handing over the tensor buffer to hardware engines to ensure coherency.
Specifies that accesses from the CPU can pass through caches and store buffers.
Setting this attribute results in the following behavior:
While reading the tensor from the CPU using NvMediaTensorLock() and NvMediaTensorUnlock(), caches are invalidated as necessary to ensure that the CPU gets the latest data written by the hardware engines.
While writing the tensor from the CPU using NvMediaTensorLock() and NvMediaTensorUnlock(), caches are flushed as necessary before handing over the tensor buffers to hardware engines to ensure coherency.
In both cases, the tensor memory is mapped and it can be accessed with a mapping into the current process’s virtual address space.
Specifies a coherency policy that is the same as for NVM_TENSOR_ATTR_CPU_ACCESS_UNCACHED. However, the tensor is not mapped into the current process’s virtual address space.
If the attribute is not specified, the coherency policy defaults to NVM_TENSOR_ATTR_CPU_ACCESS_UNCACHED.

Tensor API Functions

This section describes NvMedia Tensor API functions that create handles from NvSciBuf, destory, and manage tensors.

NvMedia Tensor Creation and Destroy Functions

These API functions allow the creation and destruction of tensors.
Creates an NvMedia Tensor handle from an NvSciBuf created with the NvSciBuf API, after the required NvSciBuf attributes list is prepared.
Every hardware engine in an NVIDIA SoC can have a different alignment or stride constraints. Hence, sharing a buffer across various engines requires that buffer allocation satisfy the constraints of all of the engines that share the buffer. An engine whose constraints are not satisfied may fail to operate on the buffer. The allocation functions provided by the various NvMedia drivers only satisfy the constraints of the engines that are visible to them, and so cannot be used to allocate shared buffers.
NvSciBuf is a buffer allocation module that satisfies a common set of constraints that are compatible with all of the hardware engines. It thus can allocate buffers that are shareable across the hardware engines visible to various drivers.
This is a typical flow to allocate an NvSciBufObj, which can be mapped to an NvMediaTensor:
1. The application creates an NvSciBufAttrList.
2. The application queries NvMedia to fill the NvSciBufAttrList by passing a set of NvMediaTensor allocation attributes and an NvMediaType as input to NvMediaTensorFillNvSciBufAttrs().
3. The application may set any of the public NvSciBufAttribute values that NvMedia does not set.
For more details on NvSciBuf concepts, terminology, and the API, see NvSciBuf User Guide.
The following NvSciBuf input attributes are set by NvMedia, and must not be set by the application:
The following attributes are not set by NvMedia and must be set by the application:
4. If the same NvSciBufObj object has to be shared with other user mode drivers (UMDs), the application can get the corresponding NvSciBufAttrList from the respective UMDs.
5. The application asks NvSciBuf to reconcile all of the filled NvSciBufAttrList objects, then allocates an NvSciBuf object.
6. The application queries NvMedia to create an NvMediaTensor from the allocated NvSciBuf object by calling NvMediaTensorCreateFromNvSciBuf().
7. The NvMediaTensor can be passed as input and output to any of the NvMedia API functions that accept an NvMediaTensor as a parameter.
Example: NvMedia Tensor Allocation with NvSciBuf
This is an example of how to allocate an NvMedia Tensor with NvSciBuf:
NvMediaDevice *device;
NvMediaStatus status;
NvSciError err;
NvSciBufModule module;
NvSciBufAttrList attrlist;
NvSciBufAttrList conflictlist;
NvSciBufObj bufObj;
NvMediaTensor *tensor;
/*NvMedia related initialization. */
device = NvMediaDeviceCreate();
status = NvMediaTensorNvSciBufInit();
/*NvSciBuf related initialization. */
err = NvSciBufModuleOpen(&module);
NvSciBufAttrKeyValuePair attr_kvp = {NvSciBufGeneralAttrKey_RequiredPerm, &access_perm,
/*Create NvSciBuf attribute list. */
err = NvSciBufAttrListCreate(module, &attrlist);
err = NvSciBufAttrListSetAttrs(attrlist, &attr_kvp, 1);
/* Initialize tensorAttrs as required. */
NVM_TENSOR_SET_ATTR_4D(tensorAttr, n, c, h, w, NCHW, INT, 8, UNCACHED, NONE, x);
/* Ask NvMedia to fill NvSciBufAttrs corresponding to
tensorAttrs. */
status = NvMediaTensorFillNvSciBufAttrs(device,
/* Reconcile the NvSciBufAttrs and then allocate an NvSciBufObj. */
err = NvSciBufAttrListReconcileAndObjAlloc(&attrlist, 1, bufobj, &conflictlist);
/* Create NvMediaTensor from NvSciBufObj. */
status = NvMediaTensorCreateFromNvSciBuf(device, bufobj, &tensor);
/* Free the NvSciBufAttrList which is no longer required. */
err = NvSciBufAttrListFree(attrlist);
/* Use the tensor as input or output as supported. */
/* Free the resources after use. */
/* Destroy NvMediaTensor. */
/* NvMedia related Deinit. */
/* NvSciBuf related deinit. */
Example: Reconcile between NvMediaTensor and NvMediaImage Attributes (Optional)
This is an example of how to reconcile between NvMediaTensor and NvMediaImage attributes.
1. Create NvMediaImage attributes “unreconciled_attrlistImage”. See Surface Allocation Functions for more details.
2. Reconcile NvMediaImage and NvMediaTensor attributes.
attr[0] = unreconciled_attrlistImage;
attr[1] = unreconciled_attrlistTensor;
err = NvSciBufAttrListReconcileAndObjAlloc(&attrlist, 2, bufobj, &conflictlist);
Destroys a previously allocated NvMedia Tensor object.
if (tensor) {