NvMedia Tensor

NVIDIA DRIVE OS 5.1 Linux SDK

Developer Guide

5.1.9.0 Release

NvMedia Tensor

Types of Tensors

Tensor Format Attributes

Tensor Allocation Attributes

Tensor API Functions

NvMedia Tensor Creation and Destroy Functions

This topic explains how to use the NvMedia Tensor API.

NvMedia Tensors are multi-dimensional data structures that NvMedia creates in SoC DRAM memory to store multi-dimensional arrays of a specific data type. For example: integers, float etc.

This topic assumes a basic understanding of NvSciBuf APIs. See the NvSciBuf User Guide for more information.

Types of Tensors

Currently, NvMedia only supports 4-dimentional tensors:

NvMedia Tensors are used with NvMedia DLA components.

NvMedia Tensors can be created by allocating NvSciBuf through NvMedia Tensor attributes using NvSciBuf API. As NvSciBuf APIs facilitate data sharing between NvMedia and NVIDIA® CUDA®, this allows tensors allocated to be reused as permitted by NvSciBuf API. For more information, see the NvSciBuf API and use cases.

NvMedia Tensor have two types of attributes:

• Tensor format attributes describe a tensor’s order and format in memory.

• Tensor allocation attributes describe additional properties of a tensor, such as:

• Width, height, channels, and number of tensor surfaces.

• CPU access mapping (cached/uncached/unmapped).

• Shared memory space across virtual machine partitions.

Tensor Format Attributes

The following sections describe the tensor format attributes that NvMedia Tensors may have.

NVM_TENSOR_ATTR_DATA_TYPE Attribute

This attribute specifies the tensor type. The value may be:

• NVM_TENSOR_ATTR_DATA_TYPE_UINT

Indicates tensor of unsigned integers data types.

• NVM_TENSOR_ATTR_DATA_TYPE_INT

Indicates tensor of signed integers data types.

• NVM_TENSOR_ATTR_DATA_TYPE_FLOAT

Indicates tensor of float data types.

NVM_TENSOR_ATTR_DIMENSION_ORDER Attribute

Specifies the layout and order of the tensor elements.

4D Tensor layout includes N, C, H, and W dimensions, where N refers to the number of surfaces (or batch size), C refers to the number of channels in the surface (for example, RGB if the surface type is an image), H refers to the height of the surface, and W refers to the width of the surface.

The following are possible values for 4D tensor formats:

• NVM_TENSOR_ATTR_DIMENSION_ORDER_NCHW

• NVM_TENSOR_ATTR_DIMENSION_ORDER_NHWC

• NVM_TENSOR_ATTR_DIMENSION_ORDER_NCxHWx

NVM_TENSOR_ATTR_BITS_PER_ELEMENT Attribute

Specifies the number of bits per element. The value can be:

• NVM_TENSOR_ATTR_BITS_PER_ELEMNT_64

Indicates that each element is 64 bits wide.

• NVM_TENSOR_ATTR_BITS_PER_ELEMNT_32

Indicates that each element is 32 bits wide.

• NVM_TENSOR_ATTR_BITS_PER_ELEMNT_16

Indicates that each element is 16 bits wide.

• NVM_TENSOR_ATTR_BITS_PER_ELEMNT_8

Indicates that each element is 8 bits wide.

NVM_TENSOR_ATTR_ALLOC_TYPE Attribute

Specifies the data type of the color components. The value may be:

• NVM_TENSOR_ATTR_ALLOC_RESERVED

Indicates tensor allocation on CVRAM.

• NVM_TENSOR_ATTR_ALLOC_NONE

Indicates tensor allocation on SoC DRAM.

Tensor Allocation Attributes

The following sections describe the tensor allocation attributes that NvMedia Tensor may have.

NVM_TENSOR_ATTR_4D_N

Specifies the number of tensor surfaces in a tensor. It is required to determine the size of memory to be allocated.

NVM_TENSOR_ATTR_4D_C

Specifies the number of tensor channels. It is required to determine the size of memory to be allocated.

NVM_TENSOR_ATTR_4D_H and NVM_TENSOR_ATTR_4D_H Attributes

Specifies the width and height of the tensor. It is required to determine the size of memory to be allocated.

NVM_TENSOR_ATTR_4D_X

Specifies the interleaving factor of tensor surfaces (only in NCxHWx tensor ordering). It is required to determine the size of memory to be allocated.

NVM_TENSOR_ATTR_CPU_ACCESS Attribute

Specifies the coherency policy to use for accesses of the tensor from the CPU. The value may be:

• NVM_TENSOR_ATTR_CPU_ACCESS_UNCACHED

Specifies that accesses from CPU never cache data.

Setting this attribute results in the following behavior: While writing to the tensor buffers from the CPU using NvMediaTensorLock() and NvMediaTensorUnlock(), NvMedia uses appropriate memory barriers before handing over the tensor buffer to hardware engines to ensure coherency.

• NVM_TENSOR_ATTR_CPU_ACCESS_CACHED

Specifies that accesses from the CPU can pass through caches and store buffers.

Setting this attribute results in the following behavior:

• While reading the tensor from the CPU using NvMediaTensorLock() and NvMediaTensorUnlock(), caches are invalidated as necessary to ensure that the CPU gets the latest data written by the hardware engines.

• While writing the tensor from the CPU using NvMediaTensorLock() and NvMediaTensorUnlock(), caches are flushed as necessary before handing over the tensor buffers to hardware engines to ensure coherency.

In both cases, the tensor memory is mapped and it can be accessed with a mapping into the current process’s virtual address space.

• NVM_TENSOR_ATTR_CPU_ACCESS_UNMAPPED

Specifies a coherency policy that is the same as for NVM_TENSOR_ATTR_CPU_ACCESS_UNCACHED. However, the tensor is not mapped into the current process’s virtual address space.

If the attribute is not specified, the coherency policy defaults to NVM_TENSOR_ATTR_CPU_ACCESS_UNCACHED.

Tensor API Functions

This section describes NvMedia Tensor API functions that create handles from NvSciBuf, destory, and manage tensors.

NvMedia Tensor Creation and Destroy Functions

These API functions allow the creation and destruction of tensors.

NvMediaTensorCreateFromNvSciBuf()

Creates an NvMedia Tensor handle from an NvSciBuf created with the NvSciBuf API, after the required NvSciBuf attributes list is prepared.

Every hardware engine in an NVIDIA SoC can have a different alignment or stride constraints. Hence, sharing a buffer across various engines requires that buffer allocation satisfy the constraints of all of the engines that share the buffer. An engine whose constraints are not satisfied may fail to operate on the buffer. The allocation functions provided by the various NvMedia drivers only satisfy the constraints of the engines that are visible to them, and so cannot be used to allocate shared buffers.

NvSciBuf is a buffer allocation module that satisfies a common set of constraints that are compatible with all of the hardware engines. It thus can allocate buffers that are shareable across the hardware engines visible to various drivers.

This is a typical flow to allocate an NvSciBufObj, which can be mapped to an NvMediaTensor:

1. The application creates an NvSciBufAttrList.

2. The application queries NvMedia to fill the NvSciBufAttrList by passing a set of NvMediaTensor allocation attributes and an NvMediaType as input to NvMediaTensorFillNvSciBufAttrs().

3. The application may set any of the public NvSciBufAttribute values that NvMedia does not set.

For more details on NvSciBuf concepts, terminology, and the API, see NvSciBuf User Guide.

The following NvSciBuf input attributes are set by NvMedia, and must not be set by the application:

• NvSciBufGeneralAttrKey_Types

• NvSciBufGeneralAttrKey_NeedCpuAccess

• NvSciBufGeneralAttrKey_EnableCpuCache

• NvSciBufTensorAttrKey_DataType

• NvSciBufTensorAttrKey_NumDims

• NvSciBufTensorAttrKey_SizePerDim

• NvSciBufTensorAttrKey_AlignmentPerDim

• NvSciBufTensorAttrKey_StridesPerDim

• NvSciBufTensorAttrKey_PixelFormat

• NvSciBufTensorAttrKey_BaseAddrAlign

The following attributes are not set by NvMedia and must be set by the application:

• NvSciBufGeneralAttrKey_RequiredPerm

4. If the same NvSciBufObj object has to be shared with other user mode drivers (UMDs), the application can get the corresponding NvSciBufAttrList from the respective UMDs.

5. The application asks NvSciBuf to reconcile all of the filled NvSciBufAttrList objects, then allocates an NvSciBuf object.

6. The application queries NvMedia to create an NvMediaTensor from the allocated NvSciBuf object by calling NvMediaTensorCreateFromNvSciBuf().

7. The NvMediaTensor can be passed as input and output to any of the NvMedia API functions that accept an NvMediaTensor as a parameter.

Example: NvMedia Tensor Allocation with NvSciBuf

This is an example of how to allocate an NvMedia Tensor with NvSciBuf:

NvMediaDevice *device;

NvMediaStatus status;

NvSciError err;

NvSciBufModule module;

NvSciBufAttrList attrlist;

NvSciBufAttrList conflictlist;

NvSciBufObj bufObj;

NvMediaTensor *tensor;

NVM_TENSOR_DEFINE_ATTR(tensorAttr);

/*NvMedia related initialization. */

device = NvMediaDeviceCreate();

status = NvMediaTensorNvSciBufInit();

/*NvSciBuf related initialization. */

err = NvSciBufModuleOpen(&module);

NvSciBufAttrKeyValuePair attr_kvp = {NvSciBufGeneralAttrKey_RequiredPerm, &access_perm,

sizeof(access_perm)};

/*Create NvSciBuf attribute list. */

err = NvSciBufAttrListCreate(module, &attrlist);

err = NvSciBufAttrListSetAttrs(attrlist, &attr_kvp, 1);

/* Initialize tensorAttrs as required. */

NVM_TENSOR_SET_ATTR_4D(tensorAttr, n, c, h, w, NCHW, INT, 8, UNCACHED, NONE, x);

/* Ask NvMedia to fill NvSciBufAttrs corresponding to

tensorAttrs. */

status = NvMediaTensorFillNvSciBufAttrs(device,

tensorAttr,

numTensorAttr,

attrlist);

/* Reconcile the NvSciBufAttrs and then allocate an NvSciBufObj. */

err = NvSciBufAttrListReconcileAndObjAlloc(&attrlist, 1, bufobj, &conflictlist);

/* Create NvMediaImage from NvSciBufObj. */

status = NvMediaTensorCreateFromNvSciBuf(device, bufobj, &tensor);

/* Free the NvSciBufAttrList which is no longer required. */

err = NvSciBufAttrListFree(attrlist);

/* Use the tensor as input or output as supported. */

....

/* Free the resources after use. */

/* Destroy NvMediaTensor. */

NvMediaTensorDestroy(tensor);

/* NvMedia related Deinit. */

NvMediaTensorNvSciBufDeinit();

NvMediaDeviceDestroy(device);

/* NvSciBuf related deinit. */

NvSciBufObjFree(bufobj);

NvSciBufModuleClose(module);

NvMediaTensorDestroy()

Destroys a previously allocated NvMedia Tensor object.

Example:

if (tensor) {

NvMediaTensorDestroy(tensor);

}