The image module is composed of 3 submodules
The image module contains structures and methods that allow the user to create and set images handles that are compatible with NVIDIA® DriveWorks modules. An image is represented generically as a handle dwImageHandle_t
, which can be passed to a DriveWorks module for processing, or more specifically as a C struct. The struct differs in content based on the type of image and the properties. All images share common properties:
The image properties are:
DW_IMAGE_MEMORY_TYPE_DEFAULT
, DW_IMAGE_MEMORY_TYPE_PITCH
, DW_IMAGE_MEMORY_TYPE_BLOCK
and represents the arrangement of data in memory. Only CUDA and NVMEDIA can handle both types, CPU is stricktly pitch and GL is stricktly block, The default memory layout will automatically choose the proper layout (once given to a DW module)Any image can be created by calling dwImage_create()
and should be followed by a dwImage_destroy()
when the image is not needed anymore. The creation is specific to the type of image and there are 4 supported types. After the image is created it is possible to pass the handle to DriveWorks modules, if they accept the opaque handle, otherwise it's possible to retrieve a struct specific to the image type. The struct allows direct access to the content of the image and any modification will affect original image.
A CPU image is stored as a pitch memory buffer represented by an array of pointers, an array of pitches and properties. Its content can be retrieved from a dwImageHandle_t by calling dwImage_getCPU() and it will return a dwImageCPU and it contains:
dwTime_t
timestamp_us : the timestamp of acquisition from a sensor. If the image is created by the user, it is 0The CPU image is created by specifying DW_IMAGE_CPU type in the properties and calling
A CUDA Image can have 2 forms, a Pitch pointer or CUDA Array form. The two forms are allocated and occupy different domains on GPU memory, one being a Pitch linear pointer, the other being a Block memory cuda Array (thought of as a Texture). It is possible to retrieve the content by calling dwImage_getCUDA() and receiving a dwImageCUDA struct, containing:
dwTime_t
timestamp_us : the timestamp of acquisition from a sensor. If the image is created by the user, it is 0The CUDA image is created by specifying DW_IMAGE_CUDA type in the properties and calling
Note: CUDA image created with format listed in (see NvMedia Images section below) are streamable from CUDA to NvMedia.
A GL image is stored as a GLuint texture present on the GPU. An invalid texture has texID of 0 but it has a positive value when properly created. It is possible to retrieve the ocntent by calling dwImage_getGL() and will receive a dwImageGL and it contains:
dwTime_t
timestamp_us : the timestamp of acquisition from a sensor. If the image is created by the user, it is 0The GL image is created by specifying DW_IMAGE_GL type in the properties and calling
An NvMedia image is stored as a pointer to the low level NvMedia API image struct. For specific information on NvMedia images, see the following information in NVIDIA DRIVE 5.1 PDK:
It is possible to access the pointer by calling dwImage_getNvMedia() and receive a dwImageNvMedia that contains:
dwTime_t
timestamp_us : the timestamp of acquisition from a sensor. If the image is created by the user, it is 0The NvMedia image is created by specifying DW_IMAGE_NVMEDIA type in the properties and calling dwImage_create()
. This will create the handle and also create a NvMediaImage using low level NvMedia API calls, based on properties. Destroying such image will also destroys the NvMediaImage using the low level NvMedia API. Here is a list of supported formats:
Calling dwImage_createAndBindNvMedia() will create the handle and use NvMediaImage created by the user. The function trusts that the user NvMediaImage matches the properties specified. Destroying such image will only destroy the handle, the ownership of the NvMediaImage remains on the user. Note that images created with this API are not streamable to CUDA
Images can be stored in memory in various formats. One dimension of this variation is interleaved vs planar storage for multi-channel images. For example, an interleaved RGB image has 1 plane with 3 channels. A YUV420 planar image has 3 planes, with 1 channel each.
Memory layout can be either pitch or block, depending on the type. CPU images are always pitch, GL images are always block, whereas CUDA and NvMedia images can be either.
The image format describes data type, color space and arrangement of the pixels
Images can be converted into a different format, while retaining the same type (for converting type, see Image Streamer). The user must allocate the output image and the conversion will be based on the properties of the input and output images. Only CUDA and NvMedia images support this operation. The converter will not change the size of the image. If all properties are identical, the converter will perform an identical copy.
The following table showcases the formats allowed in conversion. This list is for CUDA images in pitch memory. A subset of those images are also convertible in NvMedia image, indicated with *
An image streamer converts an image from a type X to a type Y, preserving the rest of the properties (see note A). All streamers (see note B) need to be initialized in order to allocate the necessary resources for the streaming (for example an image pool), depending on the type of streamer. On low level, all streamers differ in behavior and performance, so the choice and number of streamers should be planned wisely. The idea of streaming is based on the logic of producer and consumer.
The following table describes the possible streaming combinations, given by image type (dwImageType).
From (column) \ To (row) | CPU | GL | CUDA | NvMedia |
---|---|---|---|---|
CPU | - | X* | X* | X |
GL | X* | - | X | X |
CUDA | X | X* | X (only cross-process) | X |
NvMedia | X | X | X | X (only cross-process) |
CUDA->CPU and vice versa support all formats.
NvMedia -> CUDA and viceversa support:
CPU/NvMedia/CUDA -> GL and viceversa support only DW_IMAGE_FORMAT_RGBA_UINT8
Note A: In some cases (CPU->CUDA, CUDA->CPU, NvMedia->CUDA, CUDA->NvMedia) it is possible to stream into an image with a different memory layout depending on dwImageMemoryType
specified in dwImageProperties
Note B: The streamer CUDA->GL on DGPU due to technical limitations, allocates extra resources from the one needed and perform extra operations during the stream, leading to performance penalties.
Note C: Some formats are stored by NvMedia in a different order compared to the format name. Specifically YUV420/422 planar, the UV planes are actually ordered as VU. The order is restored to the one of the format name when streamed to either CPU or CUDA
The following table describes the mechanism for each streaming combination. 'X' indicates the combination is not available.
From (column) \ To (row) | CPU | CUDA Pitch | CUDA Block | GL | NvMedia |
---|---|---|---|---|---|
CPU | X | cudaMemcpy2DAsync | cudaMemcpy2DToArrayAsync | glBufferData - GL_STATIC_DRAW | NvSci mapping |
CUDA Pitch | cudaMemcpy2DAsync | X | X | cudaMemcpy3DAsync (iGPU, X86) - GL->CPU->CUDA (dGPU) | NvSci mapping |
CUDA Block | cudaMemcpy2DFromArrayAsync | X | X | cudaMemcpy3DAsync (iGPU, X86) - GL->CPU->CUDA (dGPU) | NvSci mapping |
GL | glReadPixels | cudaMemcpy3DAsync (iGPU, X86) - X (dGPU) | cudaMemcpy3DAsync (iGPU, X86) - X (dGPU) | X | EGL |
NvMedia | direct map (only for pitch linear) | NvSci mapping | NvSci mapping | EGL | X |
Note: EGL is not avilable in safety build and will be discontinued in Drive OS 6.0
The NvSci streaming mechanism, within the same process, has minimal overhead. Note also that when creating images, the pointers will reside on the GPU current to the time of creation, therefore accessing and streaming must be done ensuring the same GPU is current (see dwContext_getCurrentGPU
)
The following table gives the streaming performance on NVIDIA DRIVE AGX Developer Kit. Values are given in microseconds and represent the average of 1000 runs; std and spike values are in parenthesis.
'D' indicates dGPU performance and 'I' iGPU. If 'D' or 'I' is not specified, then the performance is independent of the GPU.
RGBA 8bit | RAW 16bit | YUV 420 SP 8bit | |
---|---|---|---|
CPU->CUDA | 20 D (4.2, 117) 402 I (38.8, 643) | 20 D (5.1, 160) 364 I (38.0, 804) | 34 D (8.6, 404) 426 I (38.3, 654) |
CPU->GL | 11 (7.9, 263) | NA | NA |
CPU->NvMedia | 19 (3.5, 56) | 690 (4.1, 711) | NA |
CUDA->CPU | 24 D (6.4, 139) 407 I (29.2. 616) | 23 D (4.6, 147) 422 I (35.6, 798) | 41 D (7.1, 168) 449 I (56.1, 632) |
CUDA->GL | 175 (73.9, 1436) | NA | NA |
CUDA->NvMedia | NA | NA | NA |
NvMedia->CPU | 7 (3.9, 71) | 8 (3.1, 35) | 14 (5.3, 138) |
NvMedia->CUDA | 52 D (11.6, 2161) 34 I (7.5, 908) | 49 D (9.4, 2020) 37 I (11.8, 724) | 71 D (12.6, 3786) 36 I (16.5, 923) |
NvMedia->GL | 38 (13.2, 282) | NA | NA |
GL->CPU | 75 (25.1, 784) | NA | NA |
GL->CUDA | 1950 (146.4, 2411) | NA | NA |
GL->NvMedia | 136 (180.9, 1635) | NA | NA |
Note 1: GL-based times were taken on iGPU
Note 2: Some streamers, especially EGL-based, have spikes for the first few frames, due to hidden optimizations that are performed during the first few iterations. Similar spikes may also occur for CUDA images.
A frame capture has 2 purposes: