The image module is composed of 3 submodules
The image module contains structures and methods that allow the user to create and set images handles that are compatible with NVIDIA® DriveWorks modules. An image is represented generically as a handle dwImageHandle_t
, which can be passed to a DriveWorks module for processing, or more specifically as a C struct. The struct differs in content based on the type of image and the properties. All images share common properties:
The image properties are:
DW_IMAGE_MEMORY_TYPE_DEFAULT
, DW_IMAGE_MEMORY_TYPE_PITCH
, DW_IMAGE_MEMORY_TYPE_BLOCK
and represents the arrangement of data in memory. Only CUDA and NVMEDIA can handle both types, CPU is stricktly pitch and GL is stricktly block, The default memory layout will automatically choose the proper layout (once given to a DW module)Any image can be created by calling dwImage_create()
and should be followed by a dwImage_destroy()
when the image is not needed anymore. The creation is specific to the type of image and there are 4 supported types. After the image is created it is possible to pass the handle to DriveWorks modules, if they accept the opaque handle, otherwise it's possible to retrieve a struct specific to the image type. The struct allows direct access to the content of the image and any modification will affect original image.
A CPU image is stored as a pitch memory buffer represented by an array of pointers, an array of pitches and properties. Its content can be retrieved from a dwImageHandle_t by calling dwImage_getCPU() and it will return a dwImageCPU and it contains:
dwTime_t
timestamp_us : the timestamp of acquisition from a sensor. If the image is created by the user, it is 0The CPU image is created by specifying DW_IMAGE_CPU type in the properties and calling
A CUDA Image can have 2 forms, a Pitch pointer or CUDA Array form. The two forms are allocated and occupy different domains on GPU memory, one being a Pitch linear pointer, the other being a Block memory cuda Array (thought of as a Texture). It is possible to retrieve the content by calling dwImage_getCUDA() and receiving a dwImageCUDA struct, containing:
dwTime_t
timestamp_us : the timestamp of acquisition from a sensor. If the image is created by the user, it is 0The CUDA image is created by specifying DW_IMAGE_CUDA type in the properties and calling
A GL image is stored as a GLuint texture present on the GPU. An invalid texture has texID of 0 but it has a positive value when properly created. It is possible to retrieve the ocntent by calling dwImage_getGL() and will receive a dwImageGL and it contains:
dwTime_t
timestamp_us : the timestamp of acquisition from a sensor. If the image is created by the user, it is 0The GL image is created by specifying DW_IMAGE_GL type in the properties and calling
An NvMedia image is stored as a pointer to the low level NvMedia API image struct. For specific information on NvMedia images, see the following information in NVIDIA DRIVE 5.1 PDK:
It is possible to access the pointer by calling dwImage_getNvMedia() and receive a dwImageNvMedia that contains:
dwTime_t
timestamp_us : the timestamp of acquisition from a sensor. If the image is created by the user, it is 0The NvMedia image is created by specifying DW_IMAGE_NVMEDIA type in the properties and calling
Images can be stored in memory in various formats. One dimension of this variation is interleaved vs planar storage for multi-channel images. For example, an interleaved RGB image has 1 plane with 3 channels. A YUV420 planar image has 3 planes, with 1 channel each.
Memory layout can be either pitch or block, depending on the type. CPU images are always pitch, GL images are always block, whereas CUDA and NvMedia images can be either.
The image format describes data type, color space and arrangement of the pixels
Images can be converted into a different format, while retaining the same type (for converting type, see Image Streamer). The user must allocate the output image and the conversion will be based on the properties of the input and output images. Only CUDA and NvMedia images support this operation. The converter will not change the size of the image. If all properties are identical, the converter will perform an identical copy.
An image streamer converts an image from a type X to a type Y, preserving the rest of the properties (see note A). All streamers (see note B) need to be initialized in order to allocate the necessary resources for the streaming (for example an image pool), depending on the type of streamer. On low level, all streamers differ in behavior and performance, so the choice and number of streamers should be planned wisely. The idea of streaming is based on the logic of producer and consumer.
The following table describes the possible streaming combinations, given by image type (dwImageType).
From (column) \ To (row) | CPU | GL | CUDA | NvMedia |
---|---|---|---|---|
CPU | - | X* | X* | X |
GL | X* | - | X | X |
CUDA | X | X* | X | X |
NvMedia | X | X | X | X (ideal for cross-processing) |
The following table describes image format (dwImageFormat) for each combination of image types (dwImageType).
From (column) \ To (row) | CPU | GL | CUDA | NvMedia |
---|---|---|---|---|
CPU | - | RGBA, R, UINT8 | ALL | RGBA, R, YUV420 p/s, YUV422 p/s, RAW, UINT8, UINT16 |
GL | RGBA, UINT8 | - | RGBA, UINT8 | RGBA, UINT8 |
CUDA | ALL | RGBA, UINT8 | ALL | RGBA, YUV420 p/s, YUV422 p/s, UINT8 |
NvMedia | RGBA, YUV420 p/s, YUV422 p/s, UINT8 | RGBA, UINT8 | RGBA, YUV420 p/s, YUV422 p/s, UINT8 | RGBA, YUV420 p/p, YUV422 p/p, RAW, UINT8, UINT16 |
Note A: In some cases (CPU->CUDA, CUDA->CPU, NvMedia->CUDA, CUDA->NvMedia) it is possible to stream into an image with a different memory layout
Note B: The streamer NvMedia->CPU is the only streamer that does not allocate any resources because it performs a direct mapping between source and destination. For this reason it has some limitations but also provides maximum performance. The streamer CUDA->GL on DGPU on a DrivePX2 platform, due to temporary technical limitations, allocates extra resources from the one needed and perform extra operations during the stream, leading to performance penalties.
The following table describes the mechanism for each streaming combination. 'X' indicates the combination is not available.
From (column) \ To (row) | CPU | CUDA Pitch | CUDA Block | GL | NvMedia |
---|---|---|---|---|---|
CPU | X | cudaMemcpy2DAsync | cudaMemcpy2DToArrayAsync | glBufferData - GL_STATIC_DRAW | NvMediaImagePutBits |
CUDA Pitch | cudaMemcpy2DAsync | X | X | cudaMemcpy3DAsync (iGPU, X86) - GL->CPU->CUDA (dGPU) | EGL |
CUDA Block | cudaMemcpy2DFromArrayAsync | X | X | cudaMemcpy3DAsync (iGPU, X86) - GL->CPU->CUDA (dGPU) | EGL |
GL | glReadPixels | cudaMemcpy3DAsync (iGPU, X86) - X (dGPU) | cudaMemcpy3DAsync (iGPU, X86) - X (dGPU) | X | EGL |
NvMedia | direct map (only for pitch linear) | EGL | EGL | EGL | X |
The following table gives the streaming performance on NVIDIA DRIVE AGX Developer Kit. Values are given in microseconds and represent the average of 1000 runs; std and spike values are in parenthesis.
'D' indicates dGPU performance and 'I' iGPU. If 'D' or 'I' is not specified, then the performance is independent of the GPU.
RGBA 8bit | RAW 16bit | YUV 420 SP 8bit | |
---|---|---|---|
CPU->CUDA | 20 D (4.2, 117) 402 I (38.8, 643) | 20 D (5.1, 160) 364 I (38.0, 804) | 34 D (8.6, 404) 426 I (38.3, 654) |
CPU->GL | 11 (7.9, 263) | NA | NA |
CPU->NvMedia | 19 (3.5, 56) | 690 (4.1, 711) | NA |
CUDA->CPU | 24 D (6.4, 139) 407 I (29.2. 616) | 23 D (4.6, 147) 422 I (35.6, 798) | 41 D (7.1, 168) 449 I (56.1, 632) |
CUDA->GL | 175 (73.9, 1436) | NA | NA |
CUDA->NvMedia | NA | NA | NA |
NvMedia->CPU | 7 (3.9, 71) | 8 (3.1, 35) | 14 (5.3, 138) |
NvMedia->CUDA | 52 D (11.6, 2161) 34 I (7.5, 908) | 49 D (9.4, 2020) 37 I (11.8, 724) | 71 D (12.6, 3786) 36 I (16.5, 923) |
NvMedia->GL | 38 (13.2, 282) | NA | NA |
GL->CPU | 75 (25.1, 784) | NA | NA |
GL->CUDA | 1950 (146.4, 2411) | NA | NA |
GL->NvMedia | 136 (180.9, 1635) | NA | NA |
Note 1: GL-based times were taken on iGPU
Note 2: Some streamers, especially EGL-based, have spikes for the first few frames, due to hidden optimizations that are performed during the first few iterations. Similar spikes may also occur for CUDA images.
A frame capture has 2 purposes: