Understanding NvMedia

The NvMedia Video domain handles the following processes:

NvMedia video decode is a frame-level API. NvMedia implementations:

The client application is responsible for:

The supported video media formats are as follows.

Surfaces are not suitable for CPU access.

NvMedia provides two functions to support software video decoders.

NvMedia Video Encode for NVENC supports the following features:

For H.265 encoding, the encoder has the following features:

The video mixer supports scaling, cropping, transformation, and de-interlacing of a Video Surface and storing the result into an output Video Surface. The input and the output surfaces can be YUV or RGB types. If the output type differs from the input type the mixer performs color space conversion.

The following table describes the features that are supported depending of the input and output surface types.

The NvMediaVideoMixerRenderSurface function supports two layers of images composited in the following order:

The presence of the background layer is determined at the video mixer creation phase, through the following feature:

The background layer supports a single color.

The VideoMixerRenderSurfaceWithAlpha function supports an input video surface and input alpha surface and produces an alpha blended output. Regular and pre-multiplied type alpha blending are supported. This function does not support background layer composition.

The following ProcAmp functions are supported:

The background layer is an optional layer.

Video layer is the main video playback layer. NvMediaVideoDesc is used to describe this layer.

Each NvMediaVideoSurface must contain an entire frame's-worth of data, irrespective of whether an interlaced of progressive sequence is being decoded.

Depending on the exact encoding structure of the compressed video stream, the application may need to call NvMediaVideoDecoderRenderEx twice to fill a single NvMediaVideoSurface.

When the stream contains an encoded progressive frame, or a “frame coded” interlaced field-pair, a single NvMediaVideoDecoderRenderEx call fills the entire surface. When the stream contains separately encoded interlaced fields, two reference NvMediaVideoDecoderRenderEx calls are required; one for the top field, and one for the bottom field.

The canonical usage is to call NvMediaVideoMixerRenderSurface once for decoded field, in display order, to yield one post-processed frame for display. For each call to NvMediaVideoMixerRenderSurface, the field to be processed must be provided as the current parameter.

To enable operation of advanced deinterlacing algorithms and/or post-processing algorithms, some past and/or future surfaces must be provided as context. These are provided as the previous2, previous, and next parameters. The NvMediaVideoMixerRenderSurface pictureStructure parameter applies to current.

The picture structure for the other surfaces is automatically derived from that for the current picture. The derivation algorithm is simple; the concatenated list past/current/future is assumed to have an alternating top/bottom pattern throughout. In other words, the concatenated list of past/current/future frames forms a window that slides through the sequence of decoded fields.

The following provides a full reference of the required fields for different de-intelacing modes:

If pictureStructure is not NVMEDIA_PICTURE_STRUCTURE_FRAME, deinterlacing is performed. Bob deinterlacing is always available but Advanced1 deinterlacing (NVMEDIA_DEINTERLACE_TYPE_ADVANCED1) is used if the following conditions are met:

Weave deinterlacing is the act of interleaving the lines of two temporally adjacent fields to form a frame for display. To disable deinterlacing for progressive streams, simply specify current as NVMEDIA_PICTURE_STRUCTURE_FRAME; no deinterlacing will be applied. Weave deinterlacing for interlaced streams is identical to disabling deinterlacing, as describe immediately above, because each ref NvMediaVideoSurface already contains an entire frame's worth (i.e., two fields) of picture data. Weave deinterlacing produces one output frame for each input frame. The application should make one NvMediaVideoMixerRenderSurface call per pair of decoded fields, or per decoded frame. Weave deinterlacing requires no entries in the past/future lists.

Bob deinterlacing is the act of vertically scaling a single field to the size of a single frame. To achieve bob deinterlacing, simply provide a single field as b current, and set pictureStructure appropriately, to indicate whether a top or bottom field was provided. Inverse telecine is disabled when using bob deinterlacing. Bob deinterlacing produces one output frame for each input field. The application should make one NvMediaVideoMixerRenderSurface call per decoded field. Bob deinterlacing requires no entries in the past/future lists. Bob deinterlacing is the default when no advanced method is requested and enabled. Advanced deinterlacing algorithms may fall back to bob e.g., when required past/future fields are missing.

This algorithm uses various advanced processing on the pixels of both the current and various past/future fields in order to determine how best to deinterlacing individual portions of the image. Advanced deinterlacing produces one output frame for each input field. The application should make one NvMediaVideoMixerRenderSurface call per decoded field. Advanced deinterlacing requires entries in the past/future lists.

For all deinterlacing algorithms except weave, a choice may be made to call NvMediaVideoMixerRenderSurface for either each decoded field, or every second decoded field. If NvMediaVideoMixerRenderSurface is called for every decoded field, the generated post-processed frame rate is equal to the decoded field rate. Put another way, the generated post-processed nominal field rate is equal to 2x the decoded field rate. This is standard practice. If NvMediaVideoMixerRenderSurface is called for every second decoded field (say every top field), the generated post-processed frame rate is half to the decoded field rate. This mode of operation is thus referred to as “half-rate “.

Concatenation of past/current/future surface lists forms a window into the stream of decoded fields. To achieve standard deinterlacing, the window slides through the list of decoded fields, one field at a time, and a call is made to NvMediaVideoMixerRenderSurface for each movement of the window. To achieve half-rate deinterlacing, the window slides through the list of decoded fields, two fields at a time, and a call is made to NvMediaVideoMixerRenderSurface for each movement of the window.

NvMedia Video Capture captures the HDMI to CSI and CVBS to CSI data incoming to the Tegra CSI port.

The supported features include:

The NvMedia Image domain performs the following processes:

The NvMedia ICP component provides the following features.

The NvMedia Image Signal Processor (ISP) component processes Bayer images to YUV formatted images. It uses Tegra configurable ISP hardware and supports the following processing operations.

The NvMedia Image 2D component supports image surface processing features such as image copy, image scaling, image cropping. It operates on YUV/RGB input and output surfaces. It also performs format conversion to/from YUV to RGB and supports aggregated image handling.

The NvMedia Image Encode (IEP) component supports encoding the incoming NvMedia Image (YUV or RGB) inputs to H.264 or H.265 or JPEG formats.

Consult Video Encoder for a list of supported features.

The NvMedia Image display (IDP) component displays YUV and RGB formatted images. It provides mechanisms to render YUV and RGBA surfaces on the display. The display can be selected among the available/connected displays (platform dependent).

The NvMedia Image Processing Pipeline (IPP) framework provides high dynamic range (HDR) camera processing which outputs images for human and machine vision. It handles individual camera processing or multiple cameras connected to an image aggregator chip.

NvMedia IPP connects individual image components which operate inside the NvMedia Image domain. These components include:

NvMedia IPP components interconnect to one another like a graph, forming a processing pipeline. NvMedia IPP uses image buffers and queues between the components to send the processed images to the next component. Each component maintains its own buffers, called the buffer pool. The IPP framework creates and manages threads for each component.

ISP also outputs image statistics that the Control Algorithm component uses to calculate the proper ISP and sensor exposure settings to achieve proper auto white balance and auto exposure.

The NvMedia EGLStream component supports GL/CUDA interoperability of the NvMedia domain. EGLStream provides the interface to post or retrieve the raw YUV or RGB images. It provides the channel of communication to connect the NvMedia domain with GL/CUDA domain. Any NvMedia surface can be rendered on the screen using any type of consumer using EGLStream.

NvMedia EGLStream component provides the following features:

This interoperability feature of NvMedia EGLStream makes it useful in many applications. You can easily realize NvMedia Image, Video producers/consumers interacting with GL/CUDA producers/consumers.

The following diagram show how camera input is processed.

The NvMedia ISC provides a framework for image sensor control. It includes programming I2C controlled external components such as aggregators, and image sensors. It provides the following features:

NvMedia ISC exposes a user space interface that supports configuring and controlling the sensors, aggregating, and serializing. In addition, it can turn on and configure the camera inputs and respond to interrupts.

Media Format	Description
MPEG-1	Includes all slices beginning with start codes 0x00000101 through 0x000001AF. The slice start code must be included for all slices.
H.264	Supports bit streams with Baseline / Main / High profile up to 5.1 level. If desired: • The slice start code prefix may be included in a separate bitstream buffer array entry to the actual slice data extracted from the bitstream. • Multiple bitstream buffer array entries (e.g., one per slice) may point at the same physical data storage for the slice start code prefix.
H.265	Supports bit streams with Main profile up to 5.1 high tier.
VC-1 Simple and Main Profile	Consists of a single slice per picture. Does not use start codes to delimit pictures. The container format must indicate where each picture begins and ends. Slice start codes must not be included in the data passed to NvMedia, pass in the exact data from the bitstream. Header information, contained in the bitstream, must be parsed by the application and passed to NvMedia using the “picture information” data structure. This header information explicitly must not be included in the bitstream data passed to NvMedia for this encoding format.
Vc-1 Advanced Profile	Includes all slices beginning with start codes 0x0000010D (frame), 0x0000010C (field) or 0x0000010B (slice). The slice start code must be included in all cases. Some VC-1 advanced profile streams do not contain slice start codes; again, the container format must indicate where picture data begins and ends. In this case, pictures are assumed to be progressive and to contain a single slice. It is highly recommended that applications detect this condition, and add the missing start codes to the bitstream passed to NvMedia. However, NvMedia implementations must allow bitstreams with missing start codes, and act as if a 0x0000010D (frame) start code had been present. • Pictures containing multiple slices, or interlace streams, must contain a complete set of slice start codes in the original bitstream; without them, it is not possible to correctly parse and decode the stream. The bitstream passed to NvMedia should contain all original emulation prevention bytes present in the original bitstream; do not remove these from the bitstream.
MPEG-4 Part 2	Includes all slices beginning with start codes 0x000001B6. The slice start code must be included for all slices.
VP8	Includes VP8 bitstream with level 0-4.
VP9	Includes all VP9 bitstream with profile 0.

Conversion Type	Composition	Alpha Blend	Scaling	Cropping	ProcAmp	De-Interlace
YUV to RGB	Yes	Yes	Yes	Yes	Yes	Yes
YUV to YUV	No	No	Yes	Yes	No	Yes
RGB to RGB	No	No	Yes	Yes	No	No
RGB to YUV	No	No	Yes	Yes	No	No

Mode	Picture Structure	Previous 2	Previous	Current	Next
Progressive	NVMEDIA_PICTURE_STRUCTURE_FRAME	NULL	NULL	Current	NULL
Bob	NVMEDIA_PICTURE_STRUCTURE_TOP_FIELD NVMEDIA_PICTURE_STRUCTURE_BOTTOM_FIELD	NULL	NULL	Current	NULL
Advanced1 (Half-rate) Top Field First	NVMEDIA_PICTURE_STRUCTURE_TOP_FIELD	Past	Past	Current	Current
Advanced1 (Half-rate) Bottom Field First	NVMEDIA_PICTURE_STRUCTURE_BOTTOM_FIELD	Past	Past	Current	Current
Advanced1 (Full-rate) Top Field First	NVMEDIA_PICTURE_STRUCTURE_TOP_FIELD	Past	Past	Current	Current
Advanced1 (Full-rate) Top Field First	NVMEDIA_PICTURE_STRUCTURE_BOTTOM_FIELD	Past	Current	Current	Future
Advanced1 (Full-rate) Bottom Field First	NVMEDIA_PICTURE_STRUCTURE_BOTTOM_FIELD	Past	Past	Current	Current
Advanced1 (Full-rate) Bottom Field First	NVMEDIA_PICTURE_STRUCTURE_TOP_FIELD	Past	Current	Current	Future