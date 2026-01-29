This chapter explains how to use the PyNvVideoCodec APIs for video decode, encode, and transcode workflows. The chapter also covers how PyNvVideoCodec can exchange video data with popular deep learning frameworks, enabling smooth integration of PyNvVideoCodec into AI and computer-vision pipelines.

Extract encoded video packets from container formats using PyNvVideoCodec's demuxing capabilities.

In this section, we'll learn how to extract encoded video packets from container formats like MP4, MKV, and AVI using PyNvVideoCodec's demuxing APIs. Demuxing is the first step when working with the low-level decoder APIs.

Demuxing (demultiplexing) is the process of extracting encoded video packets from container formats. A container format (like MP4 or MKV) wraps the actual video bitstream along with metadata, audio streams, and other data. The demuxer parses this container and provides individual encoded video packets that can be fed to a decoder.



Demuxing is required when using the low-level CreateDecoder API. If you're using SimpleDecoder or ThreadedDecoder , demuxing is handled automatically for you.

Use explicit demuxing when you need:

Fine-grained control over packet processing

Access to packet-level metadata (PTS, DTS, flags)

Custom streaming or network-based video sources

SEI message extraction during decoding

PyNvVideoCodec provides two ways to demux video data:

File-based demuxing reads directly from video files on disk. This is the simplest approach for processing local files and supports seeking.

Buffer-based demuxing reads from memory buffers via a callback function. This approach is useful for network streaming, encrypted content, or any scenario where video data is already in memory.



Choose the demuxing approach that fits your use case:

Demuxing from File - For processing local video files

Demuxing from Memory - For streaming and custom data sources

Extract encoded video packets from local video files using file-based demuxing.

Example

The following example demonstrates the complete decode pipeline:

Video File → Demuxer → Packets → Decoder → Raw Frames

Step 1: Create the Demuxer

Import PyNvVideoCodec and create a demuxer by passing the path to your video file:

Copy Copied! import PyNvVideoCodec as nvc # Create demuxer to read video file nv_dmx = nvc.CreateDemuxer(filename="input.mp4")

Step 2: Query Stream Properties

The demuxer exposes stream metadata that you can use to configure the decoder or for display purposes:

Copy Copied! # Query stream properties for decoder setup print("FPS:", nv_dmx.FrameRate()) print("Resolution:", nv_dmx.Width(), "x", nv_dmx.Height())

Step 3: Create the Decoder

Create a hardware decoder using the codec information from the demuxer. The GetNvCodecId() method returns the codec type detected in the video stream:

Copy Copied! # Create decoder using demuxer's codec information nv_dec = nvc.CreateDecoder( gpuid=0, codec=nv_dmx.GetNvCodecId(), usedevicememory=True )

Step 4: Iterate and Decode

The demuxer is iterable. Loop over it to retrieve packets, then pass each packet to the decoder. The decoder may return zero, one, or multiple frames per packet (due to B-frame reordering):

Copy Copied! # Iterate over packets and decode for packet in nv_dmx: # Decode returns a list of frames (0 to N depending on B-frame reordering) for decoded_frame in nv_dec.Decode(packet): # Process frame - access via CUDA Array Interface frame_ptr = decoded_frame.cuda() # ... process frame data ...





Note

The demuxer uses FFmpeg internally for container parsing.

Seeking accuracy depends on keyframe placement in the video. The demuxer seeks to the nearest keyframe before the requested timestamp.

the requested timestamp. The decoder may buffer frames internally for B-frame reordering. After processing all packets, call Flush() on the decoder to retrieve remaining buffered frames.

on the decoder to retrieve remaining buffered frames. For buffer-based demuxing (streaming, network sources), see Demuxing from Memory.

APIs Used

The following APIs are used in this example:

CreateDemuxer() – Create a demuxer from a video file

Demuxer.FrameRate() – Get the video frame rate

Demuxer.Width() / Height() – Get the video dimensions

Demuxer.GetNvCodecId() – Get the codec identifier for decoder creation

CreateDecoder() – Create a hardware decoder

Demuxer Iterator – Iterate over video packets

Decoder.Decode() – Decode a packet and return frames

Sample Applications

See these sample applications in the samples/advanced/ directory:

decode.py – Basic video decoding using demuxer and native decoder. Demonstrates the complete pipeline from file to raw YUV frames.

decode_with_cuda_control.py – Decoding with explicit CUDA context and stream management for advanced GPU control.

decode_with_low_latency.py – Low-latency decoding modes for real-time applications.

Process video data directly from memory buffers using buffer-based demuxing.

Example

The following example demonstrates buffer-based demuxing where video data is read from memory instead of directly from a file:

Memory Buffer → Data Feeder → Demuxer → Packets → Decoder → Raw Frames

Step 1: Create a Data Feeder Class

Create a class that reads video data into memory and provides a callback method to feed chunks to the demuxer:

Copy Copied! class VideoStreamFeeder: """Class to handle feeding video data in chunks to the demuxer.""" def __init__(self, file_path): # Read entire file into memory buffer with open(file_path, 'rb') as f: self.video_buffer = bytearray(f.read()) self.current_pos = 0 self.bytes_remaining = len(self.video_buffer) def feed_chunk(self, demuxer_buffer): """Feed next chunk of video data to demuxer buffer. Returns: Number of bytes copied, 0 if no more data (EOF) """ buffer_capacity = len(demuxer_buffer) chunk_size = min(self.bytes_remaining, buffer_capacity) if chunk_size == 0: return 0 # Signal end of stream # Copy data to demuxer buffer demuxer_buffer[:] = self.video_buffer[self.current_pos:self.current_pos + chunk_size] self.current_pos += chunk_size self.bytes_remaining -= chunk_size return chunk_size

Step 2: Create the Buffer-Based Demuxer

Pass the callback function to CreateDemuxer() instead of a filename. The demuxer will call this function whenever it needs more data:

Copy Copied! import PyNvVideoCodec as nvc # Create data feeder with video file loaded into memory data_feeder = VideoStreamFeeder("input.mp4") # Create demuxer using the callback function buffer_demuxer = nvc.CreateDemuxer(data_feeder.feed_chunk)

Step 3: Create the Decoder

Create a hardware decoder using the codec information from the demuxer, the same as file-based demuxing:

Copy Copied! # Create decoder using demuxer's codec information buffer_decoder = nvc.CreateDecoder( gpuid=0, codec=buffer_demuxer.GetNvCodecId(), cudacontext=0, cudastream=0, usedevicememory=True )

Step 4: Iterate and Decode

The demuxer is iterable. Loop over it to retrieve packets, then pass each packet to the decoder:

Copy Copied! # Iterate over packets and decode for packet in buffer_demuxer: for decoded_frame in buffer_decoder.Decode(packet): # Process frame - access via CUDA Array Interface frame_ptr = decoded_frame.cuda() # ... process frame data ...





Note

The callback function receives a pre-allocated buffer from the demuxer and must return the number of bytes copied.

Return 0 from the callback to signal end of stream.

This approach is useful for network streaming, encrypted content, or video data from databases.

The decode pipeline after demuxer creation is identical to file-based demuxing.

APIs Used

The following APIs are used in this example:

CreateDemuxer(callback) – Create a demuxer using a callback function for buffer-based input

Demuxer.GetNvCodecId() – Get the codec identifier for decoder creation

CreateDecoder() – Create a hardware decoder

Demuxer Iterator – Iterate over video packets

Decoder.Decode() – Decode a packet and return frames

Sample Applications

See this sample application for a complete implementation:

decode_from_memory_buffer.py – Demonstrates buffer-based demuxing with a VideoStreamFeeder class that reads video data into memory and feeds chunks to the demuxer through a callback

Query video stream metadata using PyNvVideoCodec's demuxer and decoder APIs.

PyNvVideoCodec provides APIs to query video stream metadata including resolution, codec, frame rate, duration, and more. This metadata is useful for configuring processing pipelines and understanding video properties.

APIs

The following APIs are available for querying stream metadata:

SimpleDecoder

get_stream_metadata() – Get basic stream metadata (codec, resolution, frame rate, duration)

get_scanned_stream_metadata() – Get accurate metadata by scanning the entire video file

ThreadedDecoder

get_stream_metadata() – Get basic stream metadata

get_scanned_stream_metadata() – Get accurate metadata by scanning

Demuxer

FrameRate() – Get video frame rate

Width() / Height() – Get video dimensions

GetNvCodecId() – Get codec identifier

ChromaFormat() – Get chroma subsampling format

BitDepth() – Get bit depth

PyNvVideoCodec provides robust hardware-accelerated video decoding capabilities, leveraging NVIDIA GPUs to efficiently decode various video formats. This section introduces three decoder interfaces, each optimized for specific use cases, and explains how to use them for frame sampling and decoding.

Understand the different decoder interfaces available in PyNvVideoCodec and how to choose the right one for your use case.

PyNvVideoCodec provides two high-level decoder interfaces optimized for common use cases. For advanced scenarios requiring fine-grained control, a low-level decoding API is also available.

Available Decoder Interfaces

SimpleDecoder

The SimpleDecoder is a high-level interface designed for ease of use. It provides built-in demuxing, frame indexing, and random access capabilities.

ThreadedDecoder

The ThreadedDecoder is optimized for maximum throughput in batch processing scenarios. It uses internal threading to overlap decoding with frame processing.



Low-Level Decoding API

For advanced scenarios requiring fine-grained control, use CreateDecoder() to create a native decoder. This requires explicit demuxing but offers control over packet processing, SEI message extraction, low-latency modes, and resolution reconfiguration. See Core Decoder for Low-Level Control for details.

Learn how to efficiently sample frames from videos for deep learning training and inference using PyNvVideoCodec's SimpleDecoder.

The SimpleDecoder provides a powerful and flexible interface for frame sampling from video datasets. It supports multiple access patterns optimized for different deep learning workflows, from training data preparation to real-time inference.

Example

The following example demonstrates multi-file video decoding with frame sampling and PyTorch tensor conversion:

Video Files → SimpleDecoder → Frame Sampling → PyTorch Tensors

Step 1: Create the SimpleDecoder

Create a SimpleDecoder with RGB output format for deep learning workflows.

Copy Copied! import PyNvVideoCodec as nvc decoder = nvc.SimpleDecoder( video_path, gpu_id=0, use_device_memory=True, output_color_type=nvc.OutputColorType.RGB # RGB format for DL )

Step 2: Get Total Frame Count

Use len() to get the total number of frames in the video:

Copy Copied! # Get total frames in the video total_frames = len(decoder) print(f"Video has {total_frames} frames")

Step 3: Calculate Sample Indices

Create evenly spaced frame indices across the video duration for balanced sampling:

Copy Copied! import numpy as np # Sample frames evenly across the video num_frames = 16 # Number of frames to sample frame_indices = np.linspace(0, total_frames-1, num_frames, dtype=int).tolist() print(f"Sampling frames at indices: {frame_indices}")

Step 4: Get Batch Frames by Index

Use get_batch_frames_by_index() to retrieve specific frames in one operation:

Copy Copied! # Get batch of frames by indices decoded_frames = decoder.get_batch_frames_by_index(frame_indices)

Step 5: Convert to PyTorch Tensors

Convert decoded frames to PyTorch tensors using DLPack for zero-copy transfer:

Copy Copied! import torch # Convert frames to torch tensors frames_tensor = torch.stack([ torch.from_dlpack(frame) for frame in decoded_frames ]) print(f"Tensor shape: {frames_tensor.shape}") # [N, H, W, C]

Step 6: Reconfigure Decoder for Multiple Videos

Reuse the decoder for subsequent videos using reconfigure_decoder() :

Copy Copied! # Process multiple video files efficiently video_files = ['video1.mp4', 'video2.mp4', 'video3.mp4'] for i, video_file in enumerate(video_files): if i == 0: # First video - decoder already created pass else: # Reconfigure decoder for subsequent files torch.cuda.current_stream().synchronize() decoder.reconfigure_decoder(video_file) # Process frames from current video total_frames = len(decoder) frame_indices = np.linspace(0, total_frames-1, num_frames, dtype=int).tolist() decoded_frames = decoder.get_batch_frames_by_index(frame_indices) # ... process frames ...





Frame Access Patterns

SimpleDecoder supports multiple frame fetching patterns:

Single Frame: decoder[10] – Access frame at index 10

– Access frame at index 10 Slice: decoder[0:100:5] – Get every 5th frame from 0 to 100

– Get every 5th frame from 0 to 100 Sequential Batch: decoder.get_batch_frames(16) – Get 16 consecutive frames

– Get 16 consecutive frames Indexed Batch: decoder.get_batch_frames_by_index([0, 10, 20]) – Get specific frames

Note

SimpleDecoder requires seekable container formats (MP4, MKV, AVI). Elementary streams are not supported.

Use output_color_type=nvc.OutputColorType.RGBP for planar CHW format (common in PyTorch models).

for planar CHW format (common in PyTorch models). Call torch.cuda.current_stream().synchronize() before reconfiguring to ensure all GPU operations complete.

APIs Used

The following APIs are used in this example:

SimpleDecoder() – Constructor with all parameters

len(decoder) – Get total frame count

decoder[index] – Single frame and slice access

get_batch_frames() – Get sequential batch of frames

get_batch_frames_by_index() – Get frames by specific indices

seek_to_index() – Seek to specific frame position

get_index_from_time_in_seconds() – Convert time to frame index

get_stream_metadata() – Get basic stream metadata

get_scanned_stream_metadata() – Get accurate metadata by scanning

reconfigure_decoder() – Reconfigure for different video

Sample Applications

PyNvVideoCodec includes sample applications demonstrating SimpleDecoder usage:

simple_decode_sampling.py – Multi-file video decoding with frame sampling and PyTorch tensor conversion

simple_decode_tutorial.ipynb – Interactive Jupyter notebook tutorial covering multiple frame access methods

These samples can be found in the samples/ directory.

Efficiently process multiple video files by reusing decoder instances with SimpleDecoder's built-in caching mechanism.

When processing multiple video files, creating a new decoder for each video introduces significant overhead. SimpleDecoder addresses this with decoder caching - an LRU (Least Recently Used) cache that stores and reuses decoder instances based on video properties.

Example

The following example demonstrates efficient processing of multiple video files using decoder caching:

Video Files → SimpleDecoder (with cache) → Reconfigure → Process Next Video

Step 1: Create SimpleDecoder with Caching Parameters

Configure the decoder with max_width , max_height , and decoder_cache_size to enable caching across multiple videos:

Copy Copied! import PyNvVideoCodec as nvc # Create decoder with caching enabled decoder = nvc.SimpleDecoder( "video1.mp4", gpu_id=0, use_device_memory=True, max_width=2048, max_height=2048, decoder_cache_size=4 # Cache up to 4 decoder instances )

Step 2: Process First Video

Decode frames from the first video using any of SimpleDecoder's access methods:

Copy Copied! # Get total frames and process total_frames = len(decoder) print(f"Video 1 has {total_frames} frames") # Access frames using indexing frames = decoder[0:10] # Get first 10 frames

Step 3: Reconfigure for Next Video

Use reconfigure_decoder() to switch to a new video source. If the new video's properties match a cached decoder, it will be reused:

Copy Copied! # Reconfigure decoder for next video decoder.reconfigure_decoder("video2.mp4") # Process the new video total_frames = len(decoder) print(f"Video 2 has {total_frames} frames") frames = decoder[0:10] # Get first 10 frames

Step 4: Process Multiple Videos in a Loop

Efficiently process a batch of video files:

Copy Copied! video_files = ["video1.mp4", "video2.mp4", "video3.mp4"] for i, video_file in enumerate(video_files): if i == 0: # First video - decoder already created pass else: # Reconfigure for subsequent videos decoder.reconfigure_decoder(video_file) # Process frames from current video total_frames = len(decoder) frames = decoder[0:16] # Sample first 16 frames print(f"Processed {len(frames)} frames from {video_file}")





Cache Behavior

The decoder cache uses an LRU (Least Recently Used) eviction policy:

Lookup: When reconfiguring, SimpleDecoder checks the cache for a decoder matching the new video's properties Reuse: If a matching decoder is found (cache hit), it's reused immediately Create: If no match is found (cache miss), a new decoder is created Eviction: If the cache is full, the least recently used decoder is removed

Cache Key Properties:

Decoders are matched based on:

Video codec (H.264, HEVC, VP9, AV1)

Bit depth (8-bit, 10-bit, 12-bit)

Chroma format (4:2:0, 4:2:2, 4:4:4)

Resolution within max_width and max_height

Note

Set max_width and max_height to the largest resolution you expect to process for maximum cache reuse.

and to the largest resolution you expect to process for maximum cache reuse. Increase decoder_cache_size if processing videos with different codecs or bit depths.

if processing videos with different codecs or bit depths. Videos with the same codec, bit depth, and chroma format will share cached decoders.

Cache is automatically managed - no manual cleanup required.

APIs Used

The following APIs are used in this example:

SimpleDecoder() – Constructor with caching parameters

len(decoder) – Get total frame count

decoder[index] – Frame access using indexing

reconfigure_decoder() – Switch to a different video source

Sample Applications

See this sample application for a complete implementation:

simple_decode_sampling.py – Multi-file video decoding with decoder reconfiguration

ThreadedDecoder enables background frame decoding on a dedicated thread, ensuring a continuous supply of ready-to-process frames for inference pipelines.

ThreadedDecoder continuously decodes frames in the background and maintains a preloaded buffer of ready-to-use frames. With this approach the decoder latency could be hidden behind inference.

Example

The following example demonstrates ThreadedDecoder usage for video analytics pipelines:

Video File → ThreadedDecoder (Background Prefetch) → Batched Frames → PyTorch Tensors

Step 1: Import Required Modules

Import ThreadedDecoder and OutputColorType from PyNvVideoCodec, along with PyCUDA for GPU context management:

Copy Copied! from PyNvVideoCodec import ThreadedDecoder, OutputColorType import pycuda.driver as cuda from pycuda.autoinit import context import torch





Output Color Formats

Choose the output format based on your model requirements:

OutputColorType.RGBP – Planar RGB (CHW format). Preferred for most PyTorch/TensorFlow models.

– Planar RGB (CHW format). Preferred for most PyTorch/TensorFlow models. OutputColorType.RGB – Interleaved RGB (HWC format). Use when your pipeline expects HWC layout.

– Interleaved RGB (HWC format). Use when your pipeline expects HWC layout. OutputColorType.NV12 – Native decoder output. Most efficient if your pipeline can handle YUV.

Note

ThreadedDecoder prefetches frames in the background, so get_batch_frames() returns immediately with already-decoded frames.

returns immediately with already-decoded frames. An empty list from get_batch_frames() indicates end of video.

indicates end of video. Use torch.cuda.current_stream().synchronize() before reconfiguring to ensure all GPU operations complete.

before reconfiguring to ensure all GPU operations complete. For random access patterns, consider SimpleDecoder instead.

Step 2: Create the ThreadedDecoder

Initialize ThreadedDecoder with the video path, buffer size, and output color format. Use OutputColorType.RGBP (planar RGB in CHW format) for deep learning models:

Copy Copied! # Configure decoder parameters color_format = OutputColorType.RGBP # Planar RGB (CHW) for DL models batch_size = 3 # Process 3 frames at a time # Initialize ThreadedDecoder decoder = ThreadedDecoder( enc_file_path="input.mp4", # Input video path buffer_size=12, # Number of frames to prefetch gpu_id=0, # GPU device ID use_device_memory=True, # Keep frames in GPU memory output_color_type=color_format )

Step 3: Get Stream Metadata

Query the video stream metadata to get the total number of frames and other properties:

Copy Copied! # Get video information metadata = decoder.get_stream_metadata() num_frames = metadata.num_frames print(f"Video has {num_frames} frames")

Step 4: Process Frames in Batches

Use get_batch_frames() to retrieve prefetched frames. Convert to PyTorch tensors using DLPack for zero-copy transfer:

Copy Copied! # Process video frames in batches frame_count = 0 while frame_count < num_frames: # Get batch of prefetched frames (returns immediately) frames = decoder.get_batch_frames(batch_size) if len(frames) == 0: break # Convert frames to PyTorch tensors for frame in frames: tensor = torch.from_dlpack(frame) # tensor shape: [C, H, W] for RGBP, [H, W, C] for RGB # Normalize for model input normalized = tensor.float() / 255.0 # ... run inference with your model ... frame_count += len(frames)

Step 5: Reconfigure for Multiple Videos

Reuse the decoder for subsequent videos using reconfigure_decoder() :

Copy Copied! # Process multiple video files efficiently video_files = ['video1.mp4', 'video2.mp4', 'video3.mp4'] for i, video_file in enumerate(video_files): if i == 0: # First video - decoder already created pass else: # Reconfigure decoder for subsequent files torch.cuda.current_stream().synchronize() decoder.reconfigure_decoder(video_file) # Process frames from current video metadata = decoder.get_stream_metadata() while True: frames = decoder.get_batch_frames(batch_size) if len(frames) == 0: break # ... process frames ...





Buffer Size Selection

The buffer_size parameter controls how many frames are prefetched in the background:

Recommended: 2-3x your batch size (e.g., for batch_size=4 , use buffer_size=8-12 )

, use ) Larger buffers provide more cushion for variable inference times but consume more GPU memory

Smaller buffers reduce memory usage but may cause stalls if inference is slower than decoding

APIs Used

The following APIs are used in this example:

ThreadedDecoder() – Constructor with all parameters

get_stream_metadata() – Get video stream metadata

get_batch_frames() – Get batch of prefetched frames

reconfigure_decoder() – Reconfigure for different video

Sample Applications

PyNvVideoCodec includes sample applications demonstrating ThreadedDecoder usage:

object_detection_tutorial.ipynb – Interactive Jupyter notebook demonstrating ThreadedDecoder integration with Faster R-CNN model for real-time object detection

These samples are located in the samples/jupyter/ directory.

The Core Decoder provides direct access to NVDEC hardware for fine-grained control over video decoding operations.

The Core Decoder (also known as the native decoder) is the low-level decoding interface that gives you complete control over the decode pipeline. Unlike SimpleDecoder and ThreadedDecoder which handle demuxing internally, the Core Decoder requires explicit demuxing and packet management.

When to Use Core Decoder

Use the Core Decoder when you need:

SEI message extraction: Access to Supplemental Enhancement Information embedded in the video stream

Access to Supplemental Enhancement Information embedded in the video stream Low-latency decoding: Control over decode latency modes for real-time applications

Control over decode latency modes for real-time applications Resolution reconfiguration: Switch between videos with different resolutions without recreating the decoder

Switch between videos with different resolutions without recreating the decoder Packet-level control: Fine-grained control over individual packet processing

Fine-grained control over individual packet processing Custom streaming sources: Decode from network streams or memory buffers

Decode from network streams or memory buffers Decode statistics: Extract QP values, coding-unit types, and motion vectors

Decode Pipeline

The Core Decoder pipeline requires explicit management of each stage:

Video File → Demuxer → Packets → Core Decoder → Raw Frames

You must create a demuxer to extract packets from the container format, then feed those packets to the decoder. This separation provides flexibility but requires more code than the high-level interfaces.



Example

The following example demonstrates the complete Core Decoder workflow:

Copy Copied! import PyNvVideoCodec as nvc # Step 1: Create demuxer to read video file nv_dmx = nvc.CreateDemuxer(filename="input.mp4") # Step 2: Query stream properties print(f"Resolution: {nv_dmx.Width()}x{nv_dmx.Height()}") print(f"Codec: {nv_dmx.GetNvCodecId()}") print(f"FPS: {nv_dmx.FrameRate()}") # Step 3: Create Core Decoder using demuxer's codec information nv_dec = nvc.CreateDecoder( gpuid=0, codec=nv_dmx.GetNvCodecId(), usedevicememory=True ) # Step 4: Iterate over packets and decode frame_count = 0 for packet in nv_dmx: # Decode returns a list of frames (0 to N due to B-frame reordering) for decoded_frame in nv_dec.Decode(packet): # Access frame via CUDA Array Interface frame_ptr = decoded_frame.cuda() frame_count += 1 # ... process frame data ... # Step 5: Flush remaining frames from decoder buffer for decoded_frame in nv_dec.Flush(): frame_count += 1 print(f"Decoded {frame_count} frames")





Resolution Reconfiguration

The Core Decoder supports dynamic resolution changes using setReconfigParams() . This allows you to decode multiple videos with different dimensions using a single decoder instance:

Copy Copied! # Create decoder with max dimensions to accommodate all streams nv_dec = nvc.CreateDecoder( gpuid=0, codec=codec_id, usedevicememory=True, maxwidth=3840, # Maximum width across all videos maxheight=2160 # Maximum height across all videos ) # Decode first video... # Reconfigure for second video with different dimensions nv_dec.setReconfigParams(new_width, new_height) # Continue decoding second video...





APIs Used

The following APIs are used with the Core Decoder:

CreateDemuxer() – Create a demuxer to extract packets

CreateDecoder() – Create the Core Decoder

Decoder.Decode() – Decode a packet and return frames

Decoder.Flush() – Flush remaining buffered frames

Decoder.setReconfigParams() – Reconfigure decoder for new resolution

Sample Applications

See these sample applications demonstrating Core Decoder usage:

decode.py – Basic decoding with Core Decoder

decode_with_cuda_control.py – Explicit CUDA context and stream management

decode_with_low_latency.py – Low-latency decoding modes

decode_reconfigure.py – Dynamic resolution reconfiguration

decode_sei_msg.py – SEI message extraction

Configure decode latency modes for real-time and low-latency video processing applications.

PyNvVideoCodec provides different latency modes for video decoding, which control the timing of when decoded frames are made available to the application. Understanding these modes is crucial for applications that require real-time or low-latency processing.

DisplayDecodeLatencyType Enumeration

The DisplayDecodeLatencyType enumeration defines three possible latency modes:

NATIVE: For a stream with B-frames, there is at least 1 frame latency between submitting an input packet and getting the decoded frame in display order.

For a stream with B-frames, there is at least 1 frame latency between submitting an input packet and getting the decoded frame in display order. LOW: For All-Intra and IPPP sequences (without B-frames), there is no latency between submitting an input packet and getting the decoded frame in display order. Do not use this flag if the stream contains B-frames. This mode maintains proper display ordering.

For All-Intra and IPPP sequences (without B-frames), there is no latency between submitting an input packet and getting the decoded frame in display order. Do not use this flag if the stream contains B-frames. This mode maintains proper display ordering. ZERO: Enables zero latency for All-Intra / IPPP streams. Do not use this flag if the stream contains B-frames. This mode maintains decode ordering.

Understanding Latency in H.264/HEVC Decoding

In H.264 and HEVC, there is an inherent display latency for video content with frame reordering (typically due to B-frames). Even for All-Intra and IPPP sequences, if num_reorder_frames is not explicitly set to 0 in the Video Usability Information (VUI), there can still be display latency. The LOW and ZERO latency modes help eliminate this latency for appropriate content types.



Implementing Low-Latency Decoding

To achieve low-latency decoding, you need to:

Set the appropriate DisplayDecodeLatencyType when creating the decoder For packets containing exactly one frame or field, set the ENDOFPICTURE flag to trigger immediate decode callback

Code Example:

Copy Copied! import PyNvVideoCodec as nvc # Create a decoder with low latency mode nvdec = nvc.CreateDecoder( gpuid=0, codec=nvc.cudaVideoCodec.H264, cudacontext=cuda_ctx.handle, cudastream=cuda_stream.handle, latency=nvc.DisplayDecodeLatencyType.LOW ) # When processing packets in low latency mode for packet in demuxer: # If using LOW or ZERO latency mode # and packet contains exactly one frame if decode_latency == nvc.DisplayDecodeLatencyType.LOW or \ decode_latency == nvc.DisplayDecodeLatencyType.ZERO: # Set flag to trigger decode callback immediately # when packet contains exactly one frame packet.decode_flag = nvc.VideoPacketFlag.ENDOFPICTURE # Decode the packet frames = nvdec.Decode(packet) for frame in frames: # Process frame here process_frame(frame)

Note: The ENDOFPICTURE flag is only effective for content without B-frames (All-Intra or IPPP sequences). For content with B-frames, some inherent latency will remain due to the nature of bidirectional prediction.





Sample Applications

See the following sample application for a complete low-latency decoding implementation:

decode_with_low_latency.py – Demonstrates all three latency modes with proper packet flag handling

Extract and process Supplemental Enhancement Information (SEI) messages from video streams.

SEI (Supplemental Enhancement Information) messages are metadata embedded in video bitstreams that provide additional information such as HDR metadata, timecode data, and custom application-specific data.

Example

The following example demonstrates SEI message extraction from a video file:

Video File → Demuxer → Decoder (SEI enabled) → Decoded Frames → SEI Messages

Step 1: Initialize CUDA Context

Initialize PyCUDA and create a CUDA context for GPU operations:

Copy Copied! import pycuda.driver as cuda import PyNvVideoCodec as nvc cuda.init() cuda_device = cuda.Device(0) cuda_ctx = cuda_device.retain_primary_context() cuda_ctx.push() cuda_stream = cuda.Stream()

Step 2: Create Demuxer

Create a demuxer to read the video file and extract encoded packets:

Copy Copied! # Create demuxer to read video file nv_dmx = nvc.CreateDemuxer(filename="input.mp4") print(f"FPS = {nv_dmx.FrameRate()}")

Step 3: Create Decoder with SEI Enabled

Create a decoder with enableSEIMessage=1 to enable SEI message extraction:

Copy Copied! # Create decoder with SEI extraction enabled nv_dec = nvc.CreateDecoder( gpuid=0, codec=nv_dmx.GetNvCodecId(), cudacontext=cuda_ctx.handle, cudastream=cuda_stream.handle, usedevicememory=True, enableSEIMessage=1 # Enable SEI message extraction )

Step 4: Decode and Extract SEI Messages

Iterate over packets, decode frames, and extract SEI messages using getSEIMessage() :

Copy Copied! import ctypes # Decode and extract SEI messages for packet in nv_dmx: for decoded_frame in nv_dec.Decode(packet): # Get SEI messages from decoded frame seiMessage = decoded_frame.getSEIMessage() if seiMessage: for sei_info, sei_message in seiMessage: sei_type = sei_info["sei_type"] sei_uncompressed = sei_info["sei_uncompressed"] print(f"SEI Type: {sei_type}, Size: {len(sei_message)} bytes")

Step 5: Parse SEI Message Types

Parse different SEI message types using ctypes structures. Common types include timecode, HDR metadata (mastering display, content light level), and alternative transfer characteristics:

Copy Copied! # Parse SEI based on type (when sei_uncompressed == 1) if sei_uncompressed == 1: buffer = (ctypes.c_ubyte * len(sei_message))(*sei_message) # Handle different SEI message types if sei_type in (nvc.SEI_TYPE.TIME_CODE_H264, nvc.SEI_TYPE.TIME_CODE): # Parse timecode structure pass elif sei_type == nvc.SEI_TYPE.MASTERING_DISPLAY_COLOR_VOLUME: # Parse HDR mastering display info pass elif sei_type == nvc.SEI_TYPE.CONTENT_LIGHT_LEVEL_INFO: # Parse content light level info pass elif sei_type == nvc.SEI_TYPE.ALTERNATIVE_TRANSFER_CHARACTERISTICS: # Parse alternative transfer characteristics pass





Common SEI Types

PyNvVideoCodec provides constants for common SEI message types via nvc.SEI_TYPE :

TIME_CODE / TIME_CODE_H264 – Frame timing and sequence information

/ – Frame timing and sequence information MASTERING_DISPLAY_COLOR_VOLUME – HDR color space and primaries

– HDR color space and primaries CONTENT_LIGHT_LEVEL_INFO – HDR brightness metadata

– HDR brightness metadata ALTERNATIVE_TRANSFER_CHARACTERISTICS – Transfer function characteristics

Note

SEI extraction requires using CreateDecoder with enableSEIMessage=1 .

with . Not all videos contain SEI messages.

The sei_uncompressed flag indicates if the message can be parsed as a structured type.

flag indicates if the message can be parsed as a structured type. For SEI message encoding, see SEI Message Encoding.

APIs Used

The following APIs are used in this example:

CreateDemuxer() – Create a demuxer from a video file

Demuxer.FrameRate() – Get the video frame rate

Demuxer.GetNvCodecId() – Get the codec identifier

CreateDecoder() – Create a hardware decoder with SEI enabled

Decoder.Decode() – Decode a packet and return frames

DecodedFrame.getSEIMessage() – Get SEI messages from decoded frame

Sample Applications

See this sample application for a complete implementation:

decode_sei_msg.py – Demonstrates SEI message extraction and parsing for various SEI types including timecode and HDR metadata

Extract low-level decoding statistics including QP values, coding unit types, and motion vectors for video analysis.

PyNvVideoCodec provides access to detailed decoding statistics. These statistics include QP (Quantization Parameter) values, CU (Coding Unit) types, and motion vectors for each macroblock.

Example

The following example demonstrates decode statistics extraction using SimpleDecoder:

Video File → SimpleDecoder (stats enabled) → Decoded Frames → Statistics

Step 1: Create SimpleDecoder with Statistics Enabled

Create a SimpleDecoder with enableDecodeStats=True to enable statistics collection:

Copy Copied! import PyNvVideoCodec as nvc # Create decoder with statistics collection enabled simple_decoder = nvc.SimpleDecoder( "input.mp4", need_scanned_stream_metadata=False, use_device_memory=True, gpu_id=0, enableDecodeStats=True # Enable statistics collection )

Step 2: Get Stream Metadata

Query stream metadata for video information:

Copy Copied! # Get video metadata metadata = simple_decoder.get_stream_metadata() print(f"Video: {metadata.width}x{metadata.height}")

Step 3: Iterate and Extract Statistics

Iterate over decoded frames and check for available statistics using decode_stats_size :

Copy Copied! # Process frames and extract statistics for frame_idx, decoded_frame in enumerate(simple_decoder): # Check if statistics are available for this frame if hasattr(decoded_frame, 'decode_stats_size') and decoded_frame.decode_stats_size > 0: # Parse the statistics parsed_stats = decoded_frame.ParseDecodeStats() # Access statistics fields qp_values = parsed_stats.get("qp_luma", []) cu_types = parsed_stats.get("cu_type", []) if len(qp_values) > 0: avg_qp = sum(qp_values) / len(qp_values) print(f"Frame {frame_idx}: Avg QP = {avg_qp:.2f}")

Step 4: Analyze Statistics

The ParseDecodeStats() method returns a dictionary with the following fields:

Copy Copied! # Available statistics fields parsed_stats = decoded_frame.ParseDecodeStats() # QP Analysis - compression level per macroblock qp_luma = parsed_stats["qp_luma"] # List of QP values (higher = more compression) # CU Type Distribution - prediction mode per macroblock # 0=INTRA, 1=INTER, 2=SKIP, 3=PCM, 7=INVALID cu_type = parsed_stats["cu_type"] # Motion Vectors - temporal prediction info mv0_x = parsed_stats["mv0_x"] # L0 reference X component mv0_y = parsed_stats["mv0_y"] # L0 reference Y component mv1_x = parsed_stats["mv1_x"] # L1 reference X component (B-frames) mv1_y = parsed_stats["mv1_y"] # L1 reference Y component (B-frames)





Note

Statistics collection must be enabled at decoder creation time with enableDecodeStats=True .

. Enabling statistics incurs a small performance overhead.

Supported codecs: H.264 (AVC) and H.265 (HEVC).

Check decode_stats_size > 0 before calling ParseDecodeStats() .

before calling . CU types: 0=INTRA (spatial prediction), 1=INTER (temporal prediction), 2=SKIP (copy from reference), 3=PCM (uncompressed).

APIs Used

The following APIs are used in this example:

SimpleDecoder() – Constructor with enableDecodeStats parameter

parameter get_stream_metadata() – Get video stream metadata

decode_stats_size – Property indicating statistics data size (>0 if available)

– Property indicating statistics data size (>0 if available) ParseDecodeStats() – Parse statistics into a dictionary

Sample Applications

See this sample application for a complete implementation:

simple_decode_stats.py – SimpleDecoder-based statistics extraction with formatted output including QP analysis, CU type distribution, and motion vector statistics

This section provides an overview of the key workflows and features for video encoding, from basic frame encoding to advanced runtime configuration and metadata handling.

The encoder accepts raw frames from either CPU memory (numpy arrays) or GPU memory (CUDA buffers) and produces encoded bitstream data that can be written to files or streamed.



Basic Encoding Workflow – Step-by-step guide to encode raw frames to compressed video

Encoder Settings – Configure codec, bitrate, presets, and quality options

Encoder Reconfiguration – Change encoder parameters at runtime without recreating the session

SEI Message Encoding – Embed metadata and custom data in the bitstream

PyNvVideoCodec provides hardware-accelerated video encoding using NVIDIA GPUs. The encoder supports both CPU (host memory) and GPU (device memory) buffer modes.

Basic Encoding Workflow

The following steps demonstrate the complete encoding workflow:

Raw Frames → Buffer Preparation → Encoder → Encoded Bitstream

Step 1: Prepare Buffer for Encoding

Prepare input buffers based on your buffer mode. For CPU buffers, read raw YUV data into a numpy array. For GPU buffers, use CUDA device memory objects.

CPU Buffer Mode:

Copy Copied! import numpy as np # Calculate frame size based on format (NV12 = height * 1.5) frame_size = int(width * height * 1.5) # Read raw YUV frame into numpy array with open("input.yuv", "rb") as dec_file: chunk = np.fromfile(dec_file, np.uint8, count=frame_size)

GPU Buffer Mode:

Copy Copied! # For GPU buffers, use objects implementing CUDA Array Interface # The object must expose a cuda() method returning device pointers class AppFrame: def __init__(self, width, height, fmt): self.frameSize = int(width * height * 1.5) # NV12 # Allocate CUDA device memory def cuda(self): # Return CUDA Array Interface for each plane return [self.luma_cuda_interface, self.chroma_cuda_interface] input_frame = AppFrame(width, height, "NV12")

Step 2: Configure and Create Encoder

Create an encoder with CreateEncoder() specifying resolution, format, buffer mode, and encoding parameters. See CreateEncoder API Reference for all available parameters.

Copy Copied! import PyNvVideoCodec as nvc # Encoder configuration parameters config_params = { "gpu_id": 0, "codec": "h264", # Additional optional parameters (bitrate, preset, etc.) } # Create encoder: usecpuinputbuffer=True for CPU, False for GPU nvenc = nvc.CreateEncoder( width=1920, height=1080, format="NV12", usecpuinputbuffer=True, # True=CPU buffers, False=GPU buffers **config_params )

Step 3: Encode Frames and Flush

Pass frames to Encode() to get encoded bitstream. After processing all frames, call EndEncode() to flush remaining data from the encoder queue. See Encode API Reference and EndEncode API Reference.

Copy Copied! with open("output.h264", "wb") as enc_file: # Encode each frame for i in range(num_frames): chunk = np.fromfile(dec_file, np.uint8, count=frame_size) if chunk.size == 0: break # Encode frame - returns bitstream data bitstream = nvenc.Encode(chunk) enc_file.write(bytearray(bitstream)) # Flush encoder queue - REQUIRED to get remaining frames bitstream = nvenc.EndEncode() enc_file.write(bytearray(bitstream))

Step 4: Runtime Reconfiguration (Optional)

Change encoder parameters at runtime without recreating the encoder session using Reconfigure() . This is useful for adaptive bitrate streaming or handling network conditions. See Reconfigure API Reference for supported parameters.

Copy Copied! # Get current encoder parameters reconfig_params = nvenc.GetEncodeReconfigureParams() # Modify parameters (e.g., change bitrate) reconfig_params["averageBitrate"] = 5000000 # 5 Mbps # Apply new configuration nvenc.Reconfigure(reconfig_params)





Note

Supported formats: NV12, ARGB, ABGR, YUV444, YUV420, P010, YUV444_16bit

Supported codecs: H264, HEVC, AV1

For GPU buffer mode, input objects must implement the cuda() method exposing CUDA Array Interface

method exposing CUDA Array Interface Always call EndEncode() at the end to flush remaining encoded data

at the end to flush remaining encoded data Reconfigurable parameters: rateControlMode, averageBitrate, maxBitRate, vbvBufferSize, frameRateNum, frameRateDen

Sample Applications

See these sample applications for complete implementations:

encode.py – Unified encoding supporting both CPU and GPU buffer modes with configurable codec and format options

API Reference

For complete API specifications, see:

CreateEncoder() – Create an encoder instance

Encode() – Encode a raw frame

EndEncode() – Flush encoder and get remaining data

Reconfigure() – Change encoder parameters at runtime

Detailed explanation of video encoder parameters and configuration options for optimizing encoding quality, performance, and output characteristics.

Overview

PyNvVideoCodec provides hardware-accelerated video encoding with extensive configurability. This section explains the important parameters and values they can take, helping you optimize your encoder for specific use cases.

PyNvVideoCodec has been designed for simplified video encoding with appropriate default values. However, you can also access detailed optional parameters and the full flexibility offered by NVIDIA video technology stack.



Supported Codecs

NVIDIA GPUs support encoding for H.264, HEVC (H.265), and AV1 codecs. Depending on your hardware generation, not all codecs will be accessible. Refer to the NVIDIA Hardware Video Encoder section for information about supported codecs for each GPU architecture.

Codec Selection Guidelines:

H.264: Best compatibility across all devices and platforms. Suitable for streaming, video conferencing, and general use

Best compatibility across all devices and platforms. Suitable for streaming, video conferencing, and general use HEVC: Better compression efficiency (approximately 50% better than H.264) but requires more powerful decode hardware. Ideal for 4K content, archival, and OTT streaming

Better compression efficiency (approximately 50% better than H.264) but requires more powerful decode hardware. Ideal for 4K content, archival, and OTT streaming AV1: Next-generation codec with superior compression. Best for web streaming and modern devices

Presets

Encoder presets control the quality and performance tradeoff. NVENC offers seven presets from P1 (highest performance) to P7 (highest quality). Using these presets will automatically configure all relevant encoding parameters for the selected tuning information.

Preset Speed Best For P1 Fastest Real-time streaming, live broadcasts, cloud gaming P2-P3 Fast Video conferencing, game streaming, screen capture P4 Balanced (Default) General-purpose encoding, transcoding workflows P5-P6 Slow High-quality archival, OTT streaming, VOD content P7 Slowest Maximum quality archival, master copies, premium content

Higher presets produce better quality but encode slower. Specific attributes within a preset can be further tuned if required.



Tuning Information

The NVIDIA Encoder Interface exposes different tuning options to optimize the encoder for specific scenarios:

High Quality: Tune presets for latency-tolerant encoding. Suited for high-quality transcoding, video archiving, and encoding for OTT streaming

Tune presets for latency-tolerant encoding. Suited for high-quality transcoding, video archiving, and encoding for OTT streaming Low Latency: Tune presets for low latency streaming. Suited for cloud gaming, streaming, video conferencing, and high bandwidth channels with tolerance for bigger occasional frame sizes

Tune presets for low latency streaming. Suited for cloud gaming, streaming, video conferencing, and high bandwidth channels with tolerance for bigger occasional frame sizes Ultra-Low Latency: Tune presets for ultra low latency streaming. Suited for cloud gaming, streaming, and video conferencing in strictly bandwidth-constrained channels

Tune presets for ultra low latency streaming. Suited for cloud gaming, streaming, and video conferencing in strictly bandwidth-constrained channels Lossless: Tune presets for lossless encoding. Suited for preserving original video footage for later editing and general lossless data archiving (video or non-video)

Tune presets for lossless encoding. Suited for preserving original video footage for later editing and general lossless data archiving (video or non-video) Ultra High Quality: Tune presets for latency-tolerant encoding with higher quality. Suited for premium content creation and high-end video production. Only supported for HEVC and AV1 on Turing+ architectures

For low latency use cases (video conferencing), combine LOW_LATENCY tuning with P1 preset and IPP GOP pattern (no B-frames). For high quality archival, use HIGH_QUALITY tuning with P6 preset and IBBBP GOP pattern.



Rate Control and Bitrate

NVENC provides control over various parameters related to the rate control algorithm, allowing it to adapt the bitrate depending on your quality, bandwidth, and performance constraints. NVENC supports the following rate control modes:

Mode Description Best For CBR Constant Bitrate - Maintains steady bitrate throughout the video Streaming, broadcasting VBR Variable Bitrate - Adjusts bitrate based on content complexity File storage, VOD CQP Constant Quantization Parameter - Fixed quality level regardless of bitrate Quality testing, research Target Quality Targets a specific quality level, varying bitrate as needed Quality-focused encoding

The bitrate can also be capped to a maximum target value using the maxbitrate parameter. For more information about rate control, refer to the NVENC Video Encoder API Programming Guide.

Rate Control Guidelines:

CBR for streaming: Set codecconfig=CBR with bitrate and maxbitrate equal for strict constant bitrate

Set with and equal for strict constant bitrate VBR for file storage: Set codecconfig=VBR with bitrate as target and maxbitrate higher for peaks

Set with as target and higher for peaks CQP for constant quality: Set codecconfig=CQP with qp parameter (lower = higher quality, typical range: 18-28)

Surface Formats

PyNvVideoCodec supports various input surface formats for encoding. The surface format is specified using the format parameter when creating an encoder.

Format Description NV12 Semi-Planar YUV [Y plane followed by interleaved UV plane] - Most efficient format YV12 Planar YUV [Y plane followed by V and U planes] IYUV Planar YUV [Y plane followed by U and V planes] YUV444 Planar YUV [Y plane followed by U and V planes] YUV420_10BIT 10 bit Semi-Planar YUV [Y plane followed by interleaved UV plane]. Each pixel of size 2 bytes. Most Significant 10 bits contain pixel data. YUV444_10BIT 10 bit Planar YUV444 [Y plane followed by U and V planes]. Each pixel of size 2 bytes. Most Significant 10 bits contain pixel data. ARGB 8 bit Packed A8R8G8B8. Word-ordered format where a pixel is represented by a 32-bit word with B in the lowest 8 bits, G in the next 8 bits, R in the 8 bits after that and A in the highest 8 bits. ARGB10 10 bit Packed A2R10G10B10. Word-ordered format where a pixel is represented by a 32-bit word with B in the lowest 10 bits, G in the next 10 bits, R in the 10 bits after that and A in the highest 2 bits. ABGR 8 bit Packed A8B8G8R8. Word-ordered format where a pixel is represented by a 32-bit word with R in the lowest 8 bits, G in the next 8 bits, B in the 8 bits after that and A in the highest 8 bits. ABGR10 10 bit Packed A2B10G10R10. Word-ordered format where a pixel is represented by a 32-bit word with R in the lowest 10 bits, G in the next 10 bits, B in the 10 bits after that and A in the highest 2 bits. NV16 Semi-Planar YUV 422 [Y plane followed by interleaved UV plane] P210 Semi-Planar 10-bit YUV 422 [Y plane followed by interleaved UV plane]

Notes on Surface Format Usage:

Both 10-bit and 16-bit input frames result in 10-bit encoding

The colorspace conversion matrix can be specified using the colorspace option during CreateEncoder

option during NV12 format is most efficient and recommended when possible

Not all formats are supported on all GPU architectures; refer to your GPU's documentation for specific support information

GOP Structure

Group of Pictures (GOP) structure defines the pattern of I-frames (Intra-coded), P-frames (Predictive), and B-frames (Bidirectional predictive):

I (Intra): All-I frames. Largest size but best seek-ability and lowest latency

All-I frames. Largest size but best seek-ability and lowest latency IPP: I and P frames only. Good for low latency, no B-frames

I and P frames only. Good for low latency, no B-frames IBP: I, B, and P frames with one B-frame between references

I, B, and P frames with one B-frame between references IBBBP: Multiple B-frames between references. Best compression efficiency

Longer GOPs improve compression efficiency but reduce seek-ability. Typical GOP sizes: 30-250 frames.



Common Encoding Scenarios

Recommended settings for common use cases:

Use Case Codec Recommended Settings Live streaming H264 Preset P1, CBR, LOW_LATENCY, GOP=60 Video archival HEVC Preset P6, VBR, HIGH_QUALITY, GOP=250 OTT/VOD content HEVC or AV1 Preset P4-P5, VBR, HIGH_QUALITY Video conferencing H264 Preset P1-P2, CBR, ULTRA_LOW_LATENCY, IPP Screen recording H264 Preset P3, VBR or LOSSLESS





Building Your Optimized Encoder

To configure NVENC for your specific use case, refer to the Recommended NVENC Settings section in the NVENC Programming Guide.

For advanced parameter tuning and performance optimization, see Advanced Encoding Parameters.



API Reference

For complete parameter documentation, refer to:

Encoder API Reference - Complete list of encoder parameters and their valid values

Parameter Type Valid Values Default Parameter Description codec String h264 , hevc , av1 h264 bitrate Integer > 0 10000000U fps Integer > 0 30 Desired Frame Per Second of the video to be encoded, default value is set to 30 initqp Integer > 0 unset option Initial Quantization Parameter (QP) idrperiod Integer > 0 250 Period between Instantaneous Decoder Refresh (IDR) frames constqp Integer or list of 3 integers >=0, <=51 qmin Integer or list of 3 integers >=0, <=51 [30,30,30] gop Integer or list of 3 integers >0 changes based on other settings tuning_info String high_quality , low_latency , ultra_low_latency , lossless high_quality preset String P1 to P7 P4 maxbitrate Integer >0 10000000U Maximum bitrate used for Variable BitRate (VBR) encoding, allowing to dynamically adapting bit rate based on video content vbvinit Integer >0 10000000U vbvbufsize Integer >0 10000000U Target client Video Buffering Verifier (VBV) buffer size, applicable for vbr . rc String cbr , constqp , vbr cbr Type of Rate Control (RC) chosen between Constant BitRate (CBR), Constant QP or Variable BitRate (VBR) multipass String fullres , qres disabled by default bf Integer >=0 varies based on tuning_info and preset Specifies the GOP pattern as follows: bf = 0: I, 1: IPP, 2: IBP, 3: IBBP max_res List of 2 integers >0 4K for H264, 8K for HEVC, AV1 Resolution not greater than maximum supported by hardware in order to account for dynamic resolution change. For example: [3840, 2160] temporalaq Integer 0 or 1 0 lookahead Integer >0 0 to 255 Number of frames to look ahead. aq Integer 0 or 1 0 ldkfs Integer >=0, <255 0 Low Delay Keyframe Scale is useful to avoid channel congestion in case I frame ends up generating high number of bits colorspace String bt601, bt709 Specify this option for ARGB/ABGR inputs timingInfo :: num_unit_in_ticks Integer >0 Specifies the number of time units of the clock (as defined in Annex E of the ITU-T Specification). HEVC and H264 only timingInfo :: timescale Integer >0 Specifies the frequency of the clock (as defined in Annex E of the ITU-T Specification). HEVC and H264 only slice::mode Integer 0 to 3 0 Slice modes for H.264 and HEVC encoding (not available for AV1) which could be 0 (MB based slices), 2 (MB row based slices) or 3 (number of slices) slice::data Integer valid range changes based on slice::mode 0 Specifies the parameter needed for sliceMode . AV1 does not support slice::data repeatspspps Integer 0 or 1 0 Enable writing of Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) for every IDR frame

Dynamic reconfiguration of encoder parameters during encoding sessions for adaptive encoding workflows.

Overview

PyNvVideoCodec supports runtime reconfiguration of certain encoder parameters without recreating the encoder instance. This capability is essential for adaptive encoding scenarios where encoding parameters need to change dynamically based on content characteristics, network conditions, or application requirements.

Encoder reconfiguration offers significant performance benefits by avoiding the overhead of encoder creation and destruction. It allows seamless parameter changes during an active encoding session, maintaining encoder state and reducing initialization latency.



When to Use Encoder Reconfiguration

Encoder reconfiguration is particularly useful in the following scenarios:

Adaptive Bitrate Streaming: Adjust bitrate dynamically based on available network bandwidth to maintain smooth streaming

Adjust bitrate dynamically based on available network bandwidth to maintain smooth streaming Dynamic Quality Adjustment: Change quality settings in response to content complexity or system resource availability

Change quality settings in response to content complexity or system resource availability Processing Multiple Videos: Encode multiple videos with different settings without recreating encoder instances, improving efficiency for batch processing

Encode multiple videos with different settings without recreating encoder instances, improving efficiency for batch processing Scene-Based Encoding: Apply different encoding parameters for different scenes within the same video (e.g., higher quality for complex scenes)

Apply different encoding parameters for different scenes within the same video (e.g., higher quality for complex scenes) Real-Time Encoding: Respond to changing conditions in live streaming or video conferencing applications

Reconfigurable Parameters

The following encoder parameters can be reconfigured during an active encoding session:

Bitrate: Target bitrate and maximum bitrate for rate control

Target bitrate and maximum bitrate for rate control Frame Rate: Output frame rate

Output frame rate GOP Structure: I-frame interval and B-frame configuration

I-frame interval and B-frame configuration Quality Parameters: QP values, VBV buffer size

QP values, VBV buffer size Intra Refresh: Periodic intra refresh settings

Note: Some parameters cannot be changed once the encoder is created, including codec type, resolution, and profile. For changes to these parameters, a new encoder instance must be created.



Reconfiguration Workflow

To reconfigure an encoder during encoding, call the Reconfigure() method with the new parameter values. The method accepts parameters like bitrate , framerate , maxbitrate , and other reconfigurable settings.

The typical workflow is:

Create encoder with initial settings Encode frames with initial configuration Call Reconfigure() with new parameters when needed Continue encoding with the new settings

Adaptive Bitrate Encoding

Adaptive bitrate encoding adjusts encoder parameters based on network conditions. The application periodically checks available bandwidth and calls Reconfigure() to update bitrate and maxbitrate parameters when significant changes are detected.

Key considerations for adaptive encoding:

Use LOW_LATENCY tuning mode for streaming scenarios

Set an appropriate check interval (e.g., every 30 frames)

Include a buffer margin (e.g., 20%) when setting maxbitrate

Avoid reconfiguring on every frame to minimize overhead

Batch Processing with Reconfiguration

Reconfiguration improves efficiency when processing multiple videos with different encoding requirements. Instead of creating new encoder instances for each video, use Reconfigure() to change parameters between videos.

When planning for batch processing with varying resolutions, specify max_width and max_height during encoder creation to allow reconfiguration up to those limits.



Important Considerations

Flush Before Reconfiguration: In some cases, it may be necessary to flush the encoder before reconfiguring to ensure all pending frames are encoded with previous settings

In some cases, it may be necessary to flush the encoder before reconfiguring to ensure all pending frames are encoded with previous settings Parameter Compatibility: Not all parameter combinations can be changed at runtime. Refer to the API documentation for limitations

Not all parameter combinations can be changed at runtime. Refer to the API documentation for limitations Performance Impact: While reconfiguration is faster than recreating an encoder, there is still a small performance cost. Avoid reconfiguring on every frame

While reconfiguration is faster than recreating an encoder, there is still a small performance cost. Avoid reconfiguring on every frame Resolution Limits: When reconfiguring resolution (if supported), the new resolution must not exceed the max_width and max_height specified during encoder creation

Sample Applications

PyNvVideoCodec includes sample applications demonstrating encoder reconfiguration:

encode_reconfigure.py: Demonstrates dynamic bitrate and frame rate changes during encoding

These samples are located in the samples/ directory.



API Reference

For complete documentation of reconfigurable parameters and method signatures, refer to:

Encoder API Reference - Reconfigure() method documentation

CreateEncoder API Reference - Parameters and their valid ranges

Insert Supplemental Enhancement Information (SEI) messages into encoded video streams for embedding metadata.

SEI messages are metadata containers that can be embedded in H.264/HEVC/AV1 bitstreams. Common uses include HDR metadata, timecodes, closed captions, and custom application data.

Example

The following example demonstrates SEI message insertion during encoding:

Raw Frames + SEI Data → Encoder → Encoded Bitstream with SEI

Step 1: Define SEI Message Data

Create SEI message payloads as byte arrays. For User Data Unregistered (type 5), the payload typically starts with a 16-byte UUID:

Copy Copied! # Define SEI message payloads (16-byte UUID for User Data Unregistered) SEI_MESSAGE_1 = [0xdc, 0x45, 0xe9, 0xbd, 0xe6, 0xd9, 0x48, 0xb7, 0x96, 0x2c, 0xd8, 0x20, 0xd9, 0x23, 0xee, 0xef] SEI_MESSAGE_2 = [0x12, 0x67, 0x56, 0xda, 0xef, 0x99, 0x00, 0xbb, 0x6a, 0xc4, 0xd8, 0x10, 0xf9, 0xe3, 0x3e, 0x8f]

Step 2: Create SEI Info Dictionary

Specify the SEI type based on codec. Use type 5 (User Data Unregistered) for H.264/HEVC, or type 6 for AV1:

Copy Copied! import PyNvVideoCodec as nvc # Determine SEI type based on codec codec = "h264" # or "hevc", "av1" if codec in ["hevc", "h264"]: sei_info = {"sei_type": 5} # User Data Unregistered elif codec == "av1": sei_info = {"sei_type": 6} # Metadata OBU for AV1

Step 3: Create SEI Messages List

Combine SEI info and payload into a list of tuples. Multiple SEI messages can be inserted per frame:

Copy Copied! # Create SEI messages list: [(sei_info, payload), ...] sei_messages = [ (sei_info, SEI_MESSAGE_1), (sei_info, SEI_MESSAGE_2) ]

Step 4: Create Encoder and Encode with SEI

Pass the SEI messages list as the third argument to Encode() :

Copy Copied! # Create encoder config_params = {"gpu_id": 0, "codec": codec} nvenc = nvc.CreateEncoder(1920, 1080, "NV12", False, **config_params) # Encode frame with SEI messages # Encode(frame, pic_flags, sei_messages) bitstream = nvenc.Encode(input_frame, 0, sei_messages) enc_file.write(bytearray(bitstream)) # Flush encoder bitstream = nvenc.EndEncode() enc_file.write(bytearray(bitstream))





Common SEI Types

Type 5 (H.264/HEVC) – User Data Unregistered: Custom metadata with 16-byte UUID

(H.264/HEVC) – User Data Unregistered: Custom metadata with 16-byte UUID Type 4 (H.264/HEVC) – User Data Registered: Closed captions (CEA-608/708)

(H.264/HEVC) – User Data Registered: Closed captions (CEA-608/708) Type 137 (HEVC) – Mastering Display Color Volume: HDR display metadata

(HEVC) – Mastering Display Color Volume: HDR display metadata Type 144 (HEVC) – Content Light Level: HDR luminance levels

(HEVC) – Content Light Level: HDR luminance levels Type 6 (AV1) – Metadata OBU: Custom metadata for AV1

Note

SEI messages are passed as the third argument to Encode() .

. Each SEI message is a tuple of (sei_info_dict, payload_bytes) .

. Multiple SEI messages can be inserted per frame.

To verify SEI insertion, decode the output and extract SEI using SEI Message Decoding.

Sample Applications

See this sample application for a complete implementation:

encode_sei_msg.py – Demonstrates SEI message insertion during encoding with custom user data

API Reference

Encode() – Encode frame with optional SEI messages

CreateEncoder() – Create encoder instance

Extract smaller, meaningful segments from long videos with optimized context management for efficient processing.

Segment-based transcoding is a critical technique in modern video processing pipelines, particularly in workflows that involve deep learning (DL) and AI model training. This approach focuses on extracting smaller, meaningful segments from long videos, allowing for more targeted and efficient processing.

Traditional transcoding workflows typically process entire videos sequentially, often requiring repeated initialization of decoding and encoding contexts. This introduces significant overhead and slows down processing. In contrast, segment-based transcoding minimizes these inefficiencies by avoiding redundant context creation, resulting in faster performance, better resource utilization, and greater overall efficiency.



PyNvVideoCodec addresses these inefficiencies by introducing an optimized approach to segment-based transcoding:

Persistent Context Management : Rather than creating a new decode/encode context for each segment, PyNvVideoCodec maintains a persistent context throughout the transcoding session, significantly reducing overhead.

: Rather than creating a new decode/encode context for each segment, PyNvVideoCodec maintains a persistent context throughout the transcoding session, significantly reducing overhead. Shared Context Across Segments and Streams : The same context is reused between segments—eliminating unnecessary reinitialization. This context sharing not only applies within a single bitstream but also across multiple bitstreams, further enhancing performance.

: The same context is reused between segments—eliminating unnecessary reinitialization. This context sharing not only applies within a single bitstream but also across multiple bitstreams, further enhancing performance. Efficient NVDEC and NVENC Utilization: By keeping GPU resources active and simply switching data buffers, PyNvVideoCodec maximizes throughput and achieves better GPU efficiency compared to traditional FFmpeg-based methods.

Creating Video Segments – Step-by-step guide to segment extraction

Transcoding Entire Video – Full video transcoding workflow

Extract video segments using PyNvVideoCodec's Transcoder with persistent context management.

PyNvVideoCodec provides the Transcoder class for efficient segment-based transcoding. The transcoder maintains persistent decode/encode contexts across segments, eliminating the overhead of repeated initialization.

Example

The following example demonstrates segment extraction from a video file:

Input Video → Transcoder → Video Segments

Step 1: Get Video Duration

Use SimpleDecoder to get the video metadata for validating segment timestamps:

Copy Copied! import PyNvVideoCodec as nvc # Get video duration for validation decoder = nvc.SimpleDecoder(input_file_path, gpu_id=0) duration = decoder.get_stream_metadata().duration print(f"Video duration: {duration:.2f} seconds")

Step 2: Load Transcoder Configuration

Define encoding parameters such as codec, preset, tuning, and bitrate:

Copy Copied! import json # Load transcoder configuration from JSON file with open(config_file_path) as json_file: config = json.load(json_file) # Example config structure: # { # "codec": "h264", # "preset": "P4", # "tuning_info": "high_quality", # "bitrate": 5000000 # }

Step 3: Create Transcoder and Extract Segment

Create a Transcoder instance with input/output paths and configuration, then call segmented_transcode() with start and end times:

Copy Copied! # Define segment boundaries (in seconds) start_time = 10.0 end_time = 25.0 # Create transcoder and extract segment transcoder = nvc.Transcoder( input_file_path, output_file_path, gpu_id, 0, # cuda_context (0 for default) 0, # cuda_stream (0 for default) **config ) # Extract the segment transcoder.segmented_transcode(start_time, end_time) print(f"Created segment: {start_time}s - {end_time}s")

Step 4: Process Multiple Segments

For multiple segments, create a new transcoder for each output file:

Copy Copied! # Define multiple segments as (start, end) tuples segments = [ (0.0, 10.5), (15.0, 30.0), (45.5, 60.0) ] for start_time, end_time in segments: # Validate against video duration if end_time > duration: end_time = duration # Generate output path with timestamps output_path = f"segment_{start_time}_{end_time}.mp4" # Create transcoder and extract segment transcoder = nvc.Transcoder(input_file_path, output_path, gpu_id, 0, 0, **config) transcoder.segmented_transcode(start_time, end_time) print(f"Created: {output_path}")





Note

Segment times are specified in seconds (float values).

The transcoder automatically seeks to the nearest keyframe before the start time.

Output files are named with timestamps appended by the API.

For concatenating segments into a single file, use the same transcoder instance with multiple segmented_transcode() calls.

APIs Used

The following APIs are used in this example:

SimpleDecoder() – Get video metadata for duration validation

get_stream_metadata() – Get video duration and properties

Transcoder() – Create transcoder with encoding configuration

segmented_transcode() – Extract a segment by start/end times

Sample Applications

See this sample application for a complete implementation:

create_video_segments.py – Demonstrates extracting multiple segments from a video file with configurable start/end times from a segments file

PyNvVideoCodec provides efficient interoperability with popular deep learning frameworks through DLPack, the open-source memory tensor structure for sharing tensors across frameworks. This allows video frames decoded by PyNvVideoCodec to be directly passed to frameworks like PyTorch, TensorFlow, and others without expensive CPU-GPU memory transfers.

DLPack is a standardized memory tensor structure that enables efficient sharing of tensor data between different frameworks with zero-copy. It serves as a common exchange format that allows deep learning libraries to pass tensors to each other without expensive data copies or CPU round-trips.

The key benefits of DLPack include:

Zero-copy tensor sharing between different libraries

Standardized memory management protocol

Support for different device types (CPU, CUDA, etc.)

Common representation for tensor metadata (shape, strides, data type)

Proper handling of CUDA stream synchronization

PyNvVideoCodec implements the Python DLPack protocol through __dlpack__() and __dlpack_device__() methods on decoded frames. This allows seamless integration with any framework that supports the DLPack protocol.

Copy Copied! from PyNvVideoCodec import SimpleDecoder, OutputColorType # Decode with GPU memory enabled decoder = SimpleDecoder( "video.mp4", use_device_memory=True, output_color_type=OutputColorType.RGBP ) frame = decoder[0] # DLPack protocol methods are available on the frame object device_type, device_id = frame.__dlpack_device__() print(f"Device: {device_type}, ID: {device_id}") # Device: 2 (CUDA), ID: 0 # The __dlpack__() method is called automatically by from_dlpack() # You typically don't call it directly - just use: # tensor = torch.from_dlpack(frame)

The implementation handles important aspects:

Memory ownership : The PyNvVideoCodec frame retains ownership of the underlying memory until the tensor using it is destroyed

: The PyNvVideoCodec frame retains ownership of the underlying memory until the tensor using it is destroyed Stream synchronization : Proper CUDA stream synchronization is maintained between producer (PyNvVideoCodec) and consumer (e.g., PyTorch)

: Proper CUDA stream synchronization is maintained between producer (PyNvVideoCodec) and consumer (e.g., PyTorch) Tensor metadata: Shape, strides, and data type information are correctly propagated to the DLPack tensor

PyTorch provides the torch.from_dlpack() function to import DLPack tensors directly. The resulting tensor shares the same GPU memory with no data copying.

Copy Copied! import torch from PyNvVideoCodec import SimpleDecoder, OutputColorType # Create decoder with GPU memory and planar RGB output decoder = SimpleDecoder( "video.mp4", use_device_memory=True, output_color_type=OutputColorType.RGBP # Planar RGB (CHW format) ) # Get a decoded frame frame = decoder[0] # Convert to PyTorch tensor - zero-copy! tensor = torch.from_dlpack(frame) print(f"Tensor shape: {tensor.shape}") # Output: torch.Size([3, 1080, 1920]) print(f"Tensor device: {tensor.device}") # Output: cuda:0 # Normalize for model input normalized = tensor.float() / 255.0

The tensor format follows the video pixel format:

RGBP (Planar): Shape is (3, height, width) - preferred for most deep learning models

Shape is (3, height, width) - preferred for most deep learning models RGB (Interleaved): Shape is (height, width, 3)

Shape is (height, width, 3) NV12 (Native): Shape depends on the native decoder output format

When processing multiple frames for deep learning inference, convert frames to tensors and stack them into a batch:

Copy Copied! import torch from PyNvVideoCodec import SimpleDecoder, OutputColorType # Create decoder with planar RGB output for CNN models decoder = SimpleDecoder( "video.mp4", use_device_memory=True, output_color_type=OutputColorType.RGBP ) batch_size = 4 # Get multiple frames frames = decoder.get_batch_frames(batch_size) # Convert each frame to tensor (zero-copy) tensors = [torch.from_dlpack(frame) for frame in frames] # Stack into batch tensor for inference batch = torch.stack(tensors) # Shape: [batch_size, 3, height, width] # Normalize and prepare for model batch = batch.float() / 255.0 # Run inference with your model # output = model(batch)

Using ThreadedDecoder for High-Throughput Inference:

Copy Copied! from PyNvVideoCodec import ThreadedDecoder, OutputColorType # ThreadedDecoder prefetches frames in background decoder = ThreadedDecoder( enc_file_path="video.mp4", buffer_size=12, use_device_memory=True, output_color_type=OutputColorType.RGBP ) metadata = decoder.get_stream_metadata() batch_size = 4 while True: # get_batch_frames() returns immediately with prefetched frames frames = decoder.get_batch_frames(batch_size) if len(frames) == 0: break # Convert and stack batch = torch.stack([torch.from_dlpack(f) for f in frames]) batch = batch.float() / 255.0 # Run inference - decoding happens in parallel! # output = model(batch)





PyNvVideoCodec's DLPack support works with any framework that supports importing DLPack tensors.

TensorFlow Integration:

Use tf.experimental.dlpack.from_dlpack(frame) to convert decoded frames to TensorFlow tensors. Refer to the TensorFlow DLPack documentation for details and compatibility information.

CuPy Integration:

Copy Copied! import cupy as cp from PyNvVideoCodec import SimpleDecoder, OutputColorType decoder = SimpleDecoder( "video.mp4", use_device_memory=True, output_color_type=OutputColorType.RGBP ) frame = decoder[0] # Convert to CuPy array - zero-copy! cupy_array = cp.from_dlpack(frame) print(f"CuPy array shape: {cupy_array.shape}") # Perform GPU-accelerated operations with CuPy normalized = cupy_array.astype(cp.float32) / 255.0

NumPy Integration (requires copy):

Copy Copied! import torch import numpy as np from PyNvVideoCodec import SimpleDecoder, OutputColorType decoder = SimpleDecoder( "video.mp4", use_device_memory=True, output_color_type=OutputColorType.RGBP ) frame = decoder[0] # First convert to PyTorch, then to NumPy (copies GPU → CPU) tensor = torch.from_dlpack(frame) numpy_array = tensor.cpu().numpy() print(f"NumPy array shape: {numpy_array.shape}")