`utils.decoder_utils`#

Module Contents#

Classes#

`FrameExtractionPolicy`	Policy for extracting frames from video content.
`FrameExtractionSignature`	Configuration for frame extraction parameters.
`FramePurpose`	Purpose for extracting frames from video content.
`Resolution`	Container for video frame dimensions.
`VideoMetadata`	Metadata for video content including dimensions, timing, and codec information.

Functions#

`decode_video_cpu`	Decode video frames from a binary stream using PyAV with configurable frame rate sampling.
`decode_video_cpu_frame_ids`	Decode video using PyAV frame ids.
`extract_frames`	Extract frames from a video into a numpy array.
`extract_video_metadata`	Extract metadata from a video file using ffprobe.
`find_closest_indices`	Find the closest indices in src to each element in dst.
`get_avg_frame_rate`	Get the average frame rate of a video.
`get_frame_count`	Get the total number of frames in a video file or stream.
`get_video_timestamps`	Get timestamps for all frames in a video stream.
`sample_closest`	Sample `src` at `sample_rate` rate and return the closest indices.
`save_stream_position`	Context manager that saves and restores stream position.

API#

class utils.decoder_utils.FrameExtractionPolicy(*args, **kwds)#

Bases: enum.Enum

Policy for extracting frames from video content.

This enum defines different strategies for selecting frames from a video, including first frame, middle frame, last frame, or a sequence of frames.

Initialization

first#: 0

last#: 2

middle#: 1

sequence#: 3

class utils.decoder_utils.FrameExtractionSignature#

Configuration for frame extraction parameters.

This class combines extraction policy and target frame rate into a single signature that can be used to identify and reproduce frame extraction settings.

extraction_policy: utils.decoder_utils.FrameExtractionPolicy#: None

target_fps: float#: None

to_str() → str#

Convert frame extraction signature to string format.

Returns: String representation of extraction policy and target FPS.

class utils.decoder_utils.FramePurpose(*args, **kwds)#

Bases: enum.Enum

Purpose for extracting frames from video content.

This enum defines different purposes for extracting frames from a video, including aesthetics and embeddings.

Initialization

AESTHETICS#: 1

EMBEDDINGS#: 2

class utils.decoder_utils.Resolution#

Bases: typing.NamedTuple

Container for video frame dimensions.

This class stores the height and width of video frames as a named tuple.

height: int#: None

width: int#: None

class utils.decoder_utils.VideoMetadata#

Metadata for video content including dimensions, timing, and codec information.

This class stores essential video properties such as resolution, frame rate, duration, and encoding details.

audio_codec: str#: None

bit_rate_k: int#: None

fps: float#: None

height: int#: None

num_frames: int#: None

pixel_format: str#: None

video_codec: str#: None

video_duration: float#: None

width: int#: None

utils.decoder_utils.decode_video_cpu( data: pathlib.Path | str | BinaryIO | bytes, sample_rate_fps: float, timestamps: numpy.typing.NDArray[numpy.float32] | None = None, start: float | None = None, stop: float | None = None, endpoint: bool = True, stream_idx: int = 0, video_format: str | None = None, num_threads: int = 1, ) → numpy.typing.NDArray[numpy.uint8]#

Decode video frames from a binary stream using PyAV with configurable frame rate sampling.

This function decodes video frames from a binary stream at a specified frame rate. The frame rate does not need to match the input video’s frame rate. It is possible to supersample a video as well as undersample.

Args: data: An open file, io.BytesIO, or bytes object with the video data. sample_rate_fps: Frame rate for sampling the video timestamps: Optional array of presentation timestamps for each frame in the video. If supplied, this array must be monotonically increasing. If not supplied, timestamps will be extracted from the video stream. start: Optional start timestamp for frame extraction. If None, the first frame timestamp is used. stop: Optional end timestamp for frame extraction. If None, the last frame timestamp is used. endpoint: If True, stop is the last sample. Otherwise, it is not included. Default is True. stream_idx: PyAv index of the video stream to decode, usually 0. video_format: Format of the video stream, like “mp4”, “mkv”, etc. None is probably best num_threads: Number of threads to use for decoding.

Returns: A numpy array of shape (num_frames, height, width, channels) containing the decoded frames in RGB24 format

Raises: ValueError: If the sampled timestamps differ from source timestamps by more than the specified tolerance

utils.decoder_utils.decode_video_cpu_frame_ids( data: pathlib.Path | str | BinaryIO | bytes, frame_ids: numpy.typing.NDArray[numpy.int32], counts: numpy.typing.NDArray[numpy.int32] | None = None, stream_idx: int = 0, video_format: str | None = None, num_threads: int = 1, ) → numpy.typing.NDArray[numpy.uint8]#

Decode video using PyAV frame ids.

It is not recommended to use this function directly. Instead, use decode_video_cpu, which is timestamp-based. Timestamps are necessary for synchronizing sensors, like multiple cameras, or synchronizing video with GPS and LIDAR.

Args: data: An open file, io.BytesIO, or bytes object with the video data. frame_ids: List of frame ids to decode. counts: List of counts for each frame id. It is possible that a frame id is repeated during supersampling, which can happen in videos with frame drops, or just due to clock drift between sensors. stream_idx: PyAv index of the video stream to decode, usually 0. video_format: Format of the video stream, like “mp4”, “mkv”, etc. None is probably best num_threads: Number of threads to use for decoding.

Returns: A numpy array of shape (frame_count, height, width, channels) containing the decoded frames.

utils.decoder_utils.extract_frames( video: pathlib.Path | str | BinaryIO | bytes, extraction_policy: utils.decoder_utils.FrameExtractionPolicy, sample_rate_fps: float = 1.0, target_res: tuple[int, int] = (-1, -1), num_threads: int = 1, stream_idx: int = 0, video_format: str | None = None, ) → numpy.typing.NDArray[numpy.uint8]#

Extract frames from a video into a numpy array.

Args: video: An open file, io.BytesIO, or bytes object with the video data. extraction_policy: The policy for extracting frames. sample_rate_fps: Frame rate for sampling the video target_res: The target resolution for the frames. stream_idx: PyAv index of the video stream to decode, usually 0. video_format: Format of the video stream, like “mp4”, “mkv”, etc. None is probably best num_threads: Number of threads to use for decoding.

Returns: A numpy array of shape (num_frames, height, width, 3) containing the decoded frames in RGB24 format

utils.decoder_utils.extract_video_metadata( video: str | bytes, ) → utils.decoder_utils.VideoMetadata#

Extract metadata from a video file using ffprobe.

Args: video: Path to video file or video data as bytes.

Returns: VideoMetadata object containing video properties.

utils.decoder_utils.find_closest_indices( src: numpy.typing.NDArray[numpy.float32], dst: numpy.typing.NDArray[numpy.float32], ) → numpy.typing.NDArray[numpy.int32]#

Find the closest indices in src to each element in dst.

If an element in dst is equidistant from two elements in src, the left index in src is used.

Args: src: Monotonically increasing array of numbers to match dst against dst: Monotonically increasing array of numbers to search for in src

Returns: Array of closest indices in src for each element in dst

utils.decoder_utils.get_avg_frame_rate( data: pathlib.Path | str | BinaryIO | bytes, stream_idx: int = 0, video_format: str | None = None, ) → float#

Get the average frame rate of a video.

Args: data: An open file, io.BytesIO, or bytes object with the video data. stream_idx: Index of the video stream to decode, usually 0. video_format: Format of the video stream, like “mp4”, “mkv”, etc. None is probably best

Returns: The average frame rate of the video.

utils.decoder_utils.get_frame_count( data: pathlib.Path | str | BinaryIO | bytes, stream_idx: int = 0, video_format: str | None = None, ) → int#

Get the total number of frames in a video file or stream.

Args: data: An open file, io.BytesIO, or bytes object with the video data. stream_idx: Index of the video stream to read from. Defaults to 0, which is typically the main video stream. video_format: Format of the video stream, like “mp4”, “mkv”, etc. None is probably best

Returns: The total number of frames in the video stream.

utils.decoder_utils.get_video_timestamps( data: pathlib.Path | str | BinaryIO | bytes, stream_idx: int = 0, video_format: str | None = None, ) → numpy.typing.NDArray[numpy.float32]#

Get timestamps for all frames in a video stream.

The file position will be moved as needed to get the timestamps.

Note: the order that frames appear in a video stream is not necessarily the order that the frames will be displayed. This means that timestamps are not monotonically increasing within a video stream. This can happen when B-frames are present

This function will return presentation timestamps in monotonically increasing order.

Args: data: An open file, io.BytesIO, or bytes object with the video data. stream_idx: PyAv index of the video stream to decode, usually 0. video_format: Format of the video stream, like “mp4”, “mkv”, etc. None is probably best

Returns: A numpy array of monotonically increasing timestamps.

utils.decoder_utils.sample_closest( src: numpy.typing.NDArray[numpy.float32], sample_rate: float, start: float | None = None, stop: float | None = None, endpoint: bool = True, dedup: bool = True, ) → tuple[numpy.typing.NDArray[numpy.int32], numpy.typing.NDArray[numpy.int32], numpy.typing.NDArray[numpy.float32]]#

Sample src at sample_rate rate and return the closest indices.

This function is meant to be used for sampling monotonically increasing numbers, like timestamps. This function can be used for synchronizing sensors, like multiple cameras, or synchronizing video with GPS and LIDAR.

The first element sampled with either or src[0] or the element closest to start

The last element sampled will either be src[-1] or the element closest to stop. The last element is only included if it both fits into the sampling rate and if endpoint=True

This function intentionally has no policy about distance from the closest elements in src to the sample elements. It will return the index of the closest element to the sample. It is up to the caller to define policy, which is why sample_elements is returned.

Args: src: Monotonically increasing array of elements sample_rate: Sampling rate start: Start element (defaults to first element) stop: End element (defaults to last element) endpoint: If True, stop can be the last sample, if it fits into the sample rate. If False, stop is not included in the output. dedup: Whether to deduplicate indices. Repeated indices will be reflected in the returned counts array.

Returns: Tuple of (indices, counts) where counts[i] is the number of times indices[i] was sampled. The sample elements are also returned

utils.decoder_utils.save_stream_position( stream: BinaryIO, ) → collections.abc.Generator[BinaryIO, None, None]#: Context manager that saves and restores stream position.

utils.decoder_utils#

Module Contents#

Classes#

Functions#

API#

`utils.decoder_utils`#