bridge.diffusion.data.common.diffusion_sample#

Module Contents#

Classes#

DiffusionSample

Data class representing a sample for diffusion tasks.

API#

class bridge.diffusion.data.common.diffusion_sample.DiffusionSample#

Bases: megatron.energon.Sample

Data class representing a sample for diffusion tasks.

.. attribute:: video

Video latents (C T H W).

Type:

torch.Tensor

.. attribute:: t5_text_embeddings

Text embeddings (S D).

Type:

torch.Tensor

.. attribute:: t5_text_mask

Mask for text embeddings.

Type:

torch.Tensor

.. attribute:: loss_mask

Mask indicating valid positions for loss computation.

Type:

torch.Tensor

.. attribute:: image_size

Tensor containing image dimensions.

Type:

Optional[torch.Tensor]

.. attribute:: fps

Frame rate of the video.

Type:

Optional[torch.Tensor]

.. attribute:: num_frames

Number of frames in the video.

Type:

Optional[torch.Tensor]

.. attribute:: padding_mask

Mask indicating padding positions.

Type:

Optional[torch.Tensor]

.. attribute:: seq_len_q

Sequence length for query embeddings.

Type:

Optional[torch.Tensor]

.. attribute:: seq_len_q_padded

Sequence length for query embeddings after padding.

Type:

Optional[torch.Tensor]

.. attribute:: seq_len_kv

Sequence length for key/value embeddings.

Type:

Optional[torch.Tensor]

.. attribute:: pos_ids

Positional IDs.

Type:

Optional[torch.Tensor]

.. attribute:: latent_shape

Shape of the latent tensor.

Type:

Optional[torch.Tensor]

.. attribute:: video_metadata

Metadata of the video.

Type:

Optional[dict]

video: torch.Tensor#

None

context_embeddings: torch.Tensor#

None

context_mask: torch.Tensor#

None

image_size: Optional[torch.Tensor]#

None

loss_mask: torch.Tensor#

None

fps: Optional[torch.Tensor]#

None

num_frames: Optional[torch.Tensor]#

None

padding_mask: Optional[torch.Tensor]#

None

seq_len_q: Optional[torch.Tensor]#

None

seq_len_q_padded: Optional[torch.Tensor]#

None

seq_len_kv: Optional[torch.Tensor]#

None

seq_len_kv_padded: Optional[torch.Tensor]#

None

pos_ids: Optional[torch.Tensor]#

None

latent_shape: Optional[torch.Tensor]#

None

video_metadata: Optional[dict]#

None

to_dict() dict#

Converts the sample to a dictionary.

__add__(other: Any) int#

Adds the sequence length of this sample with another sample or integer.

__radd__(other: Any) int#

Handles reverse addition for summing with integers.

__lt__(other: Any) bool#

Compares this sample’s sequence length with another sample or integer.