nemo_automodel.components.flow_matching.adapters.hunyuan
nemo_automodel.components.flow_matching.adapters.hunyuan
HunyuanVideo model adapter for FlowMatching Pipeline.
This adapter supports HunyuanVideo 1.5 style models with dual text encoders and image embeddings for image-to-video conditioning.
Module Contents
Classes
Functions
Data
API
Bases: ModelAdapter
Model adapter for HunyuanVideo 1.5 style models.
These models use:
- Condition latents concatenated with noisy latents
- Dual text encoders with attention masks
- Image embeddings for i2v
Expected batch keys:
- text_embeddings: Primary text encoder output [B, seq_len, dim]
- text_mask: Attention mask for primary encoder [B, seq_len] (optional)
- text_embeddings_2: Secondary text encoder output [B, seq_len, dim] (optional)
- text_mask_2: Attention mask for secondary encoder [B, seq_len] (optional)
- image_embeds: Image embeddings for i2v [B, seq_len, dim] (optional)
Execute forward pass for HunyuanVideo model.
Parameters:
HunyuanVideo model
Dictionary from prepare_inputs()
Returns: torch.Tensor
Model prediction tensor
Generate conditional latents based on task type.
Parameters:
Input latents [B, C, F, H, W]
Task type (“t2v” or “i2v”)
Returns: torch.Tensor
Conditional latents [B, C+1, F, H, W]
Prepare inputs for HunyuanVideo model.
Parameters:
FlowMatchingContext with batch data
Returns: Dict[str, Any]
Dictionary containing:
Patch Diffusers Hunyuan attention to avoid dense mask construction for flash-varlen attention.