nemo_automodel.components.flow_matching.adapters.flux
nemo_automodel.components.flow_matching.adapters.flux
Flux model adapter for FlowMatching Pipeline.
This adapter supports FLUX.1 style models with:
- T5 text embeddings (text_embeddings)
- CLIP pooled embeddings (pooled_prompt_embeds)
- 2D image latents (treated as 1-frame video: [B, C, 1, H, W])
Module Contents
Classes
API
Bases: ModelAdapter
Model adapter for FLUX.1 image generation models.
Supports batch format from multiresolution dataloader:
- image_latents: [B, C, H, W] for images
- text_embeddings: T5 embeddings [B, seq_len, 4096]
- pooled_prompt_embeds: CLIP pooled [B, 768]
FLUX model forward interface:
- hidden_states: Packed latents
- encoder_hidden_states: T5 text embeddings
- pooled_projections: CLIP pooled embeddings
- timestep: Normalized timesteps [0, 1]
- img_ids / txt_ids: Positional embeddings
Pack latents from [B, C, H, W] to Flux format [B, (H//2)(W//2), C4].
Flux uses a 2x2 patch embedding, so latents are reshaped accordingly.
Prepare positional IDs for image latents.
Returns tensor of shape [B, (H//2)*(W//2), 3] containing (batch_idx, y, x).
Unpack latents from Flux format back to [B, C, H, W].
Parameters:
Packed latents of shape [B, num_patches, channels]
Original image height in pixels
Original image width in pixels
VAE compression factor (default: 8)
Execute forward pass for Flux model.
Returns unpacked prediction in [B, C, H, W] format.
Prepare inputs for Flux model from FlowMatchingContext.
Expects 4D image latents: [B, C, H, W]