> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.flow_matching.adapters.flux

Flux model adapter for FlowMatching Pipeline.

This adapter supports FLUX.1 style models with:

* T5 text embeddings (text\_embeddings)
* CLIP pooled embeddings (pooled\_prompt\_embeds)
* 2D image latents (treated as 1-frame video: \[B, C, 1, H, W])

## Module Contents

### Classes

| Name                                                                                | Description                                       |
| ----------------------------------------------------------------------------------- | ------------------------------------------------- |
| [`FluxAdapter`](#nemo_automodel-components-flow_matching-adapters-flux-FluxAdapter) | Model adapter for FLUX.1 image generation models. |

### API

```python
class nemo_automodel.components.flow_matching.adapters.flux.FluxAdapter(
    guidance_scale: float = 3.5,
    use_guidance_embeds: bool = True
)
```

**Bases:** [ModelAdapter](/nemo-automodel/nemo_automodel/components/flow_matching/adapters/base#nemo_automodel-components-flow_matching-adapters-base-ModelAdapter)

Model adapter for FLUX.1 image generation models.

Supports batch format from multiresolution dataloader:

* image\_latents: \[B, C, H, W] for images
* text\_embeddings: T5 embeddings \[B, seq\_len, 4096]
* pooled\_prompt\_embeds: CLIP pooled \[B, 768]

FLUX model forward interface:

* hidden\_states: Packed latents
* encoder\_hidden\_states: T5 text embeddings
* pooled\_projections: CLIP pooled embeddings
* timestep: Normalized timesteps \[0, 1]
* img\_ids / txt\_ids: Positional embeddings

```python
nemo_automodel.components.flow_matching.adapters.flux.FluxAdapter._pack_latents(
    latents: torch.Tensor
) -> torch.Tensor
```

Pack latents from \[B, C, H, W] to Flux format \[B, (H//2)*(W//2), C*4].

Flux uses a 2x2 patch embedding, so latents are reshaped accordingly.

```python
nemo_automodel.components.flow_matching.adapters.flux.FluxAdapter._prepare_latent_image_ids(
    batch_size: int,
    height: int,
    width: int,
    device: torch.device,
    dtype: torch.dtype
) -> torch.Tensor
```

Prepare positional IDs for image latents.

Returns tensor of shape \[B, (H//2)\*(W//2), 3] containing (batch\_idx, y, x).

```python
nemo_automodel.components.flow_matching.adapters.flux.FluxAdapter._unpack_latents(
    latents: torch.Tensor,
    height: int,
    width: int,
    vae_scale_factor: int = 8
) -> torch.Tensor
```

staticmethod

Unpack latents from Flux format back to \[B, C, H, W].

**Parameters:**

Packed latents of shape \[B, num\_patches, channels]

Original image height in pixels

Original image width in pixels

VAE compression factor (default: 8)

```python
nemo_automodel.components.flow_matching.adapters.flux.FluxAdapter.forward(
    model: torch.nn.Module,
    inputs: typing.Dict[str, typing.Any]
) -> torch.Tensor
```

Execute forward pass for Flux model.

Returns unpacked prediction in \[B, C, H, W] format.

```python
nemo_automodel.components.flow_matching.adapters.flux.FluxAdapter.prepare_inputs(
    context: nemo_automodel.components.flow_matching.adapters.base.FlowMatchingContext
) -> typing.Dict[str, typing.Any]
```

Prepare inputs for Flux model from FlowMatchingContext.

Expects 4D image latents: \[B, C, H, W]