nemo_automodel.components.models.bagel.autoencoder#
Autoencoder used by BAGEL Stage 2 image generation training.
Module Contents#
Classes#
Architecture parameters for the BAGEL/FLUX autoencoder. |
|
Single-head spatial attention block used in the VAE bottleneck. |
|
Residual convolution block used by the autoencoder. |
|
Stride-2 downsample with explicit asymmetric padding. |
|
Nearest-neighbor upsample followed by a 3x3 convolution. |
|
BAGEL/FLUX autoencoder encoder. |
|
BAGEL/FLUX autoencoder decoder. |
|
Convert latent moments to a Gaussian sample or mean. |
|
BAGEL Stage 2 autoencoder wrapper. |
Functions#
Swish activation. |
|
Return the BAGEL-7B-MoT autoencoder architecture parameters. |
|
Load the BAGEL autoencoder from |
Data#
API#
- nemo_automodel.components.models.bagel.autoencoder.logger#
‘getLogger(…)’
- class nemo_automodel.components.models.bagel.autoencoder.AutoEncoderParams#
Architecture parameters for the BAGEL/FLUX autoencoder.
- resolution: int#
None
- in_channels: int#
None
- downsample: int#
None
- ch: int#
None
- out_ch: int#
None
- ch_mult: list[int]#
None
- num_res_blocks: int#
None
- z_channels: int#
None
- scale_factor: float#
None
- shift_factor: float#
None
- nemo_automodel.components.models.bagel.autoencoder.swish(x: torch.Tensor) torch.Tensor#
Swish activation.
- class nemo_automodel.components.models.bagel.autoencoder.AttnBlock(in_channels: int)#
Bases:
torch.nn.ModuleSingle-head spatial attention block used in the VAE bottleneck.
Initialization
- attention(h_: torch.Tensor) torch.Tensor#
Apply scaled dot-product attention over flattened image positions.
- forward(x: torch.Tensor) torch.Tensor#
Apply residual attention.
- class nemo_automodel.components.models.bagel.autoencoder.ResnetBlock(in_channels: int, out_channels: int)#
Bases:
torch.nn.ModuleResidual convolution block used by the autoencoder.
Initialization
- forward(x: torch.Tensor) torch.Tensor#
Run the residual block.
- class nemo_automodel.components.models.bagel.autoencoder.Downsample(in_channels: int)#
Bases:
torch.nn.ModuleStride-2 downsample with explicit asymmetric padding.
Initialization
- forward(x: torch.Tensor) torch.Tensor#
Downsample spatial dimensions by 2.
- class nemo_automodel.components.models.bagel.autoencoder.Upsample(in_channels: int)#
Bases:
torch.nn.ModuleNearest-neighbor upsample followed by a 3x3 convolution.
Initialization
- forward(x: torch.Tensor) torch.Tensor#
Upsample spatial dimensions by 2.
- class nemo_automodel.components.models.bagel.autoencoder.Encoder(
- resolution: int,
- in_channels: int,
- ch: int,
- ch_mult: list[int],
- num_res_blocks: int,
- z_channels: int,
Bases:
torch.nn.ModuleBAGEL/FLUX autoencoder encoder.
Initialization
- forward(x: torch.Tensor) torch.Tensor#
Encode an image tensor to Gaussian latent moments.
- class nemo_automodel.components.models.bagel.autoencoder.Decoder(
- ch: int,
- out_ch: int,
- ch_mult: list[int],
- num_res_blocks: int,
- in_channels: int,
- resolution: int,
- z_channels: int,
Bases:
torch.nn.ModuleBAGEL/FLUX autoencoder decoder.
Initialization
- forward(z: torch.Tensor) torch.Tensor#
Decode latents to image tensors.
- class nemo_automodel.components.models.bagel.autoencoder.DiagonalGaussian(sample: bool = True, chunk_dim: int = 1)#
Bases:
torch.nn.ModuleConvert latent moments to a Gaussian sample or mean.
Initialization
- forward(z: torch.Tensor) torch.Tensor#
Sample from or return the mean of a diagonal Gaussian.
- class nemo_automodel.components.models.bagel.autoencoder.AutoEncoder( )#
Bases:
torch.nn.ModuleBAGEL Stage 2 autoencoder wrapper.
Initialization
- encode(x: torch.Tensor) torch.Tensor#
Encode image tensors to scaled latents.
- decode(z: torch.Tensor) torch.Tensor#
Decode scaled latents to image tensors.
- forward(x: torch.Tensor) torch.Tensor#
Encode and decode image tensors.
- nemo_automodel.components.models.bagel.autoencoder._log_load_warning(missing: list[str], unexpected: list[str]) None#
- nemo_automodel.components.models.bagel.autoencoder.default_autoencoder_params() nemo_automodel.components.models.bagel.autoencoder.AutoEncoderParams#
Return the BAGEL-7B-MoT autoencoder architecture parameters.
- nemo_automodel.components.models.bagel.autoencoder.load_bagel_autoencoder(
- local_path: str | None,
Load the BAGEL autoencoder from
ae.safetensors.- Parameters:
local_path – Local path to
ae.safetensors. IfNone, the module is returned with randomly initialized weights.- Returns:
The autoencoder module and its architecture parameters.