Mixture of Weather Experts (MoWE) for Weather Forecasting#

This repository contains the implementation of a Mixture of Weather Experts (MoWE) model that intelligently combines forecasts from multiple state-of-the- art weather models to produce a more accurate and robust prediction.


🎯 Problem Overview#

Individual weather forecasting models such as Aurora, Pangu-Weather, and FCN3 each have unique strengths and weaknesses.
The MoWE framework leverages this diversity by training a gating networkβ€”a neural network that learns to assign optimal weights to each expert’s forecast. The gating network takes the forecasts from all experts as input and predicts a set of weights (\(W_i\)) and a bias (\(b\)). The final, synthesized forecast (\(\hat{Y}\)) is then calculated as a weighted sum:
\[\hat{Y} = \sum_{i=1}^{N} (W_i \cdot E_i) + b\]

Where \(E_i\) represents the forecast from the \(i\)-th expert model. This approach allows the MoWE to dynamically emphasize the most reliable expert for a given forecast lead time and atmospheric state, leading to enhanced accuracy and robustness.


🧠 Model Overview#

We provide two variants of the MoWE model:

  • Deterministic β€” produces a single best estimate.

  • Probabilistic β€” models uncertainty via ensemble predictions.

Core Components#

  • Backbone: A Diffusion Transformer (DiT) treats stacked expert forecasts as a multi-channel β€œimage”. We leverage the conditioning property of a DiT to have a lead_time embedding and a noise vector embedding (for the probabilistic model).

⚠️ Warning - the DiT architecture is experimental and subject to future changes.

  • MoWE wrapper: It concats the channels of the various experts outputs and passes it to the DiT model. The output is then reshaped based on if we need the probabilities (or weights) of the expert model or the final prediction directly.

Inputs to the MoWE#

  • Expert Forecasts: Full output fields from Aurora, Pangu-Weather, and FCN3.

  • Lead Time: Randomly sampled during training between 6 hours and 2 days.

  • Noise Vector: Only for probabilistic MoWE case, else None.

Outputs of MoWE#

  • Weights(\(W_i\)) and bias (\(b\)) of the MoWE if return_probabilities = True and bias = True

  • Direct Final Forecasts (\(\hat{Y}\)) without the linear combination of experts if return_probabilities = False


πŸ“Š Dataset#

ERA5 dataset is used for training. We do not provide the ERA5 dataset. However, ERA5 data can be downloaded using the dataset creation script by changing the DataSource to one of the built-in earth2studio ERA5 sources.

⚠️ Warning - This example requires multiple TBs of disk space.

Training is performed on pre-generated expert forecasts. Earth2Studio is used to generate the forecast datasets of individual models. The Pangu forecast dataset can be generated by going to data folder and:

torchrun --standalone --nproc_per_node=8 create_MoWE_dataset.py

This generates a dataset in .zarr format. It is converted to hdf5 format similar to ERA5 data using:

python restructure_dataset.py

Update the config.yaml file in the data folder for generating data for different expert models. The datasets will be structured as follows for the training dataloader to work.

Aurora/               # Aurora forecasts (1980–2015)
└── train/
    β”œβ”€β”€ 1980/
    β”œβ”€β”€ 1981/
    └── ...

FCN3/                 # FCN3 forecasts (1980–2015)
└── train/
    β”œβ”€β”€ 1980/
    β”œβ”€β”€ 1981/
    └── ...

Pangu/                # Pangu forecasts (1980–2015)
└── train/
    β”œβ”€β”€ 1980/
    β”œβ”€β”€ 1981/
    └── ...

ERA5/                 # ERA5 (1980–2015)
└── train/
    β”œβ”€β”€ 1980/
    β”œβ”€β”€ 1981/
    └── ...

πŸš€ Training#

Update the data path in the conf/ config files. We provide scripts for both deterministic and probabilistic training. For training the models using 8 GPUs on a single node:

torchrun --standalone --nproc_per_node=8 train.py --config-name=config_base.yaml

For training the probabilistic model using CRPS loss:

torchrun --standalone --nproc_per_node=8 train_crps.py --config-name=config_base_crps.yaml

References#

Hersbach, Hans, et al.Β β€œThe ERA5 global reanalysis” Quarterly Journal of the Royal Meteorological Society (2020).

Bodnar, C., Bruinsma, W. P., Lucic, A., Stanley, M., Allen, A., Brandstetter, J., … & Perdikaris, P. (2025). A foundation model for the Earth system. Nature, 1-8.

Bonev, B., Kurth, T., Mahesh, A., Bisson, M., Kossaifi, J., Kashinath, K., … & Keller, A. (2025). FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale. arXiv preprint arXiv:2507.12144.

Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., & Tian, Q. (2022). Pangu-weather: A 3d high-resolution model for fast and accurate global weather forecast. arXiv preprint arXiv:2211.02556.