Mixture of Weather Experts (MoWE) for Weather Forecasting#
This repository contains the implementation of a Mixture of Weather Experts (MoWE) model that intelligently combines forecasts from multiple state-of-the- art weather models to produce a more accurate and robust prediction.
π― Problem Overview#
Where \(E_i\) represents the forecast from the \(i\)-th expert model. This approach allows the MoWE to dynamically emphasize the most reliable expert for a given forecast lead time and atmospheric state, leading to enhanced accuracy and robustness.
π§ Model Overview#
We provide two variants of the MoWE model:
Deterministic β produces a single best estimate.
Probabilistic β models uncertainty via ensemble predictions.
Core Components#
Backbone: A Diffusion Transformer (DiT) treats stacked expert forecasts as a multi-channel βimageβ. We leverage the conditioning property of a DiT to have a lead_time embedding and a noise vector embedding (for the probabilistic model).
β οΈ Warning - the DiT architecture is experimental and subject to future changes.
MoWE wrapper: It concats the channels of the various experts outputs and passes it to the DiT model. The output is then reshaped based on if we need the probabilities (or weights) of the expert model or the final prediction directly.
Inputs to the MoWE#
Expert Forecasts: Full output fields from Aurora, Pangu-Weather, and FCN3.
Lead Time: Randomly sampled during training between 6 hours and 2 days.
Noise Vector: Only for probabilistic MoWE case, else None.
Outputs of MoWE#
Weights(\(W_i\)) and bias (\(b\)) of the MoWE if
return_probabilities= True andbias= TrueDirect Final Forecasts (\(\hat{Y}\)) without the linear combination of experts if
return_probabilities= False
π Dataset#
ERA5 dataset is used for training. We do not provide the ERA5 dataset. However, ERA5 data can be downloaded using the dataset creation script by changing the DataSource to one of the built-in earth2studio ERA5 sources.
β οΈ Warning - This example requires multiple TBs of disk space.
Training is performed on pre-generated expert forecasts. Earth2Studio is used to generate the forecast datasets of individual models. The Pangu forecast dataset can be generated by going to data folder and:
torchrun --standalone --nproc_per_node=8 create_MoWE_dataset.py
This generates a dataset in .zarr format. It is converted to hdf5 format similar to ERA5 data using:
python restructure_dataset.py
Update the config.yaml file in the data folder for generating data for different expert models. The datasets will be structured as follows for the training dataloader to work.
Aurora/ # Aurora forecasts (1980β2015)
βββ train/
βββ 1980/
βββ 1981/
βββ ...
FCN3/ # FCN3 forecasts (1980β2015)
βββ train/
βββ 1980/
βββ 1981/
βββ ...
Pangu/ # Pangu forecasts (1980β2015)
βββ train/
βββ 1980/
βββ 1981/
βββ ...
ERA5/ # ERA5 (1980β2015)
βββ train/
βββ 1980/
βββ 1981/
βββ ...
π Training#
Update the data path in the conf/ config files. We provide scripts for both deterministic and probabilistic training. For training the models using 8 GPUs on a single node:
torchrun --standalone --nproc_per_node=8 train.py --config-name=config_base.yaml
For training the probabilistic model using CRPS loss:
torchrun --standalone --nproc_per_node=8 train_crps.py --config-name=config_base_crps.yaml
References#
Hersbach, Hans, et al.Β βThe ERA5 global reanalysisβ Quarterly Journal of the Royal Meteorological Society (2020).
Bodnar, C., Bruinsma, W. P., Lucic, A., Stanley, M., Allen, A., Brandstetter, J., β¦ & Perdikaris, P. (2025). A foundation model for the Earth system. Nature, 1-8.
Bonev, B., Kurth, T., Mahesh, A., Bisson, M., Kossaifi, J., Kashinath, K., β¦ & Keller, A. (2025). FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale. arXiv preprint arXiv:2507.12144.
Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., & Tian, Q. (2022). Pangu-weather: A 3d high-resolution model for fast and accurate global weather forecast. arXiv preprint arXiv:2211.02556.