Install required packages:

Copy Copied! pip install -r requirements.txt

Machine-specific configuration files can be created in the configs directory, defining the following variables:

Copy Copied! isd_path = "<path to isd data>" path_to_pretrained = "<path to the pretrained model>" path_to_model_state = "<path to model state from a training checkpoint>" path_to_hrrr = "<path to Zarr file containing 2017 HRRR data>" station_locations = "<path to station_locations_on_grid.nc generated by preprocess_isd.py>" path_to_isp = "<path to ISD csv data>" val_station_path = "<path to validation station locations generated by val_stations.py>"

See configs/base.py for an example configuration. Both station_locations and val_station_path are checked into the code for simplicity.

HRRR data is used for training and model evaluation. Data from 2018-2020 is used for training, and 2017 is used for model evaluation. The model is trained on a 128x128 section of Oklahoma, offset by (834, 353), using the following channels:

10 metre U wind component ( 10u )

10 metre V wind component ( 10v )

Total precipitation ( tp )

The training and inference scripts requires the data be converted into a Zarr format before use, with one file per year of data. Example dimensions for the 2017.zarr are shown below:

Copy Copied! <xarray.Dataset> Size: 3GB Dimensions: (time: 8760, channel: 6, y: 128, x: 128) Coordinates: * channel (channel) object 48B '10u' '10v' 'gust' 'tp' 'sp' 'refc' * time (time) datetime64[ns] 70kB 2017-01-01T01:00:00 ... 2018-01-01 Dimensions without coordinates: y, x Data variables: HRRR (time, channel, y, x) float32 3GB dask.array<chunksize=(1, 6, 128, 128), meta=np.ndarray> latitude (y, x) float32 66kB dask.array<chunksize=(128, 128), meta=np.ndarray>

ISD data is used for inference. ISD data can be obtained from the NOAA Data Search and downloaded in CSV format.

To download multiple stations:

Click “+Select All > Proceed to Cart” Enter email Hit submit You will receive an email with a download link

The data then needs to be gridded to the model grid and interpolated to a common time frequency. This is done using obs/preprocess_isd.py.

Training scripts are in the training directory. Configuration is handled via the YAML files in the training/config directory. For example:

Copy Copied! cd training python3 train_diffusions.py \ --outdir /expts \ --tick 100 \ --config_file ./config/hrrr.yaml \ --config_name unconditional_diffusion_downscaling_a2s_v3_1_oklahoma \ --log_to_wandb True \ --run_id 0

With log_to_wandb=True , you’ll want to specify your wandb entity and project in train_diffusions.py .

For an example of inferencing on both subsampled HRRR and ISD data, run the example_inference.py script. It requires the ISD and HRRR data, as well as the state file of the trained model.

For running a full inference of the model across an entire year, use the full_inference.py script. The output filename is specified within that script.

The code to reproduce the paper figures is in the paper_figures/ directory. To score the output of full_inference.py , use the score_inference.py script:

Copy Copied! cd paper_figures python score_inference.py figure_data/scores/<_my_regen_model_name> python score_inference.py -g truth figure_data/scores/hrrr