Pangu-Weather

Overview

Pangu-Weather model is an AI model for global weather forecasting. The Pangu-Weather model is described and available at Pangu-Weather.

Dr. Nuo Chen implemented the support for the Pangu-Weather model in DART based on the CAM-FV DART interface.

Pangu-Weather model was trained with 0.25 degree ERA5 reanalysis data, making the horizontal dimensions fixed at (721, 1440) and the a total 13 vertical levels fixed at (1000hPa, 925hPa, 850hPa, 700hPa, 600hPa, 500hPa, 400hPa, 300hPa, 250hPa, 200hPa, 150hPa, 100hPa and 50hPa in the exact order). All state variables are assumed to be on the mass point.

The Pangu DART Interface

How to perform the assimilation?

Files required:

  • Initial ensemble anlysis or reanalysis files in your data storage directory

  • Conversion from initial ensemble files to npy files

  • DART obs_seq format observation files

  • landmask.npy quater degree landmask created by WPS geogrid.exe

  • terrain.npy quater degree landmask created by WPS geogrid.exe

  • pangu_weather_6.onnx Pangu-Weather model

  • In your working directory:

    • inference_cpu.py or inference_gpu.py depending on whether you have access to GPU to run Pangu-Weather model, modify model_path=$dir_to_pangu_weather_model/pangu_weather_6.onnx

    • input.nml

    • sampling_error_correction_table.nc

    • convert_pgout_to_nc.py; modify landmask = $dir_to_landmask/landmask.npy, terrainmask = $dir_to_terrain/terrain.npy, and work_dir = $dir_to_working_directory/

    • convert_dartout_to_npy.py

    • assimilate.sh

    • run_filter.csh

Steps:

  1. Download pangu_weather_6.onnx from Pangu-Weather.

  2. Create an virtual environment with `pangu-dart-cpu.yml` or pangu-dart-gpu.yml

  3. Prepare the initial ensemble anlysis or reanalysis files and convert them using convert_initial_conditions.py

  4. In $DART/build_template/ choose a mkmf.template.xx that suits your system, then cp mkmf.template.xx mkmf.template

  5. In $DART/models/pangu/work/, run ./quichbuild.sh. It compiles the necessary DART executables based on the mkmf.template that you set. Among all the executables generated, ./filter performs the actual assimilation. After the compilation, move ./filter to the working directory.

  6. Modify file locations in inference_cpu.py (or inference_gpu.py), convert_pgout_to_nc.py as suggested above

  7. Modify the user defined section in assimilate.sh

  8. Modify the hpc settings in run_filter.csh and assimilate.sh

  9. Modify input.nml. See Namelists and for more details.

  10. run qsub ./assimilate.sh or ./assimilate.sh to perform the assimilation cycle

User Defined variables

Item

Type

Description

num_instances

integer

ensemble size

old_date

string

cycle start date “yyyy-mm-dd-HH”

output_dir

string

directory to store the output files (e.g. output_mean, output_sd, obs_seq.final and ensemble output files )

obs_dir

string

where the observations are located. Observation should be in DART obs_seq format, see DART Observations.

Namelists

DART assembles the namelists for all the relevant modules into a single namelist file $DART/models/pangu/input.nml. Namelists star with an ampersand & and terminate with a slash /. Character strings that contain a / must be enclosed in quotes to prevent them from interfering with the namelist structure. Text outside of the & and / pair is ignored.

Here is a list of the model_nml variales and default values.

 &model_nml
     cam_template_filename               = 'pginput_0001.nc'
     vertical_localization_coord         = 'PRESSURE'
     use_log_vertical_scale              = .false.
     state_variables  =
         'T',     'QTY_TEMPERATURE',         'NA', 'NA', 'UPDATE'
         'U',     'QTY_U_WIND_COMPONENT',    'NA', 'NA', 'UPDATE'
         'V',     'QTY_V_WIND_COMPONENT',    'NA', 'NA', 'UPDATE'
         'Q',     'QTY_SPECIFIC_HUMIDITY',   'NA', 'NA', 'UPDATE'
     assimilation_period_days            = 0
     assimilation_period_seconds         = 21600
     debug_level                         = 0
/

utilities like pert_copies, fields_to_perturb, perturbation_amplitude and options like no_obs_assim_above_level, model_damping_ends_at_level, no_normalization_of_scale_heights, use_log_vertical_scale are not currently supported.

model_nml in input.nml

Item

Type

Description

cam_template_filename

character(len=128)

Pangu input template file used to provide configuration information, such as the longitude, latitude, land mask, etc.

vertical_localization_coord

character(len=128)

The vertical coordinate to which all vertical locations are converted in model_mod. Valid options is “PRESSURE”.

use_log_vertical_scale

logical

Use the log of the vertical distances when interpolating. This is only used for locations having which_vert = VERTISPRESSURE. It should be .true. when vertical_localization_coord = “scaleheight” or “height”.

state_variables

character (len=64) dimension(100)

Character string table that includes: column 1. Pangu variable names to be read into the state vector; column 2, the corresponding DART QTY (quantity); cloumn 3 and 4, if a bounded quantity, the minimum and maximum valid values, Column 5. the string ‘UPDATE’ indicates that the updated values should be written back to the output file. ‘NOUPDATE’ will skip writing this field at the end of the assimilation.

assimilation_period_days

integer

With assimilation_period_seconds, sets the assimilation cycle length. They should match the model forecast step. The common global assimilation window is 0 days, 21600 seconds (6 hours). They also set the assimilation window width.

assimilation_period_seconds

integer

See assimilation_period_days

debug_level

integer

Set this to increasingly larger values to print out more debugging information. Note that this can be very verbose. Use with care.

Features not implemented and future development plan

  • Allow ensemble generation from single intial condition files.

  • Implement the ability to discard observations at too high or low levels, including damping options.

  • Build the support for vertical localization in HEIGHT, SCALEHEIGHT, and LEVEL is the PRESSURE coordinate.

  • Assimilation of the surface variables (MSLP, U10, V10, T2M)

  • Ability to specify the model pressure level in the namelist or read from the input file