High-level data assimilation workflows in DART ============================================== In this section we present two high-level data assimilation workflows that show the relevant DART programs with their inputs and outputs. These two workflows represent two different types of DA experiments typically run. It is possible to run DART in **Observation System Simulation Experiment (OSSE)** mode. In OSSE mode, a perfect “true” model trajectory is created, and synthetic observations are generated from the “truth” with added noise. This is useful to test the theoretical capability of DA algorithms, observations, and/or models. In this document so far, we have conducted only OSSEs. It is also possible to run DART in a more realistic **Observation System Experiment (OSE)** mode. In an OSE, there is no perfect model truth, which is similar to real-world situations where the true values of the model state will likely never be perfectly known. The observations (which again themselves are noisy and imperfect) are the only way to get a look at the “truth” that is estimated by the model state. In OSE mode, the user must provide observations to DART, which are usually from real-world observation systems (which come with all of their own idiosyncrasies and imperfections). DART can help generate ensemble perturbations, or the user can specify their own. The filtering aspect is the same for both OSSE and OSE experiments, and many of the same tools for data assimilation are available in OSSE and OSE modes. The core difference, therefore, is the existence of the perfect model “truth.” For a simple model such as Lorenz 63 investigated above, DART can typically advance the model time explicitly through a Fortran function call, allowing the filtering to compute all necessary time steps in sequence without exiting the DART program. However, for larger models (or those that DART cannot communicate with through Fortran), a shell-script may be necessary to run the model and advance the time forward. For the largest models, the model state is typically advanced in parallel over many computing nodes on a supercomputer. In this more complex case, DART only considers one step at a time in order to combine the observations and the prior ensemble to find the posterior analysis, which will then be used to restart the model and continue the forecast. For efficiency reasons, data from models with large states may be written in separate files for every ensemble member at every stage of the assimilation process. Data from models with small states may be conveniently be written as variables inside a single netCDF file. Simple model workflow with an OSSE ---------------------------------- The first example DA workflow is for a model that can be advanced by DART with all ensemble members stored in a single file running an OSSE. Details of the executables mentioned below can be found in :doc:`Programs included in DART <../assimilation_code/programs/readme>`. |simple_model_workflow| .. |simple_model_workflow| image:: images/simple_model_workflow.jpg :width: 100% As shown, the program *preprocess* takes the ``input.nml`` namelist file and generates Fortran code for the observations. This code, along with the namelist, is used by all subsequent programs. *create_obs_seq* is used to define a set of observations in ``set_def.out``, which can be replicated through the program *create_fixed_network_seq* to create a ``obs_seq.in`` file. There are two inputs to *perfect_model_obs*: the ``obs_seq.in`` file and ``perfect_input.nc`` (which here is generated by ``perfect_input.cdl`` via *ncgen*). ``obs_seq.in`` provides *perfect_model_obs* with the observation template (i.e. the location and type of observations), while ``perfect_input.nc`` provides the initial state that will be used to advance the model. On output, the “perfect” model state at the final time, which can be used as a restart for running this procedure again, will be written to ``perfect_output.nc`` (i.e. ``perfect_output.nc`` could be renamed to ``perfect_input.nc`` to extend the OSSE), while the entire state trajectory will be stored in ``true_state.nc``. The noisy synthetic observations and noise-free truth (for verification only) will be stored in ``obs_seq.out``. The observation values of ``obs_seq.out`` will be input to *filter* along with the ``filter_input.nc`` (generated by ``filter_input.cdl`` via *ncgen*), which contains the initial state for all the ensemble members. The output of *filter* is ``preassim.nc``, which contains the prior state for all the ensemble members just before applying DA (so including prior inflation if it is being used); ``analysis.nc``, which contains the posterior state for all the ensemble members after assimilation (and including inflation if it is being used); ``filter_output.nc``, which is the final posterior that could be used to restart the OSSE process; and ``obs_seq.final``, which adds the forward-calculated expected values *h*\ (**x**) for each observation. The ``obs_seq.final`` file can be analyzed and binned by the *obs_diag* program, producing the file ``obs_diag_output.nc`` which can be used for diagnostics. Complex model workflow with an OSE ---------------------------------- The second workflow is for a complex model with all ensemble members stored in separate files running an OSE. In this case, DART will only operate on one model output at a time. External programs will advance the model states, generate the observations, and call DART again. Details of DART's internal programs, which are mentioned below, can be found in :doc:`Programs included in DART <../assimilation_code/programs/readme>`. The following diagram in shows the high-level DART flow in this case: |DART_workflow| .. |DART_workflow| image:: images/DART_workflow.png :width: 100% Within a single time step, DART will use the *filter* program to run the “Assimilate” portion of the above diagram and/or the “diagnostics” as follows: |complex_model_workflow| .. |complex_model_workflow| image:: images/complex_model_workflow.jpg :width: 100% The single time-step workflow for an OSE experiment within a single step is slightly simpler than the OSSE equivalent as DART handles less of the process. Like the OSSE case, the namelist and preprocessed observation source files are input to all other DART programs. In the OSE case, however, the user must provide an **obs converter** that will output a ``obs_seq.out`` file. There are many DART utilities to make this process easier, but for the OSE case the ``obs_seq.out`` file is ultimately the user’s responsibility (to avoid duplicating effort, see the list of existing observation types in :doc:`Important capabilities of DART `). Here, the option to run with one file for each ensemble member is demonstrated. There are *k* ensemble members used as input to *filter*, which also outputs *k* members for the prior and posterior. The ``obs_seq.final`` and ``obs_diag_output.nc`` are used in the same way as in the OSSE case. The names of the input files and output files can be controlled by the user through the ``filter_input_list.txt`` and ``filter_output_list.txt`` files, which can contain the user-specified list of the ensemble input or output files, respectively. Another view of the stages of filter is shown in the following diagram: |diag_cycle| .. |diag_cycle| image:: images/diag_cycle.png :width: 100% As shown here, an ensemble forecast is stored in ``forecast.nc`` , to which prior inflation can be applied and stored in ``preassim.nc``. Once assimilation is applied, the output can be stored in ``postassim.nc``, and finally if posterior inflation is applied, the final analysis can be written in ``analysis.nc`` . The model forecast will start from the analysis to advance the model in order to start the cycle over again. .. note:: The “forecast” will be the same as the “preassim” if prior inflation is not used, and the “postassim” will be the same as the “analysis” if posterior inflation is not used. The *stages_to_write* variable in the “&filter_nml” section of the ``input.nml`` namelist controls which stages are output to file. For a multi-file case, the potential *stages_to_write* are “input, forecast, preassim, postassim, analysis, output” while for a single file the same stages are available with the exception of “input.” .. note:: In the above cycling diagram, there will actually be one file per member, which is not shown here in order to simplify the process. .. important:: The decision to store ensemble members as separate files and whether to run an OSSE or OSE are independent. An OSSE can be run with multiple files and an OSE can be run with all ensemble members stored in a single file.