High-level data assimilation workflows in DART
==============================================

In this section we present two high-level data assimilation workflows that show
the relevant DART programs with their inputs and outputs. These two workflows
represent two different types of DA experiments typically run.

It is possible to run DART in **Observation System Simulation Experiment
(OSSE)** mode. In OSSE mode, a perfect “true” model trajectory is created, and
synthetic observations are generated from the “truth” with added noise. This is
useful to test the theoretical capability of DA algorithms, observations, and/or
models. In this document so far, we have conducted only OSSEs.

It is also possible to run DART in a more realistic **Observation System
Experiment (OSE)** mode. In an OSE, there is no perfect model truth, which is
similar to real-world situations where the true values of the model state will
likely never be perfectly known. The observations (which again themselves are
noisy and imperfect) are the only way to get a look at the “truth” that is
estimated by the model state. In OSE mode, the user must provide observations to
DART, which are usually from real-world observation systems (which come with all
of their own idiosyncrasies and imperfections). DART can help generate ensemble
perturbations, or the user can specify their own.

The filtering aspect is the same for both OSSE and OSE experiments, and many of
the same tools for data assimilation are available in OSSE and OSE modes. The
core difference, therefore, is the existence of the perfect model “truth.”

For a simple model such as Lorenz 63 investigated above, DART can typically
advance the model time explicitly through a Fortran function call, allowing the
filtering to compute all necessary time steps in sequence without exiting the
DART program. However, for larger models (or those that DART cannot communicate
with through Fortran), a shell-script may be necessary to run the model and
advance the time forward. For the largest models, the model state is typically
advanced in parallel over many computing nodes on a supercomputer. In this more
complex case, DART only considers one step at a time in order to combine the
observations and the prior ensemble to find the posterior analysis, which will
then be used to restart the model and continue the forecast.

For efficiency reasons, data from models with large states may be written in
separate files for every ensemble member at every stage of the assimilation
process. Data from models with small states may be conveniently be written as
variables inside a single netCDF file.

Simple model workflow with an OSSE
----------------------------------

The first example DA workflow is for a model that can be advanced by DART with
all ensemble members stored in a single file running an OSSE.
Details of the executables mentioned below can be found in
:doc:`Programs included in DART <../assimilation_code/programs/readme>`.
|simple_model_workflow|

.. |simple_model_workflow| image:: images/simple_model_workflow.jpg
   :width: 100%

As shown, the program *preprocess* takes the ``input.nml`` namelist file and
generates Fortran code for the observations. This code, along with the namelist,
is used by all subsequent programs. *create_obs_seq* is used to define a set of
observations in ``set_def.out``, which can be replicated through the program
*create_fixed_network_seq* to create a ``obs_seq.in`` file. There are two inputs
to *perfect_model_obs*: the ``obs_seq.in`` file and ``perfect_input.nc`` (which
here is generated by ``perfect_input.cdl`` via *ncgen*). ``obs_seq.in`` provides
*perfect_model_obs* with the observation template (i.e. the location and type of
observations), while ``perfect_input.nc`` provides the initial state that will
be used to advance the model. On output, the “perfect” model state at the final
time, which can be used as a restart for running this procedure again, will be
written to ``perfect_output.nc`` (i.e. ``perfect_output.nc`` could be renamed to
``perfect_input.nc`` to extend the OSSE), while the entire state trajectory will
be stored in ``true_state.nc``. The noisy synthetic observations and noise-free
truth (for verification only) will be stored in ``obs_seq.out``. The observation
values of ``obs_seq.out`` will be input to *filter* along with the
``filter_input.nc`` (generated by ``filter_input.cdl`` via *ncgen*), which
contains the initial state for all the ensemble members. The output of *filter*
is ``preassim.nc``, which contains the prior state for all the ensemble members
just before applying DA (so including prior inflation if it is being used);
``analysis.nc``, which contains the posterior state for all the ensemble members
after assimilation (and including inflation if it is being used);
``filter_output.nc``, which is the final posterior that could be used to restart
the OSSE process; and ``obs_seq.final``, which adds the forward-calculated
expected values *h*\ (**x**) for each observation. The ``obs_seq.final`` file
can be analyzed and binned by the *obs_diag* program, producing the file
``obs_diag_output.nc`` which can be used for diagnostics.

Complex model workflow with an OSE
----------------------------------

The second workflow is for a complex model with all ensemble members stored in
separate files running an OSE. In this case, DART will only operate on one model
output at a time. External programs will advance the model states, generate the
observations, and call DART again.  Details of DART's internal programs,
which are mentioned below, can be found in
:doc:`Programs included in DART <../assimilation_code/programs/readme>`.
The following diagram in shows the high-level
DART flow in this case:

|DART_workflow|

.. |DART_workflow| image:: images/DART_workflow.png
   :width: 100%

Within a single time step, DART will use the *filter* program to run the
“Assimilate” portion of the above diagram and/or the “diagnostics” as follows:

|complex_model_workflow|

.. |complex_model_workflow| image:: images/complex_model_workflow.jpg
   :width: 100%

The single time-step workflow for an OSE experiment within a single step is
slightly simpler than the OSSE equivalent as DART handles less of the process.
Like the OSSE case, the namelist and preprocessed observation source files are
input to all other DART programs. In the OSE case, however, the user must
provide an **obs converter** that will output a ``obs_seq.out`` file. There are
many DART utilities to make this process easier, but for the OSE case the
``obs_seq.out`` file is ultimately the user’s responsibility (to avoid
duplicating effort, see the list of existing observation types in :doc:`Important
capabilities of DART <important-capabilities-dart>`). Here, the option to run with one
file for each ensemble member is demonstrated. There are *k* ensemble members
used as input to *filter*, which also outputs *k* members for the prior and
posterior. The ``obs_seq.final`` and ``obs_diag_output.nc`` are used in the same
way as in the OSSE case. The names of the input files and output files can be
controlled by the user through the ``filter_input_list.txt`` and
``filter_output_list.txt`` files, which can contain the user-specified list of
the ensemble input or output files, respectively.

Another view of the stages of filter is shown in the following diagram:

|diag_cycle|

.. |diag_cycle| image:: images/diag_cycle.png
   :width: 100%

As shown here, an ensemble forecast is stored in ``forecast.nc`` , to which
prior inflation can be applied and stored in ``preassim.nc``. Once assimilation
is applied, the output can be stored in ``postassim.nc``, and finally if
posterior inflation is applied, the final analysis can be written in
``analysis.nc`` . The model forecast will start from the analysis to advance the
model in order to start the cycle over again.

.. note::

   The “forecast” will be the same as the “preassim” if prior inflation is not
   used, and the “postassim” will be the same as the “analysis” if posterior
   inflation is not used. The *stages_to_write* variable in the “&filter_nml”
   section of the ``input.nml`` namelist controls which stages are output to
   file. For a multi-file case, the potential *stages_to_write* are “input,
   forecast, preassim, postassim, analysis, output” while for a single file the
   same stages are available with the exception of “input.”

.. note::
    
   In the above cycling diagram, there will actually be one file per member,
   which is not shown here in order to simplify the process.

.. important:: 

   The decision to store ensemble members as separate files and whether to run
   an OSSE or OSE are independent. An OSSE can be run with multiple files and
   an OSE can be run with all ensemble members stored in a single file.