PDAF - the Parallel Data Assimilation Framework

Advanced data assimilation algorithms for filtering and smoothing applications with state-of-the-art large-scale geophysical models are of increasing interest. The applied algorithms are typically ensemble-based Kalman filters, nonlinear particle filters, or variational methods. Their aim is to estimate the state of a geophysical system (atmosphere, ocean, ...) on the basis of a numerical model and measurements by combining both sources of information. In addition, the ensemble-based algorithms provide an estimate of the error in the computed state estimate.


Dr. Lars Nerger

Further Information

Detailed information about PDAF as well as the software download can be found on the project pages of PDAF.


Efficient ensemble-based data assimilation

Data assimilation with these advanced algorithms and large-scale models is computationally extremely demanding. This motivates the parallelization of the assimilation problem and the use of high-performance computers. The implementation of a data assimilation system on the basis of existing numerical models is complicated by the fact that these models are typically not prepared to be used with data assimilation algorithms. The Parallel Data Assimilation Framework - PDAF - has been developed to facilitate the implementation of parallel data assimilation systems. For this task, the model is typically extended by subroutine calls to PDAF such that a single data assimilation program results.

PDAF allows to combine an existing numerical model with data assimilation algorithms, like ensemble-based filters, with minimal changes to the model code. Furthermore, PDAF supports the efficient use of parallel computers by creating a parallel data assimilation system. The most costly part of ensemble-based Kalman filters is the integration of an ensemble of model states. To utilize the fact that each ensemble member can be integrated independently of the others, PDAF supports to extend a model code such that it allows the program to compute multiple concurrent model tasks. Since the model integrations are typically the most time consuming part of the data assimilation problem, their parallelization renders ensemble-based Kalman filters to be highly scalable (by now, PDAF was used with up to 16800 processor cores). For the assimilation step, which combines the observations and the model state estimate, PDAF includes a several filter algorithms. All filters are fully implemented, optimized, and parallelized.


Components of a data assimilation system

PDAF bases on a logical separation of the assimilation system into three components (see figure above):

  • On one side, the model provides the initialization and integration of all fields considered in the model.
  • On the other side, the measurements provide the observational information. It consists of the information on the variables that are observed as well as the values of the observations and error estimates of them.
  • In between the model and observations, the filter algorithm combines the model and observational information. The filter is located in the core of PDAF.

Well defined interfaces serve for the information exchange between the three components. To implement a data assimilation system with PDAF, filter algorithms are usually attached to the model with minimal changes to the model source code itself. The parts of the filter problem that are model-dependent or refer to the measurements are organized as separate call-back sub-routines. These routines need to be implemented by the user of the framework like routines of the model, while the core routines of PDAF remain unchanged.

The data assimilation system is controlled by the user-supplied routines. This approach ensures that the driver functionality remains in the model part of the program. In addition, the user-supplied routines can be implemented in the context of the model code. That is, if the model uses Fortran common blocks or modules they can be also used in the user-supplied routines. These possibilities simplify the implementation of the user-supplied routines as the users typically know about the particularities of their model. The data assimilation system can then be run like the regular model, but with additional options for the data assimilation.

PDAF is implemented in Fortran and parallelized using the MPI standard. For efficiency, BLAS and LAPACK libraries are used. PDAF has been tested on different platforms with various compilers.


Optimized and parallelized filter and smoother algorithms

PDAF includes a selection of commonly used filter algorithms. All filters are optimized and parallelized. The filter algorithms, which are currently included in PDAF are:

  • EnKF / LEnKF (Ensemble Kalman Filter / local EnKF)
  • SEEK (Singular "Evolutive" Extended Kalman) filter
  • SEIK / LSEIK (Singular "Evolutive" Interpolated Kalman filter / local SEIK
  • ETKF / LETKF (Ensemble Transform Kalman filter / local LETKF)
  • ESTKF / LESTKF (Error Subspace Transform Kalman filter / local ESTKF)
  • NETF / LNETF (Nonlinear Ensemble Transform filter / local NETF)
  • PF (Particle filter with importance resmapling)

All filters, except SEEK and PF, are provided with and without localization for optimal compute performance in global and localized applications.

The first three filters (EnKF, SEEK, SEIK) are described and compared in Nerger et al. (2005a), while the local SEIK filter is described in Nerger et al. (2006). The ETKF and SEIK filters have been examined in Nerger et al. (2012b), where also the new ESTKF was introduced. The NETF has been described in Tödter et al. (2016).

Next to the filter algorithms, the following smoothers are available:

  • EnKS (Ensemble Kalman Smoother)
  • ETKS (Ensemble Transform Kalman Smoother)
  • LETKS (Local Ensemble Transform Kalman Smoother)
  • ESTKS (Error Subspace Transform Kalman Smoother)
  • LESTKS (Local Error Subspace Transform Kalman Smoother)
  • LNETS (Local Nonlinear Ensemble Transform Smoother)

The smoother extension was described in Nerger et al. (2014) where also the influence of nonlinearity on the smoothing was studied. The LNETS was studied in Kirchgessner et al. (2017).

A general overview of PDAF is provided in Nerger et al. (2005b) and the implementation strategy used in PDAF as well as its parallel performance have been discussed in Nerger and Hiller (2013). The strategy to implement coupled data assimilation for Earth system models is described in Nerger et al. (2020).


Diagnostics, Ensembles, and Observation Generation


PDAF provides functionality for different statistical diagnostics for ensmeble data assimilation. There are, e.g. rank histograms and the computation of statistical moments.

Ensemble generation

Ensemble data assimilation requires the initialization of an ensemble of model state realizations. PDAF provides tools for generating such ensembles which represent the uncertainty of the model state.

Generating synthetic observations

For assessing data assimilation methods, often twin experiments are performed. These need synthetic observations which are generated from a model run representing the 'truth'. PDAF provides functionality to easily generate synthetic observations which mimic the distribution of real observations.

Toy models

For assessing data assimilation methods ith idealized models, the Lorenz-63 and Lorenz-96 are provided. These allow to study data assimilation with varying degrees of nonlinearity.

Model bindings

PDAF provides model binding codes for using PDAF with the MITgcm ocean model and the AWI-CM coupled ocean-atmosphere model.



Open-Source Software

PDAF is currently used in several projects inside and outside of AWI. The framework is available as free open-source software. Further information about PDAF as well a possibility to download the source code is available on the project pages of PDAF. Here you will also find a tutorial teaching how to use PDAF.