pyPDAF provides a Python interface to the established Parallel Data Assimilation Framework (PDAF). The original framework is used with various regional and global climate models including atmosphere, ocean, hydrology, land surface and sea ice models. These models are typically written in Fortran which can be easily used with PDAF. pyPDAF can become useful in the following scenarios:
- With an increasing number of Python-coded numerical models, especially machine learning models, pyPDAF is a convenient tool to implement data assimilation (DA) systems purely in Python.
- Alternatively, pyPDAF can be used to set up offline data assimilation system. In such a system, the model fields in restart files are replaced by analyses generated by pyPDAF. This can be an attractive alternative to the original Fortran implementations considering the simplicity of code implementation and package management in Python.
The interface inherits the efficiency of the data assimilation algorithms in Fortran, and the flexibility to be applied to different models and observations. This means that users of pyPDAF can couple the DA algorithms with any types of model and observations without the need to coding the actual DA algorithms. This allows the users to focus on the specific research problems. The framework includes various ensemble DA algorithms including many variants of ensemble Kalman filters, particle filters and other non-linear filters. It also provides framework for variants of 3DVar. A full list of supported methods can be found here
It is recommended to install pyPDAF via conda
:
conda create -n pypdaf -c conda-forge yumengch::pypdaf==1.0.1
You can also install locally from the source code using pip
by setting up setup.cfg
and cmake
configurations
with examples given in PDAFBuild.
For users without prior experience with PDAF, we highly recommend to start with the tutorial here: .
To construct a parallel ensemble DA system,
in example directory, we provide both online
and offline
examples.
pyPDAF and PDAF both utilise Message Passing Interface (MPI)
parallelisation. Hence, to run the example, it needs to be executed
from commandline using mpiexec
. For example,
mpiexec -n 4 python -u example/online/main.py
will run the example with 4 processes. The example is based on the tutorials of the original PDAF.
The most up-to-date pyPDAF has interface with PDAF-V2.3.1
.
A documentation is provided.
The interface follows the naming convention of PDAF.
One major difference is the localisation functions
in the Observation Module Infrastructure (OMI).
In PDAF, one can simply call PDAFomi_init_dim_obs_l
or PDAFomi_localize_covar
.
In pyPDAF, these subroutines are broken into three functions:
pyPDAF.PDAF.omi_init_dim_obs_l_iso
,
pyPDAF.PDAF.omi_init_dim_obs_l_noniso
,
pyPDAF.PDAF.omi_init_dim_obs_l_noniso_locweights
for isotropic,
non-isotropic and horizontal and vertically separated
non-isotropic localisation. The suffix is applied similarly
to pyPDAF.PDAF.omi_localize_covar
.
Details of the application of these localisation for
PDAFomi_init_dim_obs_l
and
PDAFomi_localize_covar
can be found in PDAF documentation.
We welcome issues, pull requests, feature requests and any other discussions in the issues section.
Yumeng Chen, Lars Nerger
pyPDAF is mainly developed and maintained by National Centre for Earth Observation and University of Reading.