Land surface models (LSMs) are widely used to study the continental part of the water cycle. However, even though their accuracy is increasing, inherent model uncertainties can not be avoided. In the meantime, remotely sensed observations of the continental water cycle variables such as soil moisture, lakes and river elevations are more frequent and accurate. Therefore, those two different types of information can be combined, using data assimilation techniques to reduce a model's uncertainties in its state variables or/and in its input parameters. The objective of this study is to present a data assimilation platform that assimilates into the large-scale ISBA-CTRIP LSM a punctual river discharge product, derived from ENVISAT nadir altimeter water elevation measurements and rating curves, over the whole Amazon basin. To deal with the scale difference between the model and the observation, the study also presents an initial development for a localization treatment that allows one to limit the impact of observations to areas close to the observation and in the same hydrological network. This assimilation platform is based on the ensemble Kalman filter and can correct either the CTRIP river water storage or the discharge. Root mean square error (RMSE) compared to gauge discharges is globally reduced until 21 % and at Óbidos, near the outlet, RMSE is reduced by up to 52 % compared to ENVISAT-based discharge. Finally, it is shown that localization improves results along the main tributaries.

The continental part of the water cycle is commonly studied, at large scale, with hydrological modelling. These models are generally issued from the coupling of a land surface model (LSM) with a river routing model (RRM). The LSM determines the water and energy budget at the surface by spreading precipitations between the soil and the canopy. Meanwhile, the RRM transfers water mass through the basin to the outlet and gives an estimate of river discharge.

RRMs are mainly based on kinematic

However, even if hydrological models become more and more accurate, inherent model uncertainties are unavoidable. They originate from several sources: simplification and lack of knowledge in the real physics, numerization and discretization-induced errors and uncertainties in the input parameters and forcing. All these uncertainties impact a model's outputs. In the worst case, all those uncertainties could accumulate and result in the collapse of the model. The model gives therefore an approximate view of the system's real state.

Observations of the system can be used to calibrate and/or validate the model and reduce its errors. These observations can be obtained from in situ or remote techniques. In situ techniques mainly focus on measuring river water elevations at a gauge station. Another important variable of interest in river hydrology is the river discharge, which is sparsely measured compared to water elevation. Based on river discharges and elevations measured at the same time and at the same location, it is possible to build a rating curve that represents the elevation–discharge relationship. This rating curve is then applied to water elevation to set continuous discharge time series. Institutions delivering in situ data provide mainly discharge. Even though in situ measures are generally quite accurate with a high time sampling (i.e. sub-daily), their main limitation is their local and spatially sparse sampling over the river network. Furthermore, nowadays, remotely sensed data from satellite missions are more and more available and provide useful observations of rivers. The most straightforward and used instrument to measure river water elevations is the nadir altimeter.

Altimeters were initially developed to measure ocean topography with
satellite missions GEOS-3 (1975–1978) and SEASAT launched in 1978

DA aims to improve model skills to forecast/simulate the physical system evolution. To do so, DA techniques focus on either correcting the model's input parameters (parameter estimation) or the model's outputs (state estimation). State estimation (SE) consists in using observations to directly correct the model output state. It is based on the assumption that the model (and the observations) are known to be imperfect. So, SE aims at correcting model outputs, whose errors result from all sources of uncertainties previously described.

SE has been widely used in oceanography and meteorology

The objective of the present study is to investigate the contribution of
remotely sensed data, and in particular measurements derived from nadir
altimeters that provide local information, to improve a large-scale RRM via
DA. The scale difference between the observations and the model leads us to
also study the need to use localization methods within our DA framework. We
used an ensemble Kalman filter, to which we added a simple localization
module, to assimilate discharges derived from ENVISAT water surface elevation
measurements. These observations are used to correct the state of the
large-scale Total Runoff Integrated Pathways (TRIP,

In Sect.

The study is focused on the Amazon River basin (see Fig.

The Amazon basin's geology can be divided into three major morpho-structural
units: the western Andean Cordillera, the central Amazon trough and the
shields at the eastern part of the basin (Guiana shield to the north and the
Brazilian shield to the south). The northern and southern regions of the
basin are under a tropical climate with a dry and a wet season, but the
maximum rainfall season for the two parts occurs at different periods during
the year

The ISBA model

The CTRIP RRM is also defined over a regular mesh grid. In this study, it is
run at the same resolution as ISBA (0.5

Only the surface reservoir

The floodplain scheme activates when the water height in the river,

ISBA-CTRIP is run in offline mode. This implies that external atmospheric
data are needed to force the model. Here, the atmospheric data from the
Global Soil Wetness Projet 3 (GSWP3,

Map of hydro-geomorphological zones defined over the Amazon basin.

For ISBA-CTRIP, the Amazon basin is composed of a total number of 2028 cells.
A sensitivity analysis (SA) of the ISBA-CTRIP has been conducted by

The altimetry-based discharge product used in this study is derived from
water surface elevations measured by the ENVISAT Radar Altimeter-2 altimeter
instrument at Virtual Station (VS). VS is computed where the altimeter track
crosses the river. The ENVISAT mission operated from September 2002 to
October 2010 on its nominal orbit, which has a 35-day repeat period and an
80 km inter-track distance at the Equator. The water surface elevations
measured over the Amazon basin were initially generated by

Turning water surface elevation measures into an equivalent discharge
requires the use of elevation–discharge rating curves. The rating curves
used in this study have been built and validated by

The MGB-IPH discharges were used by

The quality assurance of the discharge product has been made by constraining
the rating curve coefficients within a physical range of values

General framework of the DA method at a

Altimetric discharges have then to be compared to ISBA-CTRIP discharges.
However, while the virtual stations are irregularly distributed over the
entire basin, the model is defined over a coarse regular mesh grid of
0.5

The CTRIP river network is compared to a realistic river system (produced with GoogleEarth) to properly associate ISBA-CTRIP cells with a given tributary in the basin.

Then, each virtual station is coupled with the closest ISBA-CTRIP cell along the same tributary. It may be the cell containing the virtual station or an adjacent cell according to the river network.

At a national or basin scale, water agencies can share discharge time series,
such as the Agencia Nacional de Agua (ANA,

The purpose of the SE DA is to correct model outputs using observations while
taking into consideration uncertainties in both the model and the
observations. In this work, as observed data correspond to discharge
estimates, we chose to correct model output variables such as discharge or
river storage. Indeed, following the results from the ISBA-CTRIP sensitivity
analysis (SA;

The DA technique implemented in the present study is a sequential EnKF

First of all, the the term “assimilation window” used hereafter corresponds
to the period during which a complete assimilation cycle is conducted. It is
delineated by two consecutive observation times and will be denoted by
[

The vector

Unlike hydrodynamic models, which directly solve Saint-Venant equations and
for which discharge is a model state variable (or prognostic variable), the
hydrological model ISBA-CTRIP solve differential equations describing the
time evolution of water stock in the river (

Therefore, three types of variables can be considered as control variables in
the data assimilation scheme: the discharge

The observation operator

A possible solution consists in inverting Eq. (

The computational cost for this option is the same as for the first option
but, now, the observation operator is defined as

The discharge observations are used to correct the surface water stock at the
time prior to the observation time or, in other words, at the initial time of
the integrating window. Therefore, the observation operator is written as the
composition of the model operator

In the framework of the state estimation, the observation variables, at a
given day within the Amazon basin, are the discharge estimates derived from
ENVISAT water surface elevations at the virtual stations associated with an
ISBA-CTRIP cell. The ENVISAT repeatability is 35 days, and therefore a given
virtual station will provide an observation every 35 days at best. During the
data assimilation experiments, all virtual stations will be used
simultaneously. Because of the ENVISAT orbit, the number of available
observations at a given day will vary between 0 and 15, and these
observations will be assimilated daily via the EnKF. Then, the observation
vector

Measurement errors

Moreover, for a given assimilation cycle and also between different cycles,
the observations are considered uncorrelated in space and time. The
observation error covariance matrix at cycle

In the EnKF framework, the model and observation operator are not linear.
Therefore, the main idea is to use stochastic ensembles to represent the
control variables PDFs along with the error models

Finally, the EnKF analysis step is applied to each member of the ensemble such that

The particularity of the EnKF is that the Kalman gain (

In the framework of state estimation, the sampling error can introduce
artificial correlations into the background/analysis error covariance
matrices, and generate spurious correlations between two distant grid cells
in the mesh

There exist two types of localization techniques

However, all these localization techniques described above have been
developed for atmospheric modelling where problems are in two or three
dimensions. The use of localization in hydrology is more limited. Several
studies exist to improve subsurface flow modelling

The localization method used with the CTRIP river routing model is of the B-localization type. However, it can not be simply defined on a two-dimensional radial function. Indeed, the river flow is along several one-dimensional flow directions, modelled by the routing network. The localization technique must consider the routing network to decorrelate adjacent cells on the mesh grid but located in two different sub-catchments. Nevertheless, along a same flow direction, the correlation between two distinct cells depends on the distance between the two cells. Then, for each assimilation cycle, the localization consists in a localization mask delimiting an influence area for each observation. These influence areas gather a limited number of neighbouring downstream and upstream cells around the observed cell with respect to the river routing network. We chose a fixed localization scale for simplicity and as a first step in the feasibility study of the development of a localization method for a hydrology application.

To determine the number of cells defining the influence area, the basin
subdivision into nine hydro-geomorphological zones is used with a mean flow
velocity for each zone. The influence area, for a given observed cell, is
given by the criteria below. For an influence area of size

The observed cell.

The

all the cells upstream the observed one covering

The number of cells within the influence area depends on the mean flow
velocities (averaged over a year of simulation) in the zone in which the
considered cell is situated. Those mean velocities are calculated from the
free run simulation, namely the ISBA-CTRIP simulation realized without any
assimilation step. The ISBA-CTRIP resolution is 0.5

the mean velocity for the cells into a given zone is converted into an equivalent distance in km,

the maximal distance within the zone is kept and rounded to the closest higher multiple of 50,

the number

The final localization mask is presented into a matrix of size

The background error covariance matrices

We used the Amazon basin division into

The

Firstly, a more important perturbation is applied to cells situated on the river mainstream (zones 2 and 3), as we assume that the uncertainties are more important in those zones. Indeed, discharges in these zones are the highest of the entire basin. Besides, several cells are confluence cells and are subject to backwater effects. As ISBA-CTRIP does not model the backwater effects, the water stock uncertainties in these cells are increased.

Secondly, at the first assimilation cycle, the initial condition before
perturbation

Constant values used to generate the background control ensemble

Size of the influence area for the localization process.

Presentation of the different state estimation experiments. The “SE” acronym stands for “state estimation”, indexes “1”, “2” or “3” are to differentiate the control variables (“1”: initial river storage, “2”: final river storage and “3”: discharge) and the suffixes “direct”, “diag” and “local” indicate the localization scheme (“direct”: without localization, “diag”: diagonal error covariance matrices and “local”: with localization).

During the assimilation experiment, it is necessary to quantify the
assimilation performances. The quality of the assimilation will be evaluated
in a given cell

Based on this definition, the assimilation performance will be estimated at
each cell with the normalized RMSE (RMSEn) defined by

Also, to evaluate the global performance of the assimilation over the entire
basin, a global RMSEn (RMSEn

Besides, the analysis run is available as an ensemble. The statistics will
then be estimated for each member of the ensemble and the mean (see
Eq.

The state estimation experiments have the objective of testing the different
control variables described in Sect.

For all the DA experiments, the observation errors are those described in
Sect.

In the SE1-direct experiment, ENVISAT discharges are assimilated to correct
the initial surface reservoir storage in TRIP (and inherently TRIP simulated
discharges). For this first experiment, a classical EnKF, without any
localization treatment of the error covariance matrices

The current section briefly presents the model performance without
assimilation called the free run. As all in situ and ENVISAT VS have been
associated with a unique ISBA-CTRIP cell, it is possible to compare observed
discharge at these stations to corresponding ISBA-CTRIP simulated discharge.
To begin with, a sample of 12 in situ stations, spread over the entire basin
(over the mainstream and the main tributaries), is selected. The location and
the name of these stations are represented in
Fig.

Over the majority of cells where there are both an in situ station and a virtual
station, the two discharge time series are similar (but not identical; see
Fig.

A strong difference between the in situ and ENVISAT discharges could
indicate either that the rating curve parameters were not correctly estimated
or that in situ/ENVISAT/MGB-IPH discharges have strong errors. As an example,
see Fig.

Finally, in most cases, the free run discharge is quite different
from the observed discharge. At downstream mainstream stations (at Manacapuru
and Óbidos in Fig.

Map of the 12 in situ stations used to evaluate assimilation performance: (1) São Paulo de Olivenca (Solimões), (2) Manacapuru (Solimões), (3) Óbidos (Amazonas), (4) Ipiranga (Putumayo/Icá), (5) Serrinha (Negro), (6) Uaicás (Branco), (7) Porto Seguro (Jutaí), (8) Santos Dumont (Juruá), (9) Lábrea (Purus), (10) Manicoré (Madeira), (11) Itaituba (Tapajós), and (12) Boa Sorte (Xingu).

Comparison between the ISBA-CTRIP free run (blue line),
ENVISAT-derived observed discharges (green markers) and ANA gauge discharges
(black dots) over the year 2009. For each panel, the

Then, Fig.

RMSEn for the free run simulation compared to the ENVISAT
discharges

Analysis RMSEn for the SE1-direct experiment with respect to

The first series of experiments assimilates ENVISAT discharges to correct the
ISBA-CTRIP initial river stock (see the three first rows in
Table

Global statistics for experiments with different localization schemes.

Figure

Figure

SE1-direct ensemble mean analysis discharge (red line) compared to
the free run discharge (blue line), the ENVISAT observed discharges (green
markers) and the measured gauge discharges (black dots) over the year 2009.
For each panel, the

Nevertheless, mean analysis discharge
for all displayed stations presents a chaotic behaviour with numerous local
minima and maxima. We can assume that this behaviour is present for all CTRIP
cells in the basin. Moreover, for a given cell, most of these sudden
variations are asynchronous with ENVISAT observation dates for this cell. For
example, at Serrinha in the left panel in
Fig.

These abrupt variations are completely artificial and directly result from
the assimilation processing. Indeed, for days with unrealistic
peaks/off-peaks, there are multiple ENVISAT observations available on the
basin, which impact many cells all over the basin, even if they are located
on other sub-catchments or tributaries. This is due to the construction of
the error covariance matrices

In the SE1-diag experiment, the error covariance matrices are forced to be
diagonal. The objective of such processing on the error covariance matrices
is to limit the impact of a given observation only to the observed cell.
According to Table

SE1-local uses the localization treatment presented in
Sect.

Analysis RMSEn difference between SE1-direct and the SE1-local
experiment with respect to

Then, Fig.

The localization mask has been built to avoid the effect of spurious correlations between distant cells or ones situated on different sub-basins. The current localization scheme meets this constraint. Indeed, results for the SE1-local experiment are globally improved compared to the previous experiments.

Nevertheless, along the mainstream, the initial experiment without localization gives better results. We can interpret that by the fact that discharge along the mainstream integrates hydrological processes from all the upstream basins. So when, in the SE1-local experiment, we limit the impact of the observation to only close cells, we suppress part of the information brought by distant cells to mainstream cells.

Local

Therefore, the current localization mask should be improved. The main difficulty here is to determine the size of the influence area for each observation. Currently, this size is predetermined and is constant in time according to averaged flow velocity. A potential development is to consider an influence area size that can vary in time, according to the hydrological season (high-flow/low-flow season). For example, during high-flow season, the flow velocity is higher so is the size of the influence area. Thus, the error covariance matrices would depend on the river time and space dynamic (as if there were defined from a well-sampled and significant ensemble).

In the second series of experiments, all of them uses the localization scheme
(see Sect.

Global statistics for experiments with different types of control variables.

SE1-local ensemble mean analysis discharge (red line) compared to
the free run discharge (blue line), the ENVISAT observed discharges (green
markers) and the measured gauge discharges (black dots) over the year 2009.
For each panel, the

Analysis RMSEn differences between SE1-local and SE2-local
experiments with respect to

Analysis RMSEn differences between SE1-local and SE3-local
experiments with respect to

Table

Figures

From the different approaches tested in this paper, it appears that there is
no one specific configuration that gives the best results for all rivers,
when compared to both ENVISAT and gauge discharges. In contrast, the most
effective configuration depends on the size and location of the rivers. Along
the river mainstream (the Solimões and the Amazon in
Fig.

Nevertheless, among all experiments (see Table

Statistics between analysis and in situ stations for the different assimilation experiments. Italic values indicate the best result among the three experiments for all tested gages.

Figure

SE3-local ensemble mean analysis discharge (red line) compared to
the free run discharge (blue line), the ENVISAT observed discharges (green
markers) and the measured gauge discharges (black dots) from
25 September 2002 and for 8 years. In each panel, the

However, despite the use of the localization, the analysis discharge keeps
presenting a quite chaotic behaviour: more particularly at Sao Paulo de
Olivenca (Fig.

This study presents, over the Amazon basin, the assimilation of a satellite-derived discharge product into a large-scale hydrological model to correct its state variables. The remotely sensed discharge data are derived from the ENVISAT nadir altimeter and are assimilated into the ISBA-CTRIP model using an ensemble Kalman filter. Five experiments were carried out over the year 2009. For all experiments, the assimilations were able to reduce the modelling errors compared to both observed and gauge discharges.

The first experiments tested different definition of the background error covariance matrices, where the influence of a given observation is either reduced to the only observed cell (SE1-diag), or limited to a few close cells on the hydrological network (SE1-local), or not limited and can potentially impact the entire basin (SE1-direct). Results showed that the complete stochastic matrices gave the best results along the mainstream and the localization treatment appeared necessary along the tributaries. The need for the localization is explained by the spurious elements in the error covariance matrix due to the limited ensemble size and the methodology used to generate it.

The last tests compared the corrections of different state variables: the river initial storage (SE1-local), or the river final storage (SE2-local), or the river discharge (SE3-local). The main difficulty with these different types of variables is, on the one hand, the relationship between the control and the observed variables (gathered in the observation operator) and, on the other hand, the reciprocal relationship to generate inputs for the next DA cycle. Results showed that correcting river discharge gives the best global results over the entire basin, as the link between the observed and corrected variables is the most straightforward. Therefore, the final experiment (SE3-local-long) uses the SE3-local configuration over the whole ENVISAT observation period (from September 2002 to 2010) and confirms the possibility of using such low-resolution remotely sensed data in a large-scale model.

These experiments offer several perspectives. First, the localization
treatment could be improved by combining the three tested approaches
according to the cell's position on the river: discharge correction for cells
along the mainstream should be impacted by all upriver observations, while
correction for cells on tributaries should be impacted only by close
observations along the same sub-catchment. Moreover, the size of the area of
influence for a given observations could also vary in time according to the
season (high flow/low flow). With ulterior developments of the localization
method, new challenges may appear such as the risk of imbalance, already
studied in the field atmospheric DA

A main limitation of assimilating ENVISAT data is their low repeat period
(one observation every 35 days, at best). Indeed, corrected discharges often
present strong sudden variations between unobserved and observed dates, as
the model goes back to its free run when it is not constrained by an
observation. However, there are other satellite altimetry missions with
different repeat periods, for example JASON-2 (10-day repeat period from
June 2008 to October 2016), JASON-3 (10-day repeat period, launched in
January 2016), Sentinel-3A (27-day repeat period, launched in February 2016)
or Sentinel-3B (27-day repeat period, which should be launched in 2018).
Also, the incoming SWOT (Surface Water and Ocean Topography, launch scheduled
for 2021) wide-swath altimetry mission will provide a remotely sensed
discharge product. SWOT will have a 21-day repeat period, with an almost
global spatial coverage thanks to its two 50 km swaths. All these data could
be combined with ENVISAT data (during the overlapping period) within the
assimilation scheme to have a denser network of observation over the study
domain, to get a better estimate of discharge (similar to a reanalysis) over
a multi-decadal time frame

To improve these DA results, several aspects could be investigated. For
example, one could study whether a more realistic ensemble method generation
could be helpful. In the present study, only the model initial condition and
the precipitation forcing are perturbed to generate the background forecast
ensemble. More uncertainties in this ensemble could be added by also
perturbing CTRIP parameters and/or ISBA outputs. Another DA aspect to look
into is the potential use of a smoothing data assimilation algorithm, such as
the ensemble Kalman smoother

The CTRIP code is open source and is available as a
part of the surface modelling platform
called SURFEX, which can be downloaded at

This Appendix provides more details and the approximation used to derive
Eq. (

The background error covariance matrices

The ensemble of perturbed precipitation
fields

The precipitation field

The precipitation relative error

The fields

For the time correlation, the parameter

Constant values used to perturb the precipitation fields.

The authors declare that they have no conflict of interest.

This work has been performed using HPC resources from CALMIP (grant 2016-P1408). The GSWP3 team is acknowledged for letting the authors use their different forcing fields. This work was supported by the CNES, through a grant from the Terre-Océan-Surfaces Continentales-Atmosphère (TOSCA) committee affiliated with the project entitled “Towards an improved understanding of the global hydrological cycle using SWOT measurements”. The European Space Agency (ESA) is also thanked for providing to the scientific community observations from the RA2 altimeter embarked on ENVISAT. Charlotte Marie Emery was supported by a CNES/région Midi-Pyrénées PhD grant. This work was done as a private venture and not in the author's capacity as an employee of the Jet Propulsion Laboratory, California Institute of Technology. Edited by: Albrecht Weerts Reviewed by: Rolf Hut and one anonymous referee