DAHITI – an innovative approach for estimating water level time series over inland waters using multi-mission satellite altimetry

Satellite altimetry has been designed for sea level monitoring over open ocean areas. However, for some years, this technology has also been used to retrieve water levels from reservoirs, wetlands and in general any inland water body, although the radar altimetry technique has been especially applied to rivers and lakes. In this paper, a new approach for the estimation of inland water level time series is described. It is used for the computation of time series of rivers and lakes available through the web service “Database for Hydrological Time Series over Inland Waters” (DAHITI). The new method is based on an extended outlier rejection and a Kalman filter approach incorporating cross-calibrated multi-mission altimeter data from Envisat, ERS-2, Jason-1, Jason-2, TOPEX/Poseidon, and SARAL/AltiKa, including their uncertainties. The paper presents water level time series for a variety of lakes and rivers in North and South America featuring different characteristics such as shape, lake extent, river width, and data coverage. A comprehensive validation is performed by comparisons with in situ gauge data and results from external inland altimeter databases. The new approach yields rms differences with respect to in situ data between 4 and 36 cm for lakes and 8 and 114 cm for rivers. For most study cases, more accurate height information than from other available altimeter databases can be achieved.


Introduction
Since the 1990s, monitoring and modelling the water cycle of the Earth system have become a very important task (Stakhiv and Stewart, 2010).In particular, the knowledge of regional changes of water storage in rivers and lakes is fundamental for the risk assessment of natural disasters such as the droughts and floods which have been increasing over the last few decades (Guha-Sapir and Vos, 2011).Despite the growing importance of measurements, the number of in situ stations monitoring river discharge is globally declining.The number of river discharge time series provided by the Global Runoff Data Centre (GRDC) decreased from about 7300 to 1000 stations between 1978and 2013(Global Runoff Data Centre, 2013).In order to make a statement about the development of water level gauging stations, an equivalent database such as the GRDC is required.In general, in situ water level data are managed by federal institutions which make data access very difficult.Because of the restricted data access and lack of in situ data for rivers and lakes, there is a strong need for using satellite altimetry to monitor both types of inland water bodies.However, many remote-sensing satellites have been launched in the last few years measuring parameters relevant for the investigation of the water cycle, e.g.precipitation, water level, and gravity.
Among these remote-sensing techniques is satellite altimetry.Besides its main design goal of measuring water levels in the ocean, satellite altimetry can also be used for deriving water levels of inland water bodies, i.e. lakes, reservoirs, rivers, and wetlands (e.g.Birkett, 1995;Crétaux andBirkett, 2006 andCrétaux et al., 2011).The advantage of satellite al-Published by Copernicus Publications on behalf of the European Geosciences Union.
C. Schwatke et al.: DAHITI -an innovative approach for estimating water level time series timetry is its global availability, which allows for estimation of water level time series even in remote areas without local infrastructure.Satellite altimetry can provide water level time series longer than 2 decades.
However, because its measurement geometry provides observations along specific ground tracks touching water bodies is by chance.Therefore, big water bodies have a higher probability of being crossed than smaller ones.In addition, because of a repeat orbit configuration, the temporal resolution is limited to 35 (ERS-2, Envisat, SARAL/AltiKa) or 10 (TOPEX/Poseidon, Jason-1, Jason-2) days when only single altimeter missions are used.Thus, the combination of different altimeter systems plays a key role in increasing the temporal and spatial resolution as well as the length of the time series.
Satellite altimetry has to cope with different problems over inland water, which are mainly caused by the large pulselimited footprint of radar altimeters.For altimeter missions using Ku-band such as Envisat, the resulting footprint varies between 2 km over the ocean and up to 16 km over the land (Chelton et al., 2001).Even for SARAL/AltiKa, measuring in Ka-band, the footprint size is still about 8 km (Schwatke et al., 2015).
The major challenge of inland altimetry is the handling of different reflections within the large footprint (water, land, etc.).The shapes of altimeter waveforms vary depending on the different surface reflections.Waveforms reflected from open ocean or large lakes show typical Brown-like shapes (Brown, 1977).In contrast, quasi-specular waveforms defined by one single peak occur mainly over smaller rivers.Both waveform groups are not influenced by land.However, near lake shores and over remaining inland, the land contamination of the radar echo leads to more than one reflection and results in degraded range quality or even to unusable data sets.The problem of non-ocean-like waveform shapes such as quasi-specular shapes over inland waters has to be considered when retracking waveforms.The affected waveforms do not have typical Brown-like shapes and cannot be retracked by using ocean waveform retrackers (MLE (Challenor and Srokosz, 1989), NASA β (Martin et al., 1983), etc.).Therefore, additional retracking can be applied with retracking algorithms such as OCOG (Wingham et al., 1986), improved threshold (Hwang et al., 2006), etc., which are more robust with respect to the geometry of the waveforms and can achieve reliable heights.The choice of retracker depends on the quality of existing altimeter measurements, which varies between investigated inland water bodies because of their extent, shape or ambient topography.The selection of an insufficient retracking algorithm can also lead to the so-called "hooking" or "off-nadir" effect.This effect arises from offnadir radar returns when the satellite is still/already over land but receives the main reflection from the off-nadir water areas.This leads to longer ranges visible in a parabolic shape of the resulting height sequence.This effect can be corrected by fitting curves to the resulting water levels (da Silva et al., 2010;Maillard et al., 2015).For each land-water transition a parabola can be fitted to the measurements that can be used to correct the off-nadir effect.In this paper, the off-nadir data are discarded since for all targets enough reliable nadir measurements are available.
The potential of satellite altimetry for the estimation of water level time series and for understanding the terrestrial water cycle was shown by Birkett (1995), Crétaux and Birkett (2006) and Crétaux et al. (2011).In most studies, only single satellite tracks were used for the computation of water level time series.The most popular study areas were the Great Lakes (Ponchaut and Cazenave (1998) used TOPEX/Poseidon) and the Amazon Basin.For the latter, investigations were based on different missions: e.g.TOPEX/Poseidon (de Oliveira Campos et al., 2001;Zakharova et al., 2006), TOPEX/Jason-1/Jason-2 (Seyler et al., 2013) and ERS-2/Envisat (da Silva et al., 2010).In addition to these individual investigations, four global databases have been developed that provide water level time series over inland waters to the international community.The different processing strategies of these four databases are described as follows.
The Hydroweb database1 was developed by the Laboratoire d'Etudes en Géophysique et Océanographie Spatiales (LEGOS).For the estimation of water level time series over lakes and rivers, a multi-mission approach using satellite altimeter data of TOPEX/Poseidon, ERS-1, ERS-2, Envisat, Jason-1, and GFO is applied.The physical heights are estimated in a track-wise manner and are corrected by the slope of the geoid or mean lake level and by range biases with respect to TOPEX/Poseidon.The final time series are computed by merging the altimeter data on a monthly basis.The applied approaches are published in Crétaux et al. (2011) andda Silva et al. (2010).
The River and Lake database2 was developed by the European Space Agency and De Montfort University (ESA-DMU).It provides track-wise time series derived from Jason-2 and Envisat for a variety of inland waters.For each track crossing the water body of interest a single time series is processed.The methodology for the estimation uses an expert system which is based on neural networks (Berry et al., 1997).
The Global Reservoir and Lake Monitor (GRLM)3 is maintained by the Foreign Agricultural Service of the United States Department of Agriculture (USDA).Water level time series of lakes and reservoirs are estimated by using a segment of one single altimeter track over the investigated target.The time series are composed of data from consecutive altimeter missions measured along the same ground track.A combination of contemporaneous missions is not performed.
The method for the estimation of water level time series is described in Birkett et al. (2011).
The Database for Hydrological Time Series over Inland Waters (DAHITI)4 was launched by the Deutsches Geodätisches Forschungsinstitut (DGFI, now DGFI-TUM) in 2013.Currently, DAHITI provides about 250 time series of rivers, lakes, reservoirs, and wetlands.The methodology for the estimation of water level time series in DAHITI is based on an extended outlier rejection and a Kalman filter approach described in detail in the article at hand.
In contrast to the methods already published in the literature, our approach is based on a rigorous combination of a variety of altimeter missions.In addition, extended outlier detection is applied and optional waveform retracking is implemented.Moreover, the processing contains a full error propagation and provides accuracies for each height measurement.Furthermore, correlations between altimeter measurements are considered in order to achieve more reliable errors for each water level height.The current paper provides detailed information on the estimation of water level time series and performs a comprehensive validation by comparing the results with in situ gauging data and time series from other databases (Hydroweb, River and Lake, and GRLM).
The article is structured as follows: in Sect. 2 the altimeter data that serve as input for the processing are described.In Sect. 3 the methodology for the estimation of water level time series from satellite altimeter data using a Kalman filter approach is explained.Section 4 starts with the introduction of the validation areas and data before the resulting water level time series and validation results are presented.The paper concludes with a summary of the results and outlook.

Altimeter data and height estimation
In this paper, altimeter measurements from TOPEX/Poseidon, Jason-1, Jason-2, ERS-2, Envisat, and SARAL/AltiKa are used depending on the data coverage for the inland water bodies under investigation.In principle, data from Geosat, ERS-1, HY-2A, IceSAT, and Cryosat-2 can be used.However, these missions are neglected in the current investigations for a number of reasons, i.e. lack of data over land, non/long-repeat cycle, bad data quality, or missing waveform information.The applied missions can be separated into two groups according to their orbit characteristics.TOPEX/Poseidon was launched in 1992 into an orbit with a repeat cycle of 9.9156 days and a track separation at the Equator of about 300 km.The mission was followed by its successors, Jason-1 and Jason-2.These three altimeter satellites can be used for estimating continuous time series over more than 2 decades.The second group starts with ERS-2 (launched in 1995), followed by Envisat and SARAL/AltiKa.The orbit of these missions is defined by a repeat cycle of 35 days and a track separation of about 80 km at the Equator.The data are available for almost 2 decades with a data gap between October 2010 (end of Envisat core mission) and March 2013 (launch of SARAL/AltiKa).The data for Envisat on its drifting orbit (October 2010-April 2012) are not used.ERS-1 is not yet ready for use in DAHITI but will be integrated in the near future.This will enable extensions of the time series back to 1991.
For the estimation of water levels, Sensor Geophysical Data Records (SGDR) altimeter products are used which provide high-frequency ranges as well as altimeter waveforms.The altimeter waveforms allow individual retracking in order to achieve more reliable altimeter ranges, especially for smaller inland water bodies.Table 1 shows a list of the altimeter missions used and provides information about the product, cycle length, frequency, along-track distance between altimeter measurements on the ground, time period, and mean range bias with respect to TOPEX/Poseidon.
Depending on the investigated inland water body, the original ocean ranges in the SGDR are very often corrupted.Especially over small lakes and rivers the altimeter waveforms do not exhibit the typical ocean-like shapes but quasispecular shapes.Land-contaminated altimeter waveforms are usually more peaky and noisy, leading to flat-patched and complex waveforms (Berry et al., 2005).The quality of the ranges can be improved by retracking these waveforms.In this study, the "improved threshold retracker" (Hwang et al., 2006) with a threshold of 10 % is applied if additional retracking is necessary.In general, all altimeter measurements of smaller lakes and rivers are retracked if the ocean product does not lead to reliable time series because of the influence of land.Testing different thresholds for the retracking of altimeter measurements showed that a threshold of 10 % gives slightly better results for smaller lakes and rivers.In our implementation of the improved threshold retracker the first sub-waveform is always chosen.Nor do we use a reference height for choosing the sub-waveform such as the last range over ocean as described in Hwang et al. (2006) since this is difficult in the case of small lakes and rivers.This algorithm is very robust and delivers ranges for all surface types which are more reliable than the original ranges over small inland waters.However, over open water (i.e.larger lakes) the resulting ranges are less precise than ranges derived from retracking algorithms for ocean applications.It is known that switching retracking algorithms along a single satellite track leads to height offsets (Crétaux et al., 2009).To avoid those offsets, all altimeter measurements of an investigated inland water body are retracked with the same algorithm.
In order to convert the range measurements (original or retracked) to water levels serving as input for our Kalman filter approach, numerous preprocessing steps are necessary.
Equation (1) summarizes the height computation from altimeter products (orbit height h sat and (retracked) altimeter range r alt ).These processing steps have to be performed for each individual altimeter measurement.The derived normal heights h normal serve as input for the DAHITI approach, described in Sect.3.
First, the range has to be corrected for geophysical effects.For this purpose, the models and corrections given in Table 2 are applied.It is important to apply identical geophysical corrections for all missions and over the whole time period in order to avoid inconsistencies in the resulting multi-mission time series.To correct the wet ( h wet ) and dry ( h dry ) tropospheric delay, products of ECMWF for Vienna Mapping Function 1 (VMF1) (Boehm et al., 2009) are used.The ionospheric delay h iono is corrected by using the NOAA Ionospheric Climatology 2009 (NIC09) (Scharroo and Smith, 2010) model.The solid Earth tide and pole tide corrections ( h etide , h ptide ) are applied according to the IERS Conventions 2003 (McCarthy andPetit, 2004).Finally, each single altimeter measurement is corrected for its radial error h rad in order to account for inter-mission biases.Radial errors are derived from a global multi-mission crossover analysis as described by Bosch et al. (2014).They are computed with the ocean products.Radial errors were interpolated over land to provide range bias corrections for each altimeter measurement over land.This approach works quite well as long as the ocean product is used for the computation of inland water levels.However, as soon as retracking is involved, additional retracker offsets will occur.In order to minimize the relative offsets between different altimeter tracks, we use the same retracker for all measurements over one target.That minimizes the inter-mission biases, which are shown later for selected results in Sect.4.3 and allow us to use different altimeter missions as a single virtual altimeter system.The average values of the applied range errors are given in Table 1 for each al-timeter mission.All data used in this study (the altimeter data as well as all corrections) are extracted from the Open Altimeter Database (OpenADB) 5 , the open altimeter database of DGFI-TUM.More information on OpenADB is given in Sect.3.1.The quality of extracted geophysical corrections is checked, and altimeter measurements are rejected if they do not comply with the valid ranges given in the mission handbooks.
For the computation of water level time series within the Kalman filter approach, normal heights h normal are used as input data, whereas altimetry provides ellipsoidal heights.However, ellipsoidal heights are purely geometrical and do not allow us to predict where the water will flow.We compute normal heights by subtracting a (quasi-)geoid model (N) from the ellipsoidal heights.For this purpose, the EIGEN-6c3stat (Förste et al., 2012) model is used, which supplements the EGM2008 geoid model with additional GOCE gravity data.

DAHITI approach
In order to use altimeter measurements from different tracks and missions a consistent and reliable combination strategy is important.The irregular spaced observations from different locations must be merged into one time series per target, and the optimal combination of measurements with different uncertainties must be ensured.This requirement is fulfilled by our DAHITI approach, which is based on an extended outlier rejection and a Kalman filter for the estimation of water level time series.
The processing strategy for the estimation of water level time series over inland waters using the DAHITI approach is separated into three steps: preprocessing, Kalman filtering and postprocessing (cf.Fig. 1).The preprocessing step includes all necessary tasks for the preparation of the input altimeter heights such as waveform retracking, applying range corrections, calculation of height errors, and rejection of outliers.
In the Kalman filtering step, the computation of the water levels of the investigated water body is performed.In this paper, we apply Kalman filtering in a single location centred on the investigated water body and obtain one computed water level for each epoch.However, there is also an option for performing Kalman filtering on a grid which can be used for investigation of the surface variability of larger lakes.
In the postprocessing step, all water levels from the previous step are merged to form a single water level time series referring to one reference location if the Kalman filtering was performed on a grid.Subsequent outlier detection can be conducted if necessary.The final time series is stored in DAHITI, accessible via the website.

Preprocessing
OpenADB holds satellite altimeter data and derived highlevel products.OpenADB provides satellite altimeter data, geophysical corrections, models, etc., which are also accessible via the website.The data sets from OpenADB used for this study, and the methodology used to derive individual water levels are described in Sect. 2.
In addition to the normal heights of the water levels the Kalman filter requires information on the quality of each measurement.This information is used for the weighting of the individual data sets as well as for the error estimation of water level products.Because of the lack of absolute accuracy, the precision of the heights is computed by analysing the along-track scatter of the measurements.
For this purpose, an "absolute deviation around the median" (ADM) is estimated by using a sliding box along the altimeter track.The size of the sliding box varies for large lakes (±3.5 km), small lakes/large rivers (±1.5 km) and smaller rivers (±0.5 km).The definition of the sliding box in kilometres instead of number of points allows consistent handling of missions with different data rates (10, 20, or 40 Hz) and ensures correct inter-mission weighting.The ADM is calculated by estimating a median of the water heights within the box.Then the median height is subtracted from the cur-

C. Schwatke et al.: DAHITI -an innovative approach for estimating water level time series
rent water height and the absolute value of the difference is used as the "error" of the altimeter measurement.Compared with estimated standard deviations, the ADM method is more robust against corrupted water heights and topography near shores and leads to more reliable errors as long as more than half of the altimeter measurements are over water.
Before Kalman filtering is performed, various user-defined outlier rejections can be applied.Inaccurate water levels must be rejected before Kalman filtering; precise ones are used for the estimation of the resulting water levels.The following outlier criteria can be applied in the preprocessing step: -latitude thresholds, -water height thresholds, It is important to note that the criteria for the outlier detection are very flexible and the optimal configuration strongly depends on the investigated water body.As a consequence, the parameters for outlier rejection vary with the study areas.First, three outlier criteria (latitude thresholds, water height thresholds and height error threshold) are applied.
The backscatter coefficients of altimeter measurements provide information about the reflectance of the surface.This information can be used to reject altimeter measurements affected by ice.
Moreover, outlier detection with support vector regression (SVR) (Smola and Schölkopf, 2004) is implemented.This method applies linear regression to each altimeter track to reject altimeter measurements that do not represent the flat water level of the inland water target.SVR is similar to common regression but is more flexible and robust.SVR is an advancement of the support vector machine (SVM) (Boser et al., 1992), which is used as a classification algorithm for applications such as pattern recognition and machine learning.Depending on the mathematical problem, the kernel for the regression varies.One can use linear, polynomial or radial base functions (Smola and Schölkopf, 2004).In our case, SVR is applied on single altimeter tracks over an inland water body using a linear kernel and zero-slope constraint.Based on the constant representing the flat water level, an interval is defined which separates into valid and invalid data.Figure 2 shows an example of an altimeter track (Envisat, Pass 80, Cycle 007) crossing Lake Erie, which has an island in the middle.Blue dots indicate valid measurements, red dots indicate rejected data that exceed the ADM threshold of 5 cm (black dotted line), and green dots mean outliers detected by SVR (with rejection interval of ±5 cm).The threshold of the SVR should be of the order of the noise of high-frequency altimeter measurements.One can see that all heights influenced by land contamination are detected as outliers and the remaining heights represent a flat surface.

Kalman filtering
The method of Kalman filtering is applied for the computation of water level time series in DAHITI.It updates a model by measurement data of different accuracies and predicts the current state to the next time epoch (Kalman, 1960).In contrast to the common least-squares adjustment, the Kalman filter works recursively and the number of input observations per processing step is significantly reduced because of its sequential integration.This also enables real-time applicability.
The Kalman filter performs the estimation of water level time series from the track-wise input heights by combining time-dependent input data available at irregular intervals and -in the case of larger lakes -at different locations.Different modified Kalman filter approaches have been used for geodetic applications (e.g.Yang and Gao, 2006;Eicker et al., 2014 andGruber et al., 2014).In principle, this algorithm realizes a sequential least-squares adjustment by taking into account the accuracies of the input data as well as the deterministic and stochastic behaviour of the system and produces a statistically optimal estimate of the water level time series.

Update interval
The Kalman filter uses input observations to update the current state of the system and predict the model of the following time epoch.This is performed in a continuous loop consisting of two steps (an update and a prediction step) running consecutively for every period of time t k .At the beginning, an initialization is necessary in order to set the starting conditions.The work flow is illustrated in Fig. 3.The time increment of the Kalman filter can be defined arbitrarily.In our case an observation-based update interval instead of a constant one is used.That means that our system is updated each time a new altimeter track is available.Thus, the update interval strongly depends on the size and the data coverage of the investigated water body.It can vary between 35 days (if only an Envisat track crosses the target area) and 1 day (in the case of large lakes covered by different altimeter missions).Time intervals shorter than 1 day are precluded by assigning the individual measurements to full days.The use of an adaptive update interval avoids smoothing effects in the case of data gaps that may occur when a fixed time increment is selected.

Optional computation grid
All computations can be referred to one location (centre of the target) or performed on a computation grid.The latter is optional and can be applied for special investigations on surface variability of larger lakes.The standard solutionalso used for all computations within this study -assumes uniform lake surfaces in balance with gravity and merges all water heights of one update step to one location.Surface differences owing to systematic height, geoid errors or hydrodynamic effects from wind and waves are neglected.In prac- tice, our approach automatically creates a grid by means of a recursive algorithm used on an initial grid node as a reference point.A land water mask provides information on the extent of the water body and the grid.The grid node separation can be chosen manually depending on the extent of the investigated inland water.Thus, normally we define only one grid node over the target.However, in cases where surface differences are expected, a smaller grid node distance can be chosen.The computations will then be performed for all grid nodes and different water levels for the whole lake surface.

Kalman filter equations
In the following, the basic equations of the Kalman filter are introduced.The algorithm consists of an observation model and a dynamic model.The observations for each step k corresponding to epoch t k are given in vector l k and its co-variances in matrix ll,k .
The vector length of l k depends on the number of water levels m k available at each epoch t k .The unknown grid node heights are compiled in vector x k .For computations using the standard solution, the vector x k has the length of 1.The m k ×n design matrix A k is the core of the observation model and connects the water levels with the computation grid consisting of n grid points (n = 1 using only a single grid point).A k has a dimension of m k ×n and contains ones for those grid nodes where water levels are available.Hereby, each water level height is assigned to the nearest grid node.In the case when the computation is performed on a single grid node all water level heights are merged into it.The vector v k absorbs the residuals of the observation model.
The uncertainties of the water levels are described in ll,k .Since there is no information on correlation between individual water levels, the matrix is defined as a diagonal matrix with variances σ 2 l from ADM (computed in the preprocessing step) on the mean diagonal.These are collected in vector s l,k .
The dynamic model of the Kalman filter approach describes the transition of the system state from epoch t k to t k+1 .
This includes the prediction step (cf.Fig. 3) for the parameter vector x + k as well as for its covariance matrix + xx,k .The prediction of the grid node heights is done by the transition matrix k .In addition, system noise q k is taken into account and mapped to the grid node heights by k .The model uncertainties are predicted by Eq. ( 5), where the covariance matrix Q k contains the uncertainties of the system disturbance, i.e. the system noise.Since no information on the temporal evolution of the water level is known in advance, the prediction is based purely on stochastic information.Moreover, the (deterministic) system disturbances in q k are set to 0. The system noise σ 2 q in matrix Q k is assumed to yield 5 cm −2 for each grid node (without correlations) because of the average noise of altimeter measurements.
The applied Kalman filter procedure as used in the DAHITI approach is described in detail below.

Initialization
The Kalman filter approach begins with an initialization step which is necessary before starting the recursive loop.The initial state vector x − k is filled by setting all elements to the observed water level with the smallest height error in the first epoch t k .The covariance matrix − xx,k is initialized by an identity matrix of size n × n.

Update
In the update step, new altimeter water levels are introduced in order to update the parameters of the current state x − k to a new state x + k .The update is done by comparing the estimated observations (based on the current model; cf.Eq. 2) with the water levels.The weighting of this so-called innovation is described by matrix K k .It can be computed based on the design matrix and the covariance matrices of observations and parameters using The parameter update of vector x + k describes the updated water levels for each grid node at the current epoch t k .
In parallel, the corresponding covariance matrix + xx,k of the height estimates is updated using Eq. ( 8).The uncertainties of new altimeter data are taken into account by applying the Kalman matrix as a weighting matrix.It can easily be seen that the parameter accuracies will become smaller within the updating step.

Prediction
After the parameter vector and the covariance matrix of the current epoch t k have been updated, the prediction of x + k and + xx,k to the next epoch t k+1 is performed and x − k+1 and − xx,k+1 are computed.The predictions are used as initial parameters for the next update step, and the computation loop then continues until all water levels have been processed.In our case, no additional information about the temporal propagation of the parameter vector and the covariance matrix is introduced.Therefore, no deterministic model is applied and the transition matrices k for data and k for disturbances in Eqs. ( 4) and ( 5) can be identity matrices.Furthermore, only system noise is taken into account by setting the disturbance value q k equal to 0 and its uncertainties Q k to variances of 5 cm 2 for each grid node without any correlations.

Post-processing
The Kalman filter provides water heights x k and their formal errors xx,k for each epoch t k and grid node.
If Kalman filtering is performed on a single grid node, the final water level and error are immediately available.If it is computed on a grid, a "mean" one-dimensional time series is computed.Instead of simply averaging all grid node heights, we select only the best water levels per epoch.Only water levels are selected that fulfill certain error criteria of Kalman filtering errors.In general, the limit for the maximum height error is set to values between 5 and 10 cm.The selected limit depends on the resulting height errors.Therefore, the limit is selected manually in such a manner that only reliable heights are used for the final time series.The remaining water levels are averaged for each epoch by using the formal errors for the weighting factors.Finally, a time series of water levels and their formal errors over the entire period of time are obtained.
In a last step, an outlier rejection is performed.The water level time series can still contain outliers because of bad quality of data, ice coverage, orbit manoeuvres, etc.For the detection of those outliers, SVR can be applied again -now on the full time series.Complete tracks showing significant differences with respect to the other points of the water level time series can be rejected.This time, radial base functions instead of a linear kernel are used to perform the regression since a constant water level over time cannot be assumed.The radial base function kernel of the SVR allows us to fit the time series including seasonal variations and trends.Figure 4 shows the results of an applied SVR on a 6-year subset of the time series of Lake Erie.The fitted model of the SVR is plotted as a cyan line together with its manually defined confidence interval.The confidence interval is selected depending on the noise of the water level time series, which Hydrol.Earth Syst.Sci., 19, 4345-4364, 2015 www.hydrol-earth-syst-sci.net/19/4345/2015/ varies between 7.5 and 100 cm.Water levels which fulfill the limit of the SVR are kept (blue), whereas outliers are rejected (red).

Results and validation
In this chapter, water level time series resulting from the Kalman approach are presented and validated.Since it is not possible to show results for all inland water bodies, we focus on the selected study areas introduced in Sect.4.1.Three inland water targets are described in more detail.They represent different target types, i.e. large lakes, small lakes, and rivers.Moreover, results from 16 lakes and 20 river crossings are validated by comparison with in situ data and altimeter time series provided by other groups.

Study areas
For altimetry-derived water level time series, in situ measurements from gauging stations are the most important validation data sets.In order to perform reliable comparisons, only those inland water bodies are selected as study areas for which in situ data are available.Since we have access to many gauging stations in North and South America, we focus our study on these two continents.
Another criterion for the selection of inland water bodies is the availability of external altimetry-derived time series to demonstrate the performance of our Kalman filter method compared with other approaches.Each study case is observed by at least one other group (i.e.Hydroweb, River & Lake, or GRLM).Thus, those targets in North and South America are selected which are best represented by other inland altimetry databases for as long a time period as possible.We end up with the 16 lakes and 20 river crossings illustrated in Fig. 5.For almost all investigated inland water bodies at least one in situ gauging station and one external altimetryderived time series are available.
The first study areas are the Great Lakes of North America, comprising Lake Superior (82 000 km 2 ), Lake Huron (59 000 km 2 ), Lake Michigan (58 000 km 2 ), Lake Erie (25 000 km 2 ), and Lake Ontario (19 000 km 2 ).The size of the these lakes leads to ocean-like conditions, which means that the altimeter measurements are not disturbed by land.Only a few altimeter measurements near the lake shore are contaminated by land.The Great Lakes show seasonal variations of about 1 m.They are well-observed inland waters with many in situ stations provided by NOAA's "Tides & Currents" platform6 .For validation of Lake Superior, in situ stations of Duluth, Grand Marais, Marquette, Ontonagon and Point Iroquois are used.Lake Huron has five stations for validation: Essexville, Harbor Beach, Lakeport, Mackinaw City, and de Tour Village.The stations Calumet Harbor, Holland, Kewaunee, Ludington, Milwaukee, and Port Inland are used for Lake Michigan.Lake Erie has seven stations for validation: Buffalo, Cleveland, Fairport, Fermi Power Plant, Marblehead, Sturgeon Point, and Toledo.For validation of Lake Ontario, the in situ stations of Cape Vincent, Olcott, Oswego, and Rochester are used.
In addition to the Great Lakes, the Great Slave Lake (27 200 km 2 ), Lake Winnipeg (24 000 km 2 ), Lake Athabasca (7800 km 2 ), Lake Winnipegosis (5100 km 2 ), Lake Manitoba (4600 km 2 ), Lake of the Woods (4300 km 2 ), Great Salt Lake (4000 km 2 ), Lake Claire (1400 km 2 ), and Cedar Lake (1300 km 2 ) -which are located in Canada and the United States -are investigated.These lakes differ significantly in surface extent, by a factor of up to 20.Estimation of water level time series in the Canadian lakes is made difficult by the winter conditions.Several lakes are frozen for several months, which makes the water level computation challenging (Table 3).For validation of the water level time series, in situ data provided by the government of Canada7 and the US Geological Survey (USGS)8 are used.
In addition to the lakes in North America, two lakes in the very south of South America are selected for validating our approach.Lake Argentino (1466 km 2 ) and Lake Buenos Aires (1850 km 2 ) are located in Argentina next to the Andes.The lakes are partly surrounded by mountains, which can affect the altimeter measurements.The lakes have a similar shape, with the largest extent in across-track direction of the satellites ground track.This leads to rather short track crossings varying between 10 and 15 km.Despite their location in a temperate zone near high mountains, the lakes are not frozen during winter.The seasonal variations of both lakes vary between 2.5 and 3.5 m.For validation of Lake Argentino and Lake Buenos Aires, in situ data from the Ministerio de Planificación Federal, República Argentina9 , are used.
For the analysis of rivers, the Amazon Basin is selected as the study area; it is the largest basin in the world and covers about 7 000 000 km 2 .The region is located in the tropics, and the climate is hot and humid throughout the year.Because of the strong precipitation, the resulting seasonal variations of the water level reach annual variations up to 15 m.The Amazon Basin consists of countless rivers which differ in terms of length, width, meanders, and seasonal variations.This variety is very useful for the quality assessment of water level time series from altimetry.For example, the river widths vary from up to 10 km for the Amazon River to a few hundred metres for the Jiparaná River.Moreover, the Amazon Basin is a well-observed area since the Agência Nacional de Águas (ANA)10 provides data for numerous in situ gauging stations.For validation, water level time series of gauges at the Japurá River, the Solimões River, the Negro River, the Purus River,  Jiparaná River, Paraguay River, Madeira River, and the São Lourenço River are used.Another reason why we chose the Amazon Basin is that other groups such as LEGOS and ESA-DMU have also investigated this area.

Validation data sets
Water level time series from gauges have a high relative accuracy, but some points must be kept in mind in the use of in situ data.The absolute comparison of heights from gauges and satellite altimetry is often very difficult since location, reference height and vertical datum of gauges are not always precisely known or may even be unknown.This leads to height offsets between water level time series from gauge and altimetry, which must be considered in the validation step.In particular, the comparison between water levels from altimetry and in situ data over rivers shows in most cases re-maining offsets.In general, almost no altimeter satellite track crosses the river at the location of a gauging station, which leads to additional offsets because of the river slope.To avoid handling the uncertainties of in situ data, only relative comparisons with water level time series from altimetry are performed.
In order to rank our results with respect to other time series derived from altimeter data, we download water levels from three external inland altimeter databases, namely Hydroweb, River & Lake, and GRLM.These results are based on various altimeter missions, and diverse approaches were performed to compute the water level time series.As a consequence, these external time series cover different time periods with temporal resolutions between 10 and 35 days.This has to be kept in mind when the different time series of the four databases are compared.(1992-2014), Hydroweb (1992-2011), River & Lakes (2002-2010) and GRLM (1992-2014) compared with in situ data (Ontonagon, 1992(Ontonagon, -2014) ) and shifted to the water level height of the in situ data.Additionally, differences between heights from altimetry and in situ data are plotted for periods in which both data sets are available.Hydroweb (green), River & Lakes (light blue), and GRLM (orange) are plotted.The time series of the four altimeter data bases are shifted to the level of the in situ data.In principle, Lake Athabasca, whose surface covers 7,800 km², should be large enough to provide reliable altimetry-derived water level time series.However, different problems such as ice coverage because of regular freezing in winter, land contamination and off-nadir effects near lake shores have to be considered.For the estimation of the water level time series in DAHITI retracked altimeter data are used, with a 10% Improved Threshold retracker (Hwang et al., 2006).For the computation, altimeter data of Topex, Jason-1, Jason-2, Envisat, ERS-2 and SARAL/AltiKa are used.In order to achieve reliable water level time series, the same outlier criteria as for Lake Superior but different thresholds are applied.First, outliers are rejected by using thresholds for latitude (depending on track length over Lake Athabasca), height (208 m to 212 m) and height error (50 cm).Furthermore, water levels affected by ice coverage are rejected if the valid backscatter coefficients are not between 10 db and 18 db.To reject water levels near the shore which are affected by land contamination, an SVR along the crossing altimeter track using a confidence limit of ±5 cm is applied.Finally, an SVR along the final water level time series using a confidence limit of ±50 cm is applied to reject remaining outliers.

855
The DAHITI water level shows a very good agreement with in situ data in summer and almost no outliers owed to ice coverage are visible in winter compared with time series from Hydroweb and River & Lakes.The overall consistency with the gauge data yields a correlation coefficient of 860 0.90 and an RMS difference of 15.1 cm using 1279 points in the period between 1992 and 2014.The usage of a median filter leads to slightly worse RMS differences of 15.3 cm for Lake Athabasca.The differences between in situ data and Hydroweb (RMS=32.1 cm, R²=0.79, 224 points), River 865& Lakes (RMS=80.5 cm, R²=0.30, 79 points) and GRLM (RMS=55.7 cm, R²=0.27, 76 points) show higher RMS values and smaller correlations.One can clearly see that the problems of altimeter time series occur mostly in winter because of ice coverage.In particular, water level time series of 870 Hydroweb and River & Lakes show strong outliers in winter which are not contained in the time series of DAHITI because of the applied outlier rejection.A new problem with retracker biases arises for time series based on retracked altimeter data.To minimize those effects all altimeter mea-875 surements are retracked using the 10% Improved Threshold Figure 6.Water level time series of Lake Superior from DAHITI (1992-2014), Hydroweb (1992-2011), River & Lake (2002-2010) and GRLM (1992-2014) compared with in situ data (Ontonagon, 1992(Ontonagon, -2014) ) and shifted to the water level height of the in situ data.Additionally, differences between heights from altimetry and in situ data are plotted for periods in which both data sets are available.

Selected results
We choose three of the aforementioned water bodies in order to present detailed results of our DAHITI approach.The targets are selected to represent three diverse inland water body types featuring different characteristics.Lake Superior (Fig. 6) is selected as representative of larger lakes with ocean-like conditions.Lake Athabasca (Fig. 7) is a smaller lake which has to cope with ice coverage in winter, which is the case for most lakes in North America.Finally, the Madeira River (Fig. 8) in the Amazon Basin is selected to show the potential of the DAHITI approach for river monitoring.For all examples, the time series from DAHITI is compared with in situ data and results from Hydroweb, River & Lake, and GRLM.

Lake Superior
Figure 6 shows the water level time series of Lake Superior between 1992 and 2014.The DAHITI result is plotted in blue (subplot a); the in situ data of the Ontonagon station are in red; and external altimetry-derived water levels in green (Hydroweb, subplot b), light blue (River & Lake, subplot c), and orange (GRLM, subplot d).In order to neglect constant offsets between the different solutions, all time series are shifted to the level of in situ data, and only water level changes are compared.The applied offset is estimated by using the average of height differences on all days on which in situ data and time series from altimetry are available.Additionally, differences between water levels from altimetry and in situ data are plotted for each time series.For the DAHITI computation, high-frequency altimeter data of TOPEX/Poseidon, Jason-1, Jason-2, Envisat, ERS-2, and SARAL/AltiKa are used.An additional retracking is not applied.The Kalman filter provides a continuous time series with an irregular near-daily resolution, which shows neither outliers nor inter-mission inconsistencies.In order to achieve reliable water level time series, different outlier criteria are applied.Initially, the number of invalid water levels is reduced by using thresholds for latitude (depending on track length over Lake Superior), height (180 to 185 m) and height error (10 cm).Furthermore, only backscatter coefficients between 10 and 18 db are selected in order to reject data affected by ice coverage.Then, an SVR using a confidence limit of ±5 cm is applied along the crossing altimeter track to reject water levels near the shore which are affected by land contamination.Finally, an SVR using a confidence limit of ±7.5 cm is applied along the final water level time series to reject remaining outliers.Altogether, the time series is composed of 3449 single points, each representing 1  (1992-2014), Hydroweb (1992-2011), River & Lakes (2002-2010) and GRLM (2002GRLM ( -2014) ) compared with in situ data (Lake Athabasca, 1992-2013) and shifted to the water level height of the in situ data.Additionally, differences between heights from altimetry and in situ data are plotted for periods in which both data sets are available.retracker.However, small retracker biases can also occur if identical retracking algorithms are applied on altimeter missions measuring in different bands such as Ku-band (Envisat) and Ka-band (SARAL/AltiKa).

Madeira River
Figure 8 shows the resulting water level derived from an Envisat and SARAL/AltiKa crossing over the Madeira River.The water level time series from DAHITI (blue), Hydroweb (green) and River & Lakes (light blue) are compared with the in situ station Humaitá (red), which is located about 27.6 km upstream.All time series from altimetry are shifted to the water level of the in situ station.At this location the Madeira River is about 2.5 km wide.In order to achieve reliable water level time series over the Madeira River different outlier cri-890 teria are applied.First, thresholds for latitude (depending on track length over the Madeira River), height (30 m to 50 m) and height error (100 cm) are applied to reduce the number of invalid water levels.Finally, an SVR along the crossing altimeter track using a confidence limit of ±10 cm and an SVR 895 along the final water level time series using a confidence limit of ±100 cm are applied to reject remaining outliers.In this case, no limit for the backscatter coefficients is applied because no ice coverage exists in the Amazon basin.In principle, the backscatter coefficient can also be used to distinguish 900 between water and land but this is not considered here.All altimeter time series reach a temporal resolution of about one month since there is only one mission with 35-day temporal resolution at the same time.Altimeter data are available between 2002 and 2014 with a data gap between October 905 2010 and March 2013.The altimeter data from Envisat on the shifted orbit can not be used between October 2010 and April 2012 for the current water level time series.Gauging information does not start before 2007.Thus, the comparison with in situ data only comprises a time period of about 910 3.5 years.For DAHITI another year of SARAL/AltiKa data is available.The Kalman filter result (blue) shows an RMS difference of 19.4 cm and a correlation coefficient of 1.00 by using 35 points.The estimation of the water level time series using a median filter leads to RMS difference of 19.6 cm.

915
The RMS is comparable to the result for Lake Athabasca, which is even more satisfactory when we take into account the seasonal variations of about 15 m of the Madeira River.The high amplitude is also the reason for the extremely high correlation, which should not be overvalued.The RMS dif-920 Figure 7. Water level time series of Lake Athabasca from DAHITI (1992-2014), Hydroweb (1992-2011), River & Lake (2002-2010) and GRLM (2002GRLM ( -2014) ) compared with in situ data (Lake Athabasca, 1992-2013) and shifted to the water level height of the in situ data.Additionally, differences between heights from altimetry and in situ data are plotted for periods in which both data sets are available.
day with at least one altimeter track crossing the lake.During computation of the final water level time series 24 % of the data are rejected, mostly because of ice coverage.
The DAHITI water levels coincide very well with the daily in situ data of Ontonagon.The correlation coefficient R 2 is 0.95 and the rms difference shown is 4.4 cm.The alternative computation of the water level time series using a median filter instead of the Kalman filter leads to a slightly worse rms difference of 4.5 cm (see Sect. 4.3.4).In comparison with the DAHITI time series, the other altimetry-derived water levels show significantly reduced temporal resolutions.In addition, the lengths of the time series differ, depending on the missions used by the different groups.In order to rank the DAHITI result compared with other altimetry-derived water levels, we also compare the three external time series with in situ gauging data within the corresponding time intervals.For all three databases this gives smaller correlations and higher rms (Hydroweb: rms = 5.7 cm, R 2 = 0.95, 228 points; River & Lake: rms = 8.2 cm, R 2 = 0.82, 82 points; and GRLM: rms = 12.1 cm, R 2 = 0.74, 760 points).For validation, water level time series of the other altimetry-derived water levels are used as they are, without any additional outlier rejection.This leads to higher rms differences as published in Riçko et al. (2013), who applied an additional outlier rejection based on in situ data.The altimetry-derived so-lutions differ because of varying input data sets and the different approaches.Hydroweb uses a multi-mission approach with a merged monthly resolution, whereas River & Lake relies purely on Envisat with a temporal resolution of 35 days.GRLM applies a multi-mission approach providing a temporal resolution of about 10 days.The time series of Hydroweb and GRLM still show mission-dependent offsets which can be seen in the differences from the in situ data (mainly positive for ERS-2, mainly negative for Envisat).In contrast, mission-dependent offsets are quite small in the water level time series of DAHITI.

Lake Athabasca
Figure 7 shows the water level time series of Lake Athabasca between 1992 and 2014.Once again, water levels from DAHITI (blue) and in situ data of Crackingstone Point (red), Hydroweb (green), River & Lake (light blue), and GRLM (orange) are plotted.The time series of the four altimeter databases are shifted to the level of the in situ data.In principle, Lake Athabasca, whose surface covers 7800 km 2 , should be large enough to provide reliable altimetry-derived water level time series.However, different problems -such as ice coverage because of regular freezing in winter, land contamination and off-nadir effects near lake shores -have to be  (Humaitá, 2007(Humaitá, -2014) ) and shifted to the water level height of the in situ data.Additionally, differences between heights from altimetry and in situ data are plotted for periods in which both data sets are available.
ferences of Hydroweb and River & Lakes with respect to the gauge are twice as great, at 45.1 cm (Hydroweb, 29 points) and 53.2 cm (River & Lakes, 28 points) respectively.GRLM does not provide information for this virtual station.

925
The DAHITI time series show good consistency with in situ observations and clear advances over established approaches.However, some problems remain, especially for smaller lakes and rivers.For larger lakes, the assumption of a uniform surface level may no longer be justified.In addition to height 930 differences owed to systematic errors in geophysical corrections or the geoid, hydrodynamic effects caused by wind and waves can cause horizontal lake level differences.Currently, these are neglected when combine observations from diverse parts of the lake.Moreover, measurements (altimetry as well as in situ) feature non-uniform accuracies observed over areas with different surface conditions.This effect can be seen when we compare the DAHITI water level time series of Lake Superior with additional gauging stations.The five possible comparisons lead to RMS differences varying by 2 cm (between 4.4 cm and 6.6 cm; Table 4).The two stations Duluth and Point Iroqouis show reduced consistency with al-timetry.Both stations are located in smaller bays of the lake and are more affected by wind and waves than the other stations, which leads to more noisy in situ time series.

945
For small lakes and rivers, land contamination of waveforms is the largest problem because nearly all altimeter measurements are affected.For rivers, almost no nadir measurements may occur and even these can originate from river branches and distort the water level time series from the in-950 vestigated target.Moreover, the river slope can influence the time series, as well as the comparison with in situ data.The crossings between river and altimeter track can vary slightly (up to 1 km) because of orbit instabilities so that the reflections originate from different areas which do not exhibit the 955 same water level.The most important challenge remaining is the handling of inter-mission biases and retracker biases.The usage of radial errors from a global crossover analysis and the restriction to one common retracker works reasonably well; however, small discrepancies remain in the time series.

960
Moreover, the quality of the single altimeter measurements could surely be further improved by combining different retracking algorithms depending on the waveform shapes.This remains a major challenge and offers enormous potential for future work.(2002-2014), Hydroweb (2002-2010), and River & Lake (2002-2010) compared with in situ data (Humaitá, 2007(Humaitá, -2014) ) and shifted to the water level height of the in situ data.Additionally, differences between heights from altimetry and in situ data are plotted for periods in which both data sets are available.considered.For the estimation of the water level time series in DAHITI retracked altimeter data are used, with a 10 % improved threshold retracker (Hwang et al., 2006).For the computation, altimeter data of TOPEX/Poseidon, Jason-1, Jason-2, Envisat, ERS-2 and SARAL/AltiKa are used.In order to achieve reliable water level time series, the same outlier criteria as for Lake Superior but different thresholds are applied.First, outliers are rejected by using thresholds for latitude (depending on track length over Lake Athabasca), height (208 to 212 m) and height error (50 cm).Furthermore, water levels affected by ice coverage are rejected if the valid backscatter coefficients are not between 10 and 18 db.To reject water levels near the shore which are affected by land contamination, an SVR along the crossing altimeter track using a confidence limit of ±5 cm is applied.Finally, an SVR along the final water level time series using a confidence limit of ±50 cm is applied to reject remaining outliers.
The DAHITI water level shows a very good agreement with in situ data in summer, and almost no outliers owing to ice coverage are visible in winter compared with time series from Hydroweb and River & Lake.The overall consistency with the gauge data yields a correlation coefficient of 0.90 and an rms difference of 15.1 cm using 1279 points in the period between 1992 and 2014.The usage of a me-dian filter leads to slightly worse rms differences of 15.3 cm for Lake Athabasca.The differences between in situ data and Hydroweb (rms = 32.1 cm, R 2 = 0.79, 224 points), River & Lake (rms = 80.5 cm, R 2 = 0.30, 79 points) and GRLM (rms = 55.7 cm, R 2 = 0.27, 76 points) show higher rms values and smaller correlations.One can clearly see that the problems of altimeter time series occur mostly in winter because of ice coverage.In particular, water level time series of Hydroweb and River & Lake show strong outliers in winter, which are not contained in the time series of DAHITI because of the applied outlier rejection.A new problem with retracker biases arises for time series based on retracked altimeter data.To minimize those effects, all altimeter measurements are retracked using the 10 % improved threshold retracker.However, small retracker biases can also occur if identical retracking algorithms are applied on altimeter missions measuring in different bands, such as Ku-band (Envisat) and Ka-band (SARAL/AltiKa).

Madeira River
Figure 8 shows the resulting water level derived from an Envisat and SARAL/AltiKa crossing over the Madeira River.The water level time series from DAHITI (blue), Hydroweb (green) and River & Lake (light blue) are compared with the and height error (100 cm) are applied to reduce the number of invalid water levels.Finally, an SVR along the crossing altimeter track using a confidence limit of ±10 cm and an SVR along the final water level time series using a confidence limit of ±100 cm are applied to reject remaining outliers.In this case, no limit for the backscatter coefficients is applied because no ice coverage exists in the Amazon Basin.In princi-ple, the backscatter coefficient can also be used to distinguish between water and land, but this is not considered here.All altimeter time series reach a temporal resolution of about 1 month since there is only one mission with 35-day temporal resolution at the same time.Altimeter data are available between 2002 and 2014 with a data gap between October 2010 and March 2013.The altimeter data from Envisat on the shifted orbit can not be used between October 2010 and April 2012 for the current water level time series.Gauging information does not start before 2007.Thus, the comparison with in situ data only comprises a time period of about 3.5 years.For DAHITI another year of SARAL/AltiKa data is available.The Kalman filter result (blue) shows an rms difference of 19.4 cm and a correlation coefficient of 1.00 by using 35 points.The estimation of the water level time series using a median filter leads to a rms difference of 19.6 cm.The rms is comparable to the result for Lake Athabasca, which is even more satisfactory when we take into account the seasonal variations of about 15 m of the Madeira River.The high amplitude is also the reason for the extremely high correlation, which should not be overvalued.The rms differences of Hydroweb and River & Lake with respect to the gauge are twice as great, at 45.1 cm (29 points) and 53.2 cm (28 points) respectively.GRLM does not provide information for this virtual station.

Discussion
The DAHITI time series show good consistency with in situ observations and clear advances over established approaches.However, some problems remain, especially for smaller lakes and rivers.For larger lakes, the assumption of a uniform surface level may no longer be justified.In addition to height differences due to systematic errors in geophysical corrections or the geoid, hydrodynamic effects caused by wind and waves can cause horizontal lake level differences.Currently, these are neglected when combining observations from diverse parts of the lake.Moreover, measurements (altimetry as well as in situ) feature non-uniform accuracies observed over areas with different surface conditions.This effect can be seen when we compare the DAHITI water level time series of Lake Superior with additional gauging stations.The five possible comparisons lead to rms differences varying by 2 cm (between 4.4 and 6.6 cm; Table 4).The two stations Duluth and Point Iroquois show reduced consistency with altimetry.Both stations are located in smaller bays of the lake and are more affected by wind and waves than the other stations, which leads to noisier in situ time series.
For small lakes and rivers, land contamination of waveforms is the largest problem because nearly all altimeter measurements are affected.For rivers, almost no nadir measurements may occur, and even these can originate from river branches and distort the water level time series from the investigated target.Moreover, the river slope can influence the time series, as well as the comparison with in situ data.The crossings between river and altimeter track can vary slightly (up to 1 km) because of orbit instabilities so that the reflections originate from different areas which do not exhibit the same water level.The most important challenge remaining is the handling of inter-mission biases and retracker biases.The usage of radial errors from a global crossover analysis and the restriction to one common retracker works reasonably well; however, small discrepancies remain in the time series.Moreover, the quality of the single altimeter measurements could surely be further improved by combining different retracking algorithms depending on the waveform shapes.This remains a major challenge and offers enormous potential for future work.
The validation of water level time series of DAHITI for Lake Superior, Lake Athabasca, and the Madeira River compared with in situ data and time series from Hydroweb, River & Lake, and GRML showed clear improvements.To evaluate the impact of the outlier rejection and Kalman filtering on the improvements of the DAHITI time series, an alternative approach using a simple median filter instead of a Kalman filter was applied.
The resulting rms differences for three inland waters decreased slightly by 0.1 to 0.2 cm, which indicates that the combination strategy has only a moderate effect on the overall accuracy.The strongest improvements are currently due to rigorous outlier detection and data retracking.However, the Kalman filter has a considerable potential when upgraded by dynamic modelling and used for real-time applications.

Quality assessment
The results for Lake Superior, Lake Athabasca, and the Madeira River presented in Sect.4.3 already show the ability of the DAHITI approach to provide reliable and highly accurate time series of inland water levels.Since three results -even if they do represent different inland water types -are not enough to perform a reliable quality assessment of the method, we extend the validation to a larger sample and include all study targets (16 lakes and 20 river crossings) described in Sect.4.1 in the comparison.
Table 3 gives an overview of the different parameters used for the estimation of water level time series in DAHITI.This information is provided for all investigated lakes and rivers.The first column shows the altimeter missions used, followed by the retracking flag, which indicates if additional retracking is applied.Then the ice flag shows if the water body is affected by ice coverage in winter.This information originates from external sources, e.g.National Snow and Ice Data Centre (http://nsidc.org/) for Lake Superior.Table 3 also shows which outlier criteria were applied for the different inland water targets to reject erroneous water levels.Consequently, appropriate thresholds for latitude, height, backscatter coefficient, height error, SVR along the pass and SVR along the final time series can be selected.Finally, the number of data points of the water level time series is shown, which is equal to the number of days on which altimeter data are available.The last column describes the percentage of outliers which were rejected during the computation of the water level time series.Especially, for inland water bodies which are icecovered in winter, the percentage of outliers have strongly increased.
Table 4 summarizes the comparisons of lake level time series from DAHITI, Hydroweb, River & Lake, and GRLM with in situ gauge data.For each target, rms difference, squared correlation coefficient and the number of points (No.) used for validation are provided.Depending on the availability of in situ time series of the investigated water body, more than one comparison is performed for the larger ter level time series.Furthermore, distances of tenths of kilometres between the in situ station and the nearest crossing altimeter track make it more difficult to prove dependences due to unpredictable river flow effects.
Compared with time series from Hydroweb and River & Lake, the new DAHITI approach can improve the gauge consistency for most of the targets.The improvement can reach several decimetres.Many correlation coefficients in Table 5 are close to 1.This is not necessarily an indication of optimal consistency between altimeter water level and gauging observations but is significantly influenced by the large absolute water level variations (more than 10 m).

Conclusions
This paper presents a new method for estimating water level time series over inland waters using multi-mission satellite altimetry data.It is based on careful data preprocessing (including waveform retracking), a Kalman filter approach, and a rigorous outlier detection.The introduced is the sis of DAHITI, an online database for inland water level time series from satellite altimetry observations operated by the Deutsches Geodätisches Forschungsinstitut der Technischen Universität München (DGFI-TUM).
The study demonstrates the performance of the new method for numerous lakes and rivers in North and South America.A comprehensive validation is performed by comparison with time series of water level variations from in situ gauging stations.Moreover, a comparison with external altimetry-derived water level variations is presented based on data from Hydroweb (LEGOS), the River & Lake database (ESA-DMU), and the Global Reservoir and Lake Monitor (USDA).
The lake level data sets computed with the presented approach yield accuracies between 4 and 36 cm depending on the surface extent of the lake and climate conditions (i.e.ice coverage).For rivers, the performance is considerably lower, with rms differences varying between 8 and 114 cm.Here the accuracy mainly depends on the crossing angle of the altimeter track and the surrounding conditions.Also, other surrounding conditions -such as topography, quality of waveforms and their retracked water heights -can influence the resulting water level time series.Especially in the Amazon Basin the river meander can also change over the years because of strong seasonal variations.
For most study cases, the new approach yields significant accuracy improvements compared with water level variations provided by established inland altimeter databases, especially for smaller lakes and rivers.In addition, the temporal resolution of the DAHITI lake time series is significantly improved compared with other data sets, allowing for the detection of sub-monthly temporal changes.
The reasons for the improved performance of the presented approach are multiple: first, a larger observation data set is used as input as a multi-mission concept is realized.All available altimeter missions are cross-calibrated and incorporated into the computations.Second, the applied preprocessing consists of a robust outlier elimination and optional retracking.This ensures that only highly accurate data will be used.Moreover, the Kalman filter approach permits the optimal combination of all data sets and also includes the accuracies of the input data for weighting.This also enables rigorous error propagation and the computation of formal errors for each water level height.Further comparisons for the three selected areas show that using the Kalman filter approach instead of a median approach leads to slightly decreased rms differences.This indicates that the major improvements in the water level times of DAHITI are due to the extended outlier rejection.In future, the Kalman filter approach will also be used for (near-)real-time analysis and integration of altimeter data (with the so-called Operational Geophysical Data Record, OGDR).This enables daily actualization of the water level time series and may also be used for short-time predictions.Furthermore, the introduction of a dynamic model in the Kalman filter will cause an increase in the temporal resolution of the water level time series.For the development of the dynamic model, external data sets such as GRACE, precipitation, etc. can be used.
In spite of the improved water level time series of DAHITI compared with results from Hydroweb, River & Lake and GRLM, there are still some challenging tasks which have to be taken into account to make further improvements.Retracking is the most challenging task in using altimeter data for smaller water bodies.The mixture of different waveform shapes -such as ocean-like, specular, and other ones -makes it difficult to choose a suitable retracking algorithm.Each retracker is optimized for special waveform shapes, but switching the retracking algorithm to achieve the best ranges will lead to retracker biases which have to be taken into account.Furthermore, inter-mission offsets can also arise because of the different characteristics of the measurement systems (e.g.Ku-band (Envisat) and Ka-band (SARAL/AltiKa)).

Figure 1 .
Figure 1.Processing strategy for the computation of water level time series for inland waters in DAHITI in three main steps: preprocessing, Kalman filtering, and postprocessing.

Figure 3 .
Figure 3. Procedure of Kalman filtering starting with an initialization step followed by a progressive loop containing one update and one prediction step.

Figure 4 .
Figure 4. Example of applied SVR using radial base functions for outlier rejection on a resulting water level time series (Lake Erie) of the Kalman filtering step.The estimated regression function (cyan) and its confidence intervals (dotted cyan) are plotted.The result of the regression shows valid (blue) and rejected (red) altimeter heights.Each rejected water level height represents one complete satellite overflight.

Figure 5 .
Figure 5. Map of selected study areas of lakes (blue) and rivers (red) in North America (left) and South America (right).

Figure 6 :
Figure6: Water level time series of Lake Superior fromDAHITI (1992-2014), Hydroweb (1992-2011), River & Lakes (2002- 2010)  and GRLM (1992-2014) compared with in situ data(Ontonagon, 1992(Ontonagon,  -2014) )  and shifted to the water level height of the in situ data.Additionally, differences between heights from altimetry and in situ data are plotted for periods in which both data sets are available.

Figure 7 :
Figure7: Water level time series of Lake Athabasca fromDAHITI (1992-2014), Hydroweb (1992-2011), River & Lakes (2002- 2010)  andGRLM (2002GRLM ( -2014) )  compared with in situ data (Lake Athabasca, 1992-2013) and shifted to the water level height of the in situ data.Additionally, differences between heights from altimetry and in situ data are plotted for periods in which both data sets are available.

Figure 8 :
Figure 8: Water level time series of the Madeira River from DAHITI (2002-2014), and River & Lakes (2002-2010) compared with in situ data(Humaitá, 2007(Humaitá,  -2014) )  and shifted to the water level height of the in situ data.Additionally, differences between heights from altimetry and in situ data are plotted for periods in which both data sets are available. 965

Figure 8 .
Figure 8. Water level time series of the Madeira River from DAHITI(2002-2014), Hydroweb (2002-2010), and River & Lake (2002-2010)   compared with in situ data(Humaitá, 2007(Humaitá,  -2014) )  and shifted to the water level height of the in situ data.Additionally, differences between heights from altimetry and in situ data are plotted for periods in which both data sets are available.

Table 1 .
List of all altimeter missions used in this study together with their main characteristics.

Table 2 .
List of applied models and geophysical corrections Example of an outlier detection using error threshold and SVR along a single satellite track over Lake Erie, which contains an island (between approx.41 • 44 and 41 • 47 ).The result of the regression shows valid (blue) and rejected (red, green) water heights.The height errors based on ADM are plotted as grey bars.Thresholds for height errors and SVR are marked by dashed lines (black and cyan respectively).