The use of bias-aware Kalman filters for estimating and correcting observation bias in groundwater head observations is evaluated using both synthetic and real observations. In the synthetic test, groundwater head observations with a constant bias and unbiased stream discharge observations are assimilated in a catchment-scale integrated hydrological model with the aim of updating stream discharge and groundwater head, as well as several model parameters relating to both streamflow and groundwater modelling. The coloured noise Kalman filter (ColKF) and the separate-bias Kalman filter (SepKF) are tested and evaluated for correcting the observation biases. The study found that both methods were able to estimate most of the biases and that using any of the two bias estimation methods resulted in significant improvements over using a bias-unaware Kalman filter. While the convergence of the ColKF was significantly faster than the convergence of the SepKF, a much larger ensemble size was required as the estimation of biases would otherwise fail. Real observations of groundwater head and stream discharge were also assimilated, resulting in improved streamflow modelling in terms of an increased Nash–Sutcliffe coefficient while no clear improvement in groundwater head modelling was observed. Both the ColKF and the SepKF tended to underestimate the biases, which resulted in drifting model behaviour and sub-optimal parameter estimation, but both methods provided better state updating and parameter estimation than using a bias-unaware filter.

Sequential assimilation of observations in models is a widely used method in several fields, including meteorology and hydrology. The method has repeatedly been shown to improve forecasting performance, reduce uncertainty and optimize parameter values, and is still a topic subject to ongoing research.

Data assimilation in hydrological models has been studied in a number of
settings, from single process models, modelling only a limited part of the
hydrological cycle

Biases in both models and observations pose challenges to data assimilation
in hydrology, and have previously partly been studied

While the EnKF, and any derivation thereof, implicitly accounts for both
model and observation uncertainty in the form of zero-mean white noise,
model and observation biases remain an issue that requires modifications to
the filter. A few methods have been developed that attempt to estimate biases
online, and they have been applied successfully in many settings. With few
exceptions, the bias aware filters can be grouped in two: separate filter
methods and augmented state methods. The separate-bias Kalman filter (SepKF)

This study uses both a synthetic test set-up and real observations to test the
application of bias correction to a data assimilation framework that
assimilates groundwater head and stream discharge observations in an
integrated hydrological model for joint state updating and parameter
estimation. We discuss the challenges associated with observational bias in
hydrological data assimilation for both state updating and parameter
estimation. Two existing methods of estimating observation bias, the SepKF and the augmented state vector approach, are tested and
the results compared. The novelty of the study lies in the focus on data
assimilation bias estimation in a complex, integrated hydrological model as
well as the impact of bias on parameter estimation in both synthetic test and
using real-world observations. While each of these aspects have previously
been studied individually the combination of the aspects creates new
challenges, which require particular attention. This paper shares several
similarities with the preceding

This study uses a transient, spatially distributed hydrological model based
on the MIKE SHE code

An integrated model, which includes groundwater flow, vadose zone flow,
evapotranspiration, surface flow and streamflow is used in this study. Vertical
groundwater flow components are neglected in the study and groundwater flow
is simulated based on the 2-D Boussinesq equation. Each numerical element of
the groundwater flow model is coupled to a one-dimensional (1-D) model for vertical
flow in the vadose zone. For numerical and computational convenience
capillary forces are neglected and only gravity-driven flow is considered,
which is an option in the MIKE SHE code

A horizontal grid size of 1 km

This study is based on the Karup catchment (Fig.

The Karup catchment with locations of discharge and hydraulic head observations.

The geological model used in this study is a 3-D model, which contains one
dominant geological unit (meltwater sand) and five lenses (clay, quartz sand,
mica clay/silt and limestone), each with assigned parameters of hydraulic
conductivity, specific yield and specific storage. The geological model is in
a preprocessing step converted into a 2-D model by interpolating the parameter
values and gridding them to the computational grid, resulting in a spatially
variable field of hydraulic conductivity. The parameter values of the stream
model are assumed uniform throughout the model domain. The drain level and
drain time constant parameters control the amount of groundwater drained to
the nearest stream once the groundwater table exceeds the drain level, and
are as such linking the groundwater module and the streamflow module of the
model. This models the artificial drain systems installed under most
farmlands as well as the natural drainage processes that often occur in the
topsoil, and the parameters are therefore particularly important for the
drain flow of the river. The leakage coefficient is another coupling
parameter, which represents the hydraulic properties of the thin layer of the
sediments at the bottom of the stream. This parameter is of particular
importance with regard to river base flow. For more details of the model
parameterization, reference is made to

The algorithm used for assimilating data in this study is the ETKF

An

For each state variable, the ensemble is split into two sub-ensembles of
equal size. The sample correlation between the state variable and each
observation state variable is calculated for both sub-ensembles. These
correlation coefficients are then combined using the following expression:

Another localization weight,

Parameters are in this study estimated sequentially using the augmented state
vector approach

In order to compensate for the systematic underestimation of error variance
that is endemic to ensemble-based Kalman filtering, covariance inflation

The ensemble of parameter values is also inflated using Eq. (

Using covariance inflation is, like using localization, inconsistent with the deriviation of the filter and only necessary due to inadequate or incorrect noise description and ensemble generation. However, due to the complex nature of the model, Generating an ensemble that perfectly represents the uncertainty of the model is difficult and particularly in the test using real data outside the scope of this paper.

A simple damping mechanism is implemented in the modelling framework to
reduce the magnitude of the state- and parameter updates and thereby reduce
the shock introduced to the system in the form of instantaneous changes of
model states and parameter values at the time of updating. Furthermore,
damping has the same effect as inflation, as it helps maintain an ensemble
spread and thus combats the tendency for the ensemble to collapse. Damping of
parameter updates is common, and has been studied in

Damping is pragmatically applied post-updating as follows. For each ensemble
member, the post-damping and final state vector is calculated as

This study compares two different methods for estimating observation bias: the coloured noise Kalman filter (ColKF) and the SepKF.

The ColKF methodology for estimating bias follows that of

This study assumes no bias in discharge observations, meaning that the only biased observations are the groundwater head observations. In real-world observations, discharge observations would usually also be biased, but this bias is generally small compared to the random error of the observations and compared to biases in groundwater head observations.

The method requires an initial bias estimate based on a priori information. Furthermore, as with estimation of parameters, a spread in bias estimates needs to be generated. In this study, the initial estimate of bias in all observation points is generated by sampling from a normal distribution with a standard deviation of 0.6 m and a mean of 0. The standard deviation vas chosen based on precursive testing in the synthetic test environment, that showed that this value generally led to the best estimates of bias.

The implementation of the SepKF in this study is similar to the one derived
and presented in

The bias error covariance is estimated as being proportional to the ensemble
model observation forecast error covariance,

The Kalman gain for the bias filter is then calculated as

The bias Kalman gain is localized as follows:

Finally, the updated states are calculated using the following modification
of Eq. (

The augmented state method has the advantage that it can take any interaction between the bias and the states into account, as the full forecast covariance matrix is used. On the other hand, the SepKF filter ignores any cross-correlation between bias and states.

While ignoring the correlation between state error and bias error may be
problematic where such correlation exists, the price of using the augmented
state method is the increase in the state space that needs to be spanned by
the ensemble. To describe the uncertainty of the augmented state, an
(

Due to the differences in frequency between the two observation types, this
study uses asynchronous assimilation

Similarly, the observation vector is extended to correspond to the ensemble observations. While the asynchronous observations and model observations are saved and used in the filter at the time of updating, they are afterwards discarded and no retrospective updating of states is performed.

In this study, the state vector contains groundwater head, stream discharge and stream water level, all of which are updated at each updating time step. The states are updated every 4 weeks, when groundwater head observations are available. The daily discharge observations available in between updates are included as asynchronous observations while the discharge observations available at the time of updating are assimilated normally.

The horizontal hydraulic conductivities of meltwater sand (HK_mws) and
quaternary sand (HK_qs) are estimated, with the vertical conductivities tied
to them at a ratio of 10 : 1. Note that the estimated hydraulic
conductivities are those of the geological units, that are gridded to the
computational grid before further propagation of the ensemble (see
Sect.

In order to evaluate the performance of the data assimilation algorithm for
parameter estimation using real observations, the model is also calibrated
using AutoCal in order to be able to
compare the parameter estimation through data assimilation with parameter
estimation through more common method, such as inverse modelling. A
multi-objective calibration approach is used, in which both groundwater head
observations and stream discharge observations are aggregated and optimized.
The set-up of parameters is similar to the one used in the data assimilation
approach (see Sect.

Root-mean-square error is used as objective function of both groundwater head
observations and stream discharge observations, and the two are aggregated
using transformation to a common distance scale

Between 1970 and 1990, the Karup catchment was the subject of an extensive
monitoring campaign in which stream discharge and groundwater head were
rigorously measured. As a result, groundwater head observations are available
in 35 locations (Fig.

A twin test approach is used in the first part of this study, meaning that a
“true” model is defined, and that the observations to be assimilated are
generated from the results of this true model. The same model, but with
perturbed parameter values, denoted the base model, forms the basis of the
ensemble that is used for data assimilation. Note that both the true model
and the base model are deterministic models, that is, single, propagated
models without any noise added. The set-up is identical to that of

Four stream discharge observations that coincide with the locations of real
observations are included. The discharge observations are made available on a
daily basis, and are added to a normally distributed white noise that is
proportional to the observed value using a standard deviation of 5 % of the
observed discharge, which is a common error observed in real-world
observations of discharge

The states and parameters are updated every time groundwater head observations are available, i.e. every 28 days, and the daily discharge observations available in between updates are assimilated asynchronously. Tests have shown that the length of the assimilation window is of little importance and therefore no other assimilation window was tested.

Like in the synthetic test, the same 24 groundwater head observation
locations are chosen for assimilation, while the remaining locations are used
for validation. The real groundwater head observations are available with a
frequency of 14 days

Overview of set-ups studied in the synthetic tests.

Model noise is added to the ensemble through the forcings, i.e. precipitation and reference evapotranspiration, and the parameters. Noise on forcings is added as a Gaussian noise with a standard deviation of 20 % of the observed value, while no spatial correlation of the noise is considered.

Noise is added in the form of a Gaussian zero mean distribution to a large number of model parameters relating to all model processes and not just to the estimated parameters. In total noise is added to 66 parameters, only five of which are estimated. Adding noise to parameters that are not estimated helps maintain the spread of the ensemble even as the spread of the estimated parameters is reduced. Note that the zero mean of parameter noise means that if the filter successfully estimates all of the five included parameters, the ensemble of models is unbiased except for any bias there may have been introduced through the sampling of parameter noise and forcing noise.

For studying the performance of the data assimilation using synthetic
observations, the study includes the seven scenarios listed in
Table

When assimilating real observations, three scenarios are studied:
ColFil
and SepFil and NoBiasEst (Table

Scenarios studied in the real data tests.

The model simulation period is from 1 January 1968 to 31 December 1973,
and is divided into the following periods:

1969: warm-up, in which the ensemble is propagated without being updated in order to allow a spread in the ensemble of states to develop. At the end of the year 1969, the spread of the ensemble of groundwater head is between 2.1 and 0.7 m (depending on the location in the catchment), which is considered sufficient for assimilation to commence.

1970: preliminary assimilation of observations, which allows the filter to constrain the states and parameters. The results of this period are not included in the performance evaluation.

1971–1972: assimilation of observations for evaluation. The results of this period are included in the performance evaluation as an indicator for how well the filter performs. In the remainder of the report described as the “assimilation period”.

1973–1974: validation period, in which the ensemble is propagated but not updated. It is used to assess the improvement in long-term forecasting due to the filter update.

The performance of the filter when using synthetic observations is measured
using three indicators:

the mean estimated bias error (mean bias error), calculated as the average difference (in all observation points) between the actual bias used to generate the biased observation and the mean of the ensemble of estimated biases at the end of the assimilation period;

the average root-mean-square error of the groundwater head (head RMSE) in all calculation points of the groundwater model domain for the assimilation period;

The Nash–Sutcliffe coefficient of the stream discharge at the outlet of the catchment (“NS”) for the assimilation period.

The performance of the filter when using real observations is measured using
two indicators:

the mean RMSE of all 35 groundwater head observation points for

the assimilation period

the validation period.

The Nash–Sutcliffe coefficient for stream discharge in the outlet of
the catchment for

the assimilation period

the validation period.

Furthermore, a deterministic model with the optimal parameter set (as determined by the data assimilation algorithm) is used to evaluate the estimated parameters. This model is designated “optimal model” and is evaluated using the above indicators. For comparison, the results of the optimized model using AUTOCAL is included (hereafter designated “AutoCal model”).

The filter set-up that is considered the baseline set-up is ColFilEns50 in
which the ensemble size is 50 and the parameter updates are dampened by a
factor of 0.1, while no damping of the state updating is performed. The
baseline set-up is adopted from

The ColFilEns50 performed poorly in all three performance indicators as seen
in Fig.

Mean bias error, NS and

The poor performance of the ColFilEns50 is unexpected, as an almost identical
set-up was successfully used in

Doubling or quadrupling the ensemble size to 100 and 200 (ColFilEns100 and
ColFilEns200 scenarios, respectively) resulted in major improvements in
almost all indicators (Fig.

The temporal variation of Head RMSE in the synthetic test.

The increased performance, and the reduction in the spikes in head RMSE, supports the hypothesis that the poor performance of the ColFilEns50 set-up is caused primarily by spurious correlation.

Dampening the update of groundwater head (ColFilHdamp scenario) had a
profound effect on all the performance indicators (Fig.

Dampening reduces the instant change in groundwater head, and as such reduces
the problems that arise due to the non-linear relationship between states as
well as reducing spurious correlation. Furthermore, it reduces the numerical
effects that come from changing model states and parameters, in which the
model attempts to regain equilibrium. However, dampening the state updates
causes a slower reduction in head RMSE (Fig.

Using the SepKF (scenario SepFil) resulted in significant improvements over
the ColFilEns50 set-up in all performance indicators compared to the ColKF
set-up with the same number of ensemble members (ColFilEns50)
(Fig.

Groundwater head as a function of time in four selected observation locations for the year 1972 (synthetic test).

When excluding the discharge observations (scenario SepFilNoQ), the filter
performs worse in all three indicators. Compared to the SepFil scenario, both
the mean bias error and the head RMSE is increased by 58 %, and the NS is
reduced to

Spread of estimated parameters at the final update (synthetic test). Thin blue lines show the total spread of the ensemble and thick blue lines show the 25th and 75th percentile. Dots show the mean of the ensemble. The horizontal lines show the true parameter value (black line) and the base parameter value (magenta line).

Excluding bias estimation from the filter (NoBiasEst scenario) results, as
expected, in significant reductions in filter performance
(Fig.

Model observations versus synthetic observations in selected observation locations. The dashed line indicates the 1 : 1 line when corrected for the applied bias. Note that the plotted model observations are the forecasted model observations, i.e. before the states are updated in the filter.

It is clear that omitting bias estimation when biases are present has a
negative impact on both state updating and parameter estimation. It is
observed that updating the groundwater head to a biased observation level
causes the head to return to an unbiased level when model propagation is
resumed (i.e. it is drifting as seen in Fig.

The time-varying estimated biases using the ColKF and the SepKF for each
observation location are shown in Fig.

Estimated bias in the ColFilEns200 and SepFil scenarios as a function of time in the synthetic tests, compared to the true bias value used to generate the biased observations.

Figure

The improvements gained from using the SepKF filter rather than the ColKF stem from the reduction in uncertainty needed to be described by the ensemble, and thus a smaller ensemble size is required. Ignoring the correlation between the bias and the state reduces the complexity of the system, and if that correlation is negligible, as in this case, there is little advantage in using the ColKF over the SepKF.

The two bias correction methods were also compared in

The Nash–Sutcliffe coefficient for stream discharge and the mean RMSE of
groundwater head can be seen in Fig.

Nash–Sutcliffe coefficient for stream discharge (left panel) and mean RMSE of groundwater head observations (right panel) in the assimilation and validation periods, respectively (real data).

In the NoBiasEst scenario, the model states are forced to match the
observations as any bias is ignored, which results in a lower mean head RMSE
in both the assimilation and the validation period (Fig.

Groundwater head as a function of time in head observation location well 64 (real data).

The ColFil scenario results in higher mean head RMSE and slightly lower Nash–Sutcliffe coefficient than the SepFil, but the ColFil optimal model (i.e. the deterministic model using the parameter set estimated by the filter) performs better than the SepFil optimal model with respect to most indicators.

The ColFil scenario estimates significantly larger biases in most observation
points (Fig.

Estimated bias in the ColFilEns200 and SepFil scenarios as a function of time (real data). The black line indicates zero bias.

A bias of approximately zero is estimated in seven observation locations,
while biases of up to 1.8 m are estimated in others. In most locations,
however, the bias appears underestimated, as exemplified by
Fig.

Spread of estimated parameters at the final update (real data). Thin blue lines show the total spread of the ensemble and thick blue lines show the 25th and 75th percentile. Dots show the mean of the ensemble. The horizontal lines show the AutoCal parameter value (black line) and the base parameter value (magenta line).

Comparing the optimal models of the ColFil, the SepFil and the NoBiasEst with the base model and the AutoCal model reveals a clear difference between the assimilation period and the validation period. While the optimal models produce lower NS for the assimilation time than both the base model and the AutoCal model, there is a clear improvement in the NS in the validation period over both the AutoCalModel and the base model. This suggests that AutoCal has produced a biased parameter set, which is not the case using any of the three Kalman filters. However, the value of bias correction for parameter estimation is unclear, as there is no significant difference in the validation NS of the bias-aware Kalman filters and the bias-unaware Kalman filter.

This tendency is not present in head RMSE, where the optimal models perform more poorly in terms of head RMSE than the base model and the AutoCal model. While it is to be expected that the AutoCal model would produce lower head RMSE than both the ColKF and the SepKF since the AutoCal model has been optimized specifically based on the head RMSE, it was expected that the optimal models of the ColKF and SepKF would produce improvements over the base model. However, it should be noted that the evaluation of model performance is based on the possibly biased observed values, and that the estimated biases have not been taken into account in the head RMSE calculations. The lack of clear improvement in the optimal models may be explained by the fact that there is little room for improvement with the current model structure as underlined by the relatively small improvements between the AutoCal model and the base model. It may also in part be explained by the underestimation of the biases in both the ColFil and SepFil scenarios. Improving the model structure and the filter set-ups may improve the potential of estimating parameters, but with the current results the value of data assimilation for parameter estimation is not clear.

Observation bias is a notable challenge in integrated hydrological modelling and needs to be addressed when applying data assimilation to the models. Updating the states of a model to match strongly biased observations will decrease filter performance and may even cause numerical instability. The two methods for correcting observation bias presented in this study can help reduce the bias issue in data assimilation and improve filter performance. Both methods improved the groundwater head and stream discharge of the model, and with varying degrees of success estimated the observation bias when using synthetic observations. When using real observations, both bias estimation methods resulted in improved streamflow modelling, but little improvement was seen in groundwater heads.

The main difference in the bias correction methods analysed is the interaction between the bias and the states. While the ColKF takes advantage of the full covariance matrix, the SepKF only takes into account the interaction that is present from the state to the bias and not the other way around. While this is a limitation of the SepKF, it results in a lower requirement for ensemble members, meaning that for smaller ensembles, the SepKF outperforms the ColKF. To obtain similar results to those of the SepKF when using the ColKF, the ensemble size needed to be doubled or even quadrupled, or the updates of the states needed to be dampened in an attempt to reduce the spurious correlations.

Most of the model parameters were successfully estimated in the synthetic tests, but biased observations introduce issues with equifinality. A biased parameter set may produce unbiased model behaviour (i.e. without drifting) in one or more observations even if the estimated bias is incorrect. As a result, the filter does not update the bias of the observation, and the erroneous parameter set is not corrected. This resulted in significantly different parameter sets estimated by the different filters for both the synthetic tests and the tests using real data.

The study has shown that hydrological observational bias can be corrected in a data assimilation scheme and that it can improve state updating and parameter estimation. With both model bias and observational bias being significant sources of error in hydrological modelling that may function as a road block for the application of data assimilation to hydrological models, these results may act as a stepping stone for the advancement of hydrological data assimilation in large-scale, integrated hydrological models.