Regional groundwater flow strongly depends on groundwater recharge and hydraulic conductivity. Both are spatially variable fields, and their estimation is an ongoing topic in groundwater research and practice. In this study, we use the ensemble Kalman filter as an inversion method to jointly estimate spatially variable recharge and conductivity fields from head observations. The success of the approach strongly depends on the assumed prior knowledge. If the structural assumptions underlying the initial ensemble of the parameter fields are correct, both estimated fields resemble the true ones. However, erroneous prior knowledge may not be corrected by the head data. In the worst case, the estimated recharge field resembles the true conductivity field, resulting in a model that meets the observations but has very poor predictive power. The study exemplifies the importance of prior knowledge in the joint estimation of parameters from ambiguous measurements.

Regional groundwater flow depends on spatially variable properties of the
subsurface, notably the hydraulic conductivity field, and boundary conditions
such as groundwater recharge. In practical groundwater-modeling applications,
parameters of both aquifer properties and boundary conditions are estimated
from measurements of hydraulic heads at a limited number of observation
locations

In engineering practice, the model domain is typically subdivided into a
small number of zones with given geometry, and uniform values of the material
properties are assigned to each zone. Likewise, the land surface is
subdivided into zones with uniform recharge values, reflecting land use, soil
types, and local climate variability. As an alternative, parameter values may
be estimated at a limited number of points and interpolated in between

The estimation of hydraulic conductivity as a continuous field has been
intensively investigated in the past (see the reviews of

In geostatistical inversion, the parameter field to be estimated is assumed
to be an autocorrelated random space function. This prior knowledge is used
in Bayesian inference, where the statistical distribution of the parameters
is conditioned on the measurements of dependent quantities, such as hydraulic
heads. A variety of schemes target a single smooth spatial distribution
approximating the conditional mean of the parameter field using Gauss–Newton-
or conjugate-gradient-type estimation schemes

In groundwater hydrology, sequential data assimilation and Kalman filter
methods have long been used

An important step in setting up an EnKF to estimate parameters is the choice
of initial ensemble. This choice is the most straightforward way of allowing
prior information – such as ideas about correlation lengths, mean values, or
spatial pattern – to influence the filter process. From a technical point of
view, the issue of initial sampling is how to represent the prior knowledge
in an ensemble that is as small as possible, by, for example, adding ensemble
subspace restriction and requirements on the sampling

In this work we study the impact of the prior knowledge when jointly
estimating conductivity and recharge from hydraulic-head data only. We use an EnKF setup in which the
initial ensemble is drawn using different assumptions of the spatial pattern
of the parameters. Section

In regional-scale groundwater-flow problems, we typically rely on the
validity of the Dupuit assumption, stating that variations in hydraulic head
and groundwater velocity are restricted to the horizontal directions. Under
this condition, the depth-averaged, two-dimensional groundwater-flow equation
for a phreatic aquifer reads as

The term

Applying the chain rule of differentiation to the divergence in
Eq. (

The derivation given above exemplifies that the same hydraulic-head field can be obtained with different hydraulic-conductivity fields by modifying recharge and, in the case of transient flow, the specific yield. It is noteworthy that the apparent recharge depends on the gradient of the original transmissivity field. Hence, a large – positive or negative – apparent recharge is expected at locations where the transmissivity changes drastically. Though we have shown that modifications of recharge and specific yield can always replace the conductivity, the opposite case is not guaranteed, because the conductivity has clear physical limitations: notably it cannot be negative.

The fact that conductivity variation can be exchanged by recharge and specific-yield variations renders the joint estimation of hydraulic conductivity, recharge (and specific yield) an inherently ill-posed problem even when the hydraulic-head field is known at every point in the domain (and every time point).

We may illustrate the problem by the example of an unconfined aquifer at
steady state, shown in Fig.

If the inclusion is removed, and the recharge remains the same, the system
shows a perfectly homogeneous behavior (middle column of Fig.

Illustrative example of replacing a heterogeneous conductivity field (left column panels) with a homogeneous conductivity and an effective recharge (right column panels). Please note the different scale on the third recharge plot.

In classical model calibration, the ambiguity between transmissivity and
groundwater recharge may cause problems of ill posedness, but assuming
known zones with block-wise uniform parameter values restricts the
solution of the inverse problem. As example, the strong positive and negative
recharge values of the surrogate model in Fig.

In the following we briefly repeat the basic assumptions of deriving the
EnKF within a Bayesian framework. While it is
possible to have a much more pragmatic view on EnKF as an extended
least-square estimator, we believe that the transparency of the Bayesian
framework with respect to the underlying assumptions is beneficial. In
particular, the Bayesian framework explains the choice of the initial
ensemble as prior knowledge and the conceptual importance of the prior
knowledge in the estimation procedure, while a frequentist's point of view is
in contrast to making use of prior knowledge altogether. For further
transparency, we first explain the extended Kalman filter (see similar
derivations by

We denote the vector of all parameters (recharge values and log-hydraulic
conductivities of all cells)

The vector of simulated hydraulic heads

For convenience, we denote running the model and simulating the observations
(which is here just picking the heads at the observation locations) as

Since we assume multi-Gaussian distributions, finding the best conditional
estimate

By applying rules of matrix identities, it can be shown that linearization
about the prior mean

The scheme described so far is known as extended Kalman filter. It relies on
linearization about the prior mean and has the disadvantages that the full
sensitivity matrix

A popular alternative to the original Kalman filter is the
EnKF

It should be noted that the ensemble Kalman filter still relies on the same
assumptions as the original Kalman filter. Notably, the combined vector of
states, parameters, and observations is assumed to be a multi-Gaussian random
variable, which means that

Pumping rates and general model setup

An important constraint is that the scheme, like any other Bayesian method,
depends on the choice of the unconditional mean and covariance structure of
the parameters

Setup of the synthetic test case used for the parameter field estimations.

For testing the possibilities and limitations in jointly estimating
conductivity and recharge, we have set up a synthetic 2-D example of
transient flow in an unconfined aquifer. The model setup is shown in
Fig.

Parameters and properties used for the generation of the synthetic
conductivity and recharge fields

For the estimation of the recharge and conductivity fields, we apply the
ensemble Kalman filter using an ensemble of 2000 members. As this work aims
at exploring which prior knowledge is required for the estimation process,
three different cases of prior knowledge are considered. In the first, the
initial ensemble members are drawn from the same (hence correct) distribution
as the reference (true) field. The second case is identical to the first
apart from the rotation angle of the anisotropy being randomly chosen for
each ensemble member. In the third case, the rotation angle is fixed but
wrong. Here, the recharge is sampled using the rotation angle and correlation
lengths of the true conductivity field and vise versa, creating a rather
problematic initial ensemble. A plot of the three correlation structures can
be found in the bottom of Fig.

Normalized root mean square error for the prediction
period

Estimation of stand-alone recharge. Upper panels show the final ensemble
mean after all assimilation steps, and lower plots the covariance function used to generate the initial
ensemble. Please note that the random covariance functions imply drawing the
rotation angle from a uniform distribution between 0 and 2

The goodness of the resulting fields are judged in two ways. First, the
ensemble mean of the fields are visually compared to the reference fields and
subjectively judged to be similar or not. Second, the normalized root mean
square error of the simulated heads in the 45 observation wells is computed by:

The use of NRMSE gives a quantitative metric
of judging the actual performance of the estimated model. We assimilate head
observations from day 50 to day 300, while the remaining 65 days of the
1-year data are used to test the model's predictive capabilities. This
results in an assimilation error for judging how well the assimilation went
and a prediction error for judging the model's predictive powers. It should be
noted that, to properly asses the predictive power of the model in a scenario
different to the one used for the assimilation, one of the four wells shown
in Fig.

We have combined the three different prior distributions with three different
estimation problems, namely the estimation of (a) recharge alone,
(b) hydraulic conductivity alone, and (c) recharge and hydraulic conductivity
together, leading to a total of nine different scenarios. In the stand-alone
scenarios, all other parameters and settings are assumed known and are set to
their true values. As can be seen from Fig.

Normalized root mean square error for the assimilation period.

Estimation of stand-alone conductivity. Upper panels show the final ensemble mean after all assimilation steps, and lower plots the covariance function used to generate the initial ensemble. Please note that only a few illustrative examples of the random orientation angle of anisotropy are shown.

The simplest of the estimation problems presented in this study is the
stand-alone estimation of recharge, since the hydraulic heads depend linearly
on recharge. This is reflected in Fig.

It is important to keep in mind that the ensemble size is large, so the
plots of the ensemble means shown in Fig.

Joint estimation of recharge (top row panels) and conductivity (middle row panels). Shown is the final ensemble mean after all assimilation steps and the covariance functions used to generate the initial ensembles (bottom row panels).

In comparison to estimating the recharge fields, the estimation of
conductivity fields alone is more complicated. Here, the nonlinearities of
Eq. (

The prediction errors, listed in Table

As derived in Sect.

Figure

Joint estimation of recharge (top row panels) and conductivity (bottom row panels). Shown is the final ensemble variance after all assimilation steps.

As shown theoretically in Sect.

The lacking ability of the random and wrong initial ensemble estimates with
respect to predicting heads under conditions not encountered in the
calibration period is documented as OLE and TE-2 in Table

When comparing the errors in Tables

Interesting to note in Table

The values of the total normalized errors (TE-1), also listed in
Table

The issue of low errors in the assimilation period is further illustrated
with an example of two observations wells in Fig.

Two head observations plotted over time for the joint estimation of recharge and conductivity. Shown is the ensemble mean. Assimilation is performed from day 50 to day 300, while the remaining days are considered for prediction.

Like in the scenarios in which only recharge or only conductivity were
estimated, the mean joint estimate lacks the extreme values of the reference
fields. As discussed above, such behavior is expected for the smooth best
estimate even in cases where the scheme works perfectly fine. Individual
ensemble members show significantly stronger variability, as can be seen also
from the maps of the estimation variance in Fig.

In the present study we have shown that it is possible to jointly estimate reasonable fields of hydraulic conductivity (or its logarithm) and recharge as spatially fluctuating fields from pure head observations provided that the statistics of the true fields are fairly well understood. Starting with wrong assumptions about conductivity and recharge patterns can lead to aliasing, in which undetected features of hydraulic conductivity are traded for erroneous fluctuations in recharge.

In real-case applications, the prerequisite of a good prior can pose a severe problem because the true spatial patterns may be widely unknown. From a more technical point of view it may be noteworthy that a rather common way of setting up a synthetic groundwater-EnKF test is to generate a large ensemble of realizations and use one of them as the truth and the rest as the initial ensemble. By this it is guaranteed that the statistics of the initial ensemble are perfect, and, as shown here, a good result can be expected. Unfortunately, in real-world applications the geostatistics of (log-)hydraulic conductivity are typically quite uncertain, so the good performance of a scheme, involving both the measurement strategy and the inverse method, in an overly optimistic test case regarding prior knowledge may not be transferable. We thus highly recommend designing realistic test cases that include potential bias in prior knowledge.

In the present work, we only used head data for data assimilation and
parameter estimation. As shown in Sect.

In real-world applications, vague guesses of the hydraulic conductivity
distribution may exist from drilling logs, slug tests, and pumping tests

In the presented work, we consider a rather standard formulation of the
ensemble Kalman filter without iterations, smoothing, and many ad hoc
features. For the joint estimation of recharge and conductivity, an iterative
approach such as the dual-state filter

Financial support from the Deutsche Forschungsgemeinschaft (DFG) under CI 26/13-1 in the framework of the research unit FOR 2131 “Data Assimilation for Improved Characterization of Fluxes across Compartmental Interfaces” is gratefully acknowledged. Edited by: M. Giudici