We employ an approach based on the ensemble Kalman filter coupled with stochastic moment equations (MEs-EnKF) of groundwater flow to explore the dependence of conductivity estimates on the type of available information about hydraulic heads in a three-dimensional randomly heterogeneous field where convergent flow driven by a pumping well takes place. To this end, we consider three types of observation devices corresponding to (i) multi-node monitoring wells equipped with packers (Type A) and (ii) partially (Type B) and (iii) fully (Type C) screened wells. We ground
our analysis on a variety of synthetic test cases associated with various
configurations of these observation wells. Moment equations are approximated at second order (in terms of the standard deviation of the natural logarithm,

Parameter estimation for groundwater system modeling is a key and important challenge due to our incomplete knowledge of the spatial distributions of hydrogeological attributes, such as hydraulic conductivity. The ensemble Kalman filter (EnKF; Evensen, 1994) is a powerful approach to parameter estimation in subsurface flow (Hendricks Franssen and Kinzelbach, 2008; Zheng et al., 2019) and solute transport (Liu et al., 2008; Li et al., 2012; Chen et al., 2018; Xu and Gomez-Hernandez, 2018) scenarios. Estimated system parameters can include conductivity (Botto et al., 2018), permeability (Zovi et al., 2017), porosity (Li et al., 2012), specific storage (Hendricks Franssen et al., 2011), dispersivity (Liu et al., 2008), riverbed conductivity (Kurtz et al., 2014), or unsaturated flow characteristic quantities (Zha et al., 2019; Li et al., 2020).

The EnKF can assimilate data sequentially through a real-time updating process. Alternatively, all collected measurements can be assimilated simultaneously, for example, within a typical model calibration framework. With reference to the latter aspect, the EnKF becomes an ensemble smoother (ES, Van Leeuwen and Evensen, 1996), as it is associated with a smoothing probability density function (PDF) rather than a filtering PDF (Jazwinski, 1967). With reference to the ES, observations in the past and current stages are assimilated only once, thus yielding increased efficiency with respect to the EnKF (Skjervheim et al., 2011). Iterative forms of the EnKF and ES, usually denoted by IEnKF (Gu and Oliver, 2007; Sakov et al., 2012) and IES (Chen and Oliver, 2013; Emerick and Reynolds, 2013; Luo et al., 2015; Chang et al., 2017; Li et al., 2018), have been developed to improve assimilation performance in scenarios characterized by strongly nonlinear behaviors. A variety of studies investigate challenges linked to such (ensemble) data assimilation algorithms, including, e.g., the possibility of coping with non-Gaussian model parameter distributions (Zhou et al., 2011; Li et al., 2018), physical unphysical results stemming from the estimation workflow (Wen and Chen, 2006; Song et al., 2014), or spurious correlations (Panzeri et al., 2013; Bauser et al., 2018; Luo et al., 2019; Soares et al., 2019). All of these works contribute to improve the robustness of these algorithms for parameter estimation in complex environmental systems.

Recent studies include the work of Xia et al. (2018), who tackle conductivity estimation in a two-dimensional variable-density flow setting using a localized IEnKF to balance central processing unit (CPU) time and estimation accuracy. Bauser et al. (2018) develop an adaptive covariance inflation method for the EnKF to reduce the negative effect of spurious correlations and illustrate an application of the method in a soil hydrology field context. Mo et al. (2019) use a deep-learning-based model as a surrogate of a solute transport model to reduce the CPU time associated with ensemble-based data assimilation through an iterative local update ensemble smoother in a contaminant identification problem considering a synthetic two-dimensional heterogeneous conductivity field. Li et al. (2020) compare benefits and drawbacks of embedding machine-learning-based (artificial neural network, ANN) and physics-based models into an IES for a set of synthetic unsaturated flow scenarios and find that (a) the performance of an IES relying on the Richards' equation is significantly impacted by soil heterogeneity, initial, and boundary conditions, and (b) an IES based on either ANN or Richards' equation can be notably affected by the quality of the measurements.

In this broad framework, it is noted that the accuracy of parameter estimation for a given environmental system is jointly determined by the ability of the mathematical model to describe the system of interest (Sakov and Bocquet, 2018; Alfonzo and Oliver, 2020; Luo, 2019; Evensen, 2019), the ability of the assimilation algorithm used (Emerick and Reynolds, 2013; Bocquet and Sakov, 2014), as well as by the quantity and quality of available observations (Zha et al., 2019; Xia et al., 2018, and references therein).

With reference to a groundwater system, data that are commonly collected in a borehole and then employed for parameter estimation include head (water level or pressure), solute concentration, and/or in some cases fluxes. A well screen opened at multiple depths can provide information associated with preferential pathways of flow and/or solute transport. Hydraulic heads observed in such a setting can be considered to constitute an integrated type of information and to be representative of an average system state (Elci et al., 2001, 2003; Konikow et al., 2009; Zhang et al., 2019). Elci et al. (2001, 2003) conclude that the use of long-screen wells to collect measurements should be approached with caution, as these can yield misleading and ambiguous information concerning, e.g., hydraulic head, solute concentration, location of contaminant source, and plume geometry. These types of monitoring wells can be found in a variety of field settings where head and/or solute concentration data are collected (see, e.g., Elci et al. (2001, 2003), Post et al. (2007), Konikow et al. (2009), Zhang et al. (2019), and references therein). As an alternative, somehow localized information could be provided through the use of packers. Installing the latter can be costly and in some cases impractical.

Here, we aim to explore the effect that assimilating hydraulic head information collected over time within wells equipped with screens of differing lengths can have on our ability to characterize the spatial distribution of conductivity of a three-dimensional fully saturated heterogeneous aquifer. We consider multi-node wells (Konikow et al., 2009) to represent observation boreholes that can be (a) equipped with packers to mimic pointlike measurements, (b) fully screened, or (c) partially penetrating. To this end, we focus on a convergent flow scenario driven by a partially penetrating pumping well operating in a three-dimensional heterogeneous conductivity field. Hydraulic head information is collected at a network of multi-node wells to represent data associated with screened intervals of differing lengths along the vertical. We consider synthetic scenarios to provide transparent comparative analyses of the extent at which the quality of the estimated conductivity fields is influenced by the type of multi-node wells considered.

Data assimilation is performed by relying on an EnKF coupled with stochastic moment equations (MEs) of transient groundwater flow (e.g., Tartakovsky and Neuman, 1998a, b; Zhang, 2002; Ye et al., 2004). The latter are approximated at second order (in terms of the standard deviation of the natural logarithm of hydraulic conductivity) and are solved by an efficient numerical scheme proposed in this study.

While we refer to Zhang (2002) and Winter et al. (2003) for reviews of MEs in heterogeneous conductivity fields, we recall that MEs of groundwater flow have been previously incorporated into geostatistical inverse modeling approaches (e.g., Hernandez et al., 2003) or stochastic pumping test interpretation (Neuman et al., 2004, 2007) and have been considered in field settings (Riva et al., 2009; Bianchi Janetti et al., 2010; Panzeri et al., 2015). More recent developments have allowed embedding stochastic MEs of steady-state groundwater flow in model reduction strategies (Xia et al., 2020). MEs of transient groundwater flow have also been framed in the context of data assimilation or parameter estimation approaches based on the EnKF approach (Li and Tchelepi, 2006; Panzeri et al., 2013, 2014).

Panzeri et al. (2013, 2014, 2015) present an approach for data assimilation (hereafter termed the MEs-EnKF) that relies on embedding MEs of groundwater flow within an EnKF framework. They (a) demonstrate that the method does not suffer from spurious correlation, thus avoiding resorting to any localization or inflation techniques; (b) document the computational feasibility and accuracy of the approach in two-dimensional synthetic log-conductivity domains; and then (c) explore a first field application to estimate log-transmissivity through assimilation of drawdown data collected during a series of cross-hole pumping tests.

An aspect that still somehow limits the advantages of the MEs-EnKF is related to the formulation of MEs in terms of a Green's function approach (see also Ye et al., 2004). One is then required to solve the equation satisfied by a (zero-order mean) Green's function for each node of the numerical grid employed to discretize the computational domain. While one can take advantage of symmetries related to the evaluation of the Green's function, Panzeri et al. (2014) show in their illustrative examples that the CPU time required by the MEs-EnKF is equivalent to performing a classical EnKF relying on a collection of 35 000 Monte Carlo (MC) realizations. The negative impact of this computational scheme could be aggravated in three-dimensional scenarios. Here, we circumvent this issue by solving MEs for three-dimensional transient groundwater flow by relying on the (second-order accurate) approximations of MEs presented by Zhang (2002).

The remainder of the work is structured as follows. Section 2 details the main elements associated with the mathematical background of MEs and multi-node wells. Section 3 introduces the coupling between MEs and the EnKF approach. Section 4 illustrates the synthetic settings we analyze together with the criteria according to which the performance of the MEs-EnKF and the standard Monte Carlo-based EnKF (MC-EnKF) is assessed. Section 5 is devoted to the presentation and analysis of the key results. The main conclusions of this work are presented in Sect. 6.

We consider transient groundwater flow in a three-dimensional domain

The natural logarithm of hydraulic conductivity,

We start by expressing a given random quantity,

Multiplying Eqs. (1)–(4) by

The equation governing the evolution of the (second-order) head covariance
between space–time locations (

We consider three kinds of observation wells, leading to three diverse types of hydraulic head information (see Fig. 1). Type A wells are characterized by packers located at three depths, where pointwise hydraulic head observations are collected. Otherwise, Type B and/or C wells represent partially and fully penetrating wells, respectively, and provide hydraulic head values that are averaged along the corresponding screened intervals. Note that, even though there is no pumping from B- and C-wells, there are flux exchanges between these wells and the surrounding aquifer system, as opposed to the setting associated with packers (A-wells). Such flow is related to the difference between the water level within the well and hydraulic head values along the borehole.

Type of monitoring wells: pointwise (Type A), partially (Type B) and fully penetrating (Type C) observation boreholes.

Following Konikow et al. (2009), neglecting linear (due to skin effects) and
nonlinear (due to turbulent flow) well loss terms, the water level at well

Mean head at well

The covariance between water levels at wells

The cross-covariance between

The cross-covariance between

It is worthwhile to note that covariances and cross-covariances evaluated in Eqs. (13)–(15) depend explicitly on the difference between the mean water level at the monitoring well and the mean hydraulic head along the well screen.

Assuming that the evolution of head at the observation borehole

We solve Eqs. (6)–(9) numerically by approximating the spatial derivatives
through a finite element approach and the temporal derivatives through an
implicit method. As in Xia et al. (2019), moments

For the purpose of our data assimilation workflow, we start by noting that we are interested in computing

Here, we circumvent this issue and obtain high computational efficiency by
directly evaluating

It is further noted that Eqs. (6)–(9) are characterized by the same format, their discretization leading to a system of equations where the coefficients of the unknown quantities are identical, the corresponding right-hand-side terms (i.e., the forcing terms) being a function of the (ensemble) moment to be solved. In this context, one can resort to a direct solver for each time step. Thus, factorization of the matrix containing the coefficients of the system of equations is performed only once, resulting in a high computational efficiency because only the right-hand-side term needs to be updated, depending on the moment of interest.

With reference to the forcing terms

Otherwise, considering flux exchange processes when representing Type B (or C) wells entails evaluation of the source terms in Eqs. (6)–(9) as

Workflow for the numerical solution of MEs within time interval
[

We start by introducing the mean system state vector

Each data assimilation cycle, corresponding to time interval [

The equations used to evaluate the state updated vector

After the update step,

When moving to a subsequent time interval during the assimilation process,
we follow Panzeri et al. (2013) and (i) use the updated mean head vector

It should be noted that if one neglects flux exchanges between the aquifer
and Type B and/or C monitoring wells (or a Type A well is considered), moments including water level at well (i.e.,

We consider a three-dimensional domain (Fig. 3a) of size

Reference log-conductivity (

Twenty virtual monitoring wells are regularly distributed across the domain
(Fig. 3b). Type A boreholes are mimicked by considering three packers
positioned at three distinct depths, corresponding to layers no. 4, 7, and 10. Type B wells are equipped with three screens (i.e.,

Reference hydraulic head values that are collected at Type B and C wells and employed in the data assimilation procedure are evaluated upon solving the flow problem on the reference hydraulic conductivity fields described in the following. Flux exchanges between the aquifer and monitoring wells are evaluated according to the procedure described in Sect. 2.2 and 2.3 setting the convergence criterion

The effective radius of the monitoring wells is evaluated as (Chen and Zhang, 2009)

We organize our exemplary settings according to the following four groups (for a total of 26 test cases, TCs) collected in Table 1.

Summary of the test cases analyzed.

Group 1. This group includes seven TCs (TC1–TC7) that allow exploring the way conductivity estimates can be influenced by relying on the assimilation of head data collected at diverse types of virtual observation boreholes, while considering a simplified modeling approach where flux exchanges between the aquifer and Type B (or C) monitoring wells are neglected during the data assimilation procedure, head observations (considered in the data assimilation procedure) corresponding to depth-averaged values along the corresponding screens. We note that relying on this approach is tantamount to considering an imperfect flow model and would possibly oversimplify the mathematical representation of the system behavior when compared with the one employed for constructing the reference head field. Nevertheless, it has the advantage of requiring a straightforward numerical implementation.

A zero-mean reference

Group 2. This group includes six TCs (TC#2–TC#7) that are variants of those of Group 1 and consider the solution of the data assimilation procedure without neglecting flux exchanges between virtual monitoring boreholes of Type B and/or C and the aquifer, i.e., data assimilation is performed by considering perfect knowledge of the groundwater flow model, which includes all of the processes underpinning the reference head fields.

Group 3. This group includes seven TCs designed to explore (i) the impacts of the mean and variance of the

To elaborate, here we consider (i) a nearly uniform (while random) zero-mean

Group 4. This group includes six TCs where we explore the effect of inflating the measurement-error covariance matrix on the data assimilation when the latter is performed in a way similar to the corresponding TC2 and TC3 of Group 1. As such, data assimilation is based on an imperfect flow model (where flux exchanges between the aquifer and monitoring boreholes are disregarded). To cope with this, inflation on measurement-error covariance matrix is considered during data assimilation, the inflation factor being set to

Initial input quantities required to solve moment equations and spatial
fields of

We rely on the criteria illustrated in the following to appraise the quality
of the data assimilation performance. These are (i) the average absolute
difference between the estimated (or updated)

Comparison of results obtained by the MEs-EnKF and MC-EnKF (based on 100, 500, 1000, and 10 000 realizations) for TC1 and TC2#.

In this section we compare the results obtained with our MEs-EnKF approach
and a standard MC-EnKF for two selected test cases, TC1 and TC2#. Table 2
summarizes the outcomes computed via the MEs-EnKF and MC-EnKF (increasing the
number of MC simulations from 100 to 10 000) at the end of the assimilation
process in terms of

As an additional term of comparison, Fig. 4 depicts the spatial distributions of the estimated values of the mean and variance of the log-conductivity field computed with the MEs-EnKF and MC-EnKF relying on 100, 500, 1000, and 10 000 realizations at the end of the data assimilation window at layers 4, 7, and 10 (where the packers are located) in TC1. The reference

Reference

Reference

Figure 6 shows the temporal behavior of

Temporal evolution of

The highest values of

A comparison between the values of

One can note that the scenarios characterized by a dominance of Type C
boreholes (i.e., TC3 and TC6) are characterized by the lowest values of

The lowest

Temporal evolution of

Figure 7 depicts the temporal behavior of

Different from the results of Group 1, the temporal behavior of

The lowest

Figure 7d depicts relative (percentage) differences between

Temporal evolution of

Temporal evolution of

Uncertainty associated with conductivity estimates increases in TCs2#–7# as compared with their counterparts in Group 1 (see Fig. 7e, where relative differences of

The temporal evolution of

Figure 9 juxtaposes the temporal variability of

Temporal evolution of

Temporal evolution of

Figure 10 depicts the temporal evolution of

Based on these results, we conclude that the accuracy of conductivity and head estimates is generally improved when inflating the measurement-error
covariance matrix. As stated above, these results are consistent with the
observation that inflating the measurement-error covariance matrix results
in a reduced weight of the mismatch between modeled and observed values
during data assimilation. We recall that using inflation (i.e., setting

Figure 11 shows the temporal behavior of

We draw the following main conclusions based on this study:

The use of packers to collect pointwise head data (Type A wells) yields higher accuracy of conductivity estimates than can be obtained by relying on partially or fully penetrating wells. The lowest values of

Using depth-averaged head data from Type B and C wells leads to comparable results in our settings, in terms of

Neglecting flux exchanges between the aquifer and partiallyor fully screened monitoring wells in the groundwater flow model can significantly deteriorate the accuracy of conductivity estimates. Considering the application of an inflation technique to measurement-error covariance matrix can improve conductivity estimates when an imperfect flow model is applied.

The computational feasibility and accuracy of the moment equations-based ensemble Kalman filter (MEs-EnKF) are explored. The MEs-EnKF is as accurate as a typical Monte Carlo (MC)-based ensemble Kalman filter, which relies on a large number (on the order of 10 000) of MC realizations. Otherwise, the MEs-EnKF is more efficient than its MC-EnKF counterpart, the latter requiring about 20 times the central processing unit (CPU) time of the former, on the basis of our examples.

The water level at well

The FORTRAN code used for solving the moment equations of groundwater flow is available upon request.

All authors contributed to the preparation of the manuscript.

The authors declare that they have no conflict of interest.

This work was supported by the National Nature Science Foundation of China (Grant No. 42002247, 41530316) and by Nature Science Foundation of Guangdong Province, China (Grant No. 2020A1515111054). Part of the work was developed while Alberto Guadagnini was at the University of Strasbourg with funding from Region Grand-Est and Strasbourg-Eurometropole through the “Chair Gutenberg”. Xiaodong Luo acknowledges financial support from the Research Council of Norway through the Petromaks-2 project DIGIRES (RCN no. 280473) and the industrial partners AkerBP, Wintershall DEA, Vår Energi, Petrobras, Equinor, Lundin, and Neptune Energy. Chuan-An Xia was supported by International Young Researcher Development Project of Guangdong Province, China.

The main financial support comes from the National Nature Science Foundation of China, Grant No. 42002247.

This paper was edited by Brian Berkowitz and reviewed by three anonymous referees.