Contaminant source localization via Bayesian global optimization

. Contaminant source localization problems require efﬁcient and robust methods that can account for geological heterogeneities and accommodate relatively small data sets of noisy observations. As realism commands hi-ﬁdelity simulations, computation costs call for global optimization algorithms under parsimonious evaluation budgets. Bayesian optimization approaches are well adapted to such settings as they allow the exploration of parameter spaces in a principled way so as to iteratively locate the point(s) of global optimum while maintaining an approximation of the objective function with an instrumental quantiﬁcation of prediction uncertainty. Here, we adapt a Bayesian optimization approach to localize a contaminant source in a discretized spatial domain. We thus demonstrate the potential of such a method for hydrogeological applications and also provide test cases for the optimization community. The localization problem is illustrated for cases where the geology is assumed to be perfectly known. Two 2-D synthetic cases that display sharp hydraulic conductivity contrasts and speciﬁc connectivity patterns are investigated. These cases generate highly nonlinear objective functions that present multiple local minima


Introduction
Many hydrogeological processes are governed by nonlinear equations (e.g., unsaturated flow problems, heat and transport problems; De Marsily, 1986).This often results in highly nonlinear and nonconvex objective functions of related optimization problems.Often, however, default optimization algorithms employed in the hydrogeological community, notably concerning contaminant source localization problems, are based on local search principles (using analytical gradients or estimates thereof) (Mahar and Datta, 2000;Ayvaz, 2016).In contrast, derivative-free global optimization methods such as evolutionary algorithms, simulated annealing and others have also become commonplace in the last decades.Yet, these are typically regarded with caution as they do not systematically come with much guarantee and can potentially require large numbers of function evaluations, a situation that is to be avoided in the case Published by Copernicus Publications on behalf of the European Geosciences Union.
where forward runs are CPU-intensive.On the other hand, so-called Bayesian optimization algorithms have been gaining considerable importance in several fields lately as they enjoy a number of practical advantages while having been recently proven to possess desirable consistency properties (Vazquez and Bect, 2010;Bect et al., 2018).One of the greatest strengths of common Bayesian optimization algorithms is that they do not only guide evaluations towards the global optimum but also maintain an approximate representation of the objective function together with a quantification of prediction uncertainty.This enables space exploration with a memory so as to prevent or mitigate evaluations at redundant locations.Recent adaptations of popular Bayesian optimization approaches allow the accommodation of evaluation noise (Picheny and Ginsbourger, 2014), parallel evaluations (Marmin et al., 2015), high dimensions (Wang et al., 2018), nonstationarity (Snoek et al., 2014), gradient observations (Wu et al., 2017) and many more features (see for instance Ginsbourger, 2018, for a broader overview of sequential design algorithms for computer experiments).
Despite the fantastic rise of Bayesian optimization in machine learning and for the design of computer in various communities, its spread in the geosciences remains relatively modest so far, perhaps in part because, contrary to analytical test functions inspired by engineering problems or off-theshelf machine learning algorithms trained on openly available databases, it is often the case (e.g., in heavy-flow simulations) that geoscientific data and/or computer codes cannot be publicly shared or easily handled for one or the other reason.One way around that is to share instead a finite number of evaluation results performed on-site by the authors, so that users do not need to run new simulations when testing their algorithms.Yet, optimization algorithms typically used in hydrogeological inverse and related problems assume a continuous search space.When relying on a discretization of the input space, possible options that come to mind include (i) using discrete optimization algorithms, (ii) using continuous optimization algorithms on a re-interpolated function based on available evaluation results, and (iii) constraining continuous optimization methods by forcing novel evaluation points to remain among the considered finite set.Our approach here, guided by the willingness to share data with both geoscience and optimization communities and produce reproducible research, is to rely on a fine grid of evaluation results that allows any of the aforementioned three approaches to be used.
By remaining in a two-dimensional framework, our discretized 2601-element data set is actually fine enough to capture the complex behavior of the misfit functions considered so that optimization algorithms can be possibly compared by users on high-fidelity approximations of the objective.That being said, we insist throughout the paper on a natural adaption of a popular Bayesian optimization algorithm to the discrete case, hence simultaneously addressing points (i) and (iii) above and thus demonstrating the applicability of this family of techniques both to challenging contaminant localization problems in general and to discrete situations in particular (that could be relevant in a number of practical situations, e.g., well placement).
Coming back to more specific hydrogeological concerns underlying our application test case, contaminant characterization problems are motivated by the fact that the concept of polluter pays (OECD, 1972) holds for groundwater protection laws in many countries (USA, 1972;Swiss Confederation, 1983;European Union, 2000).A polluter can sometimes be identified by a specific chemical signature (Mansuy et al., 1997;Rachdawong and Christensen, 1997;Venkatramanan et al., 2016).However, when the signature is not unique, the ability to localize the contaminant source(s) can make defining responsibilities or reducing decontamination costs easier.Moreover, contaminant transport is dominated by the heterogeneity of the subsurface properties.In particular, it is controlled by the connectivity of geobodies (e.g., characterized by lithofacies or by a range of hydraulic conductivity values) and by the sharpness of geobody property contrasts.
Thus, solving contaminant source localization problems in complex environments characterized by strong property contrasts requires methods that are robust, time-efficient and able to handle input data uncertainty.Several approaches have been proposed in the last 3 decades (Atmadja and Bagtzoglou, 2001;Amirabdollahian and Datta, 2013).Analytical solutions (Ala and Domenico, 1992;Alapati and Kabala, 2000) are limited to homogeneous geological media.Methods able to handle heterogeneous geological media can be classified into two groups: backward-or forward-solverbased approaches.The backward solvers consist of reversing the flow problem (Skaggs and Kabala, 1995;Milnes and Perrochet, 2007;Ababou et al., 2010) and solving the advection dispersion equation backward in time to localize the source and identify the release history.In this group of methods, both the flow field and the contaminant plume are assumed to be perfectly known.Methods using forward solvers are based on an inverse problem formulation (Aral et al., 2001;Yeh et al., 2007;Mirghani et al., 2012), where the source location and release history are inferred from concentration samples.Parameter sets are proposed and used as inputs in a forward solver to simulate concentration breakthrough curves at the sample locations; when the mismatch between the simulated concentrations and the observed ones is within an acceptable level of error, the proposed model is accepted as a solution.The best solution is the one minimizing the error function.In this group of methods, less information about the contaminant plume is required and the method can be adapted to uncertain geology (Zhang et al., 2016).To the best of our knowledge, existing studies using forward-solver-based approaches were limited to homogeneous (Datta et al., 2011;Hansen and Vesselinov, 2016) or multi-Gaussian-like (Aral et al., 2001;Ayvaz, 2016) heterogeneous property fields.
Within this last set of methods, different optimization techniques can be employed.Classical nonlinear optimization techniques following a derivative-based approach (Mahar and Datta, 2000;Datta et al., 2011), such as the Levenberg-Marquardt algorithm (Hansen and Vesselinov, 2016), present the risk of being stuck in local minima.Employing a tabu search algorithm (Yeh et al., 2007) presents the same inconvenience as it iteratively explores neighbor solutions.Combining a gradient descent algorithm with a genetic algorithm (Aral et al., 2001;Ayvaz, 2016) decreases the risk of becoming stuck in local minima, but the genetic algorithm may require longer parameter exploration if the mutations are not guided by a smart rule.Simulated annealing (Amirabdollahian and Datta, 2014) allows for a broader exploration but at a very high computational cost.Bayesian optimization1 is a global approach that considerably limits the risk of being trapped in local minima and does not require the computation of derivatives of the objective function.It smartly explores the parameter space by looking at figures of merit such as the expected improvement (EI) criterion (Mockus, 1989;Jones et al., 1998;Vazquez and Bect, 2010), trading off exploitation of available results and space exploration.To the best of our knowledge, the potential of this class of methods for addressing contaminant source localization problems is still unexplored.
The objective of this paper is threefold.The first objective is to assess the performance of an inverse problem formulation to identify contaminant source characteristics on a synthetic case displaying strong hydrogeological property contrasts and complex connected structures.This is important because in spite of its advantages, inverse problem formulation to identify contaminant source characteristics has been employed only on multi-Gaussiantype heterogeneities, and the type of heterogeneities strongly influences mass transport.
The second objective is to test the efficiency and advantages of a Bayesian optimization algorithm which relies on expected improvement criteria in the formulated contaminant source identification problem.While Bayesian optimization has been applied to a variety of optimization problems, we believe that this is the first time the algorithm has been applied to the contaminant source identification problem.
And last but not least, the third objective is to provide an open-source optimization benchmark case that allows the comparison of different optimization strategies on objective functions defined over a discrete domain and inspired by real applications, which are not currently available in the optimization community.
With these objectives, we propose an original application of an EI algorithm to infer, in a deterministic inverse problem formulation, the contaminant source location in a 2-D heterogeneous aquifer that presents strong property contrasts and complex connected structures.To allow for a comparison between the optimizer exploration and an exhaustive search of the discrete parameter space, the model grid is limited to 2-D to keep computational costs reasonable for flow and transport simulations.The hydraulic conductivity field is generated with the multiple-point statistics algorithm called DeeSse (Straubhaar et al., 2016), from a training image representing the heterogeneous hydrogeological properties of a braided-river aquifer, which was generated by a pseudogenetic algorithm (see Appendix A; Pirot et al., 2015).The hydrogeological properties and flow boundary conditions are assumed to be perfectly known.The flow and transport equations are solved numerically using the Groundwater software (Cornaton, 2007).Because the measurement error and monitoring network affect the objective function, the algorithm performances are tested for different levels of measurement error and for different configurations of monitoring wells.The optimization is performed using the DiceKriging and DiceOptim R packages (Roustant et al., 2012).In addition, we provide a benchmark for optimization algorithms, which relies on an objective function generator that can be customized by choosing between two geological scenarios, two possible locations for the contaminant source and by the selection of observations among 25 monitoring wells.The performance of the EI algorithm is assessed by 100 runs from different initial designs.
The paper is organized as follows.Section 2 describes the synthetic test case and the experimental setup.Section 3 explains the objective function generator.Section 4 details the steps of the EI algorithm.The results are presented in Sect. 5 and are discussed in Sect.6. Conclusions are summed up in Sect.7. The open-source code provided online is listed in the Appendix.

Synthetic test cases
As different geological settings can lead to very different objective functions, and in order to test the robustness of the optimization method, we consider two synthetic cases corresponding to 5 m thick × 600 m long × 300 m wide braided river aquifers.Each aquifer is represented by a unique, supposedly known, 2-D facies model (Fig. 1) of 1 m by 1 m resolution to simplify the problem and to decrease the computing costs related to transport simulations.These 2-D facies models (Fig. 1), which present strong contrasts and realistic spatial structures, are generated by multiple-point statistics (MPS) simulation, using the training image described in Fig. A1.The hydrogeological properties associated to the facies are given in Table 1 and are inspired from analogs described in the literature (Jussel et al., 1994;Bayer et al., 2011).
Note that the contaminant spreading at the scale of this model is assumed to be mainly controlled by the geological heterogeneity.Since there is always some numerical dispersion when solving the advection dispersion equation numerically, we used the smallest possible value for the longitudinal and transverse dispersivities that would stabilize the numerical problem.Another method to obtain 2-D horizontal models of braided river aquifers from 3-D models would have been to vertically integrate the hydraulic conductivity field, but since this smoothes out the hydraulic conductivity, the resulting 2-D models present fewer contrasts and less realistic connected structures.
As boundary conditions for the flow and transport model, we impose a differential head of 2 m on the length of the model (between X = −20 m and X = 580 m) and no flow on the sides (Y = −150 m and Y = 150 m) parallel to the main flow direction.We assume steady-state flow conditions (Fig. 2) to run transport simulations by solving the advection dispersion equation with the finite-element code Groundwater (Cornaton, 2007).
The source of the contaminant is supposed to be unique, parameterized by the coordinates of its initial center of mass, and located within a search zone delimited by a 150 m × 150 m domain whose coordinates belong to [20, 170]× [−75, 75].To test the influence of the source location versus the geology, first on the misfit objective function and second on the ability of the proposed approach to deal with more or less complex objective functions, two reference locations (A and B) were chosen.
Source A is located at (x A s = 89, y A s = −36).Source B is located at (x B s = 100, y B s = 10).Since surface spills usually present some diffusion characteristics in their shape and can cover different geological features, the initial contaminant mass distribution at time 0 is chosen as a multi-Gaussian distribution centered on the source location with a standard deviation (σ x = 2.5 m, σ y = 1.0 m) for a total mass m = 100 kg.The reference concentration curves c obs (i, t) are obtained for i = 1, • • •, 25 groundwater monitoring wells (Fig. 3) and for times t = 1, • • •, T days.Three concentration  breakthrough curves recorded at the well number 2, 16 and 22 are given as examples at the bottom of Fig. 3. Real applications are always characterized by measurement errors.In our practical application of concentration measurements, as for chemical analysis, the errors are mainly due to data acquisition, sampling in the field, dilution procedure, etc.These errors can be assumed either with homogeneous variance or with a standard deviation proportional to the noiseless measurements, e.g., with a proportionality factor supposed to be below 10 % (Ramsey and Argyraki, 1997).We denote by c real (i, t) the actual concentration at wells i and time t (1 ≤ i ≤ 25 and 1 ≤ t ≤ T ), i.e., the one that corresponds with the observed concentration c obs (i, t) in the noiseless case.Now, for c obs , let us assume in the present noisy case that measurements are corrupted with a proportional Gaussian noise, so that observed concentrations become random with where ε(i, t) are independent and identically distributed from N (0, 1) and κ is a constant such that the level of errors does not exceed a certain proportion.The unknown location of the contaminant source is denoted as x = (x s , y s ).We define c sim (x, i, t) as the simulated concentration level obtained at (i, t) when the contaminant source is located at x.The aim is to find the value(s) for x that minimize(s) the following misfit objective function: which corresponds to the p distance between the matrices (c obs (i, t)) i∈{1,...,25},t∈{1,...,T } and (c sim (x, i, t)) i∈{1,...,25},t∈{1,...,T } , where p ≥ 1 is a parameter that can be arbitrarily chosen by the modeler (in our experiments both p = 1 and p = 2 were considered, as mentioned later).At the location of the reference source, the function reaches its minimum.In this synthetic study, we neglect conceptual or numerical errors in c sim that may result from an incomplete knowledge of the hydraulic conductivity field or boundary conditions, which would be important to consider in a real field application.
The search zone is restricted to a discrete domain Z, using a regular grid of 3 m resolution for three reasons.First, in practical applications, the location of the source is often restricted to an area thanks to historical information about industrial activities or accidents.Here, we apply the same prin-ciple but assume a simple geometry.Second, this procedure and geometry allows us to provide an exhaustive computation of the objective function for the research community.Third, it is an interesting problem because most available optimization programs work either on continuous domains or are dedicated to specific classes of optimization problems (integer programming, mixed linear integer programming), and few seem to be available for nonlinear optimization over finite sets beyond metaheuristics used in combinatorial optimization (Rios and Sahinidis, 2013).In the case of our contaminant source localization problem, by the nature of the problem, we have a continuous structure (objective function) where the domain is restricted to grid points.As an exhaustive evaluation of the objective function over Z is computationally expensive (depending on the mesh resolution), the aim of the optimization is to minimize the objective function f in the search zone within a limited number of iterations and for that purpose, we propose using an EI algorithm.

Benchmark case generator
An ensemble of time-varying concentrations at 25 observation wells is provided at a full factorial design of candidate points in the search zone Z, as well as at contaminant source location B (source location A belongs to the factorial design), for two geological geometries.Allowing any combination of observation wells among the 25, or any source location among the full factorial design, leads to 2 2 ×2602×(2 25 −1) possible test functions (i.e., more than 349 × 10 9 test cases).Moreover, any customized source of error can be added in the generation of the objective function.As these functions are known through their respective 51 2 values at the discretized source space Z, they can be re-interpolated (e.g., using splines) for continuous optimization purposes.Here we instead consider the discrete problem of selecting the optimal location among 51 2 candidates and for that goal, we will apply a straightforward discretized version of an EI algorithm as presented in the next section.The data and some R functions to generate benchmarks for any input parameters are provided on GitHub at https://github.com/gpirot/BGICLP(Pirot, 2018).A brief description of the repository is given in Appendix B of this paper.

Optimization methodology
The optimization algorithm used hereafter to minimize f (x) over the domain uses a machine learning approach relying on Gaussian process (GP) models (Rasmussen and Williams, 2006) to improve iteratively the knowledge of f (x) over the domain.It is indeed based on the iterative evaluation of f (x) at locations whose potential to improve the minimum among the evaluated objective function at previously explored locations is the greatest.The following steps give an overview of the proposed algorithm.In what follows, more details are given about the required assumptions, the way to estimate f (x) and the definition of the expected improvement criterion.
The algorithm belongs to a class of Bayesian optimization algorithms (Mockus, 1989;Shahriari et al., 2016).The Bayesian aspect refers to placing a random process prior Y on the unknown function f (possibly computationally expensive) and updating its probability distribution thanks to available evaluation results.The optimization part relies on using conditional distributions of Y to iteratively choose points with the identification of f 's global optimum and/or optimizer(s) in view.The crux is to fit adequate probabilistic models and also to design adapted acquisition functions (a.k.a.infill sampling criteria in surrogate-based optimization) in order to drive algorithms to an efficient optimization.
GPs constitute a very popular class of probabilistic models that are fully specified by a mean function m (x) and a covariance function k x, x (Rasmussen and Williams, 2006).In this work, we use ordinary kriging with a Matérn (ν = 3/2) Algorithm 1 Optimization algorithm overview; n 0 is the number of initial locations used to define the initial knowledge; N is the budget, or the number of time the objective function can be evaluated; n counts the number of times that the objective function has been evaluated.Knowledge initialization: evaluate f (x) at n 0 initial locations defined by an initial design Set n = n 0 while n ≤ N do Based on the current knowledge, compute the expected improvement criterion EI n (x) over the domain Evaluate f (x) where EI n (x) is maximum Increment the knowledge and n end while Return the location where f (x) is minimum over the evaluated locations covariance function (see Roustant et al., 2012, for details) and the covariance parameters are estimated by maximum likelihood using the DiceKriging R package.While it is also possible to use a transformation of the response in GP-based optimization (e.g., Jones et al., 1998), on the considered data it did not lead to substantial differences in optimization performance despite the non-negativity of the misfit.
Denoting training inputs and outputs as X n = (x 1 , x 2 , . .., x n ) and f n = (f (x 1 ) , f (x 2 ) , . .., f (x n )) and assuming a GP prior with a constant unknown mean (endowed with an improper uniform prior) leads to a Gaussian conditional distribution with the following marginal predictive mean and variance: where K = (k(x i , x j )) i,j =1,...,n is the n × n prior covariance matrix (assumed invertible here) of responses at training inputs, The optimization algorithm typically starts with constructing a space-filling design X n 0 = (x 1 , x 2 , . .., x n 0 ) (see, e.g., Dupuy et al., 2015) and evaluating f X n 0 to initialize the knowledge of the algorithm (e.g., n 0 = 9 blue dots in the left panel of Fig. 4a).Here the initial X n 0 is generated based on Latin hypercube sampling (McKay et al., 1979).Then, the algorithm begins its iterations.In each iteration, the ensemble of n available evaluations f n = (f (x 1 ) , f (x 2 ) , . .., f (x n )) is used to train the GP model and make predictions at yet unexplored decision space locations.The predictive distribution is then used to compute the so-called expected improvement criterion (Mockus, 1989), which indicates at every point in the decision space how much the objective function value may be decreased relative to f min = minf n , in expectation of the following: (5) The EI criterion offers a good balance between exploitation of regions with low predictive mean values and exploration of regions with high predictive variance, which provides an efficient optimization search scheme (e.g., red dot in the right panel of Fig. 4a).It turns out that EI can be calculated analytically (Mockus, 1989;Jones et al., 1998).In our discrete settings with moderate number of search points, the EI can be computed at all unevaluated locations of f (e.g., right panels of Fig. 4).The decision space location with the largest EI value is considered as the next point x n+1 (e.g., red dot on right panels of Fig. 4) to evaluate f .The optimization is run using the DiceKriging and DiceOptim R packages developed by Roustant et al. (2012).The number of iterations is fixed in advance (91 in what follows) so that it stops when the maximum number of iterations allowed is reached.Covariance parameters are updated after each iteration by maximum likelihood estimation.

Results
The results for both the noiseless and noisy cases are presented in this section.The main results are presented in Sect.5.1.They rely on using information from all wells, and on noiseless concentration observations for the four configurations engendered by two geological scenarios and two possible sources of contaminant.For completeness, the algorithm sensitivity analysis with the noise added to the objective function and with various well configurations are presented in Sect.5.2.
Note that with an initial space-filling design of n 0 = 9 elements, and a number of iterations of 91, we define here a total budget of N = 100 evaluations of the objective function.

Main results for noiseless cases
Using information from the 25 observation wells, the optimization algorithm is applied over four configurations that depend on the retained geology and on the contaminant source location as described in Table 2, where the noise level κ (of Eq. 1) is set to 0 and the parameter p of the objective function f (x) is set to 2.
Starting from a specific initial design, the explorations of the objective functions by the EI algorithm (aiming at the contaminant source localization) are displayed in Fig. 5 for each scenario.
These objective functions display multiple local minima, narrow valleys and sometimes very flat bottoms.These characteristics make the search for the global minimum challenging, especially for gradient-based techniques.The locations explored by the EI algorithm are plotted over the 3 m × 3 m discretization of the objective function f .The white and blue dots represent respectively the initial and then explored locations where the objective function is evaluated by the algorithm.In most cases, the minimum of the discretized objective function is reached in less than 50 evaluations.The geology seems to be the dominating factor for the global patterns of the objective function.Note that for scenarios 2 and 4, the contaminant source is located at (100, 10), which is not within the discretized grid of the objective function; the closest point on the discretized grid is (101, 9).For scenario 2, the minimum of the objective function is less than 3 m apart from the reference source located at (100, 10).However, for scenario 4, the reference source located at (100, 10) and the minimum of the objective function located at (80, 18) are 25 m apart.
The performance of the optimization algorithm is assessed on 100 runs of the algorithms.Each run is characterized by a specific and uniformly drawn 9-point initial design.Each run is allowed a total budget of 100 evaluations of the objective function.The performance depends on the number of iterations required to locate the minimum of the objective function min x f (x).The performance can be assessed directly by looking at the optimality gap, i.e., the distance between the location of the best estimated minimum f min of the objective function and the location of its minimum min as a function of the number of evaluations of f (Fig. 6a-d).Another possibility is to look at the normalized best minimum misfit found between the true minimum min x f (x) and the best estimated minimum of the objective function f min as a function of the number of evaluations of f (Fig. 6e-h).Both indicators behave similarly.Finally, the performance of the localization algorithms can be assessed by analyzing the distribution of the distance of the explored location that is closest to the true contaminant source over the 100 runs for a given number of iterations (Fig. 7).Independently from the considered scenario, the bin counts for lowest values significantly increase when the number of iterations increase, and the bin counts for distances over 20 m rapidly come down to 0.
Figure 4. Illustration of the first four EI algorithm iterations for scenario 1: the sub-figures in the left column illustrate the prediction mean of f over the two-dimensional decision space at each iteration; the blue dots indicate the decision space locations where f was previously evaluated; the sub-figures in the center column illustrate the prediction variance of f over the two-dimensional decision space at each iteration; the sub-figures in the right column illustrate the expected improvement map over the two-dimensional decision space at each iteration; the red dot denotes the decision space location with the maximum EI value.

Sensitivity of the algorithm performances to errors and to well configuration
In what follows, we show the results of a joint sensitivity analysis of the algorithm performance to proportional measurement errors and to the number of wells retained in the computation of the objective function.Four levels of proportional measurement errors are tested: 0 %, 10 %, 20 % and 40 %.Seven well configurations with 1, 3, 5, 10, 15, 20 or 25 wells are tested.The identification of the wells for each configuration is given in Table 3.The cross-joint sensitivity analysis is then composed of 28 scenarios.The resulting objective functions are illustrated in Fig. D1.One can note that the precision becomes finer around the true minimum of the objective function, when increasing the number of wells.However, the improvement is limited once a line of five wells, orthogonal to the main Table 3. Description of the seven well configurations.

Discussion
Through successive kriging of the misfit between simulated and observed concentrations, guided by the expected improvement criterion, the proposed optimization algorithm localizes efficiently the source of a contaminant in a 2-D geological environment representing realistic patterns and prop-erty contrasts.The algorithm requires approximately 50 evaluations of the objective function in comparison to more than 2600 for an exhaustive evaluation of the discretized search zone (∼ 1.9 %).The total number of candidate points would increase exponentially in the number of dimensions of the parameter space, eliminating exhaustive search as an option, from even moderate dimensions, when assuming a high resolution.
Comparison of the different scenarios reveals that the geology controls the main features of the objective functions, which reinforce the importance of realistic geological structures in contaminant source localization problems.Of course, the shape and location of lower value zones of the objective functions are controlled by the reference location of the contaminant source.The results presented here are based on an objective function f computed with p = 2, which corresponds to an 2 distance between reference and candidate concentration values (see Eq. 2).As the choice of p may substantially influence the flat or deep aspect of valleys (low value zones) of the objective function, we additionally tested the EI algorithm on the four scenarios for objective functions with p = 1.We found that building f onto the 2 distance leads to flatter wide valleys of low values for the objective functions, which might not favor the efficiency of the EI optimizer.However, the results and performances of the EI algorithm are very similar between the two norms tested.This is why we decided not to show the results of the algorithm objective functions built upon the 1 distance.
When proportional measurement errors do not exceed 10 %, 20 %, 30 % or 40 %, the objective function is quasi-identical and the algorithm performance is not affected.It is not surprising as the objective function is a mean of the misfit over several monitoring locations and time, which contributes to filter out the error, except for a positive bias.However, for other applications, the resulting noise in the objective function might require a more specific treatment, e.g., appealing to strategies adapted to deal with noisy function evaluations (see for instance Picheny et al., 2013, andPicheny andGinsbourger, 2014, for an overview and tutorials based on R code).Here we consider measurement errors that are proportional to the actual concentrations.However, it might take a different form.In Appendix C, we propose a more general definition of possible Gaussian measurement errors and derive the resulting objective function covariance matrix.An interesting result is that the number and configuration of wells has a strong impact on the objective function until a "full" line of wells, orthogonal to the main flow direction, is used.Increasing the number of wells on an axis orthogonal to the main flow direction greatly improves the characterization of the objective function, notably around the true contaminant source.It confirms what is often done in practice to catch contaminant plumes.Adding another line of observation wells seems less promising than densifying a column of wells.Of course, densification might be limited in practice by minimum distances between wells to avoid connecting artificially separated flowpaths for instance, but depending on the level of site characterization, a similar algorithm could then be used to optimize well configurations.
By making the source code of the objective function generator available for public use, we provide several benchmark objective functions.These are driven by real hydrogeological applications and can be used for testing and comparing optimization techniques.This benchmark will fill a gap for the community of applied mathematicians and statisticians who develop optimization algorithms and who want to test their tools on realistic objective functions.In addition, hydrogeologists will benefit from the code provided in the GitHub repository (Pirot, 2018) so that they can implement the proposed optimization algorithm in their own applications.For the test case documented here and given the structure of the objective functions that are defined on a discrete domain, it does not seem relevant to apply off-the-shelf combinatorial algorithms.However it would be certainly of interest to compare the proposed approach to evolutionary or related algorithms compatible with such settings.A pragmatic approach here, to enable comparisons with a broader class of derivative-free and also derivative-based algorithms, would be to re-interpolate the data (with a careful inspection of the optima of the interpolator, i.e., a check that it is not perturbing the problem by too many potential artifacts) and conduct a benchmark involving Bayesian optimization (with EI and potentially also other infill sampling criteria) against a selection of state-of-the-art algorithms.
Strong assumptions have been made to localize the contaminant source in the presented application.The hydrogeolological properties and the flow boundary conditions are assumed to be perfectly known and the hydrogeological model is spatially limited to two dimensions.This allowed comparison of the outcome and efficiency of the algorithm with respect to a full grid search of the objective function.Because of their expensive computing costs to assess the objective function at one location of the parameter space, three-dimensional applications will not allow for an exhaustive search of the solution; this is why they may require, in the near future, optimization algorithms such as the one proposed in this paper.Further research should also consider the uncertainty related to hydrogeological property characterization and flow and transport boundary conditions.Some steps have already been made in that direction (Koch and Nowak, 2016), but were limited to common multi-Gaussian conductivity fields.In addition, a regular grid discretization might compromise the ability to accurately locate the contaminant source in the presence of a strong flow path.For example, in a real-world application, the contaminant source has a very low probability of being located on a grid node.This problem could be avoided by using adaptive meshing, which would require more computing resources.

Conclusions
The use of 2-D hydraulic conductivity fields that present sharp contrasts and specific connectivity patterns produces complex objective functions with multiple local minima.The proposed benchmark tool produced from these complex functions offers a challenging real-world test for developers of optimization algorithms.The EI algorithm used in this 2-D study localized efficiently the contaminant source that is located on a grid node.More generally, the proposed algorithm is an interesting approach for combinatorial optimization algorithm.The objective functions and the performance of the algorithms are not affected by proportional measurement errors lower than 10 % (even 40 %).The objective function is strongly determined by the geology and by the monitoring well configuration (number and location).In particular, the characterization of the objective function, on which the performance of the algorithm rely, is greatly improved when a line of monitoring wells orthogonal to the main flow direction is densified.To improve the limitation imposed by a source centered on the nodes of a fixed mesh, which is independent of the optimization algorithm, future research could be conducted on optimization embedding adaptive meshing in flow and transport simulations; another possibility would be to relax the constraint on mass distribution of the initial plume as a way to deal with its related uncertainty.The effective performance of the algorithm on this 2-D case is an encouraging development toward 3-D applications and toward integration of geological uncertainty in contaminant source localization problems.
where ε(i, t) ∼ N (0, 1).Here for the sake of brevity we assume that the ε(i, t) are independent for different (i, t) pairs, but the following can be extended without major difficulty to the case of correlated normals with prescribed correlation matrix.
Note that from the additive formulation above, a multiplicative noise setting can be obtained by taking σ (i, t) to be proportional to c real (i, t).Imposing for instance σ (i, t) = c real (i, t), one indeed obtains c obs (i, t) = c real (i, t)(1 + ε(i, t)).
Let us now focus on the effect of noise on the objective function, and consider for simplicity the squared misfit in the case p = 2, which becomes a random function denoted henceforth by f 2 ε while f 2 stands for the deterministic squared misfit from the noiseless case.We then have (c real (i, t) + σ (i, t)ε(i, t) − c sim (x, i, t)) 2 (C3) = f 2 (x) + (c real (i, t) − c sim (x, i, t)) ε(i, t).(C4) A first important note following the expansion above is that the second term, i.e., 25 i=1 T t=1 σ (i, t) 2 ε(i, t) 2 , does not depend on x so that ignoring it would not affect the behavior of optimization algorithms unless they are sensitive to a global shift (e.g., because of tuning parameters or stopping rules that would depend on the actual values and not solely on relative ones).In our case such a shift is not detrimental, and can even mitigate the potential issue of predicting negative misfits when using GP models without response transformation.For information and, up to a multiplicative constant, the statistical distribution of this shift belongs to the generalized chi-square family (and to the usual chi-square family in the case of homogeneous σ ).On the other hand, the last term of Eq. (C2) does depend both on x and on the noise ε.Denoting η x = 2 25 i=1 T t=1 σ (i, t) (c real (i, t) − c sim (x, i, t)) ε(i, t), it is then easy to show that η defines a centered Gaussian random field indexed by x in the search domain Z, and that the covariance kernel of η boils down to the following: Cov(η x , η x ) = 4 25 i=1 T t=1 σ (i, t) 2 (c real (i, t) − c sim (x, i, t)) c real (i, t) − c sim (x , i, t) . (C5) In other words, in cases like here when c real is actually known and experiments are run for benchmarking purposes, it is possible to propagate the effect of noise corruption on the objective function without needing to appeal to the whole set of c sim values at all times and wells, but rather to a precalculable covariance matrix from which the error affecting f over the grid search can be simulated.Denoting by A x the 25 × T matrix of generic entry (2σ (i, t) (c real (i, t) − c sim (x, i, t))) and by j a vector of ones in dimension j ≥ 1, the covariance kernel of η can be written in compact form as Cov(η x , η x ) = 25 (A x • A x ) T , where • stands for the Hadamard (element-wise) product between matrices of identical dimensions.

Figure 1 .
Figure 1.Experimental setup: 600 m × 300 m 2-D facies model of the aquifer: (a) geology 1 and (b) geology 2. The black square delimits the possible locations for the search of the contaminant source.The two reference source locations are identified by black crosses.

Figure 2 .
Figure 2. Steady-state flow for (a) geology 1 and (b) geology 2. The black square delimits the possible locations for the search of the contaminant source.The two reference source locations are identified by black crosses.

Figure 3 .
Figure 3. Misfit objective function settings: (a) location of the search zone (grey area), of the two reference contaminant sources and of the 25 groundwater monitoring wells (denoted by a circle or a triangle) within the hydrogeological model boundaries.The blue dot denotes the trial location of the contaminant; (b), (c) and (d) misfit components at wells 2, 16 and 22 respectively, resulting from the comparison of the concentration breakthrough curves simulated at the trial location with the recorded ones for reference source A.

Figure 6 .
Figure 6.Performances of the EI optimization algorithm as a function of number of evaluations of the objective function for 100 different initial design: (a, b, c, d) distance of the best solution to the location of the objective function minimum; (e, f, g, h) normalized misfit; (a, e) scenario 1; (b, f) scenario 2; (c, g) scenario 3; (d, h) scenario 4.

Figure 7 .
Figure 7. Distance to the contaminant source distribution for 100 runs for the best solution given by the EI algorithm: row (a) to (d) for scenarios 1 to 4.

Table 2 .
Description of the four configurations.