Bridging the gap between GLUE and formal statistical approaches: approximate Bayesian computation

. In recent years, a strong debate has emerged in the hydrologic literature regarding how to properly treat nontra-ditional error residual distributions and quantify parameter and predictive uncertainty. Particularly, there is strong disagreement whether such uncertainty framework should have its roots within a proper statistical (Bayesian) context using Markov chain Monte Carlo (MCMC) simulation techniques, or whether such a framework should be based on a quite different philosophy and implement informal likelihood functions and simplistic search methods to summarize parameter and predictive distributions. This paper is a follow-up of our previous work published in Vrugt and Sadegh (2013) and demonstrates that approximate Bayesian computation (ABC) bridges the gap between formal and informal statistical model–data ﬁtting approaches. The ABC methodology has recently emerged in the ﬁelds of biology and population genetics and relaxes the need for an explicit likelihood function in favor of one or multiple different summary statistics that measure the distance of each model simulation to the data. This paper further studies the theoretical

Figure 1 provides an overview of possible error sources that affect our ability to correctly describe the physical system, (θ * ), of interest.Forcing data, model parameter, model state, and calibration data error are represented with a probability density function (pdf), whose statistical properties are Published by Copernicus Publications on behalf of the European Geosciences Union.typically unknown.Errors in the modeled (5) output, y t (t>0) , and (6) state, x t (t>0) , dynamics originate from a wide variety of different error sources, including (1) inadequate and/or incomplete knowledge of the model parameters, θ * ; (2) errors in the input (forcing) data, u, and (3) initial states; x 0 (4) structural inadequacies in the model equations and/or improper dimensionality of the state space; and (7) errors in the calibration data, ỹt(t>0) .The mathematical operator (also called "likelihood function") is used to judge the distance between the model predictions and corresponding calibration data.This function should explicitly recognize the contribution and role of each individual error source in determining the error residual, but is very difficult to specify correctly with very weak prior information, and hence the pitchfork symbol is used.
Within the context of hydrologic modeling, measured rainfall depths and estimates of (potential) evapotranspiration typically constitute the main forcing data.These two input variables strongly determine the simulated streamflow at interior points and the catchment outlet, surface runoff, soil moisture fluxes and storage of water in the catchment.Examples of model states are soil moisture, groundwater table depth, and hydraulic heads (amongst others).Their knowledge is beneficial to adequately represent the storage of water in the variably saturated zone and groundwater, and hence ensure an adequate model calibration.Finally, calibration data often involves time series of (spatiallydistributed) streamflow observations or time-lapse measurements of tracer concentrations.Inevitably, each of these data sources is subject to uncertainty, which severely complicates parameter estimation and quantification of model structural errors.
During the past 4 decades much research has been devoted to the development of computer based methods for fitting hydrologic models to calibration data (e.g., streamflow, water chemistry, groundwater table depth, soil moisture, snow water equivalent).That research has primarily focused on six different issues: (1) the development of specialized objective functions that appropriately represent and summarize the errors between model predictions and observations, (2) the search for efficient optimization algorithms that can reliably solve the hydrologic model calibration problem, (3) the determination of the appropriate quantity and most informative kind of data, (4) the selection of an appropriate numerical solver for the partially structured differential and algebraic equation systems of hydrologic models, (5) the representation of uncertainty, and (6) the development of methods for inferring and refining the mathematical structure and process equations of hydrologic models.
Research into error residual distributions had led to the development of a suite of different (hierarchical) likelihood functions for measuring the closeness between the model simulations (predictions) and the corresponding data (Ibbitt and O'Donnell, 1974;Sorooshian and Dracup, 1980;Kuczera, 1983a;Bates and Campbell, 2001;Kavetski et al., 2006a;Marshall et al., 2007;Schoups and Vrugt, 2010a;Smith et al., 2010).Recent work by Schoups and Vrugt (2010a) has resulted in a generalized likelihood function that encapsulates many of the existing likelihood functions in the hydrologic literature, but with additional flexibility to simultaneously account for correlated, heteroscedastic, and nontraditional error residual distributions.
Research into optimization methods has led to the development of a wide variety of different search methods.Whereas initial approaches utilized local search principles that seek iterative improvement of the objective function from a single starting point in the parameter space (Ibbitt, 1972;Johnston and Pilgrim, 1976;Sorooshian and Dracup, 1980;Restrepo, 1982;Kuczera, 1983a, b;Gupta and Sorooshian, 1983;Sorooshian et al., 1983b;Troutman, 1985a, b), problems with parameter insensitivity, curved ridges, local minima, and multiple different regions of attraction has stimulated the development of population based search algorithms that use multiple different points concurrently to locate the global optimum (Wang, 1991;Duan et al., 1992;Yapo et al., 1998;Seibert, 2000;Khu and Madsen, 2005;Chu et al., 2010).In this regard, the shuffled complex evolution global optimization algorithm of Duan et al. (1992) has shown to be effective and efficient in calibrating conceptual watershed models.Recent developments include simple randomized adaptation (Mazi et al., 2004;Tolson and Shoemaker, 2007), multimethod ensemble (Vrugt and Robinson, 2007;Vrugt et al., 2009b), and filtering based (Pauwels, 2008) parameter estimation methods that further improve search efficiency and reliability.
Research into the information content of data has led to the understanding that it is not the length of the data that matters  (Kuczera, 1982;Sorooshian et al., 1983a;Gupta and Sorooshian, 1985;Yapo et al., 1996).Wet and dry periods are both required to make sure that all the different components of the watershed model are excited and the different parameters can be estimated from the calibration data.Post-audit simulations presented in Vrugt et al. (2002) using a Bayesian analysis, adaptive random walk Metropolis resampling, and value of information (VOI) framework has demonstrated that only a few (daily) streamflow data measurements are necessary to reliably calibrate a conceptual hydrologic model.The remaining data contain redundant information and could be used to evaluate the reliability of the actual model structure.
Research into numerical solvers has demonstrated that explicit (Euler based) time-stepping schemes introduce considerable streamflow simulation errors and spurious local minima, "pits", and irregularities in the objective function space (Kavetski et al., 2003(Kavetski et al., , 2006c;;Kavetski and Clark, 2010;Schoups et al., 2010b).These findings provide a deeper understanding of the convergence problems of local search methods and demonstrate a need for implicit solvers that iteratively adjust the integration time step based on the state dynamics.
Finally, research into structural adequacy has resulted in data-based mechanistic (Young, 2002(Young, , 2012)), data assimilation (Vrugt et al., 2005;Smith et al., 2008;Bulygina and Gupta, 2011), and other stochastic techniques (Reichert and Mieleitner, 2009) for inference and iterative refinement of the mathematical structure of conceptual hydrologic models.This has led to the understanding that discharge data contain sufficient information to warrant the identification of a suitable model structure that mimics as closely and consistently as possible the observed watershed behavior at the temporal and spatial scale of measurement.
Most of these developments assume input data and model structural errors to be "negligibly small" or to be somehow "absorbed" into the output error residuals.The residuals are then expected to behave statistically similar to the calibration data measurement error.These assumptions are statistically convenient but typically not borne out of the actual probabilistic properties of the residual errors that may show changing bias, variance (heteroscedasticity), skewness, and correlation structures under different hydrologic conditions (and for different parameter sets).This is in part due to the presence of model structural and forcing (input) data errors whose contribution may, in general, be substantially larger than the (calibration) data measurement error.These errors do not necessarily have any inherent probabilistic properties that can be exploited in the construction of an explicit likelihood function.For linear systems it is known that ignoring such errors will lead to bias in the estimates of parameter values.The strong and generally difficult to justify assumptions about the nature of the errors have led Beven and coworkers to advocate informal statistical approaches using the generalized likelihood uncertainty estimation (GLUE) methodology (Beven and Binley, 1992;Beven, 1993Beven, , 2006Beven, , 2009;;Beven and Freer, 2001;Beven et al., 2008).
The origins of the GLUE method lie in trying to deal with uncertainty estimation problems for which simple explicit (theoretical) likelihood assumptions do not seem appropriate.The GLUE methodology rejects the traditional statistical basis for the likelihood function in favor of finding a set of representations (model inputs, model structures, model parameter sets, model errors) that are behavioral in the sense of being acceptably consistent with the (non-error-free) observations.An informal likelihood measure is used to avoid over conditioning and exclude parts of the model (parameter) space that might provide acceptable fits to the data and be useful in prediction.Since its introduction in 1992, GLUE has found widespread application for uncertainty assessment in many fields of study, including modeling of the rainfallrunoff transformation (Beven and Binley, 1992;Freer et al., 1996;Lamb et al., 1998), soil erosion (Brazier et al., 2001), tracer dispersion in a river reach (Hankin et al., 2001), groundwater and well capture zone delineation (Feyen et al., 2001;Jensen, 2003), unsaturated zone (Mertens et al., 2004), flood inundation (Romanowicz et al., 1996;Aronica et al., 2002), land-surface-atmosphere interactions (Franks et al., 1997), soil freezing and thawing (Hanson and Lundin, 2006), crop yields and soil organic carbon (Wang et al., 2005), and ground radar-rainfall estimation (Tadesse and Anagnostou, 2005).Applications of GLUE are also found in distributed hydrologic modeling (McMichael et al., 2006;Muleta and Nicklow, 2005).
In recent years, a strong debate has emerged in the hydrologic community between proponents that adhere strongly to the underlying philosophy of GLUE and believe that the method is a useful working methodology for assessing parameter and predictive uncertainty in nonideal cases, and researchers and practitioners that strongly oppose incorrect usage of statistics in favor of coherent probabilistic approaches (Gupta et al., 1998;Beven and Young, 2003;Gupta et al., 2003;Christensen, 2004;Montanari, 2005;Mantovan and Todini, 2006;Stedinger et al., 2008;Beven et al., 2008;Beven, 2009;Vrugt et al., 2008b, c).This paper is a followup of our earlier work (Vrugt and Sadegh, 2013) and demonstrates the similarity of likelihood-free inference used in population and evolutionary genetics (Pritchard et al., 1999;Beaumont et al., 2002) and informal statistical approaches such as GLUE.Likelihood-free inference was introduced in the statistical literature about three decades ago (Diggle and Gratton, 1984) for cases where the likelihood is intractable, too expensive to be evaluated, or an explicit formulation is not available.This method is also referred to as approximate Bayesian computation (ABC) (Marjoram et al., 2003;Sisson et al., 2007;Del Moral et al., 2008;Joyce and Marjoram, 2008;Grelaud et et al., 2009;Ratmann et al., 2009) and widens the class of models for which statistical inference can be performed.This paper follows a different line of reasoning and approach than Nott et al. (2012) who demonstrated that GLUE corresponds to a certain approximate Bayesian procedure even when the "generalized likelihood" is not a true likelihood.
The remainder of this paper is organized as follows.Section 2 briefly summarizes the Bayesian approach to model parameter and predictive uncertainty estimation, with particular emphasis on the choice of the likelihood function used to summarize the probabilistic properties of the error residuals.In Sect. 3 we subsequently introduce likelihood-free computation and demonstrate the main elements of the ABC procedure by application to a simple Nash cascade series of three linear reservoirs.This is followed by Sect. 4 in which the conceptual and statistical equivalence of ABC and the limits of acceptability approach of GLUE is demonstrated.Section 5 then proceeds with a comparison between GLUE and ABC using the Sacramento soil moisture accounting (SAC-SMA) model (Burnash et al., 1973) and hmodel (Schoups and Vrugt, 2010a), and discharge data form two contrasting watersheds in the US.Finally, in Sect.6 we summarize our results and discuss the main findings.

Bayesian Inference
Bayes theorem is a simple rule for how to update the prior probability of a certain hypothesis when new, relevant information (data) becomes available.The term "Bayesian" refers to the 18th century mathematician and minister, Reverend Thomas Bayes (1701-1761), who studied how to compute a distribution for the probability parameter of a binomial distribution.If we conveniently assume that the model parameters are the only source of uncertainty, we can write Bayes' rule as follows: where p(θ) (p(θ| Ỹ)) signifies the prior (posterior) parameter distribution, L(θ| Ỹ) ≡ p( Ỹ|θ) denotes the likelihood function, and p( Ỹ) represents the evidence (or normalization constant).
A key task is now to summarize the posterior distribution, p(θ| Ỹ), for example, by the mean, the covariance or percentiles of individual model parameters.When this task cannot be carried out by analytical means or analytical approximation, iterative methods are needed to generate a sample from the posterior distribution.The desired summary is then obtained from this sample.Knowledge of the normalization constant, p( Ỹ), is not required for sampling of the parameters as all our statistical inferences can be made from the unnormalized density.Explicit knowledge of p( Ỹ) is however desired for Bayesian model selection and averaging.
Epistemic uncertainty (model inadequacy due to a lack of knowledge) and forcing data errors can, in principle, be summarized using hyper-parameters (latent variables) and their distribution derived from Bayesian inference.Yet, practical experience suggests that it is very difficult, if not impossible, to disentangle the effects of individual error sources particularly if a single data type, Ỹ, constitutes the only basis for model evaluation.In our previous paper (Vrugt and Sadegh, 2013) we have demonstrated the ability of likelihood-free inference methods to significantly enhance the chances of detecting model structural deficiencies.This approach marks progress towards improving our perceptualconceptual-theoretical view(s) of the world, expressed as model structural hypotheses (assumptions and conjectures).
The scope of the present paper is fundamentally different than that of our previous work published in Vrugt and Sadegh (2013).We show herein that ABC has many elements in common with the limits of acceptability approach of GLUE, but benefits from a much better statistical underpinning.To be consistent with GLUE, we map the input-output uncertainty on the model parameters.Under ideal conditions with an adequate model and perfect forcing data, the error residuals follow a zero-mean Gaussian distribution: σ Ỹ can be specified a priori based on knowledge of the measurements errors, or alternatively its value can be inferred simultaneously with the values of θ (Vrugt et al., 2008b;Bikowski et al., 2012;Laloy and Vrugt, 2012).It is worth noting that the data often come from only a single experiment.So while it is possible to quantify numerical errors, such as those due to discretization (see Kaipio et al., 2004;Nissinen et al., 2009), there is no opportunity to control the boundary conditions of (large-scale) natural systems to obtain data from additional experiments in which some controllable inputs have been varied.The likelihood function, L(•), in Eq. ( 4) is useful for simple regression problems, but the assumption of independent identically distributed Gaussian error residuals cannot be justified in environmental modeling.The presence of input data and model structural errors introduces complex error residual distributions whose probabilistic properties are difficult to describe accurately with classical likelihood functions.The choice of an adequate likelihood function, L(θ| Ỹ), has therefore been the subject of considerable debate in the hydrologic and statistical literature.In response to this, Schoups and Vrugt (2010a) have introduced a generalized likelihood function that better extends the applicability of commonly likelihood functions to situations where residual errors are correlated, heteroscedastic, and non-Gaussian with varying degrees of kurtosis and skewness.Application to daily rainfall-runoff data from a dry and humid basin showed that (1) residual errors are much better described by a heteroscedastic, first-order, auto-correlated error model with a Laplacian distribution function characterized by heavier tails than a Gaussian distribution; and (2) compared to a standard least-squares approach, proper representation of the statistical distribution of residual errors yields tighter predictive uncertainty bands and different parameter uncertainty estimates that are less sensitive to the particular time period used for inference, (3) multiplicative bias factors improve the prediction of peak flow, and (4) near zero-flows are better described with a skewed error distribution.
The generalized likelihood function improves the statistical description of the error residuals, yet it does not separate out the effect of individual error sources.Another, from the viewpoint of this paper, less important deficiency is that the use of a single performance metric, L, no matter how carefully chosen, is inadequate to extract all information from the available calibration data.The use of such "insufficient statistic" promotes equifinality, making it unnecessarily difficult to find the preferred parameter values.This is not desirable and explains why calibration of highly-parameterized models is often found to be very time consuming and difficult.

Approximate Bayesian computation
Whereas traditional Bayesian approaches require us to specify an explicit likelihood function, L(θ| Ỹ), ABC approaches avoid explicit evaluation of the likelihood function in favor of (a set of) summary variables that better extract the information from the available data.The premise behind ABC is that θ should be a sample from the posterior distribution as long as the distance between the observed and simulated data, hereafter referred to as ρ Ỹ, Y(θ ) , is less than some small value, .For sufficiently complex models and large data sets the probability of happening upon a simulation run that yields precisely the same data set as the one observed will be very small, often unacceptably so.So rather than considering the data, Ỹ, itself we consider a summary statistic of the data, S( Ỹ), and use a distance function (Marjoram et al., 2003;Sisson et al., 2007) ρ S( Ỹ), S(Y(θ )) ≤ (5) to decide whether to accept the parameter values, θ , or not.
A pseudo-code of the generic ABC approach is given below.
Algorithm 1 Rejection sampler (ABC-REJ) In words, the ABC algorithm proceeds as follows.First we sample a candidate point, θ , from some prior distribution, p(θ).We then use this proposal to simulate the output of the model, Y ← H(θ |•), and use this n-vector to calculate one or multiple summary metrics.A distance function, ρ S( Ỹ), S(Y(θ )) , is then used to decide whether to accept θ or not.If this distance function is smaller than some predefined tolerance value, , then the simulation is close enough to the observations that the candidate point, θ , has some nonzero probability of being in the approximate posterior distribution, p θ|ρ S( Ỹ), S(Y) ≤ .This algorithm converges to the true posterior p(θ| Ỹ) when → 0, provided that the summary statistic(s), S(•), is (are) near sufficient (Pritchard et al., 1999;Beaumont et al., 2002;Ratmann et al., 2009;Turner and van Zandt, 2012).
The choice of summary metrics is obviously an important consideration in the application of ABC.These criteria (signatures) should be chosen so that the loss of information from the original data is minimized.Information theory can help to determine such (set of) sufficient statistic(s) (Barnes et al., 2011), but this is beyond the scope of the present study (as will soon become evident) and will therefore be a main focus in future publications.
Another issue that deserves careful attention is that ABC can only be used with a stochastic model operator.Otherwise, the posterior parameter distribution will continue to  shrink in the limit of going to zero and eventually converge to a Dirac delta function (single point) if the model is sufficiently adequate.In theory, we should therefore corrupt the output of the deterministic model, Y ← H(θ|•), with a random (measurement) error, but this is deemed unnecessary within the present context.We will revisit this important issue in the penultimate paragraph of this paper.The ABC findings presented in Tables 3-6 and Figs.3-14 thus pertain to deterministic model output only.
To illustrate the ABC methodology, we consider a Nash cascade instantaneous unit hydrograph.This model routes inflow (rainfall) through a series of linear reservoirs that all have the same recession coefficient.Mathematically, this cascade of m linear reservoirs with recession coefficient r can be written as follows (Nash, 1960): where t (days) denotes time, (•) signifies the gamma function, and h t (•) is the modeled response at time t.A 365-day period with synthetic daily streamflow data (in m 3 s −1 ) was generated by driving the Nash cascade model of Eq. ( 6) with an artificial precipitation record.We assume m = 3 reservoirs and a recession constant of r = 2 days.This artificial data set is subsequently perturbed with a heteroscedastic error (nonconstant variance) with standard deviation equal to 20 % of the original simulated discharge values.Figure 2 plots the original simulated discharge time series (blue line) and the corrupted observations (red circles) used in the ABC analysis to derive the posterior distribution of the recession constant.
We are now left with a selection of the summary statistic, S(•) to decide whether a candidate point (model simulation) is behavioral or not.For illustrative purposes we start with the mean of the actual data, to estimate the posterior distribution of the recession constraint.This metric is rather weak and cannot be considered sufficient, but for now it helps to illustrate the main elements of the ABC methodology.A uniform prior with r ∈ [0, 4] was used in all our calculations.To increase computational efficiency, we used an improved variant of the ABC population Monte Carlo (PMC) scheme of Turner and van Zandt (2012), the details of which appear in Appendix A. In short, the PMC sampler starts out as ABC-REJ during the first iteration, j = 1, but using a much larger initial value for .During each successive next step, j = {2, . . ., J }, the value of is decreased and the proposal distribution, q j (θ Through a sequence of successive (multi)normal proposal distributions, the prior sample is thus iteratively refined until a sample of the posterior distribution is obtained.This approach, similar in spirit to the adaptive Metropolis sampler of Haario et al. (1999Haario et al. ( , 2001)), receives a much higher sampling efficiency than ABC-REJ, particularly for cases where the prior sampling distribution p(θ) is a poor approximation of the actual posterior distribution.
The PMC sampler of Turner and van Zandt (2012) assumes that the sequence of values is specified by the user.This does not necessarily lead to the most efficient search.Our sampler therefore adaptively determines the next value of j ; j > 1 from the cumulative distribution function of the ρ(•) values of the N most recent accepted samples.Details of this procedure are given in Appendix B. For the present case study, an initial value of = 1 is used, and this value is adaptively decreased until a value of = 0.05 is reached.Lower values of provide similar posterior estimates, yet unnecessarily increase the computational burden of the ABC analysis (Vrugt and Sadegh, 2013).
Figure 3a presents a histogram of the posterior marginal distribution of r derived from the ABC-PMC analysis using the mean observed flow as summary statistic.The red square denotes the true parameter value used to create the synthetic data.For completeness we also present in the middle panel the results of DREAM (Vrugt et al., 2008a(Vrugt et al., , 2009a) )  likelihood function of Eq. ( 4) with a heteroscedastic measurement error, σ Ỹ = 0.2 Ỹ.
Perhaps not surprisingly, the ABC-derived posterior distribution is poorly defined by calibration against the mean observed discharge value.The behavioral recession constants extend a larger portion of its prior distribution.This suggests that the observed (synthetic) discharge data do not contain information about the recession constant of the three reservoirs.This finding is perhaps not surprising.Many different values of the recession constant, r, can be found with mean simulated discharge value similar to that of the observed data but with poor accuracy of the simulated streamflow dynamics.Indeed, if a classical likelihood function is used (Fig. 3b), the recession constant is much better defined with maximum a posteriori density equal to r = 2 and 95 % posterior parameter uncertainty ranges that vary between 1.95 and 2.05.
Fortunately, nothing prevents us from using more than one summary statistic in the ABC analysis to measure different and complementary parts of model behavior.To be meaningful in practice, such statistics should preferably measure hydrologically relevant signatures of watershed behavior.Such an approach was introduced in our previous work (Vrugt and Sadegh, 2013), and for simplicity we now augment the first metric (mean of the data) with another simple statistic (standard deviation of data) to decide whether the model simulation can be considered behavioral or not.The results of this analysis are shown in Fig. 3c using a minimum value of = 0.05.The recession constant appears much better defined, but the width of the (marginal) posterior distribution is still considerably larger than what can be expected from a classical likelihood function using MCMC simulation with DREAM (Fig. 3b).This simply conveys that our two summary metrics are jointly insufficient and that, if so desired, more powerful metrics should be used.
The ABC methodology allows the use of a wide arsenal of summary metrics and distance functions to judge the distance between the model simulation and observations.Common examples in genetics include the Canberra, Euclidean, and Manhattan distance.Those are readily applied in hydrology as well, including summary statistics such as the Nash-Sutcliffe efficiency (NSE) (Nash and Sutcliffe, 1970), mean square error (MSE), and others listed in Table 1 of Gupta et al. (1998).Temporal disaggregation of the data and model simulations would preserve the statistical moments of Ỹ such as the mean, median, standard deviation, kurtosis, and skewness.The use of flow duration curves could be beneficial in this regard as characteristic of the watersheds response to rainfall (Vrugt and Sadegh, 2013).

Statistical equivalence of ABC and GLUE
Now that the basic principles of ABC have been discussed in some detail using the simple one-parameter unit hydrograph toy problem, it is not difficult to see the many similarities of GLUE and ABC: 1.The distance function specified in Eq. ( 5) has many elements in common with the informal threshold used in the classical GLUE approach to differentiate between behavioral and nonbehavioral samples.This is perhaps more obvious if we use the following notation where I(A) is a simple indicator function that is "1" if A is true and "0" otherwise, and δ t ; t = {1, . . ., n} constitutes the effective observation error that takes into account multiple sources of error (Beven, 2006).This value is defined a priori by the user.The ABC approach can thus be made mathematically equivalent to the limits of acceptability approach of GLUE if each observation is used as summary statistic.
2. The ABC-REJ sampler is similar to the Latin hypercube sampling strategy used in GLUE to find behavioral solutions.Both methods use a fixed proposal distribution to sample from the prior parameter distribution.If the corresponding simulation falls within the bounds specified by the effective observation error, then the parameter combination will be classified as behavioral; otherwise, the proposal will be rejected.Sampling continues until N behavioral solutions are found.
Following the first proposition, a solution is deemed behavioral with ABC if its simulated discharge time series falls within the interval [ ỹt − δ t , ỹt + δ t ] for t = {1, . . ., n}.This is similar to the limits of acceptability approach of GLUE if a simple discrete (0/1) membership function is used.For the synthetic toy example used herein, we define the effective observation error to be δ = α × σ ỹ with α = 2.This is equivalent to δ t = 0.4 ỹt .The goal of the ABC analysis now becomes finding all those parameter combinations that consistently fall within the effective observation error of the discharge data and hence receive a perfect score of Eq. ( 9) equal to n.This constitutes a maximization problem, differing from a typical implementation of ABC where the distance to the summary statistics and value of is being minimized.In our numerical implementation with the PMC sampler, we therefore adapt Eq. ( 9) and calculate to decide whether a simulation is behavioral or not.For a perfect simulation, ρ • will be zero.However, in most practical applications it is not possible to find a simulation that satisfies = 0.For instance, for the present Nash cascade toy example with α = 2 and thus δ t = 0.4 ỹt , t = {1, . . ., n}, a minimum value of ρ • of about 0.05 is to be expected.This follows directly from statistical theory (about 95 % of the observations are included in the interval of 2 times the standard deviation).
The adaptive updating strategy of in PMC not only guarantees a more efficient search strategy than ABC-REJ (GLUE), but also automatically determines the maximum attainable coverage of the discharge observations within the limits of acceptability.In the first iteration, we set = 0.75 and thus define a behavioral solution as one that contains at least 25 % of the observed discharge data within the interval, [y t − δ t , y t + δ t ] (t={1,...,n}) .During each successive next iteration, the value of is sequentially reduced and the PMC sampler terminates when the difference between two subsequent values is less than 0.02, or in mathematical notation j − j −1 < 0.02.In all our simulations presented herein we request PMC to create N = 1000 behavioral solutions at each different value (iteration).We report our results for ≤ 0.10.Figure 4A presents the results of the ABC-PMC analysis and plots the 95 % streamflow simulation uncertainty ranges (dark grey region) using the ABC-PMC sampler.This result is derived from the N = 1000 posterior solutions using the 2.5 and 97.5 percentile of the simulated discharge values.The artificial discharge observations are indicated with red circles.The simulations nicely track the observed data with uncertainty intervals that appear relatively narrow and encompass about 90 % of the data.The upper panel plots the results for GLUE using the same limits of acceptability.Latin hypercube sampling was used to create 1000 behavioral solutions at = 0.10 using an algorithm virtually identical to that of ABC-REJ.Perhaps not surprising, the results are identical to those presented for ABC.Although the numerical results are identical, the computational efficiency of both methods is not.The ABC-PMC sampler exhibits an acceptance rate of about 53.5 % whereas for GLUE (and hence ABC-REJ) this is about 17.0 %.

Case studies: hydrologic modeling
Now that the ABC method has been discussed in some detail and the theoretical connection of this approach with GLUE has been demonstrated, we proceed with numerical simulation using five years of daily streamflow data from the French Broad river basin at Asheville, North Carolina (1 Januar 1962 to 30 December 1966) and the Leaf River (1 October 1952 to 30 September 1957) north of Collins, Mississippi.These watersheds have been studied extensively in the literature and details of the data can be found in related publications.Two lumped conceptual hydrologic models are used to describe the rainfall-runoff transformation.This includes the 7-parameter hmodel described in detail in Schoups and Vrugt (2010a) and the 13-parameter SAC-SMA model (Burnash et al., 1973).Inputs to these models include mean areal precipitation (MAP) and potential evapotranspiration (PET) while the outputs are estimated evapotranspiration and channel inflow.Numerical, conceptual, and computational details of both models can be found in the cited publications, and so will not be repeated herein.Tables 1 and 2 summarize the parameters of both models and their upper and lower bound values.
Implementation of the limits of acceptability approach requires knowledge of the effective observation error.This error varies dynamically with flow level and constitutes the combined effect of input data, model structural and calibration data measurement error.In practice, the user defines the limits of acceptability for each individual observation, but for convenience we follow a different approach and set the effective observation error as a multiple of the actual discharge measurement error.We follow Vrugt et al. (2005) and use consecutive differences of the calibration observations to calculate the measurement data error (e.g., 7 in Fig. 1): where l denotes the difference operator applied l subsequent times (Rice, 1984;Hall et al., 1990;Seifert et al., 1993;Dette et al., 1998) discharge data.Heteroscedasticity is easily identified by applying the nonparametric estimator locally in the calibration data time series.This provides an n-vector of measurement errors, hereafter referred to as σ Ỹ = { σ ỹ1 , . . ., σ ỹn }.The limits of acceptability in Eq. ( 9) are now defined to be δ t = α× σ ỹt for t = {1, . . ., n} using α = 2.We now summarize the results of GLUE and ABC for both models and watersheds.
Figure 5 plots histograms of the behavioral solutions of an illustrative set of five SAC-SMA model parameters for the French Broad watershed.The PMC sampler terminated its search with ≤ 0.06, corresponding to a coverage of 94% of the discharge data within the effective observation error.The top panel presents the results for GLUE (limits of accept- ability) and the bottom panel shows the corresponding counterparts for ABC.To limit the computational burden, GLUE was terminated after 300 behavioral solutions were found.This is sufficient for comparative purposes.The marginal distribution of the lower zone primary free water depletion rate (LZPK) follows a normal distribution, whereas the histograms of the other parameters deviate considerably from normality and tend to assign the highest probability mass at the lower (PCTIM, ADIMP and LZFSM) or upper bound (LZFPM).The posterior parameter uncertainty ranges appear rather large and essentially cover the entire prior distribution defined previously in   unrealistically large and much larger than what can be expected from an explicit likelihood function, but not surprising given the size of the effective observation error used to define the limits of acceptability.What is most important, however, is the finding that the GLUE and ABC derived posterior parameter distributions are essentially similar.This provides further support for our claim that the limits of acceptability approach of GLUE can be interpreted as a special case of formal Bayes.We will further elaborate on this equivalence in the final section of this paper.Now that the posterior parameter uncertainty has been defined, we focus our attention on the actual discharge simulations.Figure 6 presents the outcome of this analysis and presents the 95 % streamflow uncertainty ranges (gray region) of the GLUE (top panel) and ABC (bottom panel) derived posterior parameter distribution.The simulation uncertainty ranges appear rather large but nicely cover approximately 74 % of the discharge observations.The simulation results of GLUE and ABC are in strong agreement, which is to be expected given the strong similarity of the behavioral samples derived with both methods.Although the numerical results of GLUE and ABC are very similar, the PMC sampler requires only 1/30 (1/8) of the simulations of GLUE to locate N = 1000 posterior solutions for the SAC-SMA (hmodel) (see Table 3).The advantage of PMC is more and more apparent with increasing dimensionality of the parameter space.If the search space is relatively low-dimensional (hmodel) and the space of behavioral solutions relatively large in comparison to the prior parameter space, both sampling methods will rapidly sample N = 1000 behavioral solutions.If, on the contrary, the search space is of higher dimensions (SAC-SMA), or the behavioral solution space is made up of a small portion of the prior parameter space, Latin hypercube sampling (and ABC-REJ) will be rather inefficient, needing many thousands of draws from the prior distribution to find just a handful of good (behavioral) solutions.The PMC sampler achieves a higher sampling efficiency by iteratively reducing the value of during the search and adaptively updating the scale and orientation of the proposal distribution.Note that the PMC and Latin hypercube sampling strategies used herein vary all parameters at a time, and hence further efficiency improvements are to be expected in high-dimensional parameter spaces with the usage of genetic operators such as crossover and mutation.
To provide more insights into the sampled parameter space of the French Broad river basin, please consider Fig. 7, which presents two-dimensional scatter plots of the posterior samples derived with GLUE (top panel) and ABC-PMC (bottom panel) for three selected parameter pairs.The bivariate sample plots appear very similar and confirm our previous findings in Fig. 5 and demonstrate significant scatter with behavioral samples that extend their entire uniform prior ranges.But this does not necessarily mean that the posterior parameter space is badly defined.Instead, large portions of the (A) LZSK-LZPK, (B) PFREE-ADIMP, and (C) ZPERC-LZPK subspaces are virtually empty and thus deemed nonbehavioral.This suggests at least some level of correlation between the posterior parameter samples.The difference in sampling density between both panels is simply due to an insufficient computational budget for GLUE to create N = 1000 behavioral solutions.GLUE was terminated after 300 posterior samples were found.We now proceed with out-of-sample prediction and plot in Fig. 8 the streamflow uncertainty ranges (gray region) of the SAC-SMA model for a three-year portion of the evaluation data set of the French Broad watershed.This period commences immediately after the last day of the calibration data set, with the initial state at t = 0 having been derived from the calibration ensemble.The top panel presents the results of GLUE and the bottom panel plots the corresponding results of ABC.Perhaps not surprisingly, both methods exhibit similar results and provide a discharge ensemble that envelops about 70 % of the observed discharge data (red circles).The strong similarity between the simulation results of the calibration and evaluation sample inspires confidence in the ability of the behavioral parameter set to accurately describe the rainfall-runoff transformation of the French Broad river basin.
Table 4 summarizes the results of GLUE and ABC for the SAC-SMA model and French Broad watershed, presenting the root mean squared error (RMSE) of the posterior mean discharge simulation and associated coverage of the 95 % prediction intervals for the calibration and evaluation period.For completeness, we also list the results of the SAC-SMA model with a formal likelihood function, Eq. ( 4), using the heteroscedastic measurement error σ Ỹ derived from the nonparametric difference operator.The listed statistics summarize our main findings thus far.The results of GLUE and ABC are virtually identical and show a consistent performance during the calibration and evaluation period.The 95 % uncertainty ranges derived with both methods encompass about 70 % of the discharge observations.This coverage of the parameter uncertainty is significantly larger than the approximately 12-17 % derived from a classical likelihood function.This finding has important practical utility, for instance within the context of flood forecasting.The behavioral parameter distribution derived with ABC and GLUE provides a reasonable initial estimate of the total out-of-sample prediction uncertainty.On the contrary, the posterior parameter uncertainty derived from a classical likelihood function only envelops a small percentage of the streamflow observations.It is worth noting that this coverage will be inflated if other sources of uncertainty are considered in the formal Bayesian analysis.Nevertheless, this requires a better understanding of precipitation and model structural errors.
Our main focus thus far has been on the SAC-SMA model, without recourse to the simulation results of the hmodel.Figure 9 shows posterior histograms of five of the hmodel parameters derived with GLUE (top panel) and ABC (bottom panel) using the French Broad calibration data set.The PMC sampler determined a maximum possible coverage of 95 % of the discharge data within the uniform hypercube defined by the effective observation error.The results in Figs.9-11 thus pertain to this coverage level.The marginal posterior parameter distributions derived with GLUE and ABC again demonstrate a strong agreement.Most of the hmodel parameters, with the exception of I max and α E , are reasonably well defined by calibration against the observed discharge data.The parameter Q max is particularly well resolved and favors values close to zero -something that is physically rather unrealistic and likely due to errors in the model formulation and precipitation data.
Figure 10 presents two-dimensional scatter plots of the posterior samples of three selected parameter pairs.The top panel corresponds to GLUE and the bottom panel illustrates the same results for ABC.Each plus symbol depicts a behav- ioral solution.Because of sampling inefficiency, the GLUE calculations were terminated after 100 behavioral samples were identified.This explains the apparent differences in sampling density.Nevertheless, the bivariate plots of the posterior samples derived with both methods are in strong agreement with each other, with behavioral solutions that occupy only a relatively small part of the prior parameter space.This is particularly true for the α F − Q max subspace.The sampled parameter pairs appear rather uncorrelated, which suggests that the different hmodel parameters each control a different part of the simulated watershed response.This simplifies posterior inference, favoring a hierarchical sampling approach in which parameters are estimated sequentially.
We now demonstrate how the hmodel parameter uncertainty translates into streamflow simulation uncertainty.We separately depict the results for the calibration (Fig. 11) and evaluation (Fig. 12) period.As expected, the simulation results derived with GLUE and ABC are in close agreement.The 95 % simulation uncertainty ranges encompass about 67 % of the calibration data observations (see Table 5).This is much higher than the 7 % coverage derived with a classical likelihood function using MCMC simulation with DREAM.Yet between days 205 and 270 the posterior ensemble systematically over predicts the actual discharge observations.This positive bias is likely caused by an error in the measured rainfall data around day 205.This rainfall error accumulates in the simulated state variables and continues to persist until the next significant rainfall event around day 270.Rainfall data correction would seem appropriate (Kavetski et    2006a, b; Vrugt et al., 2008a;Beven, 2009) but is beyond the scope of the present paper.The evaluation data period again highlights the strong operational similarity of GLUE and ABC, but the average width of the 95 % streamflow simulation uncertainty ranges appears somewhat smaller.Indeed, the coverage has reduced to approximately 61 %.Note that the posterior ensemble systematically underestimates the peak flow events.This can be the effect of an increased rainfall intensity during the evaluation period or some sort of epistemic error (Beven, 2006(Beven, , 2009;;Beven et al., 2011).For practical application it would seem most productive to extend the length of the calibration data period to include a number of larger storm events.This would certainly improve the fitting of the peak flow events but not affect the main conclusions of this paper.
We now turn our attention on the Leaf River data set and present in Fig. 13 histograms of the marginal posterior distributions of a representative group of five SAC-SMA parameters.The PMC sampler determined a maximum coverage of about 56 % of the discharge data within the effective observation error used herein.The top panel displays the results for GLUE for this coverage level, whereas the bottom panel shows the corresponding counterparts derived with ABC.The histograms derived with both methods are again strikingly similar, yet the PMC sampler used in ABC is about 30 times more efficient (not tabulated) in sampling the N = 1000 behavioral solutions.Note that SAC-SMA parameters are not particularly well defined by calibration against the limits of acceptability.The marginal posterior uncertainty ranges are rather large, with the exception of the parameter LZPK, which tends to favor values close to zero.Finally, Fig. 14 illustrates the performance of the GLUE and ABC derived posterior parameter distributions for an independent evaluation period.The GLUE (top panel) and ABC (bottom panel) derived 95 % posterior streamflow uncertainty ranges and again appear nearly equivalent (expected), containing about 65 % of the observed discharge values.This is somewhat smaller than the 80 % coverage obtained during the calibration data period (not shown herein).

Discussion and conclusions
In the past two decades, the GLUE methodology of Beven and Binley (1992); Beven and Freer (2001); Beven (2006) has found widespread application and use for model parameter and predictive uncertainty analysis.This method rejects the formal Bayesian paradigm in favor of finding a set of of behavioral solutions that are acceptably close to the non-error-free observations.This avoids over conditioning of the posterior parameter space in nonideal cases with nontraditional error residual distributions.Indeed, Tables 4-6 demonstrate that the GLUE derived 95 % simulation uncertainty ranges encompass a much higher percentage of the discharge observations than the posterior parameter predictive uncertainty intervals derived from a classical likelihood function.Formal likelihood functions that do not adequately describe the probabilistic properties of the error residuals tend to overestimate the actual information content of the data, providing estimates of the posterior parameter uncertainty that are overly optimistic.
Many have criticized the GLUE methodology for being subjective and lacking an appropriate mathematical underpinning.To help bridge the gap between informal and formal Bayesian approaches, this paper introduced likelihoodfree inference to hydrologic modeling and uncertainty analysis.This approach was introduced in the statistical literature about three decades ago (Diggle and Gratton, 1984) for cases when an explicit likelihood (objective) function cannot be justified.Such approaches, also referred to as ABC, use one or multiple (sufficient) statistics to estimate the posterior parameter distribution.The premise behind ABC is that θ should be a sample from the posterior distribution as long as the distance between the observed and simulated summary statistics is smaller than some small value, .An example of this was given in Sect. 3 by calibration of the Nash cascade model against the mean and standard deviation of the discharge data.In the limit of going to zero, the behavioral solution space should converge to the actual posterior distribution, pending the assumption that the chosen summary statistic(s) is (are) near sufficient (Pritchard et al., 1999;Vrugt and Sadegh, 2013).But this was certainly not the case for the Nash cascade example.The mean and standard deviation are rather weak summary metrics, which explains why the marginal posterior distribution of the recession constant was too wide (see Fig. 3c) and did not converge to its expected distribution (Fig. 3b).Thus, there is a clear need for meaningful summary statistics with a compelling diagnostic power.Examples include the annual runoff and baseflow coefficient and the flow duration curve as used in Vrugt and Sadegh (2013).Note that the ABC approach differs from multiple objective calibration methods in that the distance between the observed and simulated summary metrics is jointly minimized.
Numerical simulations presented in Figs.4-14 have shown that, if each observation is treated as a summary variable, then the ABC approach obtains very similar results to the limits of acceptability approach of GLUE.A similar conclusion was drawn in previous work by Nott et al. (2012) but following a different line of reasoning and within the context of the more traditional GLUE methodology presented by Beven and Binley (1992).One issue deserves special attention, which is that within the limit of acceptability framework, the value of needs to be taken much larger than what is deemed statistically adequate.Standard applications of likelihood free inference define a solution to be behavioral if the chosen summary statistics are within a small distance of their observed counterparts.Yet for hydrologic systems with many calibration observations, the probability of happening upon a simulation run that yields exactly the same data set as the one observed will be extremely small.The effective observation error remedies this problem, but the magnitude of this value is typically much larger than the theoretical value of to guarantee converge to the "true" posterior parameter distribution.Thus, although the limits of acceptability approach of GLUE is a special variant of the more generic ABC approach, one should be careful with interpretation of the posterior parameter distribution.
The results in Tables 4-6 demonstrate that the RMSE of the posterior mean ABC (GLUE) simulation is substantially larger than its counterpart derived with DREAM using a formal likelihood function.This finding is not surprising.The likelihood function used in DREAM is specifically designed to minimize the squared distance between the model simulation and corresponding data.This metric poses much stronger constraints on the parameter values than the limits of acceptability used in GLUE or ABC, and hence results in a better compliance of the simulated and observed discharge data.The performance of the posterior mean ABC simulation can be enhanced if the limits of acceptability δ are tightened and/or additional summary metrics are used during model fitting.Ideally, the chosen summary statistic S(•) is sufficient for θ and thus provides as much information for the parameters as the original data set Ỹ itself.However, if the exact (perfect) likelihood function is unknown, it will be difficult to determine a sufficient statistic.One could then use multiple different summary statistics that each capture different aspects/signatures/patterns of the input-output response (Ratmann et al., 2009;Vrugt and Sadegh, 2013).
The findings presented thus far are derived by comparing the SAC-SMA and hmodel simulated streamflow time series with the observed data.This deterministic approach does not generate a random sample at each time step, and therefore violates a basic requirement of ABC.Nevertheless, the use of a deterministic model is adequate within the current context because the limits of acceptability used herein are substantially larger (four times) than the measurement error of the streamflow data.To illustrate this, please consider Fig. 15 that plots the ABC-derived parameter and 95 % posterior simulation uncertainty ranges for the French Broad river basin using deterministic (top panel) and stochastic (bottom panel) SAC-SMA modeling.We limit our display to a 250-day portion of the calibration data period and the marginal posterior distribution of two selected SAC-SMA model parameters (right-hand side).The stochastic simulation is simply derived by adding a random measurement error to the daily modeled SAC-SMA discharge values.As expected, the posterior streamflow uncertainty ranges and posterior histograms appear very similar.This concludes our numerical simulations.
The ABC methodology opens an entire new field of research with infinite scope for (amongst others) (a) inventing new summary metrics that are self-sufficient and properly rooted in hydrologic and information theory, (b) developing methods and guidelines to interpret the outcome of the ABC analysis and help aid in detection of model malfunctioning (Vrugt and Sadegh, 2013), (c) exploring ways to determine the temporal/spatial variability of individual summary metrics for stochastic (Bayesian) analysis, (d) new approaches to diagnostic model selection using stochastic minimization of the mutual information, (e) improving the computational efficiency of ABC sampling (Sadegh and Vrugt, 2013), and (f) practical applications.Challenges lie in the proper selection of summary metrics that properly extract the information from the calibration data, how to deal with input data uncertainty, and how to detect epistemic errors (lack of knowledge).Our current research efforts are devoted to these different topics.

Fig. 1 .
Fig. 1.Explicit recognition of the role of (1) parameter, (2) forcing data, (3) initial state, (4) model structural, (5) output, (6) state, and (7) calibration data uncertainty.The pitchfork symbol illuminates the difficulty with formulating the likelihood function (and prior distribution/parameterization of individual error sources) used to summarize the error residuals.Explicit treatment of individual error sources is required to increase the prospects of explaining the reasons for model inadequacy and learning from the experimental data.

Fig. 2 .
Fig. 2. Synthetic discharge time series (blue line) simulated with the Nash cascade model, and the corrupted observations (red points) used in the GLUE and ABC analysis.

Fig. 3 .
Fig. 3.Posterior distribution of the recession parameter derived from (A) ABC using the mean of the data as summary metric, (B) DREAM using a heteroscedastic measurement error, and (C) ABC using the mean and standard deviation of the data as summary statistics.The true parameter value is indicated with the red symbol.

Fig. 4 .
Fig. 4. 95 % streamflow uncertainty ranges (dark region) derived from GLUE (top panel) and ABC (bottom panel).The red points mark the actual discharge observations.The prediction uncertainty ranges derived with both methods appear virtually identical and nicely capture the desired percentage of streamflow observations.

Fig. 5 .
Fig. 5. Posterior distribution of five randomly chosen SAC-SMA model parameters derived from (A) GLUE and (B) ABC using historical streamflow data from the French Broad river basin.

Fig. 6 .
Fig.6.SAC-SMA derived 95 % streamflow simulation uncertainty ranges (grey region) of the calibration period of the French Broad river basin using (A) GLUE and (B) ABC.The observed discharge data are indicated with the solid red dots.We limit our display to the first 365-days of the calibration data set to simplify graphical interpretation.The simulation uncertainty ranges appear very similar and nicely cover the observed discharge data.

Fig. 7 .
Fig.7.Bivariate scatter plots of the behavioral (posterior) samples of three different (randomly selected) parameter pairs using GLUE (top panels) and ABC (bottom panels): (A) LZSK-LZPK, (B) PFREE-ADIMP, and (C) ZPERC-LZPK.The scatter plots derived with both methods are in close agreement but demonstrate an important difference in sampling density.The computational budget of GLUE was limited to approximately 2 days, and within this time frame the Latin hypercube sampling method located only 300 behavioral solutions.

Fig. 8 .
Fig. 8. SAC-SMA derived 95 % streamflow simulation uncertainty ranges (grey region) for a three-year portion of the evaluation period of the French Broad river basin using the (A) GLUE and (B) ABC derived posterior parameter distribution.The observed discharge data are indicated with the solid red dots.

Fig. 9 .
Fig. 9. Posterior distribution of five randomly selected hmodel parameters derived from the French Broad calibration data set: (A) I max , (B) Q max , (C) α E , (D) α F , and (E) K F .

Fig. 10 .
Fig. 10.Bivariate scatter plots of the behavioral (posterior) samples of three different (randomly selected) hmodel parameter pairs using GLUE (top panels) and ABC (bottom panels):(A) α F − Q max , (B) K F − K S ,and (C) S max − K F .The scatter plots derived with both methods are in close agreement but demonstrate an important difference in sampling density.The computational budget of GLUE was limited to about 3 days, which has resulted in 100 behavioral solutions.

Fig. 11 .
Fig.11.Hmodel derived 95 % streamflow simulation uncertainty ranges (grey region) of the calibration period of the French Broad river basin using (A) GLUE and (B) ABC.The observed discharge data are indicated with the solid red dots.We limit our display to the first 365 days of the calibration data set to facilitate graphical interpretation.The uncertainty ranges appear very similar and nicely cover the observed discharge data.

Fig. 12 .
Fig. 12. Hmodel derived 95 % streamflow simulation uncertainty ranges (grey region) for a three-year portion of the evaluation period of the French Broad river basin using the (A) GLUE and (B) ABC derived posterior parameter distribution.The observed discharge data are indicated with the solid red dots.

Fig. 13 .
Fig. 13.Posterior distribution of five randomly chosen SAC-SMA model parameters derived from GLUE (top panels), and ABC (bottom panels) using historical streamflow data from the Leaf River watershed in Mississippi.

Fig. 14 .
Fig. 14.SAC-SMA derived 95 % streamflow simulation uncertainty ranges (grey region) for a three-year portion of the evaluation period of the Leaf River watershed using the (A) GLUE and (B) ABC derived posterior parameter distribution.The observed discharge data are indicated with the solid red dots.

Fig. 15 .
Fig. 15.Illustration of the effect of measurement data uncertainty on the ABC-derived posterior parameter and streamflow uncertainty for a representative 250-day portion of the French Broad calibration data set: (A) 95 % simulation uncertainty assuming a deterministic, Y ← H(θ |•), and (D) stochastic model operator, Y ← H(θ |•) + N(0, σ Ỹ ) with discharge measurement error derived from the nonparametric estimator in Eq. (11).The observed discharge data are indicated with solid red dots.The histograms at the righthand side of each individual panel plot the corresponding marginal posterior distributions of the SAC-SMA model parameters ADIMP and PCTIM.

Table 1 .
Prior uncertainty ranges of hmodel parameters.

Table 2 .
Description of the SAC-SMA model parameters and their (uniform) prior uncertainty ranges.

Table 3 .
Computational efficiency of GLUE and ABC for the * Derived from linear scaling of FE needed to sample 300 (SAC-SMA) and 100 (hmodel) behavioral solutions.

Table 4 .
Statistics of the GLUE, ABC and DREAM (formal likelihood function) derived posterior parameter distribution for the calibration and evaluation period of the French Broad river basin: root mean square error (RMSE) of the posterior mean SAC-SMA simulation and coverage of the associated 95 % streamflow simulation uncertainty ranges.

Table 5 .
Statistics of the GLUE, ABC and DREAM (formal likelihood function) derived posterior parameter distribution for the calibration and evaluation period of the French Broad river basin: root mean square error (RMSE) of the posterior mean hmodel simulation and coverage (%) of the associated 95 % streamflow simulation uncertainty ranges.

Table 6 .
Statistics of the GLUE, ABC and DREAM (formal likelihood function) derived posterior parameter distribution for the calibration and evaluation period of the Leaf River watershed: root mean square error (RMSE) of the posterior mean SAC-SMA simulation and coverage (%) of the associated 95 % streamflow simulation uncertainty ranges.