The Kling–Gupta efficiency (KGE) is a widely used performance measure because of its advantages in orthogonally considering bias, correlation and variability. However, in most Markov chain Monte Carlo (MCMC) algorithms, error-based formal likelihood functions are commonly applied. Due to its statistically informal characteristics, using the original KGE in MCMC methods leads to problems in posterior density ratios due to negative KGE values and high proposal acceptance rates resulting in less identifiable parameters. In this study we propose adapting the original KGE using a gamma distribution to solve these problems and to apply KGE as an informal likelihood function in the DiffeRential Evolution Adaptive Metropolis DREAM

Markov chain Monte Carlo (MCMC) techniques are extremely useful in uncertainty assessments and parameter estimations of hydrological models (Smith and Marshall, 2008). Among those MCMC methods, Vrugt et al. (2008, 2009) developed a DiffeRential Evolution Adaptive Metropolis (DREAM) algorithm, which has found numerous applications in various fields (Vrugt, 2016). It is an adaptation of the SCEM-UA algorithm (Vrugt et al., 2003a) that can efficiently estimate the posterior probability distribution of model parameters in the presence of high-dimensional and complex response surfaces with multiple local optima.

The formal likelihood function, e.g., mean square error (MSE) or root mean square error (RMSE), obtained from first-order statistical principles based on error series derived from simulations and observations, is commonly used in the DREAM algorithm. The formal likelihood function strongly relies on error assumptions, which can highly influence the shape of parameter posterior distributions (Beven et al., 2008). The informal likelihood functions, such as Nash-Sutcliffe efficiency (NSE) and the alternative Kling–Gupta efficiency (KGE) are often used in hydrological studies to indicate the general performance of model simulations (Gupta et al., 2009). These metrics represent an important measure of model performance, so-called goodness of fit (Pool et al., 2018). These likelihood functions are not directly derived from stochastic error series, but can be easily used to combine different types of data.

There are studies that discussed how to adjust the calculation of NSE in order to overcome the problems using NSE in MCMC methods. For example,
McMillan and Clark (2009) introduced a constant

However, how to properly use KGE in the MCMC methods has not been studied. Directly using KGE in MCMC methods, e.g., the DREAM algorithm, may raise difficulties such as incorrect posterior ratios due to negative KGE values, and nonlinearity between model performance and KGE values. These difficulties essentially affect chain evolutions such as the acceptance rate, indicating how easy a proposal is accepted, and the convergence rate, denoting how fast a chain converges to a stationary distribution. As a consequence, considering the computational cost with a limited number of realizations in practice, the informal character of KGE and its use in MCMC methods influences the exploration of posterior parameter distribution and model uncertainty, such as the density of identifiable parameters. Studies showed that using informal likelihood functions in generalized likelihood uncertainty estimations (GLUE) may lead to unsatisfactory posterior distributions of model parameters (Mantovan and Todini, 2006; Stedinger et al., 2008). Using NSE as the likelihood function, the number of measurements cannot be considered. Therefore, with increasing numbers of measurements the information added to the performance measure is little, thus preventing the improvement of chain evolution (Mantovan and Todini, 2006). Therefore, to feasibly use KGE in MCMC methods requires solving problems in drawing better proposals to avoid a very flat posterior distribution, to account for the influence of observational size (the amount of information included in calibration) on parameter estimations and to achieve reasonable acceptance and convergence rates.

In this study we propose adapting the gamma distribution and KGE to find a
feasible solution for properly using KGE as an informal likelihood function
in DREAM

Kling–Gupta efficiency (KGE) takes account of variability (

DREAM

We choose the easily applied formal likelihood function (lik

Based on abovementioned basics of DREAM

Problem 1: the probability density

Problem 2: the model performance does not linearly increase with the linear increase of KGE. Therefore, directly using positive KGE as the pseudo probability density

Concept of adapting KGE in DREAM

To test the robustness of our new approach, we define three case studies: (1) true and pseudo-analytical posterior distributions of model parameters are known by a virtual experiment, and uncertainties in model structures and input data are absent, (2) calibrations and evaluations with a long observation time series using a rainfall-runoff model, which allows comparing the performance between three approaches by varying the amount of data in calibration. The parameterization of the system is unknown and there are uncertainties in model structure, input data and observations and (3) a model calibration combining hydrodynamics and simple solute transport for a more complex karst system with a large subsurface heterogeneity and processes for fast recharge and groundwater discharge from conduit networks. The observation period is short and uncertainties exist in model structure, input and observation data, and model parameter estimations.

We generate a virtual experiment using a rainfall-runoff model (the HBV model). We obtain the forcing data, daily mean temperature and daily precipitation (2001–2008), from the German site in Liu et al. (2021). The HBV model represents typical catchment rainfall-runoff processes considering one soil water storage and two groundwater storages (Lindström et al., 1997). In this virtual experiment, we use the model version without snow processes, which contains nine parameters. As our goal is not the model itself but the calibration of model parameters, we only provide descriptions of model parameters (Table 1). For the model structure and equations, refer to Liu et al. (2021).

Names, description, ranges and virtual true values of the HBV model parameters for the virtual experiment.

Note: K0, UZL and MAXBAS are insensitive parameters in this case study and thus are fixed to the true values.

As the analytical posterior distributions of model parameters of a hydrological model are hardly achievable, we use the following procedure to
generate the pseudo-analytical posterior distribution. Firstly, we set the
catchment area of 100 km

In order to test the capability of our approach for a real system with
uncertainties in forcing, observations, model structure and model parameters, we select a catchment from the CAMELS-US dataset (Newman et al., 2014, 2015) and simulate the rainfall-runoff processes with the HBV model as case study 1. We also compare the performance of our transformed KGE with the GLUE method. We have the following criteria to select this catchment: (1) catchment area is between 100 and 500 km

Study area of the catchment with the gauging station at North Fork Black Creek near Middleburg, FL, USA for the case study 2.

In order to test the capability of our new approach for a complex system, we
set case study 3 in a karst system that has conduit systems resulting in
fast recharge and discharge. It has uncertainties from the forcing data, the
model structure and observation errors. Daily discharge time series and
weekly solute (

Study area of the Rosario Spring for case study 3. This map is an updated version of the map in Hartmann et al. (2014).

Names, descriptions and ranges of the VarKarst model parameters.

For calibration of the three case studies, we have used the GLUE approach and
DREAM

For case study 1, following a standard calibration procedure we use 2001–2003 as the warm-up period and 2004–2008 as the calibration period. The posterior distribution of model parameters derived from DREAM

Posterior distributions of sensitive model parameters for the
virtual experiment. The red cross symbol denotes the true model parameter
value. KGE

For case study 2, the true model parameters are unknown. We use 25 hydrological years (1 October 1980–30 September 2005) to perform the calibration and evaluation. We choose the Daymet forcing data to drive our hydrological model as these meteorological data have potential evapotranspiration (PET) estimates and were used to calculate the catchment climatic properties (Addor et al., 2017). We use the first 5 years as the warm-up period, then the following 10 years for calibration and the last 10 years for evaluation. We test the performance of 4 approaches (GLUE, formal, formal

For case study 3, the true model parameters are unknown too. We calibrate the hydrodynamics and solute transport simultaneously. For calibrations using the formal likelihood function, we normalize each observation variable by its mean to exclude the influence of units and magnitudes of discharge and solute concentrations. We compare the performance of three approaches: (i) we use the formal likelihood function and use the normalized daily discharge and normalized weekly concentrations of three solutes as the combined observations (“formal

The model performance for calibration and evaluation is examined using KGE and its three components representing variability (

Acceptance rate

When using the original KGE (set negative KGE values to zero) as the likelihood function, the posterior parameter range is only slightly reduced
for all sensitive parameters compared to the prior uniform distribution. In
addition, the density around the true values of model parameters is still
very flat, indicating that true model parameters are barely identified (Fig. 4). When applying the adapted KGE (adapting the gamma distribution and KGE to derive probability density), the posterior parameter range is much more reduced and the reduced range is more or less centered at the true values as shown in Fig. 4. Compared to the pseudo-analytical posterior distributions of all model parameters derived from the formal likelihood function using the special virtual setting, the adapted KGE approach (KGE

As expected, using the original KGE we have a very high acceptance rate (ca. 60 %–80 %, Fig. 5a), leading to a very fast convergence (Fig. 5b). This results in a large uncertainty bound in the discharge simulations (Fig. 5c), and the uncertainty of peak discharges is particularly large. With the adapted KGE, we see that the acceptance rate becomes smaller and the convergence gets slower. This can be explained by introducing the nonlinearity of the adapted KGE: probability densities for large and small KGE values are more distinct compared to the original KGE. Figure 5a also shows that the acceptance rate of our approach is 5 %–10 %, which is lower than ca. 20 % of the formal likelihood function. Similarly, the convergence rate of our approach is slower than the formal likelihood function (Fig. 5b). This suggests that using the formal likelihood function has a higher efficiency than the approach adapting KGE for calibrations of a system that only contains little uncertainty (only small observation errors in our case). However, when more uncertainties appear, e.g., uncertainties in forcing data and model structures, the convergence rates (efficiency) become similar between the adapted KGE and the formal likelihood function (refer to the following subsections, Fig. 6b). Compared to the width of the discharge uncertainty bound using the original KGE (Fig. 5c), calibration using the formal likelihood function and the adapted KGE both reduce the average width of total discharge uncertainty bound by ca 85 %. As this virtual experiment does not assume uncertainty in the input data and the model structure, the adapted KGE shows a similar performance in the uncertainty estimation to using the formal likelihood function and both can closely reproduce observations.

Acceptance rate

For calibrating a real system (with uncertainties in forcing, observations, and model structure and parameters), the acceptance rate of the adapted KGE is lower than that of the formal likelihood function, but higher than that of the log-transformation (Fig. 6a) for calibrations using both short and long observations. The convergence rate is almost identical between the formal likelihood function and the adapted KGE (higher than the log-transformation, Fig. 6b). This indicates that our approach has a same efficiency as the formal likelihood function and a higher efficiency than the log-transformation for calibrations of a system with more uncertainties. With more observations in calibrations, the unidentified parameters K0 and UZL (Fig. 6c and d) using the adapted KGE and the formal likelihood function become identified (Fig. 6g and h). The identified parameter values for K0 (Fig. 6g) show a similar distribution that is different from the log-transformation, while the identified parameter values for UZL (Fig. 6h) differ between the three approaches. The identified parameter K1 with 1-year observations in calibration (Fig. 6e) shows a similar distribution to using 10-year observations (Fig. 6i) between the adapted KGE and the formal likelihood function, which is different from the log-transformation. The density is higher at the peak when using more observations (Fig. 6i). For the identified parameter MAXBAS between the three approaches is similar when using 1-year observations for calibration (Fig. 6f), but it changes after adding more observations into calibration (Fig. 6g), where the adapted KGE approach shows a similar distribution as the log-transformation. This suggests that using different likelihood functions may lead to different identified model parameters for a system with various uncertainties due to parameter interactions. More information may be needed to confine the model parameters.

General performance (KGE), variability (

In this section, we focus on analyzing the performance in the evaluation
period to show the prediction ability of the four approaches. Generally, the
uncertainty of the model performance (represented by the interquartile of
KGE,

General performance (KGE), variability (

For calibration combining discharge and solute concentrations at this
heterogeneous karst system with short observation records, the adapted KGE is superior than the formal likelihood function regardless of the weight given to discharge and solutes (Fig. 8). For the general performance measured by KGE, the adapted KGE approach performs best, followed by the formal likelihood function with same weights in discharge and each solute, and then the calibration with different weights (the number of discharge data is 10 times for each solute). The performance regarding discharge is similar between the three approaches (the mean KGE is around 0.9) with a slightly higher performance for the adapted KGE approach. However, the adapted KGE approach improves the mean performance regarding

As the formal likelihood function with the same weight for discharge and
solutes (formal

Total uncertainty for discharge and solutes (

Using the original KGE as the likelihood function, model parameters are not easily identifiable, which results in a very large uncertainty in the simulation. This is because directly using the original KGE as the likelihood estimate assumes a linear increase of probability density with the linear increase of KGE. It leads to the identification of parameter proposals with good KGE performance to be more difficult and inefficient. The difference between large and small KGE values is not distinctly large enough that the probability to accept poor proposals is high. This is why we find a very large acceptance rate and a very fast convergence rate. Mantovan and Todini (2006) and Stedinger et al. (2008) also mentioned that using the informal likelihood function, such as Nash-Sutcliffe efficiency (NSE), in the generalized likelihood uncertainty estimation (GLUE) as objectives cannot find proper posterior distributions of model parameters. Therefore, directly using the original KGE should be avoided and some adaptations to solve the incapability of exploring the posterior distributions such as our approach are needed in MCMC methods.

The adapted KGE can make a good estimate of the pseudo-analytical posterior distributions of model parameters derived from the formal likelihood function in case study 1. This suggests that it is capable of exploring the parameter posterior distributions. The adapted KGE has a lower acceptance rate and convergence rate compared to the formal likelihood function for the virtual experiment (case study 1). The possible reason is that one KGE value can cover multiple error combinations with the same RMSE around the true optimum, which makes the RMSE slightly more efficient at drawing proposals for parameters very close to the true optimum (known parameters in case study 1) for a system that only contains little uncertainty. However, calibrations of real systems usually contain more uncertainties e.g., uncertainties in forcing (including the spatial averaging), observation data (measurement errors), and uncertainty in model structures. The adapted KGE has a similar convergence rate (efficiency) as the formal likelihood for the real-world calibrations (case study 2). In particular, the acceptance rate of the adapted KGE is around 20 % for a system where we have good input and observations (case study 2). This is similar to the formal likelihood function and is also close to the theoretically optimal acceptance rate (0.234) in Metropolis algorithms with random walk (Yang et al., 2020).

The uncertainty bound of discharge simulations in case study 1 is almost
identical between the adapted KGE and the formal likelihood function. This
indicates that our approach can behave similarly concerning discharge uncertainty estimation as the formal likelihood function. For the calibration to the real system, the adapted KGE even has a higher general performance in terms of the mean KGE of the evaluation for the total, low and high flows than the formal likelihood function and the log-transformation. McMillan and Clark (2009) had a similar finding that using another informal likelihood function, NSE, in MCMC methods outperforms the formal likelihood for calibrations with high variability and multiple optima.
The formal likelihood functions go along with the strong assumption that errors are distributed normally (Vrugt et al., 2008, 2009), the informal likelihood function KGE takes into account more variability without strict assumptions on error sources (Gupta et al., 2009). The adapted KGE performs similar to the formal likelihood function regarding the correlation between simulations and observations shown in case study 2. However, they all have lower performance for the low flow simulations compared to the total and high flow simulations. While the log-transformation works well for low flow (case study 2), the adapted KGE has a good and balanced performance for both high and low flows. It also shows a lower overestimation of bias in low flows shown as the metric

While our approach has a similar performance as the formal likelihood function for discharge simulations, we find similar posterior distributions for certain parameters but also inconsistent posterior distributions for some parameters between the formal and informal approaches. This is because some model processes interplay with other processes such that there is compensation of one parameter for another, i.e., parameter interactions. Adding additional information, e.g., solutes in case study 3, can help to further constrain model parameters (Hartmann et al., 2017) and represents the complexity of real hydrological systems. Our study shows that the adapted KGE approach is superior for simultaneously calibrating model parameters with different types of data than the formal likelihood function. This can improve the model calibration using the traditional separate steps such as firstly calibrating discharge and then solute processes in Liu et al. (2020). Many studies have shown that multi-objective calibrations allow important characteristics of a system to be adequately and properly estimated (Vrugt et al., 2003b; Yapo et al., 1998). Using KGE can provide a feasible way to combine various types of observations as a measure of multi-objective performance and avoid issues regarding data units, scales and frequency.

Even though our approach adapts the gamma distribution to compute the probability density for KGE, the way we formulate the likelihood function based on KGE is still informal. It means the derivation of the likelihood is not from a strict theoretical probability framework, which is a limitation of our approach. Nevertheless, our approach provides a feasible and pragmatic way and a close solution to the formal likelihood function to avoid the pitfalls of directly using the original KGE in MCMC methods. Future work is needed to find a solution to link probability density and KGE in order to incorporate KGE in a statistical manner as much as possible.

Our study demonstrates that using the original KGE in DREAM

All data used in this study have been published in Liu et al. (2021) and Hartmann, et al. (2014) and the dataset is publicly available described by Newman et al. (2015) and Addor et al. (2017) and can be accessed via Newman et al. (2014) (

The supplement related to this article is available online at:

YL conceptualized the study, developed and applied the adapted KGE approach, and visualized the results. YL and JFO wrote the paper, and analyzed the results. MM and AH provided supervision and advice throughout developing the adapted KGE approach and supported the development of this manuscript. All authors contributed to the revision of the manuscript.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We thank Jasper A. Vrugt for his constructive comments and suggestions. We
thank the editor, Jiangjiang Zhang and another anonymous reviewer for their
valuable comments during the peer-review phase. Yan Liu and Andreas Hartmann
were supported by the Emmy-Noether-Programme of the German Research
Foundation (DFG, grant number: HA 8113/1-1, project “Global Assessment of
Water Stress in Karst Regions in a Changing World”). Jaime Fernández-Ortega and Matías Mudarra were supported by the European Project “Karst Aquifer Resources availability and quality in the
Mediterranean Area (KARMA)” PRIMA, ANR-18-PRIM-0005 (PCI2019-103675), and
by the project PID2019-111759RB-I00 funded by the Spanish Research Agency.
Additionally, it is a contribution to the Research Group RNM-308 of Junta de
Andalucía. Jaime Fernández-Ortega was also supported by the Erasmus

This research has been supported by the Deutsche Forschungsgemeinschaft (grant no. HA 8113/1-1), the Agencia Estatal de Investigación (grant no. PID2019-111759RB-I00), the Horizon 2020 ((4PRIMA) grant no. 724060). This open-access publication was funded by the University of Freiburg.

This paper was edited by Lelys Bravo de Guenni and reviewed by Jiangjiang Zhang and one anonymous referee.