Pitfalls and a feasible solution for using KGE as an informal likelihood function in MCMC methods: DREAM(ZS) as an example
 ^{1}Chair of Hydrological Modeling and Water Resources, University of Freiburg, 79098 Freiburg, Germany
 ^{2}Department of Geology and Centre of Hydrogeology, University of Málaga (CEHIUMA), 29071 Málaga, Spain
 ^{3}Department of Civil Engineering, University of Bristol, Bristol, UK
 ^{1}Chair of Hydrological Modeling and Water Resources, University of Freiburg, 79098 Freiburg, Germany
 ^{2}Department of Geology and Centre of Hydrogeology, University of Málaga (CEHIUMA), 29071 Málaga, Spain
 ^{3}Department of Civil Engineering, University of Bristol, Bristol, UK
Abstract. The KlingGupta Efficiency (KGE) is a widely used performance measure because of its advantages in orthogonally considering bias, correlation and variability. However, in most Markov chain Monte Carlo (MCMC) algorithms, errorbased formal likelihood functions are commonly applied. Due to its statistically informal characteristics, using the original KGE in MCMC methods leads to problems in posterior density ratios due to negative KGE values and high proposal acceptance rates resulting in less identifiable parameters. In this study we propose adapting the original KGE using a gamma distribution to solve these problems and to apply KGE as an informal likelihood function in the DiffeRential Evolution Adaptive Metropolis DREAM_{(ZS)}, which is an advanced MCMC algorithm. We compare our results with the formal likelihood function to show whether our approach is robust and plausible to explore posterior distributions of model parameters and to reproduce the discharge behaviors. For that, we set three case studies that contain different uncertainties. Our results show that model parameters cannot be identified and the uncertainty of discharge simulations is large when directly using the original KGE. Our approach finds similar posterior distributions of model parameters compared to the formal likelihood function. Even though the acceptance rate of the adapted KGE is lower than the formal likelihood function for some systems, the convergence rate (efficiency) is similar between the two approaches for the calibration of real hydrological systems showing generally acceptable performances. We also show that both the adapted KGE and the formal likelihood function provide low performances for low flows, with the larger overestimations obtained from using the formal likelihood function. Furthermore, the adapted KGE approach behaves closely to the formal likelihood function in terms of the correlation between simulations and observations. Thus, our study provides a feasible way to use KGE as an informal likelihood in the MCMC algorithm and provides possibilities to combine multiple data for better and more realistic model calibrations.
 Preprint
(1468 KB) 
Supplement
(351 KB)  BibTeX
 EndNote
Yan Liu et al.
Status: final response (author comments only)

RC1: 'Comment on hess2021514', Anonymous Referee #1, 22 Nov 2021
The authors proposed an informal likelihood function based on KGE (with modifications), and demonstrated its performance against a formal likelihood function based on RMSE in DREAM_ZS with three cases. There are several key questions that were not clearly answered.
 Why should one use the KGEbased informal likelihood function? Why Gamma distribution? It seems that it is not advantageous over the formal likelihood function in the three case studies. It would be essential to design a case where the formal likelihood function would fail while the KGEbased one still works. Simply introducing a new metric (without solving challenging problems) has no significance.
 No theoretical analysis has been provided. At least one case where analytical form of posterior is available should be considered to verify whether the new likelihood can obtain the right answer.
 The numbers of unknown parameters are generally small. A case with more than 20 unknown parameters (>100 would be better) is suggested to demonstrate its performance in more challenging settings.
 Comparison with other informal likelihood functions (NSE, GLUE, etc.) is lacking.
Minor comments
 Lines 4748: confused about what is N about.
 Lines 5760: The proposal should not affect the shape of posterior if the chain is sufficiently long.
 Line 82: if the types of observations are different and with different magnitudes, how to calculate the ED metric?
 There is no need to include results of KGE_ori, as they are obviously wrong.
 Figures 6 (hg), curves of KGE_ori and formal are quite different, why? A synthetic case with similar settings is needed to check which one failed to capture the truth.
 Line 364: capable to>capable of
 What is equation of the likelihood function based on RMSE? There are also many forms of formal likelihood function (e.g., Table B1 in J.A. Vrugt / Environmental Modelling & Software 75 (2016) 273e316)

AC1: 'Reply on RC1', Yan Liu, 21 Jan 2022
We thank the reviewer for the comments that help us to improve our manuscript. In the following text, the reviewer comments are shown in regular font, and our pointbypoint replies are shown in italic. Upon revision we will make the following major changes to the manuscript and Supplementary Material:
 Introduce one case study to show that the adapted KGE approach works while the formal likelihood fails to highlight the advantages and the need to use the adapted KGE;
 Provide the analysis with the known analytical solution of posterior in one of the case studies and compare results derived using adapted KGE to it;
 Compare the performance of the adapted KGE with the GLUE framework and another formal likelihood function.
The authors proposed an informal likelihood function based on KGE (with modifications), and demonstrated its performance against a formal likelihood function based on RMSE in DREAM_ZS with three cases. There are several key questions that were not clearly answered.
1. Why should one use the KGEbased informal likelihood function? Why Gamma distribution? It seems that it is not advantageous over the formal likelihood function in the three case studies. It would be essential to design a case where the formal likelihood function would fail while the KGEbased one still works. Simply introducing a new metric (without solving challenging problems) has no significance.
Response: The motivation of proposing this adapted KGE is that KGE is widely used as the performance measure in hydrological studies and also used as objectives for calibrations. However, we have seen some flaws using the original KGE in MCMCtype calibrations. Gamma distribution is an easily applicable distribution function and can solve the two problems for using the original KGE: (i) ensure the monotonically increase of probability density even with negative KGE values, and (ii) achieve a proper nonlinearity of performance increase due to the increase in KGE. They can lead to an efficient and proper chain evolution. Another reason is that among other functions we tried, the Gamma distribution is better since it does not introduce more parameters to calibrate and maintains the good performance compared to the formal likelihood function. In our case studies 2 and 3 the adapted KGE even has a higher general performance, the mean KGE of the evaluation, and a smaller bias overestimation of low flows than the formal likelihood function. We will discuss more on the use of Gamma distribution function in the revised version of the manuscript.
It is a very good idea to include one case study where our adapted KGE works while the formal likelihood fails. This will also highlight the need to use KGEbased informal likelihood function. In the revision, we will include the above mentioned case study and discuss more why the KGEbased informal likelihood function should be used.
2. No theoretical analysis has been provided. At least one case where analytical form of posterior is available should be considered to verify whether the new likelihood can obtain the right answer.
Response: In Case study 1, the true model parameters are known by setting. We compared the performance between the formal and our adapted KGE approach. In revision, we will include one analysis with the known analytical solution of posterior in one of our case studies and compare our results with it.
3. The numbers of unknown parameters are generally small. A case with more than 20 unknown parameters (>100 would be better) is suggested to demonstrate its performance in more challenging settings.
Response: Our approach was developed based on lumped or semidistributed hydrological models, where the number of model parameters is mostly smaller than 20 to which DREAM_{(SZ)} is usually applied (Liu et al., 2021; Shafii et al., 2014; Vrugt et al., 2008, 2009). Some other new likelihood measures are also usually tested with simple analytical models or models with similar complexity as ours (Knoben et al., 2019; Schwemmle et al., 2020).
4. Comparison with other informal likelihood functions (NSE, GLUE, etc.) is lacking.
Response: In revision, we will add the comparison of our approach and GLUE using NSE as the objective in one of our case study. Additionally, we will compare our approach with another formal likelihood functions, such as using the log transformation of model errors as suggested in the major comment 1 by Reviewer #2.
Minor comments
1. Lines 4748: confused about what is N about.
Response: N is the variable symbol that was used as a parameter. We will make it clearer in revision.
2. Lines 5760: The proposal should not affect the shape of posterior if the chain is sufficiently long.
Response: We agree that if the chain is long enough, the ‘true’ shape of posterior can be explored. However, in practice one needs to consider efficiency due to the computational cost. This means a limited number of realizations will be performed. Using the original KGE, the differentiation of very good (e.g. KGE=0.8) and good (e.g. KGE=0.6) in the standard MCMC is small. This will lead to a very fast convergence (indicating by the diagnostic index), which means using the limited realizations and its converged chains will result in a very flat posterior distribution, i.e. the exploration of the shape of posterior is largely affected.
3. Line 82: if the types of observations are different and with different magnitudes, how to calculate the ED metric?
Response: Since the adapted KGE is informal, we can combine multiple KGEs with each KGE for one type of observations (such as the weighted sum). The ED metric will be 1 subtracts the combined KGE. The combination of KGE will be based on the importance of each type of information defined by the user. It will be like using multiobjectives.
4. There is no need to include results of KGE_ori, as they are obviously wrong.
Response: We wanted to show problems exist when using KGE_ori. In revision, we will minimize using results of KGE_ori and put the comparison into supplement.
5. Figures 6 (hg), curves of KGE_ori and formal are quite different, why? A synthetic case with similar settings is needed to check which one failed to capture the truth.
Response: Curves of KGE_ori and formal are quite different because KGE_ori cannot well explore the posterior. The differences between KGE_formal and KGE_gamma are most probably due to the interactions between model parameters. We will check the autocorrelations of model parameters and analyze other factors to express the reasons in revision.
6. Line 364: capable to>capable of
Response: We will change it.
7. What is equation of the likelihood function based on RMSE? There are also many forms of formal likelihood function (e.g., Table B1 in J.A. Vrugt / Environmental Modelling & Software 75 (2016) 273e316)
Response: We will include the equation in revision. It is the first, “lik=11”, in Table B1.
Reference:
Knoben, W. J. M., Freer, J. E. and Woods, R. A.: Technical note: Inherent benchmark or not? Comparing NashSutcliffe and KlingGupta efficiency scores, Hydrol. Earth Syst. Sci., 23(10), 4323–4331, doi:10.5194/hess2343232019, 2019.
Liu, Y., Wagener, T. and Hartmann, A.: Assessing Streamflow Sensitivity to Precipitation Variability in Karst‐Influenced Catchments With Unclosed Water Balances, Water Resour. Res., 57(1), doi:10.1029/2020WR028598, 2021.
Schwemmle, R., Demand, D. and Weiler, M.: Technical note: Diagnostic efficiency – specific evaluation of model performance, Hydrol. Earth Syst. Sci. Discuss., (2008), 1–15, doi:10.5194/hess2020237, 2020.
Shafii, M., Tolson, B. and Matott, L. S.: Uncertaintybased multicriteria calibration of rainfallrunoff models: A comparative study, Stoch. Environ. Res. Risk Assess., 28(6), 1493–1510, doi:10.1007/s004770140855x, 2014.
Vrugt, J. A., ter Braak, C. J. F., Clark, M. P., Hyman, J. M. and Robinson, B. A.: Treatment of input uncertainty in hydrologic modeling: Doing hydrology backward with Markov chain Monte Carlo simulation, Water Resour. Res., 44(12), 1–15, doi:10.1029/2007wr006720, 2008.
Vrugt, J. A., ter Braak, C. J. F., Gupta, H. V. and Robinson, B. A.: Equifinality of formal (DREAM) and informal (GLUE) Bayesian approaches in hydrologic modeling?, Stoch. Environ. Res. Risk Assess., 23(7), 1011–1026, doi:10.1007/s004770080274y, 2009.

RC2: 'Comment on hess2021514', Anonymous Referee #2, 17 Dec 2021
General comments:
This study suggests an approach to adapt the KGE through transformation with a Gamma distribution so that it can be better used as an informal likelihood function in calibration procedures. The study finds that the results and inference behavior when using this adapted KGE measure are very similar to the case when using the RMSE as a likelihood function. In a synthetic case study, it is also shown that the presented approach successfully reinfers the known true parameter values.
The manuscript presents an elegant and innovative approach to a solution for a very relevant problem and could therefore be of high value in many fields. The manuscript is very well written, carefully composed and logically structured, and all in all very convincing.
However, it is a bit too brief I feel in some respects and would need to be extended by some theoretical considerations among others (see comments below).
Major comments:
It is not clear to me what the “formal likelihood function” is in this case. The authors say that it is the RMSE, but it would be useful to show an equation that explicitly states which assumptions w.r.t. distribution type (I assume the normal distribution) and standard deviation this corresponds to. For example, something along the lines of: using the RMSE is equivalent to assuming independently normally distributed errors at each time step in a formal Bayesian inference approach and assuming that the standard deviation is equal to a certain (which?) value at each time step, ideally including the full equation. As is mentioned by the authors, the RMSE is very sensitive to large flows and would not be a typical measure used in formal likelihood approaches in my opinion. There are assumptions that usually work better, such as a standard deviation that is proportional to the predicted streamflow, for example. For a comprehensive overview of the different assumptions on the standard deviation of the residual error (and associate transformations) in formal likelihood approaches, see for example McInerny et al. (2017). In my view, it would make more sense to use one of their suggested approaches as the “formal” approach in this study.
On a related note, the standard deviation of the additive error in formal likelihood approaches is an important parameter that needs to be used in prediction as well. The authors infer the posterior parameter distributions of the model parameters and then use these posteriors for prediction. This is fine if only parametric uncertainty is relevant, but by this, they completely neglect all other sources of uncertainty. The residual uncertainty (i.e., additive error) is very important since it represents the lumped effect of the input uncertainties, model structural uncertainties and observational uncertainties (present here at least in case study 3 as mentioned by the authors). The neglection of all these uncertainties is also the reason for the very narrow distribution of the performance metrics in prediction (Fig. 7 and 9). If actual streamflow predictions including error bands were shown, we would probably see that the observations are not covered at all by the error bands, which is a serious shortcoming if we are interested in reliable predictions.
Technical comments:
Line 44: It is not clear to me what you mean by “they can mimic the weight to small improvements in NSE”.
Line 55: did you mean “unsatisfactory”?
Line 5557: I find this sentence incomprehensible
Line 60: “rates” instead of “rate”
Line 65: replace “theoretically statistical” with “statistically sound”, also in other instances if needed
References
McInerney, D. et al. (2017) ‘Improving probabilistic prediction of daily streamflow by identifying Pareto optimal approaches for modeling heteroscedastic residual errors’, Water Resources Research. American Geophysical Union ({AGU}), 53(3), pp. 2199–2239. doi: 10.1002/2016wr019168.

AC2: 'Reply on RC2', Yan Liu, 21 Jan 2022
We thank the reviewer for the helpful comments to improve our manuscript. In the following text, the reviewer comments are shown in regular font, and our pointbypoint replies are shown in italic. Upon revision we will make the following major changes to the manuscript and Supplementary Material:
 We will provide the equation to show details on how the formal likelihood is computed. We will compare our adapted KGE with another formal likelihood functions as suggested, such as log transformation of model errors. In addition, we will also compare our approach with the GLUE framework.
 We will compute the total uncertainty and provide it to the prediction of streamflow. We will discuss the potential influence of parameter uncertainty and total uncertainty on streamflow predictions.
General comments:
This study suggests an approach to adapt the KGE through transformation with a Gamma distribution so that it can be better used as an informal likelihood function in calibration procedures. The study finds that the results and inference behavior when using this adapted KGE measure are very similar to the case when using the RMSE as a likelihood function. In a synthetic case study, it is also shown that the presented approach successfully reinfers the known true parameter values.
The manuscript presents an elegant and innovative approach to a solution for a very relevant problem and could therefore be of high value in many fields. The manuscript is very well written, carefully composed and logically structured, and all in all very convincing.
We thank the review for the positive evaluation of our work.
However, it is a bit too brief I feel in some respects and would need to be extended by some theoretical considerations among others (see comments below).
Major comments:
It is not clear to me what the “formal likelihood function” is in this case. The authors say that it is the RMSE, but it would be useful to show an equation that explicitly states which assumptions w.r.t. distribution type (I assume the normal distribution) and standard deviation this corresponds to. For example, something along the lines of: using the RMSE is equivalent to assuming independently normally distributed errors at each time step in a formal Bayesian inference approach and assuming that the standard deviation is equal to a certain (which?) value at each time step, ideally including the full equation. As is mentioned by the authors, the RMSE is very sensitive to large flows and would not be a typical measure used in formal likelihood approaches in my opinion. There are assumptions that usually work better, such as a standard deviation that is proportional to the predicted streamflow, for example. For a comprehensive overview of the different assumptions on the standard deviation of the residual error (and associate transformations) in formal likelihood approaches, see for example McInerny et al. (2017). In my view, it would make more sense to use one of their suggested approaches as the “formal” approach in this study.
Response: We will provide details of the equation on how the formal likelihood function used in DREAM_{(ZS)}. Thank you for the very nice reference discussing different error models. As discussed by McInerney, et al. (2017), there is no perfect error model that fits for all catchments and simultaneously optimize all performance metrics (such as for both low flow and high flow). The standard error model assuming Gaussian distribution with zero mean and constant variance is widely used, we therefore compared our adapted KGE to it. But to show the robustness of the adapted KGE, we will add a comparison with one of our case studies using another formal likelihood function, such as the log transformation of model errors, which is also discussed and suggested for perennial catchments in McInerney, et al. (2017). We will also compare our approach with the GLUE framework suggested in the major comment 4 of Reviewer #1.
On a related note, the standard deviation of the additive error in formal likelihood approaches is an important parameter that needs to be used in prediction as well. The authors infer the posterior parameter distributions of the model parameters and then use these posteriors for prediction. This is fine if only parametric uncertainty is relevant, but by this, they completely neglect all other sources of uncertainty. The residual uncertainty (i.e., additive error) is very important since it represents the lumped effect of the input uncertainties, model structural uncertainties and observational uncertainties (present here at least in case study 3 as mentioned by the authors). The neglection of all these uncertainties is also the reason for the very narrow distribution of the performance metrics in prediction (Fig. 7 and 9). If actual streamflow predictions including error bands were shown, we would probably see that the observations are not covered at all by the error bands, which is a serious shortcoming if we are interested in reliable predictions.
Response: Thank you for the suggestion. In the current manuscript, we only show the uncertainty caused by parameter. In revision, we will add the other uncertainty to the prediction and discuss the influence of the parameter uncertainty and total uncertainty on streamflow predictions.
Technical comments:
Line 44: It is not clear to me what you mean by “they can mimic the weight to small improvements in NSE”.
Response: It means the small improvement in NSE can also be identified and leads to the chain evolution. We will make it clearer in revision.
Line 55: did you mean “unsatisfactory”?
Response: Yes, we will change it.
Line 5557: I find this sentence incomprehensible
Response: Here we mean that the number of measurements cannot be considered. Therefore, with increasing number of measurements, the information added to the performance measure is little, thus preventing the improvement of chain evolution. In revision, we will update it to make it more comprehensible.
Line 60: “rates” instead of “rate”
Response: We will change it.
Line 65: replace “theoretically statistical” with “statistically sound”, also in other instances if needed
Response: We will update it.
References
McInerney, D. et al. (2017) ‘Improving probabilistic prediction of daily streamflow by identifying Pareto optimal approaches for modeling heteroscedastic residual errors’, Water Resources Research. American Geophysical Union ({AGU}), 53(3), pp. 2199–2239. doi: 10.1002/2016wr019168.

AC2: 'Reply on RC2', Yan Liu, 21 Jan 2022
Yan Liu et al.
Yan Liu et al.
Viewed
HTML  XML  Total  Supplement  BibTeX  EndNote  

684  135  12  831  50  6  5 
 HTML: 684
 PDF: 135
 XML: 12
 Total: 831
 Supplement: 50
 BibTeX: 6
 EndNote: 5
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1