An improved Approximate Bayesian Computation approach for high-dimensional posterior exploration of hydrological models

Liu, Song; She, Dunxian; Zhang, Liping; Xia, Jun

doi:https://doi.org/10.5194/hess-2022-414

Preprints

https://doi.org/10.5194/hess-2022-414

Preprints

31 Jan 2023

| 31 Jan 2023

Status: this preprint was under review for the journal HESS. A final paper is not foreseen.

An improved Approximate Bayesian Computation approach for high-dimensional posterior exploration of hydrological models

Song Liu, Dunxian She, Liping Zhang, and Jun Xia

Abstract. The Approximate Bayesian computation (ABC) methods provide a powerful tool for sampling from Bayesian posteriors for cases where we can simulate samples, but we have no access to an explicit expression of the likelihood function. The Simulated Annealing ABC (SABC) algorithm has been proposed to achieve a fast convergence to an unbiased approximation to the posterior by adaptively decreasing an initially coarse tolerance value. However, this algorithm uses a rather simplistic random walk Metropolis (RWM) sampler to generate trial moves in a Markov chain and always requires an excessive number of model evaluations for approximating the posterior, which inevitably lowers the sampling efficiency and limits its applications in more complex hydrologic modelling practices. Inspired by the advances made in Markov Chain Monte Carlo (MCMC) methods, we incorporated an adaptive Differential Evolution scheme to enhance the efficiency of SABC sampling. This scheme has its roots within Differential Evolution Markov Chains (DE-MC), and additionally utilizes a self-adaptive randomized subspace sampling strategy to optimally select the dimensions of parameters to be updated each time a proposal is generated. The superiority of the modified SABC (mSABC) over the original SABC algorithm was demonstrated through a SAC-SMA application to the Danjiangkou Reservoir region (DRR). The case study results showed that mSABC was far more efficient with lower computation costs and higher acceptance rates, and achieved higher numerical accuracy than the original SABC algorithm. mSABC also resulted in a better overall prediction of streamflow time series and signatures. The introduction of more advanced MCMC sampler into SABC helps to speed up convergence to the approximate posterior while achieving better model performance, which significantly widens the applicability of SABC to complex posterior exploration problems.

This preprint has been withdrawn.

Received: 12 Dec 2022 – Discussion started: 31 Jan 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 2592 KB)

Withdrawal notice
This preprint has been withdrawn.
Preprint (2592 KB)

Download & links

This preprint has been withdrawn.

Song Liu, Dunxian She, Liping Zhang, and Jun Xia

Interactive discussion

Status: closed

RC1:
'Comment on hess-2022-414', Anonymous Referee #1, 03 Apr 2023

The authors are suggesting an improvement of SABC by replacing the RWM update by a differential evolution-type of update.
As they point out, such particle-based update steps tend to be superior for high-dimensional or complicated-shaped posteriors. However, ABC is not suitable for high-dimensional posteriors. Due to the curse of dimensionality, we must restrict ABC to few summary statistics, and hence we can only hope to infer few parameters as well. Furthermore, in the case study presented, I see no indication of a complicated shape. The marginals that are shown are certainly not complicated, and two-dimensional scatterplots are not provided. This makes me wonder why mSABC is then so much better than SABC. The original SABC algorithm suggests using the empirical covariance of the population scaled by a tuning parameter beta<1. Unfortunately, I have not found the authors´ choice of the beta parameter, neither of the other tuning parameters: the annealing speed and the rejuvenation step. A proper choice of these parameters is crucial for SABC to work properly, and I´m wondering whether the bad performance of SABC is simply due to a bad tuning.
Summarizing, while I´d be happy to see improvements of the SABC update step, I´m not yet presented with sufficient evidence to believe that the suggested improvement is of practical relevance.
A few more minor comments:
L 98: Sisson et al 2007 present a wrong algorithm, which should no longer be cited.
L 99: I believe such algorithms are referred to as SMC-ABC not ABC-PMC.
Eq (2) and below: Please use standard notation: boldface “a” without index, for a vector, and non-boldface “a_i” with index for its components. And I believe it should say “spanning the parameter subspace, not “stretching”.
Result section: The true posterior, using all the data, is available here and easy to sample from. So why not comparing against it?
Fig. 3 (a): This is an unfair comparison, as one mSABC iteration is more costly than one SABC iteration if I understand correctly.
L 458ff: There is a growing body of literature about how to find near-sufficient statistics by means of machine learning (see e.g. Albert et al, SciPost Physics Core 5, 043 (2022) and references therein).

Citation: https://doi.org/10.5194/hess-2022-414-RC1
- AC1: 'Reply on RC1', Song Liu, 06 Apr 2023
  
  The authors are suggesting an improvement of SABC by replacing the RWM update by a differential evolution-type of update.
  As they point out, such particle-based update steps tend to be superior for high-dimensional or complicated-shaped posteriors. However, ABC is not suitable for high-dimensional posteriors. Due to the curse of dimensionality, we must restrict ABC to few summary statistics, and hence we can only hope to infer few parameters as well. Furthermore, in the case study presented, I see no indication of a complicated shape. The marginals that are shown are certainly not complicated, and two-dimensional scatterplots are not provided. This makes me wonder why mSABC is then so much better than SABC. The original SABC algorithm suggests using the empirical covariance of the population scaled by a tuning parameter beta<1. Unfortunately, I have not found the authors´ choice of the beta parameter, neither of the other tuning parameters: the annealing speed and the rejuvenation step. A proper choice of these parameters is crucial for SABC to work properly, and I´m wondering whether the bad performance of SABC is simply due to a bad tuning.
  Summarizing, while I´d be happy to see improvements of the SABC update step, I´m not yet presented with sufficient evidence to believe that the suggested improvement is of practical relevance.
  
  Reply: Thank you for your critical suggestion. We believe that although ABC always operates on a vector of low-dimensional summary statistics, ABC helps to explore low- and high-dimensional posteriors when the summary statistics are sufficient. The major challenge in high-dimensional posterior exploration lies in the sampling efficiency of ABC to provide accurate parameter estimates. In the present study, we intentionally used a calibration of the 14-parameter SAC-SMA hydrological model. This has been suggested to be a challenging task due to complex posterior surfaces and thus frequently utilized as a benchmark hydrologic modelling experiment for validation of algorithmic enhancements. For this reason, we did not provide two-dimensional scatterplots in the present paper.
  Finally, we agree that a proper choice of these parameters is crucial for SABC (and mSABC) to work properly. With regard to the tuning of algorithmic parameters, we adopted identical inference configurations of the original SABC algorithm as Fenicia et al. (2018), which has shown to produce satisfying inference results for real-world HYMOD experiments. Similarly, for mSABC, we used the recommended algorithmic parameter settings by previous applications (e.g., the DREAM algorithm of Vrugt et al. (2009)). Hopefully, we expect to relieve the influence of inappropriate tuning parameter values on fair comparison of these two ABC algorithms in this way.
  
  References:
  Fenicia, F., Kavetski, D., Reichert, P., and Albert, C.: Signature‐domain calibration of hydrological models using Approximate Bayesian Computation: Empirical analysis of fundamental properties, Water Resour. Res., 54, 3958-3987, https://doi.org/10.1002/2017WR021616, 2018.
  Vrugt, J.A., ter Braak, C.J.F., Diks, C.G.H., Higdon, D., Robinson, B.A., Hyman, J.M., 2009a. Accelerating Markov chain Monte Carlo simulation by differential evolution with self-adaptive randomized subspace sampling. Int. J. Nonlinear Sci. Numer. Simul. 10 (3), 273e290.
  
  A few more minor comments:
  
  L 98: Sisson et al 2007 present a wrong algorithm, which should no longer be cited.
  Reply: Thank you for pointing out the problem in our manuscript. This wrong citation will be removed from the revised manuscript.
  
  L 99: I believe such algorithms are referred to as SMC-ABC not ABC-PMC.
  Reply: As stated by Section 2.2 of Sadegh & Vrugt (2014), These methods are … also referred to as ‘‘quantum Monte Carlo,’’ ‘‘transfer-matrix Monte Carlo,’’ “Monte Carlo filter,’’ ‘‘particle filter,’’ and ‘‘sequential Monte Carlo”. So, both terminologies is right. Here we preferred the terminology “ABC-PMC” as adopted by Sadegh & Vrugt (2014).
  
  Reference: Sadegh, M., and J. A. Vrugt (2014), Approximate Bayesian Computation using Markov Chain Monte Carlo simulation: DREAM(ABC), Water Resour. Res., 50, 6767–6787, doi:10.1002/2014WR015386.
  
  Eq (2) and below: Please use standard notation: boldface “a” without index, for a vector, and non-boldface “a_i” with index for its components. And I believe it should say “spanning the parameter subspace, not “stretching”.
  Reply: Thank you for your detailed suggestions. For Eq (2), we use the same definition as Eq (23) from Vrugt (2016). And we agree with the referee’s recommendation on replacing “spanning” with “stretching”.
  
  Reference: Vrugt, J. A.: Markov chain Monte Carlo simulation using the DREAM software package: Theory, concepts, and MATLAB implementation, Environ. Modell. Softw., 75, 273-316, https://doi.org/10.1016/j.envsoft.2015.08.013, 2016.
  
  Result section: The true posterior, using all the data, is available here and easy to sample from. So why not comparing against it?
  Reply: We currently restrict the comparison to the original SABC and mSABC algorithms only. Although classical MCMC sampling approaches can be easy to be implemented and compared, a fair comparison with ABC algorithms is almost impossible as the sufficiency of summary statistics is difficult to be satisfied in the present study. The vector of eight randomly selected signatures in this study is expected to be insufficient to capture all relevant information in raw time series. This fundamentally lowers the accuracy of parameter estimates and model predictions by ABC. The comparison to MCMC methods is meaningless, although we believe that the sufficiency of summary statistics and its impact on the ABC results are a good topic to be discussed in future publications.
  
  Fig. 3 (a): This is an unfair comparison, as one mSABC iteration is more costly than one SABC iteration if I understand correctly.
  Reply: Thank you for your constructive suggestion. In the iteration step, one mSABC iteration requires 3 model evaluations, as 3 Markov chains are executed sequentially in mSABC. So, it is true that one mSABC iteration is more costly than one SABC iteration. Meanwhile, it generally requires a little bit more time to run DREAM-Core sampling than simple RWM sampling. However, when it comes to time-consuming hydrological modelling, the additional time by more complex MCMC sampling is well compensated by fewer number of model evaluations. So, we think that the cost is acceptable.
  
  L 458ff: There is a growing body of literature about how to find near-sufficient statistics by means of machine learning (see e.g. Albert et al, SciPost Physics Core 5, 043 (2022) and references therein).
  Reply: Thank you for your good suggestion. There is plentiful literature about how to find near-sufficient statistics by means of Partial least squares (Wegmann et al., 2009), information-theory (Barnes et al., 2011), and other statistical methods like machine learning (see e.g. Albert et al, SciPost Physics Core 5, 043 (2022) and references therein). However, the application to complex hydrological models is rarely discussed in hydrology literature. An example is given by Liu et al. (2022), where information redundancy analysis and discriminatory power analysis are jointly applied in pursuit of approximately sufficient statistics.
  
  References:
  Wegmann, D., C. Leuenberger, and L. Excoffier (2009), Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood, Genetics, 182(4), 1207–1218.
  Barnes, C., S. Filippi, M. P. H. Stumpf, and T. Thorne (2011), Considerate approaches to achieving sufficiency for ABC model selection, ARXIV stat.CO, 1–21. [Available at http://arxiv.org/pdf/1106.6281v2.pdf.]
  Liu, S., She, D., Zhang, L., and Xia, J.: A hybrid time- and signature-domain Bayesian inference framework for calibration of hydrological models: a case study in the Ren River basin in China, Stoch. Environ. Res. Risk A., https://doi.org/10.1007/s00477-022-02282-3, 2022a.
  
  Citation: https://doi.org/10.5194/hess-2022-414-AC1
RC2:
'Comment on hess-2022-414', Anonymous Referee #2, 03 Apr 2023

To set the context for my review, I believe that the research hydrology community might benefit by not dwelling on incremental updates to methods for fitting lumped models with small parameter sets. In light of this statement, it is not impossible to do good work in this field, but such work needs to meet a high standard for exposition, code sharing/reproducibility, and completeness of analysis. The writing and level of detail in the introduction provides a good overview of where and how ABC fits into computational hydrology and serves as a nice primer on the topic. However, I would ask the authors to do more for reproducibility by putting their code and data on a public repository. Most of the rest of my comments focus on the motivation and analysis. Given that the coauthors were involved in an adjacent study using the same model and the same dataset, it is even more important that these data and methods are made available.

I am not convinced that approximate Bayesian computation is properly motivated with rationale provided in this article and related articles on the usage of ABC for hydrology models of modest complexity. The key reason for its popularity in other disciplines such as phylogenetic, is its applicability when even computation of the likelihood itself is infeasible and likelihood approximations. Its connection to hydrology is of dubious origin; ABC appears to have been of interest to hydrologists because of its close connections to GLUE, an approach to model calibration and uncertainty quantification which lacked the kinds of guarantees for empirical risk minimization (i.e. expected loss taken with regard to an adequate measure) which are the centerpiece of most modern statistical approaches.

Another conceptual issue which I would like to raise with this article is that this article is fundamentally about comparing an accept/reject scheme versus one which uses differential evolution. Recall that the novelty of the initial research about the DREAM algorithm was that it took a popular heuristic for global optimization, differential evolution, and made it into a proper MCMC algorithm satisfying detailed balance and reversibility. Thus, it seems that a DREAM-inspired ABC algorithm is just bare differential evolution itself, a topic that has been visited before in the past. For an example, see Zhang et al. 2008, “Evaluation of global optimization algorithms for parameter calibration of a computationally intensive hydrologic model”. In this regard, stripping away the MCMC machinery just leaves us with the original DE heuristic.

With regard to the paper’s analyses, the comparison of sampling effectiveness is lacking on multiple axes. The authors’ lack of precision in terminology makes it very difficult to understand the sampling efficiency for each algorithm. Under a Markov chain-based sampling scheme, there will typically be some autocorrelation between samples but it is not clear if they are reporting the number of equivalent independent samples (obtained by adjusting the samples drawn by some index of autocorrelation) or simple draws. The discussion of acceptance rate is informative, but not a substitute for a measure of effective sample size. The comparison study is also deficient in the sense that it’s not clear at what point SABC converged to its final distribution; if it turned out to be earlier than the number of iterations reported, then the fact that mSABC reached a similar point with 30% fewer forward evaluations might just be an artifact of the analysis design.

To remedy these issues, I recommend that the authors also work with synthetic streamflow data so that they might have access to the ground truth parameter values. I also recommend that they run both the the SABC algorithm for a much longer timespan than done previously to be reasonably sure that convergence is achieved. If the mSABC and SABC samples do not converge completely when using the same distance function and same priors, then there are deeper issues which must be addressed. The use of further diagnostics like the Gelman-Rubin statistic with multiple chains would also be necessary to ensure that the chains agree and appear to have reached convergence. I also recommend that whenever the authors speak of faster or more efficient progress, they provide a precise definition of exactly what they mean in this regard. I suspect they typically use the number of forward evaluations in this capacity; this number is perfectly suitable for that purpose.

Minor comments:

Line 95: The article “an” is missing from before “excessive”

Line 130: This description is actually for the adaptive proposal version of RWM (see Haario et al. 1999, Adaptive proposal distribution for random walk Metropolis algorithm

Line 138: Strictly speaking, there are many problems for which RWM is adequate. They tend to involve a posterior dimensionality of 10 or so. In fact, I am not convinced the problem case shown calls for something for complicated. The statement that “it always requires an excessive number…” may be a bit of an overreach in this regard. Perhaps just making a statement akin to line 244-245 is adequate, as this is an accurate characterization of the fundamental flaw with RWM.

Line 225: It’s unclear what the phrase “…as a function of mean fields U of the prior ensemble…” means. Further clarification would be helpful.

Line 380: I would recommend making an attempt at the proof. We have an abundance of results from the 2003 PNAS ABC paper as well as the original DREAM paper which will likely contain the necessary elements (if they exist).

Line 585: “Deteriorate” appears to be unnecessary here.

Citation: https://doi.org/10.5194/hess-2022-414-RC2
- AC2: 'Reply on RC2', Song Liu, 06 Apr 2023
  
  To set the context for my review, I believe that the research hydrology community might benefit by not dwelling on incremental updates to methods for fitting lumped models with small parameter sets. In light of this statement, it is not impossible to do good work in this field, but such work needs to meet a high standard for exposition, code sharing/reproducibility, and completeness of analysis. The writing and level of detail in the introduction provides a good overview of where and how ABC fits into computational hydrology and serves as a nice primer on the topic. However, I would ask the authors to do more for reproducibility by putting their code and data on a public repository. Most of the rest of my comments focus on the motivation and analysis. Given that the coauthors were involved in an adjacent study using the same model and the same dataset, it is even more important that these data and methods are made available.
  
  Reply: We appreciate the referee’s approval and encouragement on the writing and level of detail in the introduction section. And we also agree with the referee’s concern on the reproducibility of the current experiment. Though we focused only on typical hydrologic modelling experiment for validation of algorithmic enhancements, we’d also like to make relevant codes and data available by putting them in a public repository (e.g., GitHub). This will be made available in near future. Interested readers can also directly email us if necessary.
  
  I am not convinced that approximate Bayesian computation is properly motivated with rationale provided in this article and related articles on the usage of ABC for hydrology models of modest complexity. The key reason for its popularity in other disciplines such as phylogenetic, is its applicability when even computation of the likelihood itself is infeasible and likelihood approximations. Its connection to hydrology is of dubious origin; ABC appears to have been of interest to hydrologists because of its close connections to GLUE, an approach to model calibration and uncertainty quantification which lacked the kinds of guarantees for empirical risk minimization (i.e. expected loss taken with regard to an adequate measure) which are the centerpiece of most modern statistical approaches.
  
  Another conceptual issue which I would like to raise with this article is that this article is fundamentally about comparing an accept/reject scheme versus one which uses differential evolution. Recall that the novelty of the initial research about the DREAM algorithm was that it took a popular heuristic for global optimization, differential evolution, and made it into a proper MCMC algorithm satisfying detailed balance and reversibility. Thus, it seems that a DREAM-inspired ABC algorithm is just bare differential evolution itself, a topic that has been visited before in the past. For an example, see Zhang et al. 2008, “Evaluation of global optimization algorithms for parameter calibration of a computationally intensive hydrologic model”. In this regard, stripping away the MCMC machinery just leaves us with the original DE heuristic.
  
  Reply: In its origin, ABC is proposed to handle Bayesian inference problems where the (formal) likelihood is infeasible, or computationally expensive to evaluate. In this sense, ABC is more related to GLUE: GLUE uses informal likelihood measures, whereas ABC replaces the computation of the likelihood by the introduction of few summary statistics. Given sufficient summary statistics, ABC leads to an approximation of the true posterior as the deviation between the measured and modelled summary statistics approaches zero. This constitutes the core of the rationale of ABC.
  With regard to the current article, we here suggested an improvement of SABC by replacing the RWM update by a differential evolution-type of update. The differential evolution-type techniques are not new, and a variety of studies has undertaken valuable analysis and discussions on this topic. We fundamentally compared an accept/reject scheme versus one which uses differential evolution, yet within a ABC framework. The use of differential evolution in SABC helps to extend the original SABC algorithm to high-dimensional posterior exploration. The innovation of this article lies not in the enhancement of MCMC sampling algorithm itself, but in its practical value in improving the current ABC approaches.
  
  With regard to the paper’s analyses, the comparison of sampling effectiveness is lacking on multiple axes. The authors’ lack of precision in terminology makes it very difficult to understand the sampling efficiency for each algorithm. Under a Markov chain-based sampling scheme, there will typically be some autocorrelation between samples but it is not clear if they are reporting the number of equivalent independent samples (obtained by adjusting the samples drawn by some index of autocorrelation) or simple draws. The discussion of acceptance rate is informative, but not a substitute for a measure of effective sample size. The comparison study is also deficient in the sense that it’s not clear at what point SABC converged to its final distribution; if it turned out to be earlier than the number of iterations reported, then the fact that mSABC reached a similar point with 30% fewer forward evaluations might just be an artifact of the analysis design.
  
  Reply: We feel regret for lack of precision in terminology and consequent confusion in understanding the sampling efficiency for each algorithm. Though both algorithms apply a Markov chain-based sampling scheme, as stated in Line 285-289, the starting point of a chain is drawn from the prior ensemble each time a proposal is generated. The autocorrelation analysis typically used in classical Markov chain-based sampling scheme is not suitable in the context of the present study. Secondly, with regard to the convergence of SABC, we adopted identical parameter settings of SABC as Fenicia et al. (2018), which has shown to achieve satisfying convergence to the approximate posterior through validation of synthetic datasets and real hydrological data. In the absence of the true posteriors, we treated the posterior derived by the original SABC as a benchmark. The influence of inappropriate number of iterations reported to achieve convergence on fair comparison can be hopefully largely relieved.
  
  Reference: Fenicia, F., Kavetski, D., Reichert, P., and Albert, C.: Signature‐domain calibration of hydrological models using Approximate Bayesian Computation: Empirical analysis of fundamental properties, Water Resour. Res., 54, 3958-3987, https://doi.org/10.1002/2017WR021616, 2018.
  
  To remedy these issues, I recommend that the authors also work with synthetic streamflow data so that they might have access to the ground truth parameter values. I also recommend that they run both the SABC algorithm for a much longer timespan than done previously to be reasonably sure that convergence is achieved. If the mSABC and SABC samples do not converge completely when using the same distance function and same priors, then there are deeper issues which must be addressed. The use of further diagnostics like the Gelman-Rubin statistic with multiple chains would also be necessary to ensure that the chains agree and appear to have reached convergence. I also recommend that whenever the authors speak of faster or more efficient progress, they provide a precise definition of exactly what they mean in this regard. I suspect they typically use the number of forward evaluations in this capacity; this number is perfectly suitable for that purpose.
  
  Reply: We appreciate the referee’s valuable suggestions on further improvement of the current article, including the experiment on synthetic datasets, the validation of the convergence of SABC, and also clear and precise presentation of the sampling efficiency. These are exactly what we shall do and will do in later revision of the manuscript.
  
  Minor comments:
  
  Line 95: The article “an” is missing from before “excessive”
  Reply: Thank you for pointing out the mistake in our manuscript. And this shall be corrected in the revised manuscript.
  
  Line 130: This description is actually for the adaptive proposal version of RWM (see Haario et al. 1999, Adaptive proposal distribution for random walk Metropolis algorithm
  Reply: Thank you for your detailed suggestion. We do not precisely distinguish two versions of RWM with and without the adaptive proposal currently. To make the statement more accurate, we’d like to restrict it to the adaptive proposal version of RWM as the referee suggested.
  
  Line 138: Strictly speaking, there are many problems for which RWM is adequate. They tend to involve a posterior dimensionality of 10 or so. In fact, I am not convinced the problem case shown calls for something for complicated. The statement that “it always requires an excessive number…” may be a bit of an overreach in this regard. Perhaps just making a statement akin to line 244-245 is adequate, as this is an accurate characterization of the fundamental flaw with RWM.
  Reply: Thank you for your good suggestion. We’d like to remove the corresponding statement and restrict it to the fundamental limitation of RWM to make it more accurate.
  
  Line 225: It’s unclear what the phrase “…as a function of mean fields U of the prior ensemble…” means. Further clarification would be helpful.
  Reply: Thank you for your good suggestion. We agree that further clarification on it is necessary. According to the original paper of Albert et al. (2015) where the SABC algorithm is firstly proposed, the phrase “…as a function of mean fields U of the prior ensemble…” means “…as a function of the average of the redefined ABC distance in the prior ensemble”. The exact formulation of the function can be referred to Eq (32) of Albert et al. (2015).
  
  Reference: Albert, C., Künsch, H. R., and Scheidegger, A.: A simulated annealing approach to approximate Bayes computations, Stat. Comput., 25, 1217-1232, https://doi.org/10.1007/s11222-014-9507-8, 2015.
  
  Line 380: I would recommend making an attempt at the proof. We have an abundance of results from the 2003 PNAS ABC paper as well as the original DREAM paper which will likely contain the necessary elements (if they exist).
  Reply: Thank you for your good suggestion. A formal proof of convergence of the DREAM algorithm (i.e., the DREAM-Core sampling in our study) is available in Vrugt et al. (2009). However, a formal proof of convergence of mSABC using the DREAM-Core sampling is difficult. This makes the validation of computational convergence necessary.
  
  Reference: Vrugt, J.A., ter Braak, C.J.F., Diks, C.G.H., Higdon, D., Robinson, B.A., Hyman, J.M., 2009a. Accelerating Markov chain Monte Carlo simulation by differential evolution with self-adaptive randomized subspace sampling. Int. J. Nonlinear Sci. Numer. Simul. 10 (3), 273e290.
  
  Line 585: “Deteriorate” appears to be unnecessary here.
  Reply: Thank you for pointing out the mistake in our manuscript. And this shall be corrected in the revised manuscript.
  
  Citation: https://doi.org/10.5194/hess-2022-414-AC2
RC3:
'Comment on hess-2022-414', Anonymous Referee #3, 04 Apr 2023
General comments
This paper developed a modified Simulated Annealing Approximation Bayesian Computation (mSABC) method for model parameter inference. The mSABC method is tested by a real case study that applies the SAC-SMA model to the Danjiangkou reservoir region. The paper fits the scope of the HESS. However, the innovation of this research is insufficient to me; the results did not prove that the proposed method is better than the SABC method; and the presentation of this paper need to be improved. Please see below for my detailed comments. I suggest rejecting and encouraging resubmission.
Major comments to the authors
Innovation limit. The authors improve the Simulated Annealing ABC (SABC) method by replacing its original random walk Metropolis (RWM) sampling with the adaptive MCMC sampling. The adaptive MCMC sampling is from the existing DREAM algorithm (Vrugt et al., 2019). This is not a substantial contribution to scientific progress.

A more critical issue is that the advantages of mSABC over SABC in parameter determination have not been well proven. In Figure 6, mSABC does not bring improved streamflow simulation results compared with SABC. From the probabilistic perspective, mSABC underestimates the peak flow in the high-flow period, though it produces a narrower uncertainty band. From the deterministic perspective, mSABC gets worse RMSE and CC metric results than SABC. The authors attribute the underestimation to two reasons: inaccurate observations and the choice of hydrologic signatures. However, I think this can also be attributed to the bad posterior parameter distribution.

Moreover, Figure 4 cannot prove mSABC generates better posterior parameter estimates than SABC. Also, I don’t understand why the authors say SABC fails to correctly infer the target distribution (line 393). I suggest the authors using a synthetic study where the true parameter inference results exist, this will help tell whether mSABC is better than SABC.

Last, I don’t understand how the conclusion of lines 546-548 is derived from Figure 5. I honestly don’t think Figure 5 supports this conclusion.

Experiment design flaw. First, on Line 352, the authors compute the distance metric as the average of eight metrics. By checking Appendix A, it looks like the eight statistic metrics do not share the same unit. If so, it does not make sense to compute their average as the distance metric.

Second, on line 382, the authors state that the SABC method is already proven to converge to the correct target distribution in previous applications by Fenicia et al (2018). I recommend the authors clarifying if Fenicia et al (2018) used the same case study and inference configurations as yours. If not, there is no evidence to tell that SABC has converged to the correct target distribution in your case study.

The presentation of this paper has not met the publication standard of HESS and needs to be improved. Some sentences are hard to understand or have grammar mistakes (e.g., lines 428). Some terms are not well established. For example, line simplistic, proposal, concept (line 169). Please check the appropriateness of them, check the literature, and use the correct terminology.

Last, the authors used too many abbreviations, readers may get lost and cannot recall all their meanings before finishing the article. Can the authors please reduce some of them, especially if the abbreviations are not used often? For example, GLUE, DE-MC, DRR, PMC.

Minor comments to the authors
Line 106. Is “ABC-REJ” a typo here? If not, please provide the full name of “ABC-REJ” when it appears for the first time.

Line 151. Is “d” needed?

Lines 214-220. Are these sentences model/system dependent? Should they be included in the Method section?

Line 263, “utilized” in the past tense?

Lines 272-274, please rephrase the meaning of “c_*”.
Citation: https://doi.org/10.5194/hess-2022-414-RC3
- AC3:
  'Reply on RC3', Song Liu, 06 Apr 2023
  General comments
  
  This paper developed a modified Simulated Annealing Approximation Bayesian Computation (mSABC) method for model parameter inference. The mSABC method is tested by a real case study that applies the SAC-SMA model to the Danjiangkou reservoir region. The paper fits the scope of the HESS. However, the innovation of this research is insufficient to me; the results did not prove that the proposed method is better than the SABC method; and the presentation of this paper need to be improved. Please see below for my detailed comments. I suggest rejecting and encouraging resubmission.
  
  Reply: Thank you for the constructive suggestions and comments on further improvement of the current work. We addressed the referee’ comments and questions detailly as below.
  
  Major comments to the authors
  
  Innovation limit. The authors improve the Simulated Annealing ABC (SABC) method by replacing its original random walk Metropolis (RWM) sampling with the adaptive MCMC sampling. The adaptive MCMC sampling is from the existing DREAM algorithm (Vrugt et al., 2019). This is not a substantial contribution to scientific progress.
  
  Reply to Comment 1: The innovation is limited when it comes to the algorithmic enhancements of adaptive MCMC sampling. The sampling scheme is not new, and is also intensively discussed in past literature. However, we believe that the novelty of this article lies more in its practical value in addressing the dimensionality issue of the original SABC algorithm and improving its efficiency in handling complex posterior exploration. By incorporating the adaptive MCMC sampling into SABC, the sampling efficiency and model performance are both significantly improved when dealing with high-dimensional posterior exploration. This is a big improvement in improving the performance of the original SABC algorithm and extending it to a broader application in hydrological modelling practices.
  A more critical issue is that the advantages of mSABC over SABC in parameter determination have not been well proven. In Figure 6, mSABC does not bring improved streamflow simulation results compared with SABC. From the probabilistic perspective, mSABC underestimates the peak flow in the high-flow period, though it produces a narrower uncertainty band. From the deterministic perspective, mSABC gets worse RMSE and CC metric results than SABC. The authors attribute the underestimation to two reasons: inaccurate observations and the choice of hydrologic signatures. However, I think this can also be attributed to the bad posterior parameter distribution.
  
  Moreover, Figure 4 cannot prove mSABC generates better posterior parameter estimates than SABC. Also, I don’t understand why the authors say SABC fails to correctly infer the target distribution (line 393). I suggest the authors using a synthetic study where the true parameter inference results exist, this will help tell whether mSABC is better than SABC.
  
  Last, I don’t understand how the conclusion of lines 546-548 is derived from Figure 5. I honestly don’t think Figure 5 supports this conclusion.
  
  Reply to Comments 2-4: Firstly, in the absence of a comprehensive measure of model predictions, we must choose a variety of mutually competing metrics for overall evaluation of the predictions. The conflict between the narrower uncertainty band and the underestimation of high flows caused by lower coverage ratio is not rare in literature (e.g., Xiong et al., 2009). A solution to this issue is to introduce additional evaluation criteria, e.g., PQQ plots and evaluation of the middle points of ensemble predictions. Also, we noticed that a slightly poorer performance of RMSE and CC could be compensated by a largely better performance of RB and RD. We must make a compromise among these competing evaluation measures.
  Secondly, mSABC generates better posterior parameter estimates than SABC, considering that mSABC produces sharper marginal distributions, especially for parameters ADIMP, LZPK and σ. This suggests that mSABC has lower parameter uncertainty than SABC. As for Line 393, this can be demonstrated by the final boxplots of posterior parameter samples: the boxplots of SABC exhibits evident differences from those of mSABC, especially for parameters ADIMP, LZPK and σ. This is consistent with Figure 4. In other words, SABC fails to converge to the approximate posterior derived by mSABC. The boxplots of SABC with 2 million iterations are similar to those of mSABC with only 200k iterations, far from the final stable boxplots of mSABC. In the meantime, we agree that a synthetic experiment with known posteriors might be more suitable for comparison of these two algorithms.
  Finally, the conclusion of lines 546-548 is demonstrated by the performance of high-flows-related signatures (e.g., Q₁₀ and HPC) given in Figure 5. Compared to the posterior distributions of relative errors of Q₁₀ and HPC by SABC, the proportion of samples that overestimate the measured signature values (i.e., with negative error values) are largely reduced for mSABC. This implies that mSABC achieved better performance for high-flow events than SABC.
  Experiment design flaw. First, on Line 352, the authors compute the distance metric as the average of eight metrics. By checking Appendix A, it looks like the eight statistic metrics do not share the same unit. If so, it does not make sense to compute their average as the distance metric.
  
  Second, on line 382, the authors state that the SABC method is already proven to converge to the correct target distribution in previous applications by Fenicia et al (2018). I recommend the authors clarifying if Fenicia et al (2018) used the same case study and inference configurations as yours. If not, there is no evidence to tell that SABC has converged to the correct target distribution in your case study.
  
  Reply to Comments 5-6: Firstly, as formulated by Equation (9), the distance metric is computed as the average of the absolute relative errors of all eight signatures, instead of the average of eight signatures themselves. This is also common practice in literature (e.g., Fenicia et al., 2018).
  Secondly, we understand the referee’s concern on the convergence of SABC. We do use the same inference configurations as Fenicia et al. (2018) but with a different real-data experiment. In fact, the codes of the original SABC algorithm are acquired directly from the authors of Fenicia et al. (2018).
  The presentation of this paper has not met the publication standard of HESS and needs to be improved. Some sentences are hard to understand or have grammar mistakes (e.g., lines 428). Some terms are not well established. For example, line simplistic, proposal, concept (line 169). Please check the appropriateness of them, check the literature, and use the correct terminology.
  
  Last, the authors used too many abbreviations, readers may get lost and cannot recall all their meanings before finishing the article. Can the authors please reduce some of them, especially if the abbreviations are not used often? For example, GLUE, DE-MC, DRR, PMC.
  
  Reply to Comments 7-8: We feel regret for imperfect representation of this paper. And we’d like to further improve the writing of the paper by simplifying the sentences, using the correct terminology and reducing the abbreviations seldom used.
  
  Minor comments to the authors
  
  Line 106. Is “ABC-REJ” a typo here? If not, please provide the full name of “ABC-REJ” when it appears for the first time.
  
  Reply: Thank you for your suggestion. The full name of ABC-REJ will be added when it appears for the first time.
  
  Line 151. Is “d” needed?
  
  Reply: Thank you for pointing out the mistake in our manuscript. The “d” will be removed in the revised manuscript.
  
  Lines 214-220. Are these sentences model/system dependent? Should they be included in the Method section?
  
  Reply: Thank you for your concern. I believe that a brief introduction of the rationale of the original SABC algorithm is necessary in the present article. This helps interested readers acquire basic knowledge of SABC. Moreover, it explains the meaning of the tolerance values. So, we’d like to keep the statement in the revised manuscript.
  
  Line 263, “utilized” in the past tense?
  
  Reply: Thank you for pointing out the mistake in our manuscript. “utilizes” in the present tense should be used instead.
  
  Lines 272-274, please rephrase the meaning of “c*”.
  
  Reply: Thank you for your detailed suggestion. We here keep the description of the meaning of “c*” in consistent with the original paper of Vrugt (2016).
  
  Reference: Vrugt, J. A.: Markov chain Monte Carlo simulation using the DREAM software package: Theory, concepts, and MATLAB implementation, Environ. Modell. Softw., 75, 273-316, https://doi.org/10.1016/j.envsoft.2015.08.013, 2016.
  
  Citation: https://doi.org/10.5194/hess-2022-414-AC3

Interactive discussion

Status: closed

RC1:
'Comment on hess-2022-414', Anonymous Referee #1, 03 Apr 2023

The authors are suggesting an improvement of SABC by replacing the RWM update by a differential evolution-type of update.
As they point out, such particle-based update steps tend to be superior for high-dimensional or complicated-shaped posteriors. However, ABC is not suitable for high-dimensional posteriors. Due to the curse of dimensionality, we must restrict ABC to few summary statistics, and hence we can only hope to infer few parameters as well. Furthermore, in the case study presented, I see no indication of a complicated shape. The marginals that are shown are certainly not complicated, and two-dimensional scatterplots are not provided. This makes me wonder why mSABC is then so much better than SABC. The original SABC algorithm suggests using the empirical covariance of the population scaled by a tuning parameter beta<1. Unfortunately, I have not found the authors´ choice of the beta parameter, neither of the other tuning parameters: the annealing speed and the rejuvenation step. A proper choice of these parameters is crucial for SABC to work properly, and I´m wondering whether the bad performance of SABC is simply due to a bad tuning.
Summarizing, while I´d be happy to see improvements of the SABC update step, I´m not yet presented with sufficient evidence to believe that the suggested improvement is of practical relevance.
A few more minor comments:
L 98: Sisson et al 2007 present a wrong algorithm, which should no longer be cited.
L 99: I believe such algorithms are referred to as SMC-ABC not ABC-PMC.
Eq (2) and below: Please use standard notation: boldface “a” without index, for a vector, and non-boldface “a_i” with index for its components. And I believe it should say “spanning the parameter subspace, not “stretching”.
Result section: The true posterior, using all the data, is available here and easy to sample from. So why not comparing against it?
Fig. 3 (a): This is an unfair comparison, as one mSABC iteration is more costly than one SABC iteration if I understand correctly.
L 458ff: There is a growing body of literature about how to find near-sufficient statistics by means of machine learning (see e.g. Albert et al, SciPost Physics Core 5, 043 (2022) and references therein).

Citation: https://doi.org/10.5194/hess-2022-414-RC1
- AC1: 'Reply on RC1', Song Liu, 06 Apr 2023
  
  The authors are suggesting an improvement of SABC by replacing the RWM update by a differential evolution-type of update.
  As they point out, such particle-based update steps tend to be superior for high-dimensional or complicated-shaped posteriors. However, ABC is not suitable for high-dimensional posteriors. Due to the curse of dimensionality, we must restrict ABC to few summary statistics, and hence we can only hope to infer few parameters as well. Furthermore, in the case study presented, I see no indication of a complicated shape. The marginals that are shown are certainly not complicated, and two-dimensional scatterplots are not provided. This makes me wonder why mSABC is then so much better than SABC. The original SABC algorithm suggests using the empirical covariance of the population scaled by a tuning parameter beta<1. Unfortunately, I have not found the authors´ choice of the beta parameter, neither of the other tuning parameters: the annealing speed and the rejuvenation step. A proper choice of these parameters is crucial for SABC to work properly, and I´m wondering whether the bad performance of SABC is simply due to a bad tuning.
  Summarizing, while I´d be happy to see improvements of the SABC update step, I´m not yet presented with sufficient evidence to believe that the suggested improvement is of practical relevance.
  
  Reply: Thank you for your critical suggestion. We believe that although ABC always operates on a vector of low-dimensional summary statistics, ABC helps to explore low- and high-dimensional posteriors when the summary statistics are sufficient. The major challenge in high-dimensional posterior exploration lies in the sampling efficiency of ABC to provide accurate parameter estimates. In the present study, we intentionally used a calibration of the 14-parameter SAC-SMA hydrological model. This has been suggested to be a challenging task due to complex posterior surfaces and thus frequently utilized as a benchmark hydrologic modelling experiment for validation of algorithmic enhancements. For this reason, we did not provide two-dimensional scatterplots in the present paper.
  Finally, we agree that a proper choice of these parameters is crucial for SABC (and mSABC) to work properly. With regard to the tuning of algorithmic parameters, we adopted identical inference configurations of the original SABC algorithm as Fenicia et al. (2018), which has shown to produce satisfying inference results for real-world HYMOD experiments. Similarly, for mSABC, we used the recommended algorithmic parameter settings by previous applications (e.g., the DREAM algorithm of Vrugt et al. (2009)). Hopefully, we expect to relieve the influence of inappropriate tuning parameter values on fair comparison of these two ABC algorithms in this way.
  
  References:
  Fenicia, F., Kavetski, D., Reichert, P., and Albert, C.: Signature‐domain calibration of hydrological models using Approximate Bayesian Computation: Empirical analysis of fundamental properties, Water Resour. Res., 54, 3958-3987, https://doi.org/10.1002/2017WR021616, 2018.
  Vrugt, J.A., ter Braak, C.J.F., Diks, C.G.H., Higdon, D., Robinson, B.A., Hyman, J.M., 2009a. Accelerating Markov chain Monte Carlo simulation by differential evolution with self-adaptive randomized subspace sampling. Int. J. Nonlinear Sci. Numer. Simul. 10 (3), 273e290.
  
  A few more minor comments:
  
  L 98: Sisson et al 2007 present a wrong algorithm, which should no longer be cited.
  Reply: Thank you for pointing out the problem in our manuscript. This wrong citation will be removed from the revised manuscript.
  
  L 99: I believe such algorithms are referred to as SMC-ABC not ABC-PMC.
  Reply: As stated by Section 2.2 of Sadegh & Vrugt (2014), These methods are … also referred to as ‘‘quantum Monte Carlo,’’ ‘‘transfer-matrix Monte Carlo,’’ “Monte Carlo filter,’’ ‘‘particle filter,’’ and ‘‘sequential Monte Carlo”. So, both terminologies is right. Here we preferred the terminology “ABC-PMC” as adopted by Sadegh & Vrugt (2014).
  
  Reference: Sadegh, M., and J. A. Vrugt (2014), Approximate Bayesian Computation using Markov Chain Monte Carlo simulation: DREAM(ABC), Water Resour. Res., 50, 6767–6787, doi:10.1002/2014WR015386.
  
  Eq (2) and below: Please use standard notation: boldface “a” without index, for a vector, and non-boldface “a_i” with index for its components. And I believe it should say “spanning the parameter subspace, not “stretching”.
  Reply: Thank you for your detailed suggestions. For Eq (2), we use the same definition as Eq (23) from Vrugt (2016). And we agree with the referee’s recommendation on replacing “spanning” with “stretching”.
  
  Reference: Vrugt, J. A.: Markov chain Monte Carlo simulation using the DREAM software package: Theory, concepts, and MATLAB implementation, Environ. Modell. Softw., 75, 273-316, https://doi.org/10.1016/j.envsoft.2015.08.013, 2016.
  
  Result section: The true posterior, using all the data, is available here and easy to sample from. So why not comparing against it?
  Reply: We currently restrict the comparison to the original SABC and mSABC algorithms only. Although classical MCMC sampling approaches can be easy to be implemented and compared, a fair comparison with ABC algorithms is almost impossible as the sufficiency of summary statistics is difficult to be satisfied in the present study. The vector of eight randomly selected signatures in this study is expected to be insufficient to capture all relevant information in raw time series. This fundamentally lowers the accuracy of parameter estimates and model predictions by ABC. The comparison to MCMC methods is meaningless, although we believe that the sufficiency of summary statistics and its impact on the ABC results are a good topic to be discussed in future publications.
  
  Fig. 3 (a): This is an unfair comparison, as one mSABC iteration is more costly than one SABC iteration if I understand correctly.
  Reply: Thank you for your constructive suggestion. In the iteration step, one mSABC iteration requires 3 model evaluations, as 3 Markov chains are executed sequentially in mSABC. So, it is true that one mSABC iteration is more costly than one SABC iteration. Meanwhile, it generally requires a little bit more time to run DREAM-Core sampling than simple RWM sampling. However, when it comes to time-consuming hydrological modelling, the additional time by more complex MCMC sampling is well compensated by fewer number of model evaluations. So, we think that the cost is acceptable.
  
  L 458ff: There is a growing body of literature about how to find near-sufficient statistics by means of machine learning (see e.g. Albert et al, SciPost Physics Core 5, 043 (2022) and references therein).
  Reply: Thank you for your good suggestion. There is plentiful literature about how to find near-sufficient statistics by means of Partial least squares (Wegmann et al., 2009), information-theory (Barnes et al., 2011), and other statistical methods like machine learning (see e.g. Albert et al, SciPost Physics Core 5, 043 (2022) and references therein). However, the application to complex hydrological models is rarely discussed in hydrology literature. An example is given by Liu et al. (2022), where information redundancy analysis and discriminatory power analysis are jointly applied in pursuit of approximately sufficient statistics.
  
  References:
  Wegmann, D., C. Leuenberger, and L. Excoffier (2009), Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood, Genetics, 182(4), 1207–1218.
  Barnes, C., S. Filippi, M. P. H. Stumpf, and T. Thorne (2011), Considerate approaches to achieving sufficiency for ABC model selection, ARXIV stat.CO, 1–21. [Available at http://arxiv.org/pdf/1106.6281v2.pdf.]
  Liu, S., She, D., Zhang, L., and Xia, J.: A hybrid time- and signature-domain Bayesian inference framework for calibration of hydrological models: a case study in the Ren River basin in China, Stoch. Environ. Res. Risk A., https://doi.org/10.1007/s00477-022-02282-3, 2022a.
  
  Citation: https://doi.org/10.5194/hess-2022-414-AC1
RC2:
'Comment on hess-2022-414', Anonymous Referee #2, 03 Apr 2023

To set the context for my review, I believe that the research hydrology community might benefit by not dwelling on incremental updates to methods for fitting lumped models with small parameter sets. In light of this statement, it is not impossible to do good work in this field, but such work needs to meet a high standard for exposition, code sharing/reproducibility, and completeness of analysis. The writing and level of detail in the introduction provides a good overview of where and how ABC fits into computational hydrology and serves as a nice primer on the topic. However, I would ask the authors to do more for reproducibility by putting their code and data on a public repository. Most of the rest of my comments focus on the motivation and analysis. Given that the coauthors were involved in an adjacent study using the same model and the same dataset, it is even more important that these data and methods are made available.

I am not convinced that approximate Bayesian computation is properly motivated with rationale provided in this article and related articles on the usage of ABC for hydrology models of modest complexity. The key reason for its popularity in other disciplines such as phylogenetic, is its applicability when even computation of the likelihood itself is infeasible and likelihood approximations. Its connection to hydrology is of dubious origin; ABC appears to have been of interest to hydrologists because of its close connections to GLUE, an approach to model calibration and uncertainty quantification which lacked the kinds of guarantees for empirical risk minimization (i.e. expected loss taken with regard to an adequate measure) which are the centerpiece of most modern statistical approaches.

Another conceptual issue which I would like to raise with this article is that this article is fundamentally about comparing an accept/reject scheme versus one which uses differential evolution. Recall that the novelty of the initial research about the DREAM algorithm was that it took a popular heuristic for global optimization, differential evolution, and made it into a proper MCMC algorithm satisfying detailed balance and reversibility. Thus, it seems that a DREAM-inspired ABC algorithm is just bare differential evolution itself, a topic that has been visited before in the past. For an example, see Zhang et al. 2008, “Evaluation of global optimization algorithms for parameter calibration of a computationally intensive hydrologic model”. In this regard, stripping away the MCMC machinery just leaves us with the original DE heuristic.

With regard to the paper’s analyses, the comparison of sampling effectiveness is lacking on multiple axes. The authors’ lack of precision in terminology makes it very difficult to understand the sampling efficiency for each algorithm. Under a Markov chain-based sampling scheme, there will typically be some autocorrelation between samples but it is not clear if they are reporting the number of equivalent independent samples (obtained by adjusting the samples drawn by some index of autocorrelation) or simple draws. The discussion of acceptance rate is informative, but not a substitute for a measure of effective sample size. The comparison study is also deficient in the sense that it’s not clear at what point SABC converged to its final distribution; if it turned out to be earlier than the number of iterations reported, then the fact that mSABC reached a similar point with 30% fewer forward evaluations might just be an artifact of the analysis design.

To remedy these issues, I recommend that the authors also work with synthetic streamflow data so that they might have access to the ground truth parameter values. I also recommend that they run both the the SABC algorithm for a much longer timespan than done previously to be reasonably sure that convergence is achieved. If the mSABC and SABC samples do not converge completely when using the same distance function and same priors, then there are deeper issues which must be addressed. The use of further diagnostics like the Gelman-Rubin statistic with multiple chains would also be necessary to ensure that the chains agree and appear to have reached convergence. I also recommend that whenever the authors speak of faster or more efficient progress, they provide a precise definition of exactly what they mean in this regard. I suspect they typically use the number of forward evaluations in this capacity; this number is perfectly suitable for that purpose.

Minor comments:

Line 95: The article “an” is missing from before “excessive”

Line 130: This description is actually for the adaptive proposal version of RWM (see Haario et al. 1999, Adaptive proposal distribution for random walk Metropolis algorithm

Line 138: Strictly speaking, there are many problems for which RWM is adequate. They tend to involve a posterior dimensionality of 10 or so. In fact, I am not convinced the problem case shown calls for something for complicated. The statement that “it always requires an excessive number…” may be a bit of an overreach in this regard. Perhaps just making a statement akin to line 244-245 is adequate, as this is an accurate characterization of the fundamental flaw with RWM.

Line 225: It’s unclear what the phrase “…as a function of mean fields U of the prior ensemble…” means. Further clarification would be helpful.

Line 380: I would recommend making an attempt at the proof. We have an abundance of results from the 2003 PNAS ABC paper as well as the original DREAM paper which will likely contain the necessary elements (if they exist).

Line 585: “Deteriorate” appears to be unnecessary here.

Citation: https://doi.org/10.5194/hess-2022-414-RC2
- AC2: 'Reply on RC2', Song Liu, 06 Apr 2023
  
  To set the context for my review, I believe that the research hydrology community might benefit by not dwelling on incremental updates to methods for fitting lumped models with small parameter sets. In light of this statement, it is not impossible to do good work in this field, but such work needs to meet a high standard for exposition, code sharing/reproducibility, and completeness of analysis. The writing and level of detail in the introduction provides a good overview of where and how ABC fits into computational hydrology and serves as a nice primer on the topic. However, I would ask the authors to do more for reproducibility by putting their code and data on a public repository. Most of the rest of my comments focus on the motivation and analysis. Given that the coauthors were involved in an adjacent study using the same model and the same dataset, it is even more important that these data and methods are made available.
  
  Reply: We appreciate the referee’s approval and encouragement on the writing and level of detail in the introduction section. And we also agree with the referee’s concern on the reproducibility of the current experiment. Though we focused only on typical hydrologic modelling experiment for validation of algorithmic enhancements, we’d also like to make relevant codes and data available by putting them in a public repository (e.g., GitHub). This will be made available in near future. Interested readers can also directly email us if necessary.
  
  I am not convinced that approximate Bayesian computation is properly motivated with rationale provided in this article and related articles on the usage of ABC for hydrology models of modest complexity. The key reason for its popularity in other disciplines such as phylogenetic, is its applicability when even computation of the likelihood itself is infeasible and likelihood approximations. Its connection to hydrology is of dubious origin; ABC appears to have been of interest to hydrologists because of its close connections to GLUE, an approach to model calibration and uncertainty quantification which lacked the kinds of guarantees for empirical risk minimization (i.e. expected loss taken with regard to an adequate measure) which are the centerpiece of most modern statistical approaches.
  
  Another conceptual issue which I would like to raise with this article is that this article is fundamentally about comparing an accept/reject scheme versus one which uses differential evolution. Recall that the novelty of the initial research about the DREAM algorithm was that it took a popular heuristic for global optimization, differential evolution, and made it into a proper MCMC algorithm satisfying detailed balance and reversibility. Thus, it seems that a DREAM-inspired ABC algorithm is just bare differential evolution itself, a topic that has been visited before in the past. For an example, see Zhang et al. 2008, “Evaluation of global optimization algorithms for parameter calibration of a computationally intensive hydrologic model”. In this regard, stripping away the MCMC machinery just leaves us with the original DE heuristic.
  
  Reply: In its origin, ABC is proposed to handle Bayesian inference problems where the (formal) likelihood is infeasible, or computationally expensive to evaluate. In this sense, ABC is more related to GLUE: GLUE uses informal likelihood measures, whereas ABC replaces the computation of the likelihood by the introduction of few summary statistics. Given sufficient summary statistics, ABC leads to an approximation of the true posterior as the deviation between the measured and modelled summary statistics approaches zero. This constitutes the core of the rationale of ABC.
  With regard to the current article, we here suggested an improvement of SABC by replacing the RWM update by a differential evolution-type of update. The differential evolution-type techniques are not new, and a variety of studies has undertaken valuable analysis and discussions on this topic. We fundamentally compared an accept/reject scheme versus one which uses differential evolution, yet within a ABC framework. The use of differential evolution in SABC helps to extend the original SABC algorithm to high-dimensional posterior exploration. The innovation of this article lies not in the enhancement of MCMC sampling algorithm itself, but in its practical value in improving the current ABC approaches.
  
  With regard to the paper’s analyses, the comparison of sampling effectiveness is lacking on multiple axes. The authors’ lack of precision in terminology makes it very difficult to understand the sampling efficiency for each algorithm. Under a Markov chain-based sampling scheme, there will typically be some autocorrelation between samples but it is not clear if they are reporting the number of equivalent independent samples (obtained by adjusting the samples drawn by some index of autocorrelation) or simple draws. The discussion of acceptance rate is informative, but not a substitute for a measure of effective sample size. The comparison study is also deficient in the sense that it’s not clear at what point SABC converged to its final distribution; if it turned out to be earlier than the number of iterations reported, then the fact that mSABC reached a similar point with 30% fewer forward evaluations might just be an artifact of the analysis design.
  
  Reply: We feel regret for lack of precision in terminology and consequent confusion in understanding the sampling efficiency for each algorithm. Though both algorithms apply a Markov chain-based sampling scheme, as stated in Line 285-289, the starting point of a chain is drawn from the prior ensemble each time a proposal is generated. The autocorrelation analysis typically used in classical Markov chain-based sampling scheme is not suitable in the context of the present study. Secondly, with regard to the convergence of SABC, we adopted identical parameter settings of SABC as Fenicia et al. (2018), which has shown to achieve satisfying convergence to the approximate posterior through validation of synthetic datasets and real hydrological data. In the absence of the true posteriors, we treated the posterior derived by the original SABC as a benchmark. The influence of inappropriate number of iterations reported to achieve convergence on fair comparison can be hopefully largely relieved.
  
  Reference: Fenicia, F., Kavetski, D., Reichert, P., and Albert, C.: Signature‐domain calibration of hydrological models using Approximate Bayesian Computation: Empirical analysis of fundamental properties, Water Resour. Res., 54, 3958-3987, https://doi.org/10.1002/2017WR021616, 2018.
  
  To remedy these issues, I recommend that the authors also work with synthetic streamflow data so that they might have access to the ground truth parameter values. I also recommend that they run both the SABC algorithm for a much longer timespan than done previously to be reasonably sure that convergence is achieved. If the mSABC and SABC samples do not converge completely when using the same distance function and same priors, then there are deeper issues which must be addressed. The use of further diagnostics like the Gelman-Rubin statistic with multiple chains would also be necessary to ensure that the chains agree and appear to have reached convergence. I also recommend that whenever the authors speak of faster or more efficient progress, they provide a precise definition of exactly what they mean in this regard. I suspect they typically use the number of forward evaluations in this capacity; this number is perfectly suitable for that purpose.
  
  Reply: We appreciate the referee’s valuable suggestions on further improvement of the current article, including the experiment on synthetic datasets, the validation of the convergence of SABC, and also clear and precise presentation of the sampling efficiency. These are exactly what we shall do and will do in later revision of the manuscript.
  
  Minor comments:
  
  Line 95: The article “an” is missing from before “excessive”
  Reply: Thank you for pointing out the mistake in our manuscript. And this shall be corrected in the revised manuscript.
  
  Line 130: This description is actually for the adaptive proposal version of RWM (see Haario et al. 1999, Adaptive proposal distribution for random walk Metropolis algorithm
  Reply: Thank you for your detailed suggestion. We do not precisely distinguish two versions of RWM with and without the adaptive proposal currently. To make the statement more accurate, we’d like to restrict it to the adaptive proposal version of RWM as the referee suggested.
  
  Line 138: Strictly speaking, there are many problems for which RWM is adequate. They tend to involve a posterior dimensionality of 10 or so. In fact, I am not convinced the problem case shown calls for something for complicated. The statement that “it always requires an excessive number…” may be a bit of an overreach in this regard. Perhaps just making a statement akin to line 244-245 is adequate, as this is an accurate characterization of the fundamental flaw with RWM.
  Reply: Thank you for your good suggestion. We’d like to remove the corresponding statement and restrict it to the fundamental limitation of RWM to make it more accurate.
  
  Line 225: It’s unclear what the phrase “…as a function of mean fields U of the prior ensemble…” means. Further clarification would be helpful.
  Reply: Thank you for your good suggestion. We agree that further clarification on it is necessary. According to the original paper of Albert et al. (2015) where the SABC algorithm is firstly proposed, the phrase “…as a function of mean fields U of the prior ensemble…” means “…as a function of the average of the redefined ABC distance in the prior ensemble”. The exact formulation of the function can be referred to Eq (32) of Albert et al. (2015).
  
  Reference: Albert, C., Künsch, H. R., and Scheidegger, A.: A simulated annealing approach to approximate Bayes computations, Stat. Comput., 25, 1217-1232, https://doi.org/10.1007/s11222-014-9507-8, 2015.
  
  Line 380: I would recommend making an attempt at the proof. We have an abundance of results from the 2003 PNAS ABC paper as well as the original DREAM paper which will likely contain the necessary elements (if they exist).
  Reply: Thank you for your good suggestion. A formal proof of convergence of the DREAM algorithm (i.e., the DREAM-Core sampling in our study) is available in Vrugt et al. (2009). However, a formal proof of convergence of mSABC using the DREAM-Core sampling is difficult. This makes the validation of computational convergence necessary.
  
  Reference: Vrugt, J.A., ter Braak, C.J.F., Diks, C.G.H., Higdon, D., Robinson, B.A., Hyman, J.M., 2009a. Accelerating Markov chain Monte Carlo simulation by differential evolution with self-adaptive randomized subspace sampling. Int. J. Nonlinear Sci. Numer. Simul. 10 (3), 273e290.
  
  Line 585: “Deteriorate” appears to be unnecessary here.
  Reply: Thank you for pointing out the mistake in our manuscript. And this shall be corrected in the revised manuscript.
  
  Citation: https://doi.org/10.5194/hess-2022-414-AC2
RC3:
'Comment on hess-2022-414', Anonymous Referee #3, 04 Apr 2023
General comments
This paper developed a modified Simulated Annealing Approximation Bayesian Computation (mSABC) method for model parameter inference. The mSABC method is tested by a real case study that applies the SAC-SMA model to the Danjiangkou reservoir region. The paper fits the scope of the HESS. However, the innovation of this research is insufficient to me; the results did not prove that the proposed method is better than the SABC method; and the presentation of this paper need to be improved. Please see below for my detailed comments. I suggest rejecting and encouraging resubmission.
Major comments to the authors
Innovation limit. The authors improve the Simulated Annealing ABC (SABC) method by replacing its original random walk Metropolis (RWM) sampling with the adaptive MCMC sampling. The adaptive MCMC sampling is from the existing DREAM algorithm (Vrugt et al., 2019). This is not a substantial contribution to scientific progress.

A more critical issue is that the advantages of mSABC over SABC in parameter determination have not been well proven. In Figure 6, mSABC does not bring improved streamflow simulation results compared with SABC. From the probabilistic perspective, mSABC underestimates the peak flow in the high-flow period, though it produces a narrower uncertainty band. From the deterministic perspective, mSABC gets worse RMSE and CC metric results than SABC. The authors attribute the underestimation to two reasons: inaccurate observations and the choice of hydrologic signatures. However, I think this can also be attributed to the bad posterior parameter distribution.

Moreover, Figure 4 cannot prove mSABC generates better posterior parameter estimates than SABC. Also, I don’t understand why the authors say SABC fails to correctly infer the target distribution (line 393). I suggest the authors using a synthetic study where the true parameter inference results exist, this will help tell whether mSABC is better than SABC.

Last, I don’t understand how the conclusion of lines 546-548 is derived from Figure 5. I honestly don’t think Figure 5 supports this conclusion.

Experiment design flaw. First, on Line 352, the authors compute the distance metric as the average of eight metrics. By checking Appendix A, it looks like the eight statistic metrics do not share the same unit. If so, it does not make sense to compute their average as the distance metric.

Second, on line 382, the authors state that the SABC method is already proven to converge to the correct target distribution in previous applications by Fenicia et al (2018). I recommend the authors clarifying if Fenicia et al (2018) used the same case study and inference configurations as yours. If not, there is no evidence to tell that SABC has converged to the correct target distribution in your case study.

The presentation of this paper has not met the publication standard of HESS and needs to be improved. Some sentences are hard to understand or have grammar mistakes (e.g., lines 428). Some terms are not well established. For example, line simplistic, proposal, concept (line 169). Please check the appropriateness of them, check the literature, and use the correct terminology.

Last, the authors used too many abbreviations, readers may get lost and cannot recall all their meanings before finishing the article. Can the authors please reduce some of them, especially if the abbreviations are not used often? For example, GLUE, DE-MC, DRR, PMC.

Minor comments to the authors
Line 106. Is “ABC-REJ” a typo here? If not, please provide the full name of “ABC-REJ” when it appears for the first time.

Line 151. Is “d” needed?

Lines 214-220. Are these sentences model/system dependent? Should they be included in the Method section?

Line 263, “utilized” in the past tense?

Lines 272-274, please rephrase the meaning of “c_*”.
Citation: https://doi.org/10.5194/hess-2022-414-RC3
- AC3:
  'Reply on RC3', Song Liu, 06 Apr 2023
  General comments
  
  This paper developed a modified Simulated Annealing Approximation Bayesian Computation (mSABC) method for model parameter inference. The mSABC method is tested by a real case study that applies the SAC-SMA model to the Danjiangkou reservoir region. The paper fits the scope of the HESS. However, the innovation of this research is insufficient to me; the results did not prove that the proposed method is better than the SABC method; and the presentation of this paper need to be improved. Please see below for my detailed comments. I suggest rejecting and encouraging resubmission.
  
  Reply: Thank you for the constructive suggestions and comments on further improvement of the current work. We addressed the referee’ comments and questions detailly as below.
  
  Major comments to the authors
  
  Innovation limit. The authors improve the Simulated Annealing ABC (SABC) method by replacing its original random walk Metropolis (RWM) sampling with the adaptive MCMC sampling. The adaptive MCMC sampling is from the existing DREAM algorithm (Vrugt et al., 2019). This is not a substantial contribution to scientific progress.
  
  Reply to Comment 1: The innovation is limited when it comes to the algorithmic enhancements of adaptive MCMC sampling. The sampling scheme is not new, and is also intensively discussed in past literature. However, we believe that the novelty of this article lies more in its practical value in addressing the dimensionality issue of the original SABC algorithm and improving its efficiency in handling complex posterior exploration. By incorporating the adaptive MCMC sampling into SABC, the sampling efficiency and model performance are both significantly improved when dealing with high-dimensional posterior exploration. This is a big improvement in improving the performance of the original SABC algorithm and extending it to a broader application in hydrological modelling practices.
  A more critical issue is that the advantages of mSABC over SABC in parameter determination have not been well proven. In Figure 6, mSABC does not bring improved streamflow simulation results compared with SABC. From the probabilistic perspective, mSABC underestimates the peak flow in the high-flow period, though it produces a narrower uncertainty band. From the deterministic perspective, mSABC gets worse RMSE and CC metric results than SABC. The authors attribute the underestimation to two reasons: inaccurate observations and the choice of hydrologic signatures. However, I think this can also be attributed to the bad posterior parameter distribution.
  
  Moreover, Figure 4 cannot prove mSABC generates better posterior parameter estimates than SABC. Also, I don’t understand why the authors say SABC fails to correctly infer the target distribution (line 393). I suggest the authors using a synthetic study where the true parameter inference results exist, this will help tell whether mSABC is better than SABC.
  
  Last, I don’t understand how the conclusion of lines 546-548 is derived from Figure 5. I honestly don’t think Figure 5 supports this conclusion.
  
  Reply to Comments 2-4: Firstly, in the absence of a comprehensive measure of model predictions, we must choose a variety of mutually competing metrics for overall evaluation of the predictions. The conflict between the narrower uncertainty band and the underestimation of high flows caused by lower coverage ratio is not rare in literature (e.g., Xiong et al., 2009). A solution to this issue is to introduce additional evaluation criteria, e.g., PQQ plots and evaluation of the middle points of ensemble predictions. Also, we noticed that a slightly poorer performance of RMSE and CC could be compensated by a largely better performance of RB and RD. We must make a compromise among these competing evaluation measures.
  Secondly, mSABC generates better posterior parameter estimates than SABC, considering that mSABC produces sharper marginal distributions, especially for parameters ADIMP, LZPK and σ. This suggests that mSABC has lower parameter uncertainty than SABC. As for Line 393, this can be demonstrated by the final boxplots of posterior parameter samples: the boxplots of SABC exhibits evident differences from those of mSABC, especially for parameters ADIMP, LZPK and σ. This is consistent with Figure 4. In other words, SABC fails to converge to the approximate posterior derived by mSABC. The boxplots of SABC with 2 million iterations are similar to those of mSABC with only 200k iterations, far from the final stable boxplots of mSABC. In the meantime, we agree that a synthetic experiment with known posteriors might be more suitable for comparison of these two algorithms.
  Finally, the conclusion of lines 546-548 is demonstrated by the performance of high-flows-related signatures (e.g., Q₁₀ and HPC) given in Figure 5. Compared to the posterior distributions of relative errors of Q₁₀ and HPC by SABC, the proportion of samples that overestimate the measured signature values (i.e., with negative error values) are largely reduced for mSABC. This implies that mSABC achieved better performance for high-flow events than SABC.
  Experiment design flaw. First, on Line 352, the authors compute the distance metric as the average of eight metrics. By checking Appendix A, it looks like the eight statistic metrics do not share the same unit. If so, it does not make sense to compute their average as the distance metric.
  
  Second, on line 382, the authors state that the SABC method is already proven to converge to the correct target distribution in previous applications by Fenicia et al (2018). I recommend the authors clarifying if Fenicia et al (2018) used the same case study and inference configurations as yours. If not, there is no evidence to tell that SABC has converged to the correct target distribution in your case study.
  
  Reply to Comments 5-6: Firstly, as formulated by Equation (9), the distance metric is computed as the average of the absolute relative errors of all eight signatures, instead of the average of eight signatures themselves. This is also common practice in literature (e.g., Fenicia et al., 2018).
  Secondly, we understand the referee’s concern on the convergence of SABC. We do use the same inference configurations as Fenicia et al. (2018) but with a different real-data experiment. In fact, the codes of the original SABC algorithm are acquired directly from the authors of Fenicia et al. (2018).
  The presentation of this paper has not met the publication standard of HESS and needs to be improved. Some sentences are hard to understand or have grammar mistakes (e.g., lines 428). Some terms are not well established. For example, line simplistic, proposal, concept (line 169). Please check the appropriateness of them, check the literature, and use the correct terminology.
  
  Last, the authors used too many abbreviations, readers may get lost and cannot recall all their meanings before finishing the article. Can the authors please reduce some of them, especially if the abbreviations are not used often? For example, GLUE, DE-MC, DRR, PMC.
  
  Reply to Comments 7-8: We feel regret for imperfect representation of this paper. And we’d like to further improve the writing of the paper by simplifying the sentences, using the correct terminology and reducing the abbreviations seldom used.
  
  Minor comments to the authors
  
  Line 106. Is “ABC-REJ” a typo here? If not, please provide the full name of “ABC-REJ” when it appears for the first time.
  
  Reply: Thank you for your suggestion. The full name of ABC-REJ will be added when it appears for the first time.
  
  Line 151. Is “d” needed?
  
  Reply: Thank you for pointing out the mistake in our manuscript. The “d” will be removed in the revised manuscript.
  
  Lines 214-220. Are these sentences model/system dependent? Should they be included in the Method section?
  
  Reply: Thank you for your concern. I believe that a brief introduction of the rationale of the original SABC algorithm is necessary in the present article. This helps interested readers acquire basic knowledge of SABC. Moreover, it explains the meaning of the tolerance values. So, we’d like to keep the statement in the revised manuscript.
  
  Line 263, “utilized” in the past tense?
  
  Reply: Thank you for pointing out the mistake in our manuscript. “utilizes” in the present tense should be used instead.
  
  Lines 272-274, please rephrase the meaning of “c*”.
  
  Reply: Thank you for your detailed suggestion. We here keep the description of the meaning of “c*” in consistent with the original paper of Vrugt (2016).
  
  Reference: Vrugt, J. A.: Markov chain Monte Carlo simulation using the DREAM software package: Theory, concepts, and MATLAB implementation, Environ. Modell. Softw., 75, 273-316, https://doi.org/10.1016/j.envsoft.2015.08.013, 2016.
  
  Citation: https://doi.org/10.5194/hess-2022-414-AC3

Song Liu, Dunxian She, Liping Zhang, and Jun Xia

Viewed

Total article views: 1,037 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
684	306	47	1,037	48	65

HTML: 684
PDF: 306
XML: 47
Total: 1,037
BibTeX: 48
EndNote: 65

Views and downloads (calculated since 31 Jan 2023)

Month	HTML	PDF	XML	Total
Jan 2023	46	7	1	54
Feb 2023	140	31	3	174
Mar 2023	36	16	0	52
Apr 2023	86	35	11	132
May 2023	9	9	0	18
Jun 2023	9	3	0	12
Jul 2023	24	17	2	43
Aug 2023	15	11	1	27
Sep 2023	18	24	1	43
Oct 2023	8	12	1	21
Nov 2023	8	5	1	14
Dec 2023	15	16	1	32
Jan 2024	11	7	1	19
Feb 2024	9	7	1	17
Mar 2024	10	6	1	17
Apr 2024	14	6	7	27
May 2024	19	9	3	31
Jun 2024	34	6	2	42
Jul 2024	13	2	1	16
Aug 2024	15	2	0	17
Sep 2024	13	1	0	14
Oct 2024	9	5	0	14
Nov 2024	8	6	1	15
Dec 2024	3	3	0	6
Jan 2025	9	6	0	15
Feb 2025	20	4	0	24
Mar 2025	22	3	1	26
Apr 2025	10	9	5	24
May 2025	20	13	1	34
Jun 2025	25	20	1	46
Jul 2025	6	5	0	11

Cumulative views and downloads (calculated since 31 Jan 2023)

Month	HTML	PDF	XML	Total
Jan 2023	46	7	1	54
Feb 2023	140	31	3	174
Mar 2023	36	16	0	52
Apr 2023	86	35	11	132
May 2023	9	9	0	18
Jun 2023	9	3	0	12
Jul 2023	24	17	2	43
Aug 2023	15	11	1	27
Sep 2023	18	24	1	43
Oct 2023	8	12	1	21
Nov 2023	8	5	1	14
Dec 2023	15	16	1	32
Jan 2024	11	7	1	19
Feb 2024	9	7	1	17
Mar 2024	10	6	1	17
Apr 2024	14	6	7	27
May 2024	19	9	3	31
Jun 2024	34	6	2	42
Jul 2024	13	2	1	16
Aug 2024	15	2	0	17
Sep 2024	13	1	0	14
Oct 2024	9	5	0	14
Nov 2024	8	6	1	15
Dec 2024	3	3	0	6
Jan 2025	9	6	0	15
Feb 2025	20	4	0	24
Mar 2025	22	3	1	26
Apr 2025	10	9	5	24
May 2025	20	13	1	34
Jun 2025	25	20	1	46
Jul 2025	6	5	0	11

Viewed (geographical distribution)

Total article views: 1,002 (including HTML, PDF, and XML) Thereof 1,002 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 10 Jul 2025

Short summary

Quantifying the uncertainty in streamflow predictions is a major challenge, with research and operational significance. This study advances the field of catchment-scale hydrological modelling by developing an improved uncertainty analysis technique that provides more reliable and accurate probabilistic streamflow predictions. This finding provides hydrologists with robust modelling tools for handling hydrological modelling uncertainties in engineering practices.


Total:	0
HTML:	0
PDF:	0
XML:	0

An improved Approximate Bayesian Computation approach for high-dimensional posterior exploration of hydrological models

Interactive discussion

Interactive discussion

Viewed

Viewed (geographical distribution)

Cited

2 citations as recorded by crossref.