the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Using simulation-based inference to determine the parameters of an integrated hydrologic model: a case study from the upper Colorado River basin
Abstract. High-resolution, spatially-distributed process-based models are a well-established tool to explore complex watershed processes and how they may evolve under a changing climate. While these models are powerful, calibrating them can be difficult because they are costly to run and have many unknown parameters. To solve this problem, we need a state-of-the-art, data- driven approach to model calibration that can scale to the high-compute, high-dimensional hydrologic simulators that drive innovation in our field today. Simulation- Based Inference (SBI) uses deep learning methods to learn a probability distribution of simulation parameters by comparing simulator outputs to observed data. The inferred parameters can then be used to run calibrated model simulations. This approach has pushed boundaries in simulator-intensive research from cosmology, particle physics, and neuroscience, but is less familiar to hydrology. The goal of this paper is to introduce SBI to the field of watershed modeling by benchmarking and exploring its performance in a set of synthetic experiments. We use SBI to infer two common physical parameters of hydrologic process-based models, Manning’s Coefficient and Hydraulic Conductivity, in a snowmelt-dominated catchment in Colorado, USA. We employ a process-based simulator (ParFlow), streamflow observations, and several deep learning components to confront two recalcitrant issues related to calibrating watershed models: 1) the high cost of running enough simulations to do a calibration; 2) finding ‘correct’ parameters when our understanding of the system is uncertain or incomplete. In a series of experiments, we demonstrate the power of SBI to conduct rapid and precise parameter inference for model calibration. The workflow we present is general-purpose, and we discuss how this can be adapted to other hydrology-related problems.
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(2079 KB)
Interactive discussion
Status: closed
-
RC1: 'Comment on hess-2022-345', Keith Beven, 07 Nov 2022
This is a highly sophisticated study, involving considerable work, that aims to introduce methods of simulation-based inference to hydrological models, methods that it seems have been used very successfully in other fields such as cosmology, particle physics and neuroscience. I have not taken the time to look at what has been done in those fields because it is clear from the current study that in hydrology no great advance has been made. The methodology effectively involves two important steps; 1. A dynamic emulator of a complex hydrological model (PAR-FLOW) (here a LSTM) and 2. A method of identifying a conditional joint parameter distribution (here a form of neural network). In both cases the aim is to greatly increase the efficiency of model calibration and in the latter case avoid the explicit specification of a likelihood function (albeit using a prior assumption of what that distribution should look like – here a simple bivariate Gaussian, which thereby implicitly implying a form of likelihood function or measure, though this is not discussed).
There are alternatives to both steps. Those potential alternatives are not compared in terms of efficiency but that is not the real problem with this study. The real problem is that it tells us nothing at all about simulating the Taylor River Basin in the upper Colorado Basin because the study uses only simulated data. The title of the paper is therefore already misleading. Indeed, all the problems of structural error or miss-specification of the model (both in terms of the process representations and their application at a 1km grid scale) and disinformation in the observations (especially since snowmelt is important in this region) are totally neglected. This is perhaps why there is no mention at all of any of the very many papers I have written about the limitations of physically-based models, the complex nature of responses surfaces (which however defined are NOT multi-Gaussian with real data), disinformation in observations used for model calibration and defining limits of acceptability.
While I certainly have no need of further citations, the more important point is that there is absolutely no point in publishing a paper that compares only model generated data with an emulator (particularly in just a 2 parameter space) without any resort to real observations. This is indicated by the threshold for the determinant of the posterior of 10-6. Look at any response surface plots for any actual model applications, or dotty plots within GLUE applications, to see that this would be overwhelmed by model misspecification and observation errors.
This situation could, of course, be easily remedied by having a two part paper in which this first part is followed up by a second part that actually applies the method to the observational data. I suspect that this is already in preparation, but this part is not worth reviewing further without the 2nd part. I suggest this paper be rejected but that the authors be asked to resubmit in that 2-part form.
Keith Beven
Citation: https://doi.org/10.5194/hess-2022-345-RC1 -
AC1: 'Reply on RC1', Robert Hull, 22 Nov 2022
We respect and appreciate the referee’s remarks. They are thoughtfully considered, evidence genuine
interest in the manuscript, and will make this work better. We see three main critiques by Dr. Beven.The first critique is that we don’t compare the results of the method we are presenting – which we term
surrogate-informed Simulation-Based Inference (SBI) – to existing alternatives. We would like to point
out that we do discuss some of the new and long-standing approaches to efficient inference, model
calibration, and uncertainty estimation in our field [lines 43 -52], though of course not all of them; in
particular, we regret omitting the 1992 GLUE paper. The shortness of our discussion is not intended as a
dig against those alternatives; it's just that wading into a detailed exposition of what constitutes robust,
efficient parameter determination (moreover the significance of the likelihood function) has been done
elsewhere [Beven, 2015; Nearing et. al., 2015].Our work is unique in its use of specialized ML components – in particular, a conditional density
estimation approach to inference [Bishop, 1994; Papamakarios and Muray, 2019]– and we think that
this novelty warrants its own publication. Still, we understand Dr. Beven’s point. We initially considered
presenting the results of our shiny method alongside traditional (for hydrology) approaches used for
parameter determination in the face of uncertainty (i.e. GLUE and Approximate Bayesian Computation
(ABC)). In the end, we felt we couldn’t do justice to such an analysis and fit it in this manuscript. We
stand by this, even though it needs to be done somewhere.The second critique is that the framing of our work is misleading. Specifically, the reviewer notes that it
“tells us nothing at all about simulating the Taylor River Basin in the Upper Colorado Basin”. We hope
the referee trusts us that it was not our intent to mislead. The title, abstract, and introduction speak to
the larger context in which this study was conducted. This context is reflected by a collection of studies
(some written by members of this group of authors) that apply ML methodologies alongside process
understanding for hydrologic prediction. Much of that work has centered on the headwaters of the
Colorado River, an important fixture of water in the American West. We felt this context was relevant
for human and hydrological reasons; in our view, the study of hydrology should never be separated from
place, no matter how ‘theoretical’. We discuss some potential changes to this framing in the concluding
paragraph.The third critique, and in Dr. Bevens’ view the fatal one, is that this study “uses only simulated data”. We
agree with Charles Pierce (a quoted authority in some of the referee’s work) that the goal of scientific
inquiry is truth, and that truth dwells in the realm 'real' observable phenomena. But to say “there is
absolutely no point in publishing a paper that compares only model generated data .... without any
resort to real observations” seems to us a bit dogmatic. There is no shortage of studies that utilize
mostly or only synthetic data to demonstrate proof-of-concept in hydrology and other simulator-
intensive fields.So we disagree that this ‘flaw’ is fatal, or even a flaw at all. Our purpose here is to rigorously present and
evaluate a method for parameter inference given well-defined constraints. The challenge of this goal is
real and relevant; in fact, this work seems to show an upper bound for the performance of SBI where
undiagnosed structural error exists [lines 578-615]. Comparing to observations would instead shift our
focus from the quality of a method to the quality of the underlying hydrologic model. Because we leave
observations for later, we have a more generalizable, model agnostic (i.e. not just about ParFlow) paper.We applaud Dr. Beven for suggesting a follow-on paper with observations. Such an effort requires an
expanded model and additional concerns about structural and observational errors, as has been noted.
In the spirt of walking before you can run, we are happy to have laid the groundwork for an effort
focusing on observations – though not necessarily to be done here or in a companion paper.
We propose the following changes to address Dr. Beven’s concerns while remaining true to our intended
purpose with this manuscript:- Reframe the abstract and introduction to make it clearer that the system under study is
synthetic, and that comparisons are not directly extendable to ‘real’ hydrologic systems; and a
related title change. - Give more detailed overview of the methods used for parameter determination in the face of
uncertainty in hydrology, such as GLUE and ABC, in the background section; though again, there
is not space to conduct a comparison in our study. - More clearly state the sources of uncertainty in each experiment, and how they relate to the
limitations of physically- based models, the complex nature of ‘real’ response surfaces, the
influence of disinformation in observations, and the challenge of defining limits of acceptability.
References:
Bishop, C. M. (1994). Mixture density networks.
Papamakarios, G., & Murray, I. (2016). Fast ε-free inference of simulation models with bayesian
conditional density estimation. Advances in neural information processing systems, 29.
Keith Beven (2016) Facets of uncertainty: epistemic uncertainty, non-stationarity, likelihood, hypothesis
testing, and communication, Hydrological Sciences Journal, 61:9, 1652-1665, DOI:
10.1080/02626667.2015.1031761
Grey S. Nearing, Yudong Tian, Hoshin V. Gupta, Martyn P. Clark, Kenneth W. Harrison & Steven V. Weijs
(2016) A philosophical basis for hydrological uncertainty, Hydrological Sciences Journal, 61:9, 1666-1678,
DOI: 10.1080/02626667.2016.1183009Citation: https://doi.org/10.5194/hess-2022-345-AC1 - Reframe the abstract and introduction to make it clearer that the system under study is
-
AC1: 'Reply on RC1', Robert Hull, 22 Nov 2022
-
RC2: 'Comment on hess-2022-345', Keith Beven, 08 Nov 2022
P.S. As a further thought, I would consider it important that in a 2nd part of the paper, the application included parameters from the CLM (so as not to fix the water balance for PAR-FLOW before running the procedure outlined in this paper) and also included multiple years so that variations in the onset of snowmelt and the impacts of year to year variations in albedo on the simulations were allowed. It is important to consider what realistic purpose the model would be used for in such a study (see our recent papers on model invalidation in Hydrological Processes).
Citation: https://doi.org/10.5194/hess-2022-345-RC2 -
AC1: 'Reply on RC1', Robert Hull, 22 Nov 2022
We respect and appreciate the referee’s remarks. They are thoughtfully considered, evidence genuine
interest in the manuscript, and will make this work better. We see three main critiques by Dr. Beven.The first critique is that we don’t compare the results of the method we are presenting – which we term
surrogate-informed Simulation-Based Inference (SBI) – to existing alternatives. We would like to point
out that we do discuss some of the new and long-standing approaches to efficient inference, model
calibration, and uncertainty estimation in our field [lines 43 -52], though of course not all of them; in
particular, we regret omitting the 1992 GLUE paper. The shortness of our discussion is not intended as a
dig against those alternatives; it's just that wading into a detailed exposition of what constitutes robust,
efficient parameter determination (moreover the significance of the likelihood function) has been done
elsewhere [Beven, 2015; Nearing et. al., 2015].Our work is unique in its use of specialized ML components – in particular, a conditional density
estimation approach to inference [Bishop, 1994; Papamakarios and Muray, 2019]– and we think that
this novelty warrants its own publication. Still, we understand Dr. Beven’s point. We initially considered
presenting the results of our shiny method alongside traditional (for hydrology) approaches used for
parameter determination in the face of uncertainty (i.e. GLUE and Approximate Bayesian Computation
(ABC)). In the end, we felt we couldn’t do justice to such an analysis and fit it in this manuscript. We
stand by this, even though it needs to be done somewhere.The second critique is that the framing of our work is misleading. Specifically, the reviewer notes that it
“tells us nothing at all about simulating the Taylor River Basin in the Upper Colorado Basin”. We hope
the referee trusts us that it was not our intent to mislead. The title, abstract, and introduction speak to
the larger context in which this study was conducted. This context is reflected by a collection of studies
(some written by members of this group of authors) that apply ML methodologies alongside process
understanding for hydrologic prediction. Much of that work has centered on the headwaters of the
Colorado River, an important fixture of water in the American West. We felt this context was relevant
for human and hydrological reasons; in our view, the study of hydrology should never be separated from
place, no matter how ‘theoretical’. We discuss some potential changes to this framing in the concluding
paragraph.The third critique, and in Dr. Bevens’ view the fatal one, is that this study “uses only simulated data”. We
agree with Charles Pierce (a quoted authority in some of the referee’s work) that the goal of scientific
inquiry is truth, and that truth dwells in the realm 'real' observable phenomena. But to say “there is
absolutely no point in publishing a paper that compares only model generated data .... without any
resort to real observations” seems to us a bit dogmatic. There is no shortage of studies that utilize
mostly or only synthetic data to demonstrate proof-of-concept in hydrology and other simulator-
intensive fields.So we disagree that this ‘flaw’ is fatal, or even a flaw at all. Our purpose here is to rigorously present and
evaluate a method for parameter inference given well-defined constraints. The challenge of this goal is
real and relevant; in fact, this work seems to show an upper bound for the performance of SBI where
undiagnosed structural error exists [lines 578-615]. Comparing to observations would instead shift our
focus from the quality of a method to the quality of the underlying hydrologic model. Because we leave
observations for later, we have a more generalizable, model agnostic (i.e. not just about ParFlow) paper.We applaud Dr. Beven for suggesting a follow-on paper with observations. Such an effort requires an
expanded model and additional concerns about structural and observational errors, as has been noted.
In the spirt of walking before you can run, we are happy to have laid the groundwork for an effort
focusing on observations – though not necessarily to be done here or in a companion paper.
We propose the following changes to address Dr. Beven’s concerns while remaining true to our intended
purpose with this manuscript:- Reframe the abstract and introduction to make it clearer that the system under study is
synthetic, and that comparisons are not directly extendable to ‘real’ hydrologic systems; and a
related title change. - Give more detailed overview of the methods used for parameter determination in the face of
uncertainty in hydrology, such as GLUE and ABC, in the background section; though again, there
is not space to conduct a comparison in our study. - More clearly state the sources of uncertainty in each experiment, and how they relate to the
limitations of physically- based models, the complex nature of ‘real’ response surfaces, the
influence of disinformation in observations, and the challenge of defining limits of acceptability.
References:
Bishop, C. M. (1994). Mixture density networks.
Papamakarios, G., & Murray, I. (2016). Fast ε-free inference of simulation models with bayesian
conditional density estimation. Advances in neural information processing systems, 29.
Keith Beven (2016) Facets of uncertainty: epistemic uncertainty, non-stationarity, likelihood, hypothesis
testing, and communication, Hydrological Sciences Journal, 61:9, 1652-1665, DOI:
10.1080/02626667.2015.1031761
Grey S. Nearing, Yudong Tian, Hoshin V. Gupta, Martyn P. Clark, Kenneth W. Harrison & Steven V. Weijs
(2016) A philosophical basis for hydrological uncertainty, Hydrological Sciences Journal, 61:9, 1666-1678,
DOI: 10.1080/02626667.2016.1183009Citation: https://doi.org/10.5194/hess-2022-345-AC1 - Reframe the abstract and introduction to make it clearer that the system under study is
-
AC1: 'Reply on RC1', Robert Hull, 22 Nov 2022
-
RC3: 'Comment on hess-2022-345', Anonymous Referee #2, 22 Jan 2023
This work is an entirely synthetic study to show that what is called the "Simulation-basin Inference" (SBI) can retrieve synthetic parameters, even with some added noise. While there is some value in showing that a parameter inversion procedure works, it should only be a small part of a proof-of-concept paper. This paper has quite some flaws and wasn't correctly marketed.
1. The pure synthetic nature of the study greatly reduced the value of the work. There is a chasm between observations and model space. When going from synthetic dataset to real dataset, you are faced with model mechanism errors and parameter compensations. This may be where machine-learning approaches could help but this paper completely left these challenges untouched. It makes the readers wonder if this approach would ever succeed --- I am not saying it could not, but you should demonstrate you can address the major issues. History has shown that inversion problems can be ill-posed, and a workflow showing perfect performance for a synthetic case can fail fantastically in real-world problems. It is most unsatisfying when you see multiple unaddressed roadblocks in a "proof-of-concept" paper.
2. The abstract and introduction were written that readers may be led to think the model calibrated against observed flows, and the advocated work represents a major breakthrough. However, the lack of true observations undermine many of the statements. Overall, much of the sales language throughout the paper should be significantly revised. To give some examples:
"calibrating them can be difficult" --> it is not calibration if no observations are used.
"confront two recalcitrant issues related to calibrating watershed models" --> both issues remain unaddressed with this paper: the surrogate model isn't perfect and we still don't know how to get *correct* parameters.
"While SBI for parameter determination has shown promise in particle physics","the applications in hydrology have been limited". --> it sounds like SBI is a major solution to our problems. However, many earlier Bayesian methods were proposed in a similar way as SBI. I wonder if it is really necessary to market SBI as a whole or more precisely emphasize what is novel about it. The main differences from previous methods seem to be (i) previous methods carry distributional assumption while here you have a neural network to generate samples. (however, in reality you still use a Gaussian mixture model in Eq. 4); (ii) you go directly from discharge to these parameter distributions. You can market the NN directly.3. The authors cited "equifinality". Can the method represent multimodal parameter distributions that can produce the same discharge output?
4. There is again a major gap between the original process-based model and the surrogate model. In this work, even the data for "posterior predictive check" was generated by the surrogate rather than the original model. This leaves everything to the surrogate model. It is well known in surrogate-based model research that surrogate models are never perfect, as the authors actually later showed. However, the author did not do anything to actually address the issue of a deteriorating surrogate model. The authors argued we never know the true conductivity values, but at least they can show how the parameters behave in the original ParFlow model.
Overall, this paper did not comfort me with respect to the potential success of the SBI method going forward. For a proof-of-concept paper, I do not mind a simple case, but at least it needs to be demonstrated that the major roadblocks can be tackled. We do not want to lead the community down the wrong path! I would only consider this paper for publication in HESS when they have a real-world case.
Citation: https://doi.org/10.5194/hess-2022-345-RC3 -
AC2: 'Reply on RC3', Robert Hull, 08 Mar 2023
We thank the anonymous referee for the time and care taken to review this paper on simulation based inference using a synthetic dataset taken from the Colorado River. As with the previous referee, many of the critiques therein are philosophical in nature, with little room for direct response. Therefore, we respond at a high level first and propose changes to the manuscript at the end.
Responding to the comment that "the pure[ly] synthetic nature of the study greatly reduced the value of the work" (point 1): this is of course a matter of opinion and one that we did think carefully about when we designed our study. As SBI is a new tool to hydrology, we believe that the synthetic scenarios explored in this manuscript add value by giving us more pointed insight into how SBI works. We think the controlled experiments provide clear value by thoroughly demonstrating the method’s strengths and weaknesses where we can pinpoint the source of mismatches.
We agree, there is "a chasm" between observations and model space, in part from "model mechanism errors" (what we term mis specification). However, the nature of this chasm is specific to the physical model being used and the domain it is applied to. If we were to apply to a non-synthetic dataset the approach explored in this work, then the focus would be on the specific relationship between our modeling platform and our chosen parameter set and observations. We agree that this is an important next step but would like to point out that addressing it is a challenge unto itself. We feel we have been up front about what our study is and what it isn’t and have dedicated most of the discussion section [section 5] in recognition of this issue.
We fundamentally disagree with the reviewer’s argument that only real world studies are worth publishing. We think the analysis we present here is critically important for pinpointing behaviors and understanding the strengths and weaknesses of the underlying method before we move to real world case studies. This is a matter of opinion but we hope the reviewer will agree to disagree on this point.
It is also not true that our manuscript leaves "the challenge [of mis specification] untouched"; instead, we confront it with a controlled experiment [section 4.2] where the mis specification results in erroneous parameter estimates, and then field a possible solution to parameter inference in the face of mis specification [section 4.3]. The idea of utilizing additional machine learning approaches to "compensate for" the parameter inference bias, as the referee hints at, is interesting. We'd love to hear a more detailed suggestion.
Responding to the comment that the paper “was incorrectly marketed” as “a major breakthrough” (point 2): our goal was to provide a measured first step in understanding how this approach could help with hydrologic problems and to provide an accessible translation of a method which has been used in other fields. We have no intention to market it as a magic bullet.
To be honest, some of the specific comments leveled by the referee on this second point seem off base. For example, to say (in response to [line 19]) "it is not calibration if no observations are used" underplays the intimate relationship between parameter selection and model calibration. In our view, calibration is the process of selecting models that are ‘consistent’ (however defined) with data-generating processes, a topic explored thoroughly in this paper.
Still, the referee is correct that we do not have a 'solution' to the 'recalcitrant' issues [line 30] of calibrating inevitably mis specified watershed models. A close reading of our paper shows we are very candid that our work is not a panacea [line 662]. Does acknowledging the biggest problems in hydrology and thinking critically about how to apply a method to solve them constitute misleading or irrelevant science? We don't think so. Solving an issue which has plagued the field for decades seems like a high bar for any paper to meet, especially without a solid foundation. We think it’s worthwhile to explore new methods and to treat them in a rigorous way. We are not arguing in any way that our paper has solved the problem; we are arguing that it’s a valuable first step.
We do recognize that what the reviewers interpreted as "sales language" could be misleading and strike the wrong tone. We discuss a major revision to the framing in our prior response to Dr. Beven and at the end of this response which we think will help be more transparent about what this work is and what it isn’t.
Responding to the referee's concern that the method does not address the problem of equifinality (point 3): Actually, neural density estimators can represent multimodal parameter distributions. See Figure 2 of Lueckmann et al, 2017 - https://arxiv.org/pdf/1711.01861.pdf. We think this is a promising area and another reason why SBI is something that should be explored more in our field.
Responding to the referee's concern that "the data for the posterior predictive check was generated by the surrogate rather than the original model" (point 4): This simply is not true. The primary experiment [section 4.2] uses synthetic observations derived from process-based model simulations outside the surrogate model's training set. The issue of a "deteriorating surrogate model" is indeed addressed using a multi-model strategy analogous to drop out [section 4.3], one of several strategies we could have implemented.
Based on the referee's remarks, we propose the following changes to the manuscript:
- Reframe the abstract and introduction to make it clearer that the system under study is synthetic, and that comparisons are not directly extendable to ‘real’ hydrologic systems; and a related title change.
- Remove any “sales language” and more precisely emphasize what is novel about our implementation of SBI, which includes:
- i) an 'emulator' used as surrogate for a physics-based model to rapidly explore probable parameter distributions using a density-based neural network;
- ii) fewer assumptions about parameter distributions from the density-based neural network (to be clear Masked Autoencoders for Density Estimation, and not a Gaussian Mixture Model as suggested by the referee) compared to other methods of inference;
- iii) the ability to "market" the density estimator directly, by going from discharge to parameter distributions, instead of needing to generate a new set of simulations to infer parameters every time observations come available.
- Adding an additional example to explore the performance of this method to return multi-modal parameter distributions, therefore explicitly addressing the referees concerns about equifinality (after Experiment 1).
- Additional demonstration of methods to 'compensate' for parameter bias (after Experiment 3), to help address the reviewers concerns about `major roadblocks` to implementing SBI.
Thank you for your time and consideration,
Robert 'Quinn' Hull
Citation: https://doi.org/10.5194/hess-2022-345-AC2
-
AC2: 'Reply on RC3', Robert Hull, 08 Mar 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on hess-2022-345', Keith Beven, 07 Nov 2022
This is a highly sophisticated study, involving considerable work, that aims to introduce methods of simulation-based inference to hydrological models, methods that it seems have been used very successfully in other fields such as cosmology, particle physics and neuroscience. I have not taken the time to look at what has been done in those fields because it is clear from the current study that in hydrology no great advance has been made. The methodology effectively involves two important steps; 1. A dynamic emulator of a complex hydrological model (PAR-FLOW) (here a LSTM) and 2. A method of identifying a conditional joint parameter distribution (here a form of neural network). In both cases the aim is to greatly increase the efficiency of model calibration and in the latter case avoid the explicit specification of a likelihood function (albeit using a prior assumption of what that distribution should look like – here a simple bivariate Gaussian, which thereby implicitly implying a form of likelihood function or measure, though this is not discussed).
There are alternatives to both steps. Those potential alternatives are not compared in terms of efficiency but that is not the real problem with this study. The real problem is that it tells us nothing at all about simulating the Taylor River Basin in the upper Colorado Basin because the study uses only simulated data. The title of the paper is therefore already misleading. Indeed, all the problems of structural error or miss-specification of the model (both in terms of the process representations and their application at a 1km grid scale) and disinformation in the observations (especially since snowmelt is important in this region) are totally neglected. This is perhaps why there is no mention at all of any of the very many papers I have written about the limitations of physically-based models, the complex nature of responses surfaces (which however defined are NOT multi-Gaussian with real data), disinformation in observations used for model calibration and defining limits of acceptability.
While I certainly have no need of further citations, the more important point is that there is absolutely no point in publishing a paper that compares only model generated data with an emulator (particularly in just a 2 parameter space) without any resort to real observations. This is indicated by the threshold for the determinant of the posterior of 10-6. Look at any response surface plots for any actual model applications, or dotty plots within GLUE applications, to see that this would be overwhelmed by model misspecification and observation errors.
This situation could, of course, be easily remedied by having a two part paper in which this first part is followed up by a second part that actually applies the method to the observational data. I suspect that this is already in preparation, but this part is not worth reviewing further without the 2nd part. I suggest this paper be rejected but that the authors be asked to resubmit in that 2-part form.
Keith Beven
Citation: https://doi.org/10.5194/hess-2022-345-RC1 -
AC1: 'Reply on RC1', Robert Hull, 22 Nov 2022
We respect and appreciate the referee’s remarks. They are thoughtfully considered, evidence genuine
interest in the manuscript, and will make this work better. We see three main critiques by Dr. Beven.The first critique is that we don’t compare the results of the method we are presenting – which we term
surrogate-informed Simulation-Based Inference (SBI) – to existing alternatives. We would like to point
out that we do discuss some of the new and long-standing approaches to efficient inference, model
calibration, and uncertainty estimation in our field [lines 43 -52], though of course not all of them; in
particular, we regret omitting the 1992 GLUE paper. The shortness of our discussion is not intended as a
dig against those alternatives; it's just that wading into a detailed exposition of what constitutes robust,
efficient parameter determination (moreover the significance of the likelihood function) has been done
elsewhere [Beven, 2015; Nearing et. al., 2015].Our work is unique in its use of specialized ML components – in particular, a conditional density
estimation approach to inference [Bishop, 1994; Papamakarios and Muray, 2019]– and we think that
this novelty warrants its own publication. Still, we understand Dr. Beven’s point. We initially considered
presenting the results of our shiny method alongside traditional (for hydrology) approaches used for
parameter determination in the face of uncertainty (i.e. GLUE and Approximate Bayesian Computation
(ABC)). In the end, we felt we couldn’t do justice to such an analysis and fit it in this manuscript. We
stand by this, even though it needs to be done somewhere.The second critique is that the framing of our work is misleading. Specifically, the reviewer notes that it
“tells us nothing at all about simulating the Taylor River Basin in the Upper Colorado Basin”. We hope
the referee trusts us that it was not our intent to mislead. The title, abstract, and introduction speak to
the larger context in which this study was conducted. This context is reflected by a collection of studies
(some written by members of this group of authors) that apply ML methodologies alongside process
understanding for hydrologic prediction. Much of that work has centered on the headwaters of the
Colorado River, an important fixture of water in the American West. We felt this context was relevant
for human and hydrological reasons; in our view, the study of hydrology should never be separated from
place, no matter how ‘theoretical’. We discuss some potential changes to this framing in the concluding
paragraph.The third critique, and in Dr. Bevens’ view the fatal one, is that this study “uses only simulated data”. We
agree with Charles Pierce (a quoted authority in some of the referee’s work) that the goal of scientific
inquiry is truth, and that truth dwells in the realm 'real' observable phenomena. But to say “there is
absolutely no point in publishing a paper that compares only model generated data .... without any
resort to real observations” seems to us a bit dogmatic. There is no shortage of studies that utilize
mostly or only synthetic data to demonstrate proof-of-concept in hydrology and other simulator-
intensive fields.So we disagree that this ‘flaw’ is fatal, or even a flaw at all. Our purpose here is to rigorously present and
evaluate a method for parameter inference given well-defined constraints. The challenge of this goal is
real and relevant; in fact, this work seems to show an upper bound for the performance of SBI where
undiagnosed structural error exists [lines 578-615]. Comparing to observations would instead shift our
focus from the quality of a method to the quality of the underlying hydrologic model. Because we leave
observations for later, we have a more generalizable, model agnostic (i.e. not just about ParFlow) paper.We applaud Dr. Beven for suggesting a follow-on paper with observations. Such an effort requires an
expanded model and additional concerns about structural and observational errors, as has been noted.
In the spirt of walking before you can run, we are happy to have laid the groundwork for an effort
focusing on observations – though not necessarily to be done here or in a companion paper.
We propose the following changes to address Dr. Beven’s concerns while remaining true to our intended
purpose with this manuscript:- Reframe the abstract and introduction to make it clearer that the system under study is
synthetic, and that comparisons are not directly extendable to ‘real’ hydrologic systems; and a
related title change. - Give more detailed overview of the methods used for parameter determination in the face of
uncertainty in hydrology, such as GLUE and ABC, in the background section; though again, there
is not space to conduct a comparison in our study. - More clearly state the sources of uncertainty in each experiment, and how they relate to the
limitations of physically- based models, the complex nature of ‘real’ response surfaces, the
influence of disinformation in observations, and the challenge of defining limits of acceptability.
References:
Bishop, C. M. (1994). Mixture density networks.
Papamakarios, G., & Murray, I. (2016). Fast ε-free inference of simulation models with bayesian
conditional density estimation. Advances in neural information processing systems, 29.
Keith Beven (2016) Facets of uncertainty: epistemic uncertainty, non-stationarity, likelihood, hypothesis
testing, and communication, Hydrological Sciences Journal, 61:9, 1652-1665, DOI:
10.1080/02626667.2015.1031761
Grey S. Nearing, Yudong Tian, Hoshin V. Gupta, Martyn P. Clark, Kenneth W. Harrison & Steven V. Weijs
(2016) A philosophical basis for hydrological uncertainty, Hydrological Sciences Journal, 61:9, 1666-1678,
DOI: 10.1080/02626667.2016.1183009Citation: https://doi.org/10.5194/hess-2022-345-AC1 - Reframe the abstract and introduction to make it clearer that the system under study is
-
AC1: 'Reply on RC1', Robert Hull, 22 Nov 2022
-
RC2: 'Comment on hess-2022-345', Keith Beven, 08 Nov 2022
P.S. As a further thought, I would consider it important that in a 2nd part of the paper, the application included parameters from the CLM (so as not to fix the water balance for PAR-FLOW before running the procedure outlined in this paper) and also included multiple years so that variations in the onset of snowmelt and the impacts of year to year variations in albedo on the simulations were allowed. It is important to consider what realistic purpose the model would be used for in such a study (see our recent papers on model invalidation in Hydrological Processes).
Citation: https://doi.org/10.5194/hess-2022-345-RC2 -
AC1: 'Reply on RC1', Robert Hull, 22 Nov 2022
We respect and appreciate the referee’s remarks. They are thoughtfully considered, evidence genuine
interest in the manuscript, and will make this work better. We see three main critiques by Dr. Beven.The first critique is that we don’t compare the results of the method we are presenting – which we term
surrogate-informed Simulation-Based Inference (SBI) – to existing alternatives. We would like to point
out that we do discuss some of the new and long-standing approaches to efficient inference, model
calibration, and uncertainty estimation in our field [lines 43 -52], though of course not all of them; in
particular, we regret omitting the 1992 GLUE paper. The shortness of our discussion is not intended as a
dig against those alternatives; it's just that wading into a detailed exposition of what constitutes robust,
efficient parameter determination (moreover the significance of the likelihood function) has been done
elsewhere [Beven, 2015; Nearing et. al., 2015].Our work is unique in its use of specialized ML components – in particular, a conditional density
estimation approach to inference [Bishop, 1994; Papamakarios and Muray, 2019]– and we think that
this novelty warrants its own publication. Still, we understand Dr. Beven’s point. We initially considered
presenting the results of our shiny method alongside traditional (for hydrology) approaches used for
parameter determination in the face of uncertainty (i.e. GLUE and Approximate Bayesian Computation
(ABC)). In the end, we felt we couldn’t do justice to such an analysis and fit it in this manuscript. We
stand by this, even though it needs to be done somewhere.The second critique is that the framing of our work is misleading. Specifically, the reviewer notes that it
“tells us nothing at all about simulating the Taylor River Basin in the Upper Colorado Basin”. We hope
the referee trusts us that it was not our intent to mislead. The title, abstract, and introduction speak to
the larger context in which this study was conducted. This context is reflected by a collection of studies
(some written by members of this group of authors) that apply ML methodologies alongside process
understanding for hydrologic prediction. Much of that work has centered on the headwaters of the
Colorado River, an important fixture of water in the American West. We felt this context was relevant
for human and hydrological reasons; in our view, the study of hydrology should never be separated from
place, no matter how ‘theoretical’. We discuss some potential changes to this framing in the concluding
paragraph.The third critique, and in Dr. Bevens’ view the fatal one, is that this study “uses only simulated data”. We
agree with Charles Pierce (a quoted authority in some of the referee’s work) that the goal of scientific
inquiry is truth, and that truth dwells in the realm 'real' observable phenomena. But to say “there is
absolutely no point in publishing a paper that compares only model generated data .... without any
resort to real observations” seems to us a bit dogmatic. There is no shortage of studies that utilize
mostly or only synthetic data to demonstrate proof-of-concept in hydrology and other simulator-
intensive fields.So we disagree that this ‘flaw’ is fatal, or even a flaw at all. Our purpose here is to rigorously present and
evaluate a method for parameter inference given well-defined constraints. The challenge of this goal is
real and relevant; in fact, this work seems to show an upper bound for the performance of SBI where
undiagnosed structural error exists [lines 578-615]. Comparing to observations would instead shift our
focus from the quality of a method to the quality of the underlying hydrologic model. Because we leave
observations for later, we have a more generalizable, model agnostic (i.e. not just about ParFlow) paper.We applaud Dr. Beven for suggesting a follow-on paper with observations. Such an effort requires an
expanded model and additional concerns about structural and observational errors, as has been noted.
In the spirt of walking before you can run, we are happy to have laid the groundwork for an effort
focusing on observations – though not necessarily to be done here or in a companion paper.
We propose the following changes to address Dr. Beven’s concerns while remaining true to our intended
purpose with this manuscript:- Reframe the abstract and introduction to make it clearer that the system under study is
synthetic, and that comparisons are not directly extendable to ‘real’ hydrologic systems; and a
related title change. - Give more detailed overview of the methods used for parameter determination in the face of
uncertainty in hydrology, such as GLUE and ABC, in the background section; though again, there
is not space to conduct a comparison in our study. - More clearly state the sources of uncertainty in each experiment, and how they relate to the
limitations of physically- based models, the complex nature of ‘real’ response surfaces, the
influence of disinformation in observations, and the challenge of defining limits of acceptability.
References:
Bishop, C. M. (1994). Mixture density networks.
Papamakarios, G., & Murray, I. (2016). Fast ε-free inference of simulation models with bayesian
conditional density estimation. Advances in neural information processing systems, 29.
Keith Beven (2016) Facets of uncertainty: epistemic uncertainty, non-stationarity, likelihood, hypothesis
testing, and communication, Hydrological Sciences Journal, 61:9, 1652-1665, DOI:
10.1080/02626667.2015.1031761
Grey S. Nearing, Yudong Tian, Hoshin V. Gupta, Martyn P. Clark, Kenneth W. Harrison & Steven V. Weijs
(2016) A philosophical basis for hydrological uncertainty, Hydrological Sciences Journal, 61:9, 1666-1678,
DOI: 10.1080/02626667.2016.1183009Citation: https://doi.org/10.5194/hess-2022-345-AC1 - Reframe the abstract and introduction to make it clearer that the system under study is
-
AC1: 'Reply on RC1', Robert Hull, 22 Nov 2022
-
RC3: 'Comment on hess-2022-345', Anonymous Referee #2, 22 Jan 2023
This work is an entirely synthetic study to show that what is called the "Simulation-basin Inference" (SBI) can retrieve synthetic parameters, even with some added noise. While there is some value in showing that a parameter inversion procedure works, it should only be a small part of a proof-of-concept paper. This paper has quite some flaws and wasn't correctly marketed.
1. The pure synthetic nature of the study greatly reduced the value of the work. There is a chasm between observations and model space. When going from synthetic dataset to real dataset, you are faced with model mechanism errors and parameter compensations. This may be where machine-learning approaches could help but this paper completely left these challenges untouched. It makes the readers wonder if this approach would ever succeed --- I am not saying it could not, but you should demonstrate you can address the major issues. History has shown that inversion problems can be ill-posed, and a workflow showing perfect performance for a synthetic case can fail fantastically in real-world problems. It is most unsatisfying when you see multiple unaddressed roadblocks in a "proof-of-concept" paper.
2. The abstract and introduction were written that readers may be led to think the model calibrated against observed flows, and the advocated work represents a major breakthrough. However, the lack of true observations undermine many of the statements. Overall, much of the sales language throughout the paper should be significantly revised. To give some examples:
"calibrating them can be difficult" --> it is not calibration if no observations are used.
"confront two recalcitrant issues related to calibrating watershed models" --> both issues remain unaddressed with this paper: the surrogate model isn't perfect and we still don't know how to get *correct* parameters.
"While SBI for parameter determination has shown promise in particle physics","the applications in hydrology have been limited". --> it sounds like SBI is a major solution to our problems. However, many earlier Bayesian methods were proposed in a similar way as SBI. I wonder if it is really necessary to market SBI as a whole or more precisely emphasize what is novel about it. The main differences from previous methods seem to be (i) previous methods carry distributional assumption while here you have a neural network to generate samples. (however, in reality you still use a Gaussian mixture model in Eq. 4); (ii) you go directly from discharge to these parameter distributions. You can market the NN directly.3. The authors cited "equifinality". Can the method represent multimodal parameter distributions that can produce the same discharge output?
4. There is again a major gap between the original process-based model and the surrogate model. In this work, even the data for "posterior predictive check" was generated by the surrogate rather than the original model. This leaves everything to the surrogate model. It is well known in surrogate-based model research that surrogate models are never perfect, as the authors actually later showed. However, the author did not do anything to actually address the issue of a deteriorating surrogate model. The authors argued we never know the true conductivity values, but at least they can show how the parameters behave in the original ParFlow model.
Overall, this paper did not comfort me with respect to the potential success of the SBI method going forward. For a proof-of-concept paper, I do not mind a simple case, but at least it needs to be demonstrated that the major roadblocks can be tackled. We do not want to lead the community down the wrong path! I would only consider this paper for publication in HESS when they have a real-world case.
Citation: https://doi.org/10.5194/hess-2022-345-RC3 -
AC2: 'Reply on RC3', Robert Hull, 08 Mar 2023
We thank the anonymous referee for the time and care taken to review this paper on simulation based inference using a synthetic dataset taken from the Colorado River. As with the previous referee, many of the critiques therein are philosophical in nature, with little room for direct response. Therefore, we respond at a high level first and propose changes to the manuscript at the end.
Responding to the comment that "the pure[ly] synthetic nature of the study greatly reduced the value of the work" (point 1): this is of course a matter of opinion and one that we did think carefully about when we designed our study. As SBI is a new tool to hydrology, we believe that the synthetic scenarios explored in this manuscript add value by giving us more pointed insight into how SBI works. We think the controlled experiments provide clear value by thoroughly demonstrating the method’s strengths and weaknesses where we can pinpoint the source of mismatches.
We agree, there is "a chasm" between observations and model space, in part from "model mechanism errors" (what we term mis specification). However, the nature of this chasm is specific to the physical model being used and the domain it is applied to. If we were to apply to a non-synthetic dataset the approach explored in this work, then the focus would be on the specific relationship between our modeling platform and our chosen parameter set and observations. We agree that this is an important next step but would like to point out that addressing it is a challenge unto itself. We feel we have been up front about what our study is and what it isn’t and have dedicated most of the discussion section [section 5] in recognition of this issue.
We fundamentally disagree with the reviewer’s argument that only real world studies are worth publishing. We think the analysis we present here is critically important for pinpointing behaviors and understanding the strengths and weaknesses of the underlying method before we move to real world case studies. This is a matter of opinion but we hope the reviewer will agree to disagree on this point.
It is also not true that our manuscript leaves "the challenge [of mis specification] untouched"; instead, we confront it with a controlled experiment [section 4.2] where the mis specification results in erroneous parameter estimates, and then field a possible solution to parameter inference in the face of mis specification [section 4.3]. The idea of utilizing additional machine learning approaches to "compensate for" the parameter inference bias, as the referee hints at, is interesting. We'd love to hear a more detailed suggestion.
Responding to the comment that the paper “was incorrectly marketed” as “a major breakthrough” (point 2): our goal was to provide a measured first step in understanding how this approach could help with hydrologic problems and to provide an accessible translation of a method which has been used in other fields. We have no intention to market it as a magic bullet.
To be honest, some of the specific comments leveled by the referee on this second point seem off base. For example, to say (in response to [line 19]) "it is not calibration if no observations are used" underplays the intimate relationship between parameter selection and model calibration. In our view, calibration is the process of selecting models that are ‘consistent’ (however defined) with data-generating processes, a topic explored thoroughly in this paper.
Still, the referee is correct that we do not have a 'solution' to the 'recalcitrant' issues [line 30] of calibrating inevitably mis specified watershed models. A close reading of our paper shows we are very candid that our work is not a panacea [line 662]. Does acknowledging the biggest problems in hydrology and thinking critically about how to apply a method to solve them constitute misleading or irrelevant science? We don't think so. Solving an issue which has plagued the field for decades seems like a high bar for any paper to meet, especially without a solid foundation. We think it’s worthwhile to explore new methods and to treat them in a rigorous way. We are not arguing in any way that our paper has solved the problem; we are arguing that it’s a valuable first step.
We do recognize that what the reviewers interpreted as "sales language" could be misleading and strike the wrong tone. We discuss a major revision to the framing in our prior response to Dr. Beven and at the end of this response which we think will help be more transparent about what this work is and what it isn’t.
Responding to the referee's concern that the method does not address the problem of equifinality (point 3): Actually, neural density estimators can represent multimodal parameter distributions. See Figure 2 of Lueckmann et al, 2017 - https://arxiv.org/pdf/1711.01861.pdf. We think this is a promising area and another reason why SBI is something that should be explored more in our field.
Responding to the referee's concern that "the data for the posterior predictive check was generated by the surrogate rather than the original model" (point 4): This simply is not true. The primary experiment [section 4.2] uses synthetic observations derived from process-based model simulations outside the surrogate model's training set. The issue of a "deteriorating surrogate model" is indeed addressed using a multi-model strategy analogous to drop out [section 4.3], one of several strategies we could have implemented.
Based on the referee's remarks, we propose the following changes to the manuscript:
- Reframe the abstract and introduction to make it clearer that the system under study is synthetic, and that comparisons are not directly extendable to ‘real’ hydrologic systems; and a related title change.
- Remove any “sales language” and more precisely emphasize what is novel about our implementation of SBI, which includes:
- i) an 'emulator' used as surrogate for a physics-based model to rapidly explore probable parameter distributions using a density-based neural network;
- ii) fewer assumptions about parameter distributions from the density-based neural network (to be clear Masked Autoencoders for Density Estimation, and not a Gaussian Mixture Model as suggested by the referee) compared to other methods of inference;
- iii) the ability to "market" the density estimator directly, by going from discharge to parameter distributions, instead of needing to generate a new set of simulations to infer parameters every time observations come available.
- Adding an additional example to explore the performance of this method to return multi-modal parameter distributions, therefore explicitly addressing the referees concerns about equifinality (after Experiment 1).
- Additional demonstration of methods to 'compensate' for parameter bias (after Experiment 3), to help address the reviewers concerns about `major roadblocks` to implementing SBI.
Thank you for your time and consideration,
Robert 'Quinn' Hull
Citation: https://doi.org/10.5194/hess-2022-345-AC2
-
AC2: 'Reply on RC3', Robert Hull, 08 Mar 2023
Model code and software
This is a repository for conducting simulation-based inference in the Taylor Basin Robert Hull https://github.com/rhull21/sbi_taylor
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
799 | 309 | 45 | 1,153 | 30 | 31 |
- HTML: 799
- PDF: 309
- XML: 45
- Total: 1,153
- BibTeX: 30
- EndNote: 31
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1