Using simulation-based inference to determine the parameters of an integrated hydrologic model: a case study from the upper Colorado River basin
- 1Hydrology and Atmospheric Sciences, University of Arizona, Tucson, AZ, USA
- 2High Meadows Environmental Institute, Princeton University, Princeton, NJ, USA
- 3Civil & Environmental Engineering, Princeton University, Princeton, NJ, USA
- 4Atmospheric Sciences & Global Change Division, Pacific Northwest National Laboratory, Richland, WA, USA
- 5Center for Statistics and Machine Learning, Princeton University, Princeton, NJ, USA
- 6Department of Astrophysical Sciences, Princeton University, Princeton, NJ, USA
- 7Integrated Ground Water Modeling Center, Princeton University, Princeton, NJ, USA
- 1Hydrology and Atmospheric Sciences, University of Arizona, Tucson, AZ, USA
- 2High Meadows Environmental Institute, Princeton University, Princeton, NJ, USA
- 3Civil & Environmental Engineering, Princeton University, Princeton, NJ, USA
- 4Atmospheric Sciences & Global Change Division, Pacific Northwest National Laboratory, Richland, WA, USA
- 5Center for Statistics and Machine Learning, Princeton University, Princeton, NJ, USA
- 6Department of Astrophysical Sciences, Princeton University, Princeton, NJ, USA
- 7Integrated Ground Water Modeling Center, Princeton University, Princeton, NJ, USA
Abstract. High-resolution, spatially-distributed process-based models are a well-established tool to explore complex watershed processes and how they may evolve under a changing climate. While these models are powerful, calibrating them can be difficult because they are costly to run and have many unknown parameters. To solve this problem, we need a state-of-the-art, data- driven approach to model calibration that can scale to the high-compute, high-dimensional hydrologic simulators that drive innovation in our field today. Simulation- Based Inference (SBI) uses deep learning methods to learn a probability distribution of simulation parameters by comparing simulator outputs to observed data. The inferred parameters can then be used to run calibrated model simulations. This approach has pushed boundaries in simulator-intensive research from cosmology, particle physics, and neuroscience, but is less familiar to hydrology. The goal of this paper is to introduce SBI to the field of watershed modeling by benchmarking and exploring its performance in a set of synthetic experiments. We use SBI to infer two common physical parameters of hydrologic process-based models, Manning’s Coefficient and Hydraulic Conductivity, in a snowmelt-dominated catchment in Colorado, USA. We employ a process-based simulator (ParFlow), streamflow observations, and several deep learning components to confront two recalcitrant issues related to calibrating watershed models: 1) the high cost of running enough simulations to do a calibration; 2) finding ‘correct’ parameters when our understanding of the system is uncertain or incomplete. In a series of experiments, we demonstrate the power of SBI to conduct rapid and precise parameter inference for model calibration. The workflow we present is general-purpose, and we discuss how this can be adapted to other hydrology-related problems.
Robert Hull et al.
Status: final response (author comments only)
-
RC1: 'Comment on hess-2022-345', Keith Beven, 07 Nov 2022
This is a highly sophisticated study, involving considerable work, that aims to introduce methods of simulation-based inference to hydrological models, methods that it seems have been used very successfully in other fields such as cosmology, particle physics and neuroscience. I have not taken the time to look at what has been done in those fields because it is clear from the current study that in hydrology no great advance has been made. The methodology effectively involves two important steps; 1. A dynamic emulator of a complex hydrological model (PAR-FLOW) (here a LSTM) and 2. A method of identifying a conditional joint parameter distribution (here a form of neural network). In both cases the aim is to greatly increase the efficiency of model calibration and in the latter case avoid the explicit specification of a likelihood function (albeit using a prior assumption of what that distribution should look like – here a simple bivariate Gaussian, which thereby implicitly implying a form of likelihood function or measure, though this is not discussed).
There are alternatives to both steps. Those potential alternatives are not compared in terms of efficiency but that is not the real problem with this study. The real problem is that it tells us nothing at all about simulating the Taylor River Basin in the upper Colorado Basin because the study uses only simulated data. The title of the paper is therefore already misleading. Indeed, all the problems of structural error or miss-specification of the model (both in terms of the process representations and their application at a 1km grid scale) and disinformation in the observations (especially since snowmelt is important in this region) are totally neglected. This is perhaps why there is no mention at all of any of the very many papers I have written about the limitations of physically-based models, the complex nature of responses surfaces (which however defined are NOT multi-Gaussian with real data), disinformation in observations used for model calibration and defining limits of acceptability.
While I certainly have no need of further citations, the more important point is that there is absolutely no point in publishing a paper that compares only model generated data with an emulator (particularly in just a 2 parameter space) without any resort to real observations. This is indicated by the threshold for the determinant of the posterior of 10-6. Look at any response surface plots for any actual model applications, or dotty plots within GLUE applications, to see that this would be overwhelmed by model misspecification and observation errors.
This situation could, of course, be easily remedied by having a two part paper in which this first part is followed up by a second part that actually applies the method to the observational data. I suspect that this is already in preparation, but this part is not worth reviewing further without the 2nd part. I suggest this paper be rejected but that the authors be asked to resubmit in that 2-part form.
Keith Beven
-
AC1: 'Reply on RC1', Robert Hull, 22 Nov 2022
We respect and appreciate the referee’s remarks. They are thoughtfully considered, evidence genuine
interest in the manuscript, and will make this work better. We see three main critiques by Dr. Beven.The first critique is that we don’t compare the results of the method we are presenting – which we term
surrogate-informed Simulation-Based Inference (SBI) – to existing alternatives. We would like to point
out that we do discuss some of the new and long-standing approaches to efficient inference, model
calibration, and uncertainty estimation in our field [lines 43 -52], though of course not all of them; in
particular, we regret omitting the 1992 GLUE paper. The shortness of our discussion is not intended as a
dig against those alternatives; it's just that wading into a detailed exposition of what constitutes robust,
efficient parameter determination (moreover the significance of the likelihood function) has been done
elsewhere [Beven, 2015; Nearing et. al., 2015].Our work is unique in its use of specialized ML components – in particular, a conditional density
estimation approach to inference [Bishop, 1994; Papamakarios and Muray, 2019]– and we think that
this novelty warrants its own publication. Still, we understand Dr. Beven’s point. We initially considered
presenting the results of our shiny method alongside traditional (for hydrology) approaches used for
parameter determination in the face of uncertainty (i.e. GLUE and Approximate Bayesian Computation
(ABC)). In the end, we felt we couldn’t do justice to such an analysis and fit it in this manuscript. We
stand by this, even though it needs to be done somewhere.The second critique is that the framing of our work is misleading. Specifically, the reviewer notes that it
“tells us nothing at all about simulating the Taylor River Basin in the Upper Colorado Basin”. We hope
the referee trusts us that it was not our intent to mislead. The title, abstract, and introduction speak to
the larger context in which this study was conducted. This context is reflected by a collection of studies
(some written by members of this group of authors) that apply ML methodologies alongside process
understanding for hydrologic prediction. Much of that work has centered on the headwaters of the
Colorado River, an important fixture of water in the American West. We felt this context was relevant
for human and hydrological reasons; in our view, the study of hydrology should never be separated from
place, no matter how ‘theoretical’. We discuss some potential changes to this framing in the concluding
paragraph.The third critique, and in Dr. Bevens’ view the fatal one, is that this study “uses only simulated data”. We
agree with Charles Pierce (a quoted authority in some of the referee’s work) that the goal of scientific
inquiry is truth, and that truth dwells in the realm 'real' observable phenomena. But to say “there is
absolutely no point in publishing a paper that compares only model generated data .... without any
resort to real observations” seems to us a bit dogmatic. There is no shortage of studies that utilize
mostly or only synthetic data to demonstrate proof-of-concept in hydrology and other simulator-
intensive fields.So we disagree that this ‘flaw’ is fatal, or even a flaw at all. Our purpose here is to rigorously present and
evaluate a method for parameter inference given well-defined constraints. The challenge of this goal is
real and relevant; in fact, this work seems to show an upper bound for the performance of SBI where
undiagnosed structural error exists [lines 578-615]. Comparing to observations would instead shift our
focus from the quality of a method to the quality of the underlying hydrologic model. Because we leave
observations for later, we have a more generalizable, model agnostic (i.e. not just about ParFlow) paper.We applaud Dr. Beven for suggesting a follow-on paper with observations. Such an effort requires an
expanded model and additional concerns about structural and observational errors, as has been noted.
In the spirt of walking before you can run, we are happy to have laid the groundwork for an effort
focusing on observations – though not necessarily to be done here or in a companion paper.
We propose the following changes to address Dr. Beven’s concerns while remaining true to our intended
purpose with this manuscript:- Reframe the abstract and introduction to make it clearer that the system under study is
synthetic, and that comparisons are not directly extendable to ‘real’ hydrologic systems; and a
related title change. - Give more detailed overview of the methods used for parameter determination in the face of
uncertainty in hydrology, such as GLUE and ABC, in the background section; though again, there
is not space to conduct a comparison in our study. - More clearly state the sources of uncertainty in each experiment, and how they relate to the
limitations of physically- based models, the complex nature of ‘real’ response surfaces, the
influence of disinformation in observations, and the challenge of defining limits of acceptability.
References:
Bishop, C. M. (1994). Mixture density networks.
Papamakarios, G., & Murray, I. (2016). Fast ε-free inference of simulation models with bayesian
conditional density estimation. Advances in neural information processing systems, 29.
Keith Beven (2016) Facets of uncertainty: epistemic uncertainty, non-stationarity, likelihood, hypothesis
testing, and communication, Hydrological Sciences Journal, 61:9, 1652-1665, DOI:
10.1080/02626667.2015.1031761
Grey S. Nearing, Yudong Tian, Hoshin V. Gupta, Martyn P. Clark, Kenneth W. Harrison & Steven V. Weijs
(2016) A philosophical basis for hydrological uncertainty, Hydrological Sciences Journal, 61:9, 1666-1678,
DOI: 10.1080/02626667.2016.1183009 - Reframe the abstract and introduction to make it clearer that the system under study is
-
AC1: 'Reply on RC1', Robert Hull, 22 Nov 2022
-
RC2: 'Comment on hess-2022-345', Keith Beven, 08 Nov 2022
P.S. As a further thought, I would consider it important that in a 2nd part of the paper, the application included parameters from the CLM (so as not to fix the water balance for PAR-FLOW before running the procedure outlined in this paper) and also included multiple years so that variations in the onset of snowmelt and the impacts of year to year variations in albedo on the simulations were allowed. It is important to consider what realistic purpose the model would be used for in such a study (see our recent papers on model invalidation in Hydrological Processes).
-
RC3: 'Comment on hess-2022-345', Anonymous Referee #2, 22 Jan 2023
This work is an entirely synthetic study to show that what is called the "Simulation-basin Inference" (SBI) can retrieve synthetic parameters, even with some added noise. While there is some value in showing that a parameter inversion procedure works, it should only be a small part of a proof-of-concept paper. This paper has quite some flaws and wasn't correctly marketed.
1. The pure synthetic nature of the study greatly reduced the value of the work. There is a chasm between observations and model space. When going from synthetic dataset to real dataset, you are faced with model mechanism errors and parameter compensations. This may be where machine-learning approaches could help but this paper completely left these challenges untouched. It makes the readers wonder if this approach would ever succeed --- I am not saying it could not, but you should demonstrate you can address the major issues. History has shown that inversion problems can be ill-posed, and a workflow showing perfect performance for a synthetic case can fail fantastically in real-world problems. It is most unsatisfying when you see multiple unaddressed roadblocks in a "proof-of-concept" paper.
2. The abstract and introduction were written that readers may be led to think the model calibrated against observed flows, and the advocated work represents a major breakthrough. However, the lack of true observations undermine many of the statements. Overall, much of the sales language throughout the paper should be significantly revised. To give some examples:
"calibrating them can be difficult" --> it is not calibration if no observations are used.
"confront two recalcitrant issues related to calibrating watershed models" --> both issues remain unaddressed with this paper: the surrogate model isn't perfect and we still don't know how to get *correct* parameters.
"While SBI for parameter determination has shown promise in particle physics","the applications in hydrology have been limited". --> it sounds like SBI is a major solution to our problems. However, many earlier Bayesian methods were proposed in a similar way as SBI. I wonder if it is really necessary to market SBI as a whole or more precisely emphasize what is novel about it. The main differences from previous methods seem to be (i) previous methods carry distributional assumption while here you have a neural network to generate samples. (however, in reality you still use a Gaussian mixture model in Eq. 4); (ii) you go directly from discharge to these parameter distributions. You can market the NN directly.3. The authors cited "equifinality". Can the method represent multimodal parameter distributions that can produce the same discharge output?
4. There is again a major gap between the original process-based model and the surrogate model. In this work, even the data for "posterior predictive check" was generated by the surrogate rather than the original model. This leaves everything to the surrogate model. It is well known in surrogate-based model research that surrogate models are never perfect, as the authors actually later showed. However, the author did not do anything to actually address the issue of a deteriorating surrogate model. The authors argued we never know the true conductivity values, but at least they can show how the parameters behave in the original ParFlow model.
Overall, this paper did not comfort me with respect to the potential success of the SBI method going forward. For a proof-of-concept paper, I do not mind a simple case, but at least it needs to be demonstrated that the major roadblocks can be tackled. We do not want to lead the community down the wrong path! I would only consider this paper for publication in HESS when they have a real-world case.
Robert Hull et al.
Model code and software
This is a repository for conducting simulation-based inference in the Taylor Basin Robert Hull https://github.com/rhull21/sbi_taylor
Robert Hull et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
445 | 117 | 14 | 576 | 5 | 2 |
- HTML: 445
- PDF: 117
- XML: 14
- Total: 576
- BibTeX: 5
- EndNote: 2
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1