the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Why our rainfall-runoff models keep underestimating the peak flows?
András Bárdossy
Faizan Anwar
Abstract. In this paper the question of how interpolation of precipitation in space by using various spatial gauge densities affects the rainfall-runoff model discharge if all other input variables are kept constant is investigated. This was done using a physically-based model as the reference with a reconstructed spatially variable precipitation and a conceptual model calibrated to match the reference model output. Both models were run with distributed and lumped inputs. Results showed that all considered interpolation methods resulted in underestimation of the total precipitation volume and that the underestimation was directly proportional to the amount. The underestimation was very severe for low observation densities and disappeared only for very high density precipitation observation networks. This result was confirmed by using observed precipitation with different observation densities. Model runoffs showed worse performance for their highest discharges. Using lumped inputs for the models showed deteriorating performance for peak flows as well even when using simulated precipitation.
András Bárdossy and Faizan Anwar
Status: final response (author comments only)
-
CC1: 'Comment on hess-2022-281', Abdolreza Bahremand, 07 Aug 2022
Thanks for this very good research. I think the effect of the objective function during auto-calibration procedure on larger values should be more discussed if not quantified. This effect on the result or on the uncertainty of the conclusion should be discussed.
Best regards,
Abdolreza Bahremand
Citation: https://doi.org/10.5194/hess-2022-281-CC1 -
RC1: 'Comment on hess-2022-281', Keith Beven, 12 Aug 2022
I am always a bit wary of papers that compare models with models in order to constrain the uncertainties involved in the modelling process. I fully understand why they do so – it is so difficult to get any grip on the real uncertainties in model inputs and “observed” stream discharges and other sources of epistemic uncertainty. But the question is whether what is learned is more than what is lost by not addressing the real problems of model uncertainty directly.
Here the reference model is the SHETRAN model driven by a relatively dense network of raingauges. How snow accumulation and melt is handled in the observations is not made clear. SHETRAN is described as a physically-based model, but its physical basis is wrong, particularly at the 1km grid scale used (Richards equation with no account of sub-grid variability, effects of assuming effective parameters at the grid scale). It was already described as a lumped conceptual model at the grid scale nearly 35 years ago (Beven, JH1989). It is not made clear how the daily time step is implemented for the reference model. It would have to have a smaller internal time step, so are the inputs simply averaged over the day? And any surface runoff production surely occurs at much smaller scales than 1km? Does that not already suggest and expectation that the reference will already underestimate peaks in unrealistic ways (unless compensations exist, for example in the routing)? As a virtual reality this does not affect the current study (there was no need to see whether this was a valid model of the catchments) – but as an example of the hydrological expectations we might have BEFORE making such a study, it is surely relevant.
And talking of expectations, it is stated that the subcatchments of the Neckar used were large enough to allow for daily time step simulations. The hydrographs shown in Figures 9, 10 and 13 really do not support this. Again, I understand the use of a daily time step so as to maximise the precipitation stations available but the discretisation effects would suggest that this is surely not properly reflecting the hydrology of these catchments. Equally HBV is run at the 1km scale, but it appears as if the outputs from each grid square are simply added, with no account taken of distance from the outlet (and the carry-over from day to day this might produce).
So we are left with the conclusion that a model might get better (in representing an incomplete virtual reality) as the number of gauges to define the inputs increases. A model using spatially distributed inputs will generally do better than one using an average input (all other things being spatially equal, which of course, in general they are not). But we knew that already, so have we really learned anything new about WHY our models “keep underestimating peak flows”? Except that, as shown in Figure 13, they do not – the global recalibration of HBV, given a set of inputs, can overcompensate for some storms.
We know there are subtle interactions between different types of model calibration, time steps, routing methods, parameter sets and their complex interactions, and sources of epistemic uncertainty, including both inputs and the rating curve (this study eliminates the discharge uncertainties by design but these can be important in practice). So if we already have an expectation that, in general, catchment averaged or poorly sampled inputs will lead to a tendency to underprediction of peaks, then the question that the paper should be addressing is what to do about that in real applications (where the discharge uncertainties also come into play). We cannot invent input data, so we will normally use as much as is available (which would be even more than the 150 gauges in this study). We know that the calibration will depend on, and compensate for, the limitations of the observed data that are available – but the smaller the sample then the greater the resulting uncertainties in the predicted discharges might be (see figure 13 again). This paper does not address those uncertainties (except in comparing the 5 samples at each density). It does not even make use of the uncertainties associated with the kriging interpolation (which would seem to be a good reason for using the kriging interpolator, despite the necessary assumptions and requirement to already have many gauges to estimate variograms).
So I do not think the paper can be published in this form. As far as I can see we cannot use any of the results to improve practical applications (other than a general exhortation to use as many input gauges as possible and try to take account of spatial patterns – but even then the common problem of not having gauges in higher elevations has not been mentioned and there is a lack of information about how snow is handled). I would suggest it needs to be more complete and more ambitious and address the question of IF we only have a certain density of gauges available (remembering that in practice we cannot resample from a larger set), then how should the modelling workflow compensate to get better estimates of the (uncertain) peak flows. Does calibration provide sufficient compensation? Do the acceptable parameter sets change with the input scenarios? Can the authors suggest “uncertainty multipliers” as the density of inputs decreases (but that requires estimation of addressing the simulation uncertainties more directly)? Does this vary between models (though I understand the problem of calibrating SHETRAN if it was used in the comparison)?
There are many other comments on the manuscript.
Keith Beven
-
AC1: 'Reply on RC1', Faizan Anwar, 17 Aug 2022
Thank you for the comments, Prof. Beven. Our responses are attached.
-
RC2: 'Reply on AC1', Keith Beven, 17 Aug 2022
So to continue the discussion…
Just a couple of points arising from your comments. You say that: We don’t use just any model, this model has precipitation inverted in order to match the observed flows. Inverted rainfall is not mentioned at all in the paper. Only that the gridded rainfall is produced so as to match the observed point rainfalls. Nor does it seem to be necessary as the “observed” flows are produced by SHETRAN (with its deficiencies and need for some clarifications I mentioned).
Secondly, I am not sure I mentioned parameter uncertainty anywhere in my comments (except to ask if the acceptable (or in your case calibrated) parameters might vary systematically with the input resolution). I am not sure that the hydrological community is obsessed with parameter uncertainty to the neglect of other sources, but in practical applications (where we would normally use ALL the raingauges available) it is a simple way of generating potentially behavioural models. Different model structures can easily be incorporated as well in a GLUE or Bayesian weighting framework. Input realisations are somewhat more problematic in that it is usually not obvious how to construct them (as you point out Kriging and its variance estimates depend on some rather strong assumptions, and gauges can be spaced more widely than raincells).
Your quick and temporary solution suggests a number of ideas but I am not sure they help much towards what we should do about the problem in practice. You ask for ideas – my analysis of the problem is that we know that interpolated rainfalls might be biased towards underestimation (especially where upland gauges are missing) and that traditionally model calibration (alone) or including rainfall multiplier parameters in the calibration (with a history back to the Stanford Watershed Model) has been a simple way of trying to compensate implicitly for that. That expectation of compensation extends to any errors there might be in the discharges observations (particularly at the highest and lowest flows), conditional on the model or models used. As noted above, allowing that an ensemble of behavioural models might result from that compensation is a further extension that can be used to produce some range of predicted outcomes that (again implicitly) reflect both sources of uncertainty. The analysis of runoff coefficients in the Inexact Science paper is another reflection of the joint uncertainties (there are a couple of follow-up papers in review that make use of that approach).
But, as you demonstrate, the compensation of calibration does not always produce an underestimation of flows. BATEA etc have revealed how this is (as you note) an ill-posed problem, with resulting huge (and unrealistic?) variations in event to event rainfall multipliers when the compensation starts to allow for model structural deficiencies and consequent antecedent conditions from event to event. These are certainly all forms of epistemic uncertainties – or put it another way, even if we use all the raingauges available we cannot know what effect the interpolation will have on individual events. We can only make assumptions about what that effect might be. So to pose a hypothesis relevant to your paper: are there any systematic biases that can be detected in the reduced rainfall networks (or their effect on the discharges) that might be used to inform the model simulations (or more generally constrain their uncertainties).
This is a more challenging, but also more important, problem. I think I would approach it somewhat differently. I would eliminate the SHETRAN virtual reality – yes it is mass-conserving but that is not really relevant for practical applications for which we are trying to reproduce the observed discharge. I would try to assess the uncertainty in the observed discharges and allow for that in the model evaluations in some way. I would generate different rainfall realisations based on the samples of raingauges and compare both the rainfall biases (ie. no compensation by calibration) and the predicted discharge biases (ie. with implicit compensation by calibration) with results using the full network. For the kriging case the realisations could reflect the grid estimation variances (though there is still the issue of what minimum network numbers you need to determine a variogram).
The reduction in the variability of both inputs and simulated flows as the sample of gauges increases would still be revealed, surely. It might be considered as representative of what to expect in areas with a similar distribution of gauges with elevation (as demonstrated in the hypsometric curves in your reply). I am not sure it would apply in our Cumbrian catchments where we are much more deficient in higher elevation gauges (again there is a paper in review on this looking at different interpolation methods – though this was not extended to the type of sampling study you have done). So that might provide a range of potential outcomes for other applications with similar sampling densities but less gauges (the only problem being that we would not know where in that distribution that actual particular sample lies…..though perhaps looking at the effect on the simulated outputs might help there even if after the compensatory effect of calibration – that is something that you could look at).
This might be a quick and more enduring way of going beyond demonstrating the problem towards what we might do about it……. I think we would certainly agree that there is no better solution than getting more raingauge (and discharge) observations but, certainly for historical data, we are often constrained to limited networks so we need some practical solutions.
KB
Citation: https://doi.org/10.5194/hess-2022-281-RC2 - AC2: 'Reply on RC2', Faizan Anwar, 03 Jan 2023
-
RC2: 'Reply on AC1', Keith Beven, 17 Aug 2022
-
AC1: 'Reply on RC1', Faizan Anwar, 17 Aug 2022
-
RC3: 'Review on hess-2022-281', Anonymous Referee #2, 07 Dec 2022
The manuscript proposes a virtual experiment for supporting the hypothesis that interpolating the precipitations determines a peak discharge underestimation.
The topic is more than interesting and the manuscript, in general, is well written and pleasant to read. While the discussion (outstanding) raised by the other reviewer and author is stimulating and full of hydrological truths, I am in favor of the submitted manuscript.
Indeed, I am in favor to provide practical analyses for supporting theoretical hypotheses. Of course and above all in case of virtual experiments, it is difficult to reach general conclusions since the obtained results are constrained by the chosen models.
In the present case the proposed analysis has an interesting and clear message well supported by the provided results so I am glad to suggest its publication with minor revision.
Comments
Section 3.1, 3.2, and 3.3 should be enriched with some plots schematizing the proposed analyses. The language is fluent and understandable, however, the reader could be lost.
Section 3.4. It should be specified the role of temperature and evapotranspiration. Reading the following Sections it will be clear, however it is better to clarify that such information are useful for the rainfall runoff model.
Section 6. Conclusion should be structured answering to the four questions posed in the Introduction and it should be stressed that they are limited to the virtual experiment condition and model adopted.
Citation: https://doi.org/10.5194/hess-2022-281-RC3 - AC3: 'Reply on RC3', Faizan Anwar, 03 Jan 2023
András Bárdossy and Faizan Anwar
András Bárdossy and Faizan Anwar
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
779 | 271 | 20 | 1,070 | 7 | 6 |
- HTML: 779
- PDF: 271
- XML: 20
- Total: 1,070
- BibTeX: 7
- EndNote: 6
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1