|1. The authors have made substantial changes to their paper in response to many of the points raised in the reviews, and overall the manuscript is much clearer than the original version. |
However, the authors have simply side-stepped some of the reviewers' most important points, with the result that some fundamental problems identified in the reviews have not been sufficiently addressed (or maybe even recognized).
2. Here is perhaps the most significant example of this issue. The major motivation of the paper is that precipitation data are often fraught with both random and systematic errors. The paper proposes that inferring rainfall from runoff could give better whole-catchment precipitation estimates than the instrumental measurements themselves. In principle this is a promising idea, and the same concept has also motivated previous attempts at inverse hydrology. The central problem here, however, is that the proposed model is calibrated to match the instrumental precipitation record, including its errors. Then how can the inverse model give a substantially better (and therefore substantially different) precipitation estimate than the (presumably erroneous) instrumental record that it was already calibrated to?
This is a potentially fatal issue, which Reviewer #3 raises rather directly in his item 2. The author's entire response is only "See above answer to referee #3." The "above answer" simply states that hydrological models are inevitably calibrated, and therefore require rainfall and runoff data. That entirely misses the point.
Consider a possible scenario, in which the rain gauge is sited in an unrepresentative location, such that the precipitation measurements overestimate whole-catchment rainfall by 20%. When the model is calibrated, it will presumably adjust the evapotranspiration parameters so that these erroneously high precipitation rates are made consistent with the measured discharge (by making ET correspondingly high). Then when the model is run in its inverse mode, it will presumably predict precipitation rates that are consistent with the precipitation measurements that it was calibrated to – that is, it will match the measurements, and therefore will match their 20% bias in relation to the true whole-catchment precipitation. Thus it seems that the proposed approach will not meet its stated objective of overcoming the "major errors" in precipitation measurements that are mentioned in the abstract.
The authors cannot just dance around this issue. They either need to prove that their method gives better precipitation estimates than the measurements it is calibrated to, or they need to remove any claims – explicit or implicit – that the proposed method estimates mean areal rainfall better than instrumental measurements do... or even that it estimates mean areal rainfall at all (since in fact it is just matching the instrumental measurements, which are often not representative of catchment-averaged precipitation). Since that is the main rationale for the paper, this is a fundamental challenge that the authors cannot and should not dodge.
3. Because evapotranspiration is crucial to the precipitation inversions, the manuscript must be absolutely clear about how ET is estimated and implemented in the model. The authors claim that two key parameters, ETVEGCOR and INTMAX, were not calibrated but instead "estimated a priori". But these parameters are shown in Table 2 as having a significant range of possible values (meaning that they are apparently NOT fixed), and the manuscript never explains how they are "estimated a priori". This undermines the technical credibility of the manuscript.
(As far as I can tell the function f() is not specified anywhere either, but maybe I missed it).
4. Reviewer #3 pointed out that the inverse model is guaranteed to do well over long periods of time, simply because it conserves mass and thus (average) precipitation must equal (average) streamflow, plus evapotranspiration. In response, the authors say only that this is not correct, because ET is significant and "ETa from the model reflects the complex interplay and temporal dynamics of the system states of the different parts of the model." But one must remember that whatever these "complex interplay and temporal dynamics of the system states" are, ET in the model has been calibrated so that inputs and outputs are forced to match. Thus the cumulative rainfall curves shown in Figures 9 and 10 represent exceptionally weak tests of the model (see more on this in point 6 below).
5. Two reviewers pointed out that the inverse model will be very sensitive to errors in the streamflow data, potentially magnifying them by orders of magnitude in the precipitation estimates. In response, the authors have introduced the "Exp4" simulation, in which (apparently) all the measured discharges are increased by 10 percent. This, however, does not address the issue that was posed. The problem is that the inferred rate of precipitation will strongly depend on the time derivative of discharge, and thus will be particularly sensitive to short-term errors (such as random noise) in discharge measurements. Re-scaling all the discharges by a constant does not provide a meaningful test of this issue.
6. The reviewers pointed out that the tests of the method were very weak. In response, the revised manuscript adds a second catchment, and several new "experiments". Skeptical readers will notice that the second catchment is similar to the first, and exhibits very similar behavior. This is not the kind of comparison that the reviewers were asking for. The reviewers were specifically asking for evidence that the model can correctly simulate behavior that is clearly different from the calibration data (for example, different seasons of the year). Instead we have just two very similar catchments, simulated for multiple summers, but each with about 600-800 mm of precipitation.
Furthermore, tables 5 and 6 now reveal that the Nash-Sutcliffe and bias statistics are calculated for the entire period 2006-2009, which includes both the "calibration" and "validation" years!
This violates the fundamental distinction between validation and calibration which underlies all model testing. Of the 24 cells in table 5 (6 experiments times 4 years), 15 are calibrations. Thus nearly two-thirds of the data used to "validate" the approach actually consist of calibration data. And for four of the six "experiments", that fraction rises to three-fourths.
This is not the way that model testing normally goes; you cannot (or at least you should not) test a model against data that it has already been calibrated with. The approach should be much more rigorously tested, for example by calibrating to only one year at a time, and validating against all three of the other years (and, of course, excluding all calibration data from the validation statistics!). Skeptical readers will wonder why more rigorous testing has not been done.
(A small technical note: exp3 and exp4 in Table 6 show that the model generates the same NSE when the correct discharge is used and when discharge that is wrong by 10% is used. This suggests that perhaps r^2 has been calculated rather than NSE.)
7. The reviewers pointed out that the "virtual experiment" in 2.3.1 presents an exceptionally weak test, in which the model is run as a forward simulation to generate runoff, and then this same simulated runoff forms the basis of an inverse simulation (with the same model, and exactly the same parameter values) to reproduce the original rainfall input. Reviewer #3 pointed out that this does not even demonstrate numerical stability, in any sense that really matters. But the revised manuscript not only retains this "virtual experiment", it even adds a figure showing the mathematically inevitable 1:1 relationship between the original precipitation input and the one obtained through this forward-backward procedure.
The analysis that Reviewer #3 suggested, which involved perturbing the streamflow time series, the model parameters, or the model structure, has not been carried out, and no satisfactory explanation has been given. Instead the manuscript says that the virtual experiments "enable a rigorous evaluation of the inverse calculations, neglecting uncertainties concerning measurement errors in runoff, model structure or model parameters". These are precisely the uncertainties and errors that the reviewers say should not be neglected.
8. Section 4.3.4. says that up to 9 months is needed for the effects of the startup "cold system state" to be forgotten. But the simulations presented here are for only three or four months! How is this supposed to work, in practice?
9. One would have thought that in response to the reviewers' comments, the revised manuscript would be more careful about the claims that it makes for the inverse modeling method. Instead, the opposite has happened; the claims have become even bolder (but without more substantial evidence to support them). For example, section 5 now says that the model can be used to "update system states" in real-time flood forecasting. No clear evidence is presented to show that this works as intended, or that it improves flood forecasts; instead the reader is simply told that the system states in the inverse model will always guarantee that the simulated runoff is identical to the observations. This may be true, but it does not demonstrate that those system states are the right ones, particularly because the system states are generated by the entire time series of (presumably flawed) precipitation and discharge measurements. So even if the simulated runoff is identical to the observations, this result could arise not because the system states are correct, but instead because the model can adjust the assumed rainfall rate to compensate for the system states being wrong.
The reader is also told that the method could be used to generate "nowcasting fields" of rainfall for short-term flood forecasting. Never mind the rather clear circularity involved (one needs measurements of discharge in order to estimate rainfall, in order to predict discharge, which has already been measured anyway). In any case the reader is not shown any evidence that this works, in any way that would be useful for forecasting. Thus what has been presented is simply speculation, but appears in the manuscript's "summary and conclusions".
Indeed, Figure 12 shows that estimated rainfall rates during extreme events can be wrong by a factor of two or more. This result would seem to argue rather clearly against the claims that are advanced starting on line 712.
10. The manuscript completely side-steps the issue of parameter uncertainty and equifinality, saying that the issue is important but is outside the scope of the paper. If the issue is important, why not make space for it? One can understand that a full exploration of parameter uncertainty in a 12-parameter model would be a substantial undertaking, but there is no good reason for avoiding the topic entirely, and not even presenting some illustrative results.
11. In summary: the general idea presented here is interesting, but it should be rigorously tested. The results of those tests should be openly and fairly presented, and only claims that can be rigorously supported should be made. If the paper is published, the source code, data sets, and all numerical results should be made available as online supplementary information, so that other researchers can verify the findings.