|# General Comments|
First of all, I would like to commend the authors for the hard work and – as a result – great improvements they could achieve on the manuscript. When reading the provided answers to my first review I was not sure if the authors did understand my critique. However, their revisions prove me wrong. What I am still missing is a discussion about why a “forecasting” setting was chosen over a “simulation” setting for the examination — albeit the former somehow undermines the importance of the rainfall. I believe this would be a perfect addition to the final discussion provided in the conclusions, where the authors (aldready) examine some of the limits of their work.
The other larger thing, which is probably clear in general, and most likely just a result of my oversight or bad memory is the following question: Why are the reported results in this version so much worse than in the first manuscript version?
Given those two points, the rest of my comments are mainly directed to imprecise statements and unclear explanations, and I am sure the authors will be able to resolve them with ease.
All or almost all in-line quotations brackets are set wrong.
L. 57ff: The authors write: “Among these, LSTM has garnered more attention of researchers due to its suitability for processing and predicting events with very long intervals and delays in time series.” –I think that statement does at least need a reference to back it up. While the LSTM has indeed the capacity to handle very long time-series I am not sure if that is the main reason why it has enjoyed more attention than feedforward approaches. In hydrology at least, this does not seem to be the case. Naively I would assume that its main selling point was its ability to provide high-quality simulations (not forecasts). However, my own assertion would demand further inquiry too.
L. 60ff: Following the comment on L. 57ff: I fail to see why these are examples for the interest of researchers in long intervals and delays in time series. Maybe further explanations are required.
L. 74: In the sense of responsible communication, I think it is necessary to tell the reader that the separate setting should NOT be seen as an equivalent approach. As it is stated now readers could get the wrong impression that both approaches are just a matter of taste and setting, while in reality the second approach is the current state-of-the-art.
L. 76: The statement that the regional LSTM received “a lot of attention because it can increase the amount of training data” could be easily misinterpreted. Maybe an active reformulation of the following kind would be useful here:
“The regional setting is of particular interest because it allows the model to encapsulate different hydrological processes by learning from more data and situations.”
L. 79ff: This is wrong. ‘Many to one’ does not designate a next time-step prediction, similarly ‘many to many’ does not imply the prediction of future multiple steps using the past. Also, given the terminological issues with prediction and forecasting this sentence should probably use the term “forecasting”.
L. 84ff: This is a great goal statement. Only thing perhaps missing is that you also just focus on the “local LSTM setting” (or maybe “separate LSTM” as you coined it).
L. 103ff (also in reference to lines 361ff) It does not become clear why it was chosen to only distribute the rainfall in a spatial manner, and not the remainder of the meteorological variables, like, say, the temperature. I guess the argument was to separate out the spatial influence of the rainfall distribution, but then I was wondering if the same would happen if other variables would be distributed. I don’t want to argue that the authors have to examine that question too, but the readers deserve at least an explanation: Models that use some meteorological variables in a lumped way and others in a spatially distributed way are known to hydrologists, the choice here seems special however.
L. 125ff: I think readers will not understand the description of the main RNN problem. I certainly did not.
L. 159ff: For the sake of readability I would recommend to explain all the variables when introduced, before, or as soon as possible after their appearance in an equation. For example, readers not familiar with these particular sets of equations might not follow through the exposition until they get an explanations of what the weight vectors are
L. 159: This is wrong, W denotes weight matrices, not vectors!
L. 188: Why are two regional models used and not one? To be clear, I am not arguing that the choice of two models is wrong here, just that the choice needs some explanation, since it does undermine the purported advantage of using more data.
L. 197: Was a hyper-parameter search conducted? If not (or just informally), please mention that none (or just an informal exploration) has been undertaken – which is fine, it just should be mentioned explicitly.
L.272: The description of the length-standardization of the inputs/rainfall should not be part of the results/discussion section. It should be explained in the method/data section. Furthermore, the provided explanation as such needs to be much clearer. Right now it takes very long to understand what the “length of 20” is referring to. It is also not clear why the setting restricts them to 20, and not, say, to 25 or not use a fusing method at all. I also was wondering if the 0 padding is sensible here, given that with the implied normalization of the input-data it should correspond to the average rainfall.
# Minor Comments
L. 33: Space missing after change.
L. 42: SHE abbreviations is not introduced (VIC is)
L. 57: LSTM abbreviation is not introduced (ANN is)
L. 67: RNN abbreviation is not introduced.
L. 67: The should not be capitalized and a space is missing.
L. 92: Do not use all caps and introduce the CAMELS abbreviations properly
L. 96: I don’t think it is ok to nonchalantly judge whether or not daymet is the better or worse product (as a matter of fact there exist studies which show that none of the products dominates all of the others). Maybe just “..., which is higher than the other two”.
L. 119: Introduction of the abbreviation RMSE is inconsistent (the others used capitalization)
L. 122: The first statement probably needs a reference.
L. 130: The date 2012 is oddly specific and would need a reference.
L. 136: It is not clear what a “flow” is here.
L.142: The notation “f(t)” is not consistent. Also it does not map onto equation 1.
L. 165: I would recommend changing “generally used” to “often used” as it is less judgmental. But this might be a matter of taste and can be ignored.
L.167f: The symbol f was already used as the forget-gate. Also, I’d like to note that the superscript time-index notation is ad odds with the previously introduced subscript notation.
L. 231, Table 3: Please describe what the bold numbers are in table 3. Even if it becomes clear from inspection, it should be part of the table-caption in the first place.
L. 359: LSTM -> LSTMs
L.359ff: Sentence seems to be missing words
L. 376f I think what is written here is true, but the claim that change is “more significantly” usually demands for statistical tests, and/or a description of what is understood by significant here.
L. 378 I would suggest replacing “show” with “suggest” since performance on two catchments allone can hardly be considered as concluding evidence.