|In this paper entitled "How can we benefit from regime information to make use of LSTM runoff models more effectively?", Hashemi et al. developed long short term memory (LSTM) models to assess their capability for runoff modeling according to how long memory (lookback hyperparameter) depends on hydrological regimes (i.e. on the information existing up to annual time scale), how the models are trained (local, regional or "national"-scale training), and in the end, answer the question "what is the most effective way of using LSTM for making runoff predictions?" (quite a broad question).|
This paper, which has undergone a number of modifications by the authors already, is overall very well written and organized, with clear objectives. This type of paper certainly deserves being brought to the hydrological community. I have a few concerns, though, that I think should be addressed before the paper be considered for final publication. They meet, to some extent, those already expressed previously by one reviewer. The authors will decide whether they can just use these comments below to modify the text or if additional trials are needed.
It would have been probably better to explore a little deeper the parameter space in my opinion (as emphasized by reviewer 2 previously). At least, should the paper be published, it is mandatory to explain why some important parameters were kept constant and what is the rationale behind this decision: otherwise, my feeling is it will not be of sufficient help to the readership and potential users to use this work as a support to develop their own models, for instance.
I am not saying the values of the parameters are not suited, but without any *strong rationale* (physical or anything else) supporting this choice, it is difficult, in the framework of ML/DL approaches, to justify the selection of just a few values of a limited number of hyperparameters.
For instance, it is not clear why batch size was kept to 128. Or why only 64, 128, 256 hidden units were eventually selected: not less, nothing in between? By the way, is there any specific reason for choosing log2 values? I don't think any numerical constraints would require this in the present context and gaps between successive values are large...
Also, I am wondering if it would have been interesting to use sequence lengths (lookback) up to, say, 4 years: I have not seen what the streamflow time series look like but for some of them with strong baseflow and high multi-annual variability (as visible in some regimes of fig.4), it might be possible that some useful information be still present further back in time (even more than 2 years), and that the annual scale ("regime") does not necessarily contains all the useful information by itself (there have been quite an amount of works published in the past decade on the topic).
Without that, it will be probably difficult to provide a meaningful answer to question Q4 about "[...] the most effective way of using LSTM for making runoff predictions", in my opinion...
- Introduction section, line 52: I think that confusing the hysteretic behavior and the memory length of a catchment is not strictly speaking true: the first relates mainly to the lagged response to the input, the second to the time taken by the system to dissipate the information of the input.
- Introduction section, line 54: remove "ground" (!?) and just keep "aquifers".
- Section 3.2: I understand the arguments supporting the choice of classical standardization instead of the usual minmax scaling. Yet, it would be interesting to indicate whether the two types of scaling were tested or not (from the text it seems that no trial was made using minmax scaling but it should be indicated).
- Caption of fig.9 seems to contradict the legend at the top of the figure (which says, for instance, "solid=mean" while the caption says "solid=training").