the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Incremental learning for rainfall-runoff simulation on deep neural networks
Abstract. Rainfall-runoff simulation based on deep learning always costs plenty of time for training with large datasets. This may affect quick decision making in some flood emergency decision-making situations. To address this issue, this study proposes an incremental learning method to accelerate rainfall-runoff simulation with deep learning model. The method consists of two components, regular training and incremental operation. In regular training phase, the model is regularly trained using historical data. In the incremental operation phase, the method selects representative samples from historical data with distribution estimation metrics and time series similarity metrics, then updates the regularly trained model with the sampled data and recent data in case of emergency. The proposed method was tested using ten hydrological observation stations in the Yangtze River and Han River drainage basin, with three different modified Recurrent Neural Networks. The results show that the incremental learning method achieves a training efficiency acceleration of over 4 times, with only a little increase in percentage error and decrease in Nash-Sutcliffe efficiency coefficient. The results also illustrate the robustness of this method for different models in different places, as well as during continuous incremental scenarios. The findings indicate that the incremental learning method has great potential applications in rapid rainfall-runoff simulation for flood emergency decision-making.
- Preprint
(4117 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
CC1: 'Comment on hess-2024-56', John Ding, 31 Mar 2024
Source of the NSE (Equation 22)
I've read with curiosity the contribution from Wuhan, PRC, on application of LSTMs on the Yangtze and Han Rivers using an NSE as a performance metric, Chen et al. (2024). Their Equation 22 is called the NSE, Nash-Sutcliffe efficiency coefficient, but I can't find a reference in the manuscript.
Equation 22 is same as Equation 1 in Bassi et al. (2024, and CC1 therein). Both are of same form as the coefficient of determination, R^2, in Ding (1974, Equations 40, 47) and the NDE (Nash-Ding efficiency) in Duc and Sawada (2023, Equation 3).
Is Equation 22 an NSE in name, but an NDE in fact?
References
Bassi, A., Höge, M., Mira, A., Fenicia, F., and Albert, C.: Learning Landscape Features from Streamflow with Autoencoders, Hydrol. Earth Syst. Sci. Discuss. [preprint], https://doi.org/10.5194/hess-2024-47, in review, 2024.
Chen, Z., Li, J., Xiao, C., and Chen, N.: Incremental learning for rainfall-runoff simulation on deep neural networks, Hydrol. Earth Syst. Sci. Discuss. [preprint], https://doi.org/10.5194/hess-2024-56, in review, 2024.
Ding, J.Y., 1974. Variable unit hydrograph. J. Hydrol., 22: 53--69.
Duc, L. and Sawada, Y.: A signal-processing-based interpretation of the Nash–Sutcliffe efficiency, Hydro. Earth Syst. Sci., 27, 1827–1839, https://doi.org/10.5194/hess-27-1827-2023, 2023.
Citation: https://doi.org/10.5194/hess-2024-56-CC1 -
AC3: 'Reply on CC1', Changjiang Xiao, 01 Jul 2024
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2024-56/hess-2024-56-AC3-supplement.pdf
-
AC3: 'Reply on CC1', Changjiang Xiao, 01 Jul 2024
-
RC1: 'Comment on hess-2024-56', Anonymous Referee #1, 24 Apr 2024
This paper describes a method for increasing the speed of rainfall-runoff recurrent neural network (RNN) model training to reduce the time taken during emergency decision-making situations. It is based on regular training of an RNN plus an incremental training operation. The method is tested on 10 gauging stations on the Yangtze/Han basin using the LSTM and GRU forms of RNN. The method is reported to increase the speed of model training whilst not significantly decreasing model performance.
My general comments on this paper are:
- The novelty of the method is not sufficiently described. Currently, it is common practice to train a model on most of the data and then ‘finetune’, or update the model, using a smaller selection of previously unseen data (see the NeuralHydrology package as an example: https://neuralhydrology.readthedocs.io/en/latest/tutorials/finetuning.html ). As this finetuning would currently be the method performed in practise under the situation described here (updating a model during an emergency situation to avoid the time needed for complete retraining), the novelty of this paper appears to be in the selection of the finetuning data. However, the need for the new selection method is not made clear.
- There is a lack of reference to current state-of-the-art rainfall-runoff modelling with RNNs. The rather significant body of work around rainfall-runoff modelling with the types of RNNs used here (LSTMs, GRUs) is not mentioned at all. Nor is the machine learning practice of ‘finetuning’, which the proposed method is based on, and the use of it in previous rainfall-runoff applications. Citations relating to rainfall-runoff modelling appear to be from the local area, and do not reflect the significant global developments in the field of rainfall-runoff modelling with machine learning.
- The baseline conditions being compared to are not appropriate. Here, the baseline is an RNN trained with the entire dataset. Whereas, more appropriately, the baseline conditions should be a model that has been trained and then finetuned on a smaller selection of data. This is the current method that would be used in the case of wanting to update a model quickly based on a small amount of newly acquired data, which is the premise of this paper. When comparing the proposed method to a baseline method, the current state-of-the-art (a finetuned model) should be used as baseline. This would then presumably be compared to one finetuned with data selected by the proposed method.
- Model setup and training is not performed to currently accepted standards. Hyperparameter tuning, a basic necessity of any machine learning model training procedure, is not performed at all. Instead, values are merely copied from other unrelated studies. In Lines 337 and 358, it is indicated that the study results are demonstrated ‘under proper hyperparameter settings’ which is apparently not the case. Also, the data appears not to have been split into training, validation and testing sets, to ensure that the reported test metrics are obtained on data that was not used during model training. Data splitting is a staple necessity of machine learning model training to avoid data cross-contamination. If the models are not setup and trained to best practice, why would readers trust the results? There’s no indication that the results would hold when readers apply them to rigorously setup and trained models.
- The method is not described well enough to follow. I was unable to see how the novel contribution – the selection of ‘partial data’ – was obtained. There is not sufficient explanation to understand this, there is little flow between sentences in the methods section, and many sentences are incomprehensible.
- The stated results are not supported by the reported metrics (that I can tell). The tables of results are visually difficult to comprehend and I am unable to find the stated conclusions within them.
- The overall presentation of the paper is poor. Many sentences are incomprehensible. Confusing terms are used that are not explained and appear to not be related to the proposed method. Much editing is required to ensure sentences are clearly formed and meaningful.
Abstract
The basis for this study as mentioned in the first sentence, that ‘deep learning always costs plenty of time for training’, is too vague. Why would it not be optimal to use a pre-trained model, as is current best practice? The need for the proposed method is not made clear.
Introduction
The ‘significant consumption of time’ of training a regular model that makes this method necessary is not described. Is this using an HPC cluster? Or a laptop? Why would one train a model from scratch during an emergency situation? This all needs further clarification.
The two-page long single paragraph (obviously far too long) beginning ‘Incremental learning…’ appears to consist of random sentences from other papers (with appropriate citations given). For example from line 64: ‘The reparameterization leads to a factorized rotation of the parameter space and makes the diagonal Fisher information matrix assumption more applicable’ - most of these terms have not been used before in this paper and will not be used again. The flow does not make sense and much of it seems irrelevant.
The description of incremental learning in line 40 as ‘….learning from…different tasks or domains to solve the future problem with historical experience’ sounds like a description of transfer learning. In line 57, ‘It is found that neural network required fewer training epochs to reach a target error on a new task after having learned other similar tasks’ also describes transfer learning or finetuning, with no reference to either of these well-established machine learning methods. If the proposed method is based on these methods, they should be discussed.
Many sentences are undecipherable, for example:
- Line 40: ‘The main goal of incremental learning can be described as performing well both in historical tasks.’
- Line 46: ‘In raw replay methods, a buffer is usually set to store part of the historical data, which avoids frequent data selection when incremental data come while adds memory overhead.’
- Line 66: ‘…learning is slowed down by weights that are important to the previous task. Specifically, the learning of the important weights that are important to the previous tasks is slowed down.’
- Line 105: ‘Owing to the temporal characters of rainfall-runoff data, the similarity measurements for time series can be integrated to partial representative replayed data selection standards of the incremental learning method.’ (??)
Many terms are used in an unclear and unexplained manner, for example: catastrophic forgetting (line 56), important weight (line 69), path integrals (line 73), SI (line 74), ICARL (line 79), etc. These are not explained and the relevance to the paper is not well-defined.
Method
The overall benefit of the method - including historical data in the incremental learning process - is not made clear. Why not just finetune with the new data?
A reader could not recreate this experiment given the information here. The method section appears to consist of random sentences pieced together.
- There is no sufficient explanation of how historical data is combined (ie. Line 165)
- Line 164: ‘…the weights of the calculation result of the difference are assigned and the replay scores are obtained’ does not clearly describe how the replay scores are obtained.
- New terms are introduced and not explained: ‘depth model parameters’, ‘moment model’, ‘incremental meta sample’. These are only used once and never referred to again, increasing confusion.
Line 152: ‘Our method is based on regular network training, and as a result, the amount of calculation is significantly reduced, resulting in a notable acceleration of the training process.’ Why is it a result that the amount of calculation is reduced if using regular network training?
Line 158 refers to this method handling the ‘error problem of the network’ when this error problem is never described.
Figures:
- Figure 2: the image of a feed-forward network used repeatedly here is confusing, when the paper is about recurrent neural networks.
- Figure 4: no mention of what the letters on the diagram refer to (eg. R, FC).
Line 214: ‘..part of the changed hyperparameters’. What is changed about the hyperparameters? And changed from what, as hyperparameters haven’t been mentioned yet?
Time is apparently one of the main metrics of this study and there is no description of how it is calculated (eg. start point, end point, what are the computing conditions, etc.).
Hyperparameter tuning is non-existent. This should be performed with a grid search (or other accepted method) and the choices made should be documented. Borrowing values from other unrelated studies, as described in Lines 282-286 (currently in Results section, should be in Method section), is not adequate. Python packages exist to do this quickly or it is simple to code for yourself. Selection of network size, lookback length and learning rate need to be included.
Results
More evidence (clearly presented) should be documented to support the findings. Some numbers are given indicating good performance but there is little evidence to support them. Moreover, the stated results often appear to contradict themselves:
- Line 293: ‘The incremental learning method is seen to have a higher RMSE and lower NSE than the baseline method at all stations, indicating better performance in forecasting.’ Would a lower RMSE and higher NSE not indicate better performance?
- Line 294: ‘the RMSE of the incremental learning method increases by 6.8% to 17.9% compared to the baseline method’ followed by Line 297: ‘These results suggest that the incremental learning method is effective in improving the efficiency of hydrological forecasting… while maintaining an acceptable range of model training errors.’ How is it determined that an increase of 17.9% is within an acceptable range? In Line 344 this is referred to as a ‘smaller error compared to the baseline model’. This needs more explanation.
- Line 302: ‘….with an error range of around 1%.’ There is no description of how this 1% is calculated or what it means.
- Line 310: ‘It can be obviously concluded from table 1 and table 2 that when the data size is at 20% of the entire dataset, if the model training time increases by more than 4 times, and the difference in error is less than 5%, and the difference in ratio-based metrics is less than 0.08.’ This sentence is unclear. I do not know what the 20% refers to, and it is very difficult to find the 4 times increase and difference in error (what error?) of 5% on the tables. What is ratio-based metrics – these have not been described and it is unclear how the value of 0.08 has been determined?
- Line 323: ‘Specifically, the run-time difference reaches over 4 times, the PE increase less than 3%, the NSE decease less than 0.05.’ Again, support for this is claim cannot be found in the results.
- Line 344: ‘However, it is notable that the baseline model and the incremental learning method had a higher error in the Han River basin than in the Yangtze River basin, likely due to the similar climatic conditions and rainfall patterns between the two regions.’ If conditions are similar, why are errors expected to be different?
The tables would perhaps be more effective if displayed as graphs. The reader needs to be able to compare and comprehend the difference between the values, not just the values themselves, as it is the differences that are referred to as the main conclusions.
Figures:
- Figure 5: what are the units on the y-axis of the Time plot? These results appear suspiciously close together, in the range [4.1-4.5], for all of the stations and all of the models. How is this explained?
- Figures 6 and 7 should be combined into one figure.
Many phrases are incomprehensible, for example Line 329: ‘…which imply that when the incremental data are taken as continuously input, the incremental learning method gives the deep learning models the ability to continuous incremental learning.’ and Line 360: ‘Besides, the similar increase intensity of evaluation metrics differences shows that….’
Again, terms are used that are not described and are not used again, eg.: ‘distribution rules’, ‘weak self-adaptivity’, etc.
Conclusion
The three listed conclusions are unclear and unsupported. In the second point, the claim that the proposed method ‘…guarantee percentage error increase and NSE decrease less than 5%’ has not been clearly demonstrated.
Citation: https://doi.org/10.5194/hess-2024-56-RC1 -
AC1: 'Reply on RC1', Changjiang Xiao, 01 Jul 2024
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2024-56/hess-2024-56-AC1-supplement.pdf
-
RC2: 'Comment on hess-2024-56', Anonymous Referee #2, 30 May 2024
The paper describes an incremental learning method to accelerate rainfall/runoff simulation with recurrent neural networks (RNN) to speed up decision-making situations during extreme events. The model is trained on historical data and regularly updated with data coming from emergency situations. The model was tested on 10 stations in the Yangtze and Han River in China with three different kinds of RNNs. The authors claim that the proposed method speeds up training efficiency at cost of little performance degradation.
General comments:
- The explanation of the proposed incremental learning method lacks clarity, and the novelty is arguable due to insufficient referencing of prior and similar work. Specifically, the proposed method seems to innovate in how incoming data is selected, yet the lack of references makes it difficult to judge. The explanation provided in section 2.2 is not made sufficiently clear and no reference is provided from line 140 to 215. Moreover, the distinctions between the three proposed incremental learning scenarios are not well defined.
- The study's regional focus limits its broader applicability. A model based on studies from a larger area might offer better generalization capabilities and could serve as a more robust baseline to refine.
- The paper deviates from standard dataset division into training, validation, and test splits, common in deep learning, which allows hyperparameter selection and model generalization. Hyper-parameter selection is not done at all and most of the hyperparameters are not reported (e.g. the length of training time series, number of LSTM memory cells, training epochs etc. ). The dataset’s parameters are vague: where and when is it trained on? The reported results seem to be from training data only. Therefore, the relevance of the results is questionable: if, for instance, all the models are overfitting the data, it is not surprising that the performance is similar when training with about 20 % of the data, which of course accelerates training significantly.
- The paper's text is difficult to follow due to its disorganized structure and numerous grammatical errors, which disrupt the reader’s understanding.
Specific comments:
- A lot of concepts are defined in the introduction and never repeated, e.g.
- Generative Adversarial Network (GANs), line 48, without reference.
- Elastic Weight Consolidation (EWC), line 65.
- Memory Aware Synapses (MAS), line 69.
- Remanian (do you mean Riemannian?) Walk, line 72.
- ICARL, line 79.
- ….
What is the relevance of these citations for the proposed method?
- Many sentences are not clear, e.g.
- “The main goal of incremental learning can be described as performing well both in historical tasks.”, lines 40-41. In historical tasks and where?
- “Preliminary conclusion can be drawn from mentioned methods and related literatures is that the similarity/dissimilarity of time series depends on the target of utilizing the similarity, that so far most of the researches propose various measurement methods from time and global or local structural features based on relatively small dataset and that among the methods the most common methods such as Euclidean distance and DTW show high performance with relatively simple idea.”, lines 101-105. This phrase is too long, it has some grammatical mistakes, and it is generally difficult to comprehend.
- “We combine data distribution estimation, temporal similarity, and regularization methods to improve.”, lines 142-143. To improve what?
- “Skewness and Kurtosis are selected as the distribution estimation metrics and standardized Euclidean distance works as the time series similarity metric, the calculation process can be formulated as the following.”, lines 182-184.
- “Then calculating the importance for each parameter in the network is attached to the loss function of the network, as regularization constraint.”, lines 198-199. Which is the subject here?
- “However, the results show relatively weak self-adaptivity lower the ability of the online learning of the incremental learning method hard to handle the incremental data with rapidly changeable distribution.” Lines 323-325
- ….
- Figures:
- Figure 2: for clarity define DNN here (defined on main text at line 23)
- Figure 3: This figure is quite confusing: how is slicing performed here? Are we measuring the similarity between what? How is new data selected?
- Figure 4: FC and R are not defined here and everywhere in the text. Why repeating R if it is a RNN? The picture of the LSTM does not referee the memory cell. Shouldn´t it be C_{t-1} instead of h__{t-1} ?
- Figures 5: is this plot relative to the attention-LSTM? And the other DNN models?
- Line 203: “𝜃𝜃∗𝑃𝑃,𝑖𝑖 is the standard to evaluate the parameter, which represents the difference between the previous and incremental meta sample”, what does it mean? What is a meta sample here?
- Line 208: “ When the incremental data come at some time, both baseline and the incremental learning method are performed.” Was the baseline not trained once and for all with all the data available?
- Line 210: “…three incremental tasks…”. Where are these tasks defined and discussed?
- Line 270: the NSE is not referenced.
- Tables: the tables show only the results for attention-LSTM. Where are the results for attention-GRU and attention-RNN?
- Line 312: do you mean “Good ability on continuous incremental learning?”
- Lines 357-358. The hyper-parameters are not reported, and the results are not robust due to the lack of validation and test splits.
- ….
Citation: https://doi.org/10.5194/hess-2024-56-RC2 -
AC2: 'Reply on RC2', Changjiang Xiao, 01 Jul 2024
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2024-56/hess-2024-56-AC2-supplement.pdf
Status: closed
-
CC1: 'Comment on hess-2024-56', John Ding, 31 Mar 2024
Source of the NSE (Equation 22)
I've read with curiosity the contribution from Wuhan, PRC, on application of LSTMs on the Yangtze and Han Rivers using an NSE as a performance metric, Chen et al. (2024). Their Equation 22 is called the NSE, Nash-Sutcliffe efficiency coefficient, but I can't find a reference in the manuscript.
Equation 22 is same as Equation 1 in Bassi et al. (2024, and CC1 therein). Both are of same form as the coefficient of determination, R^2, in Ding (1974, Equations 40, 47) and the NDE (Nash-Ding efficiency) in Duc and Sawada (2023, Equation 3).
Is Equation 22 an NSE in name, but an NDE in fact?
References
Bassi, A., Höge, M., Mira, A., Fenicia, F., and Albert, C.: Learning Landscape Features from Streamflow with Autoencoders, Hydrol. Earth Syst. Sci. Discuss. [preprint], https://doi.org/10.5194/hess-2024-47, in review, 2024.
Chen, Z., Li, J., Xiao, C., and Chen, N.: Incremental learning for rainfall-runoff simulation on deep neural networks, Hydrol. Earth Syst. Sci. Discuss. [preprint], https://doi.org/10.5194/hess-2024-56, in review, 2024.
Ding, J.Y., 1974. Variable unit hydrograph. J. Hydrol., 22: 53--69.
Duc, L. and Sawada, Y.: A signal-processing-based interpretation of the Nash–Sutcliffe efficiency, Hydro. Earth Syst. Sci., 27, 1827–1839, https://doi.org/10.5194/hess-27-1827-2023, 2023.
Citation: https://doi.org/10.5194/hess-2024-56-CC1 -
AC3: 'Reply on CC1', Changjiang Xiao, 01 Jul 2024
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2024-56/hess-2024-56-AC3-supplement.pdf
-
AC3: 'Reply on CC1', Changjiang Xiao, 01 Jul 2024
-
RC1: 'Comment on hess-2024-56', Anonymous Referee #1, 24 Apr 2024
This paper describes a method for increasing the speed of rainfall-runoff recurrent neural network (RNN) model training to reduce the time taken during emergency decision-making situations. It is based on regular training of an RNN plus an incremental training operation. The method is tested on 10 gauging stations on the Yangtze/Han basin using the LSTM and GRU forms of RNN. The method is reported to increase the speed of model training whilst not significantly decreasing model performance.
My general comments on this paper are:
- The novelty of the method is not sufficiently described. Currently, it is common practice to train a model on most of the data and then ‘finetune’, or update the model, using a smaller selection of previously unseen data (see the NeuralHydrology package as an example: https://neuralhydrology.readthedocs.io/en/latest/tutorials/finetuning.html ). As this finetuning would currently be the method performed in practise under the situation described here (updating a model during an emergency situation to avoid the time needed for complete retraining), the novelty of this paper appears to be in the selection of the finetuning data. However, the need for the new selection method is not made clear.
- There is a lack of reference to current state-of-the-art rainfall-runoff modelling with RNNs. The rather significant body of work around rainfall-runoff modelling with the types of RNNs used here (LSTMs, GRUs) is not mentioned at all. Nor is the machine learning practice of ‘finetuning’, which the proposed method is based on, and the use of it in previous rainfall-runoff applications. Citations relating to rainfall-runoff modelling appear to be from the local area, and do not reflect the significant global developments in the field of rainfall-runoff modelling with machine learning.
- The baseline conditions being compared to are not appropriate. Here, the baseline is an RNN trained with the entire dataset. Whereas, more appropriately, the baseline conditions should be a model that has been trained and then finetuned on a smaller selection of data. This is the current method that would be used in the case of wanting to update a model quickly based on a small amount of newly acquired data, which is the premise of this paper. When comparing the proposed method to a baseline method, the current state-of-the-art (a finetuned model) should be used as baseline. This would then presumably be compared to one finetuned with data selected by the proposed method.
- Model setup and training is not performed to currently accepted standards. Hyperparameter tuning, a basic necessity of any machine learning model training procedure, is not performed at all. Instead, values are merely copied from other unrelated studies. In Lines 337 and 358, it is indicated that the study results are demonstrated ‘under proper hyperparameter settings’ which is apparently not the case. Also, the data appears not to have been split into training, validation and testing sets, to ensure that the reported test metrics are obtained on data that was not used during model training. Data splitting is a staple necessity of machine learning model training to avoid data cross-contamination. If the models are not setup and trained to best practice, why would readers trust the results? There’s no indication that the results would hold when readers apply them to rigorously setup and trained models.
- The method is not described well enough to follow. I was unable to see how the novel contribution – the selection of ‘partial data’ – was obtained. There is not sufficient explanation to understand this, there is little flow between sentences in the methods section, and many sentences are incomprehensible.
- The stated results are not supported by the reported metrics (that I can tell). The tables of results are visually difficult to comprehend and I am unable to find the stated conclusions within them.
- The overall presentation of the paper is poor. Many sentences are incomprehensible. Confusing terms are used that are not explained and appear to not be related to the proposed method. Much editing is required to ensure sentences are clearly formed and meaningful.
Abstract
The basis for this study as mentioned in the first sentence, that ‘deep learning always costs plenty of time for training’, is too vague. Why would it not be optimal to use a pre-trained model, as is current best practice? The need for the proposed method is not made clear.
Introduction
The ‘significant consumption of time’ of training a regular model that makes this method necessary is not described. Is this using an HPC cluster? Or a laptop? Why would one train a model from scratch during an emergency situation? This all needs further clarification.
The two-page long single paragraph (obviously far too long) beginning ‘Incremental learning…’ appears to consist of random sentences from other papers (with appropriate citations given). For example from line 64: ‘The reparameterization leads to a factorized rotation of the parameter space and makes the diagonal Fisher information matrix assumption more applicable’ - most of these terms have not been used before in this paper and will not be used again. The flow does not make sense and much of it seems irrelevant.
The description of incremental learning in line 40 as ‘….learning from…different tasks or domains to solve the future problem with historical experience’ sounds like a description of transfer learning. In line 57, ‘It is found that neural network required fewer training epochs to reach a target error on a new task after having learned other similar tasks’ also describes transfer learning or finetuning, with no reference to either of these well-established machine learning methods. If the proposed method is based on these methods, they should be discussed.
Many sentences are undecipherable, for example:
- Line 40: ‘The main goal of incremental learning can be described as performing well both in historical tasks.’
- Line 46: ‘In raw replay methods, a buffer is usually set to store part of the historical data, which avoids frequent data selection when incremental data come while adds memory overhead.’
- Line 66: ‘…learning is slowed down by weights that are important to the previous task. Specifically, the learning of the important weights that are important to the previous tasks is slowed down.’
- Line 105: ‘Owing to the temporal characters of rainfall-runoff data, the similarity measurements for time series can be integrated to partial representative replayed data selection standards of the incremental learning method.’ (??)
Many terms are used in an unclear and unexplained manner, for example: catastrophic forgetting (line 56), important weight (line 69), path integrals (line 73), SI (line 74), ICARL (line 79), etc. These are not explained and the relevance to the paper is not well-defined.
Method
The overall benefit of the method - including historical data in the incremental learning process - is not made clear. Why not just finetune with the new data?
A reader could not recreate this experiment given the information here. The method section appears to consist of random sentences pieced together.
- There is no sufficient explanation of how historical data is combined (ie. Line 165)
- Line 164: ‘…the weights of the calculation result of the difference are assigned and the replay scores are obtained’ does not clearly describe how the replay scores are obtained.
- New terms are introduced and not explained: ‘depth model parameters’, ‘moment model’, ‘incremental meta sample’. These are only used once and never referred to again, increasing confusion.
Line 152: ‘Our method is based on regular network training, and as a result, the amount of calculation is significantly reduced, resulting in a notable acceleration of the training process.’ Why is it a result that the amount of calculation is reduced if using regular network training?
Line 158 refers to this method handling the ‘error problem of the network’ when this error problem is never described.
Figures:
- Figure 2: the image of a feed-forward network used repeatedly here is confusing, when the paper is about recurrent neural networks.
- Figure 4: no mention of what the letters on the diagram refer to (eg. R, FC).
Line 214: ‘..part of the changed hyperparameters’. What is changed about the hyperparameters? And changed from what, as hyperparameters haven’t been mentioned yet?
Time is apparently one of the main metrics of this study and there is no description of how it is calculated (eg. start point, end point, what are the computing conditions, etc.).
Hyperparameter tuning is non-existent. This should be performed with a grid search (or other accepted method) and the choices made should be documented. Borrowing values from other unrelated studies, as described in Lines 282-286 (currently in Results section, should be in Method section), is not adequate. Python packages exist to do this quickly or it is simple to code for yourself. Selection of network size, lookback length and learning rate need to be included.
Results
More evidence (clearly presented) should be documented to support the findings. Some numbers are given indicating good performance but there is little evidence to support them. Moreover, the stated results often appear to contradict themselves:
- Line 293: ‘The incremental learning method is seen to have a higher RMSE and lower NSE than the baseline method at all stations, indicating better performance in forecasting.’ Would a lower RMSE and higher NSE not indicate better performance?
- Line 294: ‘the RMSE of the incremental learning method increases by 6.8% to 17.9% compared to the baseline method’ followed by Line 297: ‘These results suggest that the incremental learning method is effective in improving the efficiency of hydrological forecasting… while maintaining an acceptable range of model training errors.’ How is it determined that an increase of 17.9% is within an acceptable range? In Line 344 this is referred to as a ‘smaller error compared to the baseline model’. This needs more explanation.
- Line 302: ‘….with an error range of around 1%.’ There is no description of how this 1% is calculated or what it means.
- Line 310: ‘It can be obviously concluded from table 1 and table 2 that when the data size is at 20% of the entire dataset, if the model training time increases by more than 4 times, and the difference in error is less than 5%, and the difference in ratio-based metrics is less than 0.08.’ This sentence is unclear. I do not know what the 20% refers to, and it is very difficult to find the 4 times increase and difference in error (what error?) of 5% on the tables. What is ratio-based metrics – these have not been described and it is unclear how the value of 0.08 has been determined?
- Line 323: ‘Specifically, the run-time difference reaches over 4 times, the PE increase less than 3%, the NSE decease less than 0.05.’ Again, support for this is claim cannot be found in the results.
- Line 344: ‘However, it is notable that the baseline model and the incremental learning method had a higher error in the Han River basin than in the Yangtze River basin, likely due to the similar climatic conditions and rainfall patterns between the two regions.’ If conditions are similar, why are errors expected to be different?
The tables would perhaps be more effective if displayed as graphs. The reader needs to be able to compare and comprehend the difference between the values, not just the values themselves, as it is the differences that are referred to as the main conclusions.
Figures:
- Figure 5: what are the units on the y-axis of the Time plot? These results appear suspiciously close together, in the range [4.1-4.5], for all of the stations and all of the models. How is this explained?
- Figures 6 and 7 should be combined into one figure.
Many phrases are incomprehensible, for example Line 329: ‘…which imply that when the incremental data are taken as continuously input, the incremental learning method gives the deep learning models the ability to continuous incremental learning.’ and Line 360: ‘Besides, the similar increase intensity of evaluation metrics differences shows that….’
Again, terms are used that are not described and are not used again, eg.: ‘distribution rules’, ‘weak self-adaptivity’, etc.
Conclusion
The three listed conclusions are unclear and unsupported. In the second point, the claim that the proposed method ‘…guarantee percentage error increase and NSE decrease less than 5%’ has not been clearly demonstrated.
Citation: https://doi.org/10.5194/hess-2024-56-RC1 -
AC1: 'Reply on RC1', Changjiang Xiao, 01 Jul 2024
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2024-56/hess-2024-56-AC1-supplement.pdf
-
RC2: 'Comment on hess-2024-56', Anonymous Referee #2, 30 May 2024
The paper describes an incremental learning method to accelerate rainfall/runoff simulation with recurrent neural networks (RNN) to speed up decision-making situations during extreme events. The model is trained on historical data and regularly updated with data coming from emergency situations. The model was tested on 10 stations in the Yangtze and Han River in China with three different kinds of RNNs. The authors claim that the proposed method speeds up training efficiency at cost of little performance degradation.
General comments:
- The explanation of the proposed incremental learning method lacks clarity, and the novelty is arguable due to insufficient referencing of prior and similar work. Specifically, the proposed method seems to innovate in how incoming data is selected, yet the lack of references makes it difficult to judge. The explanation provided in section 2.2 is not made sufficiently clear and no reference is provided from line 140 to 215. Moreover, the distinctions between the three proposed incremental learning scenarios are not well defined.
- The study's regional focus limits its broader applicability. A model based on studies from a larger area might offer better generalization capabilities and could serve as a more robust baseline to refine.
- The paper deviates from standard dataset division into training, validation, and test splits, common in deep learning, which allows hyperparameter selection and model generalization. Hyper-parameter selection is not done at all and most of the hyperparameters are not reported (e.g. the length of training time series, number of LSTM memory cells, training epochs etc. ). The dataset’s parameters are vague: where and when is it trained on? The reported results seem to be from training data only. Therefore, the relevance of the results is questionable: if, for instance, all the models are overfitting the data, it is not surprising that the performance is similar when training with about 20 % of the data, which of course accelerates training significantly.
- The paper's text is difficult to follow due to its disorganized structure and numerous grammatical errors, which disrupt the reader’s understanding.
Specific comments:
- A lot of concepts are defined in the introduction and never repeated, e.g.
- Generative Adversarial Network (GANs), line 48, without reference.
- Elastic Weight Consolidation (EWC), line 65.
- Memory Aware Synapses (MAS), line 69.
- Remanian (do you mean Riemannian?) Walk, line 72.
- ICARL, line 79.
- ….
What is the relevance of these citations for the proposed method?
- Many sentences are not clear, e.g.
- “The main goal of incremental learning can be described as performing well both in historical tasks.”, lines 40-41. In historical tasks and where?
- “Preliminary conclusion can be drawn from mentioned methods and related literatures is that the similarity/dissimilarity of time series depends on the target of utilizing the similarity, that so far most of the researches propose various measurement methods from time and global or local structural features based on relatively small dataset and that among the methods the most common methods such as Euclidean distance and DTW show high performance with relatively simple idea.”, lines 101-105. This phrase is too long, it has some grammatical mistakes, and it is generally difficult to comprehend.
- “We combine data distribution estimation, temporal similarity, and regularization methods to improve.”, lines 142-143. To improve what?
- “Skewness and Kurtosis are selected as the distribution estimation metrics and standardized Euclidean distance works as the time series similarity metric, the calculation process can be formulated as the following.”, lines 182-184.
- “Then calculating the importance for each parameter in the network is attached to the loss function of the network, as regularization constraint.”, lines 198-199. Which is the subject here?
- “However, the results show relatively weak self-adaptivity lower the ability of the online learning of the incremental learning method hard to handle the incremental data with rapidly changeable distribution.” Lines 323-325
- ….
- Figures:
- Figure 2: for clarity define DNN here (defined on main text at line 23)
- Figure 3: This figure is quite confusing: how is slicing performed here? Are we measuring the similarity between what? How is new data selected?
- Figure 4: FC and R are not defined here and everywhere in the text. Why repeating R if it is a RNN? The picture of the LSTM does not referee the memory cell. Shouldn´t it be C_{t-1} instead of h__{t-1} ?
- Figures 5: is this plot relative to the attention-LSTM? And the other DNN models?
- Line 203: “𝜃𝜃∗𝑃𝑃,𝑖𝑖 is the standard to evaluate the parameter, which represents the difference between the previous and incremental meta sample”, what does it mean? What is a meta sample here?
- Line 208: “ When the incremental data come at some time, both baseline and the incremental learning method are performed.” Was the baseline not trained once and for all with all the data available?
- Line 210: “…three incremental tasks…”. Where are these tasks defined and discussed?
- Line 270: the NSE is not referenced.
- Tables: the tables show only the results for attention-LSTM. Where are the results for attention-GRU and attention-RNN?
- Line 312: do you mean “Good ability on continuous incremental learning?”
- Lines 357-358. The hyper-parameters are not reported, and the results are not robust due to the lack of validation and test splits.
- ….
Citation: https://doi.org/10.5194/hess-2024-56-RC2 -
AC2: 'Reply on RC2', Changjiang Xiao, 01 Jul 2024
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2024-56/hess-2024-56-AC2-supplement.pdf
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
457 | 208 | 40 | 705 | 31 | 27 |
- HTML: 457
- PDF: 208
- XML: 40
- Total: 705
- BibTeX: 31
- EndNote: 27
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1