the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Incremental learning for rainfall-runoff simulation on deep neural networks
Abstract. Rainfall-runoff simulation based on deep learning always costs plenty of time for training with large datasets. This may affect quick decision making in some flood emergency decision-making situations. To address this issue, this study proposes an incremental learning method to accelerate rainfall-runoff simulation with deep learning model. The method consists of two components, regular training and incremental operation. In regular training phase, the model is regularly trained using historical data. In the incremental operation phase, the method selects representative samples from historical data with distribution estimation metrics and time series similarity metrics, then updates the regularly trained model with the sampled data and recent data in case of emergency. The proposed method was tested using ten hydrological observation stations in the Yangtze River and Han River drainage basin, with three different modified Recurrent Neural Networks. The results show that the incremental learning method achieves a training efficiency acceleration of over 4 times, with only a little increase in percentage error and decrease in Nash-Sutcliffe efficiency coefficient. The results also illustrate the robustness of this method for different models in different places, as well as during continuous incremental scenarios. The findings indicate that the incremental learning method has great potential applications in rapid rainfall-runoff simulation for flood emergency decision-making.
- Preprint
(4117 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 17 May 2024)
-
CC1: 'Comment on hess-2024-56', John Ding, 31 Mar 2024
reply
Source of the NSE (Equation 22)Â
I've read with curiosity the contribution from Wuhan, PRC, on application of LSTMs on the Yangtze and Han Rivers using an NSE as a performance metric, Chen et al. (2024). Â Their Equation 22 is called the NSE, Nash-Sutcliffe efficiency coefficient, but I can't find a reference in the manuscript.
Equation 22 is same as Equation 1 in Bassi et al. (2024, and CC1 therein). Both are of same form as the coefficient of determination, R^2, in Ding (1974, Equations 40, 47) and the NDE (Nash-Ding efficiency) in Duc and Sawada (2023, Equation 3).Â
Is Equation 22 an NSE in name, but an NDE in fact?
References
Bassi, A., Höge, M., Mira, A., Fenicia, F., and Albert, C.: Learning Landscape Features from Streamflow with Autoencoders, Hydrol. Earth Syst. Sci. Discuss. [preprint], https://doi.org/10.5194/hess-2024-47, in review, 2024.
Chen, Z., Li, J., Xiao, C., and Chen, N.: Incremental learning for rainfall-runoff simulation on deep neural networks, Hydrol. Earth Syst. Sci. Discuss. [preprint], https://doi.org/10.5194/hess-2024-56, in review, 2024.
Ding, J.Y., 1974. Variable unit hydrograph. J. Hydrol., 22: 53--69.
Duc, L. and Sawada, Y.: A signal-processing-based interpretation of the Nash–Sutcliffe efficiency, Hydro. Earth Syst. Sci., 27, 1827–1839, https://doi.org/10.5194/hess-27-1827-2023, 2023.
Citation: https://doi.org/10.5194/hess-2024-56-CC1 -
RC1: 'Comment on hess-2024-56', Anonymous Referee #1, 24 Apr 2024
reply
This paper describes a method for increasing the speed of rainfall-runoff recurrent neural network (RNN) model training to reduce the time taken during emergency decision-making situations. It is based on regular training of an RNN plus an incremental training operation. The method is tested on 10 gauging stations on the Yangtze/Han basin using the LSTM and GRU forms of RNN. The method is reported to increase the speed of model training whilst not significantly decreasing model performance.
My general comments on this paper are:
- The novelty of the method is not sufficiently described. Currently, it is common practice to train a model on most of the data and then ‘finetune’, or update the model, using a smaller selection of previously unseen data (see the NeuralHydrology package as an example: https://neuralhydrology.readthedocs.io/en/latest/tutorials/finetuning.html ). As this finetuning would currently be the method performed in practise under the situation described here (updating a model during an emergency situation to avoid the time needed for complete retraining), the novelty of this paper appears to be in the selection of the finetuning data. However, the need for the new selection method is not made clear.
- There is a lack of reference to current state-of-the-art rainfall-runoff modelling with RNNs. The rather significant body of work around rainfall-runoff modelling with the types of RNNs used here (LSTMs, GRUs) is not mentioned at all. Nor is the machine learning practice of ‘finetuning’, which the proposed method is based on, and the use of it in previous rainfall-runoff applications. Citations relating to rainfall-runoff modelling appear to be from the local area, and do not reflect the significant global developments in the field of rainfall-runoff modelling with machine learning.
- The baseline conditions being compared to are not appropriate. Here, the baseline is an RNN trained with the entire dataset. Whereas, more appropriately, the baseline conditions should be a model that has been trained and then finetuned on a smaller selection of data. This is the current method that would be used in the case of wanting to update a model quickly based on a small amount of newly acquired data, which is the premise of this paper. When comparing the proposed method to a baseline method, the current state-of-the-art (a finetuned model) should be used as baseline. This would then presumably be compared to one finetuned with data selected by the proposed method.
- Model setup and training is not performed to currently accepted standards. Hyperparameter tuning, a basic necessity of any machine learning model training procedure, is not performed at all. Instead, values are merely copied from other unrelated studies. In Lines 337 and 358, it is indicated that the study results are demonstrated ‘under proper hyperparameter settings’ which is apparently not the case. Also, the data appears not to have been split into training, validation and testing sets, to ensure that the reported test metrics are obtained on data that was not used during model training. Data splitting is a staple necessity of machine learning model training to avoid data cross-contamination. If the models are not setup and trained to best practice, why would readers trust the results? There’s no indication that the results would hold when readers apply them to rigorously setup and trained models.
- The method is not described well enough to follow. I was unable to see how the novel contribution – the selection of ‘partial data’ – was obtained. There is not sufficient explanation to understand this, there is little flow between sentences in the methods section, and many sentences are incomprehensible.
- The stated results are not supported by the reported metrics (that I can tell). The tables of results are visually difficult to comprehend and I am unable to find the stated conclusions within them.
- The overall presentation of the paper is poor. Many sentences are incomprehensible. Confusing terms are used that are not explained and appear to not be related to the proposed method. Much editing is required to ensure sentences are clearly formed and meaningful.
Â
Abstract
The basis for this study as mentioned in the first sentence, that ‘deep learning always costs plenty of time for training’, is too vague. Why would it not be optimal to use a pre-trained model, as is current best practice? The need for the proposed method is not made clear.
Â
Introduction
The ‘significant consumption of time’ of training a regular model that makes this method necessary is not described. Is this using an HPC cluster? Or a laptop? Why would one train a model from scratch during an emergency situation? This all needs further clarification.
The two-page long single paragraph (obviously far too long) beginning ‘Incremental learning…’ appears to consist of random sentences from other papers (with appropriate citations given). For example from line 64: ‘The reparameterization leads to a factorized rotation of the parameter space and makes the diagonal Fisher information matrix assumption more applicable’ - most of these terms have not been used before in this paper and will not be used again. The flow does not make sense and much of it seems irrelevant.
The description of incremental learning in line 40 as ‘….learning from…different tasks or domains to solve the future problem with historical experience’ sounds like a description of transfer learning. In line 57, ‘It is found that neural network required fewer training epochs to reach a target error on a new task after having learned other similar tasks’ also describes transfer learning or finetuning, with no reference to either of these well-established machine learning methods. If the proposed method is based on these methods, they should be discussed.
Many sentences are undecipherable, for example:
- Line 40: ‘The main goal of incremental learning can be described as performing well both in historical tasks.’
- Line 46: ‘In raw replay methods, a buffer is usually set to store part of the historical data, which avoids frequent data selection when incremental data come while adds memory overhead.’
- Line 66: ‘…learning is slowed down by weights that are important to the previous task. Specifically, the learning of the important weights that are important to the previous tasks is slowed down.’
- Line 105: ‘Owing to the temporal characters of rainfall-runoff data, the similarity measurements for time series can be integrated to partial representative replayed data selection standards of the incremental learning method.’ (??)
Many terms are used in an unclear and unexplained manner, for example: catastrophic forgetting (line 56), important weight (line 69), path integrals (line 73), SI (line 74), ICARL (line 79), etc. These are not explained and the relevance to the paper is not well-defined.
Â
Method
The overall benefit of the method - including historical data in the incremental learning process - is not made clear. Why not just finetune with the new data?
A reader could not recreate this experiment given the information here. The method section appears to consist of random sentences pieced together.
- There is no sufficient explanation of how historical data is combined (ie. Line 165)
- Line 164: ‘…the weights of the calculation result of the difference are assigned and the replay scores are obtained’ does not clearly describe how the replay scores are obtained.
- New terms are introduced and not explained: ‘depth model parameters’, ‘moment model’, ‘incremental meta sample’. These are only used once and never referred to again, increasing confusion.
Line 152: ‘Our method is based on regular network training, and as a result, the amount of calculation is significantly reduced, resulting in a notable acceleration of the training process.’ Why is it a result that the amount of calculation is reduced if using regular network training?
Line 158 refers to this method handling the ‘error problem of the network’ when this error problem is never described.
Figures:
- Figure 2: the image of a feed-forward network used repeatedly here is confusing, when the paper is about recurrent neural networks.
- Figure 4: no mention of what the letters on the diagram refer to (eg. R, FC).
Line 214: ‘..part of the changed hyperparameters’. What is changed about the hyperparameters? And changed from what, as hyperparameters haven’t been mentioned yet?
Time is apparently one of the main metrics of this study and there is no description of how it is calculated (eg. start point, end point, what are the computing conditions, etc.).
Hyperparameter tuning is non-existent. This should be performed with a grid search (or other accepted method) and the choices made should be documented. Borrowing values from other unrelated studies, as described in Lines 282-286 (currently in Results section, should be in Method section), is not adequate. Python packages exist to do this quickly or it is simple to code for yourself. Selection of network size, lookback length and learning rate need to be included.
Â
Results
More evidence (clearly presented) should be documented to support the findings. Some numbers are given indicating good performance but there is little evidence to support them. Moreover, the stated results often appear to contradict themselves:
- Line 293: ‘The incremental learning method is seen to have a higher RMSE and lower NSE than the baseline method at all stations, indicating better performance in forecasting.’ Would a lower RMSE and higher NSE not indicate better performance?
- Line 294: ‘the RMSE of the incremental learning method increases by 6.8% to 17.9% compared to the baseline method’ followed by Line 297: ‘These results suggest that the incremental learning method is effective in improving the efficiency of hydrological forecasting… while maintaining an acceptable range of model training errors.’ How is it determined that an increase of 17.9% is within an acceptable range? In Line 344 this is referred to as a ‘smaller error compared to the baseline model’. This needs more explanation.
- Line 302: ‘….with an error range of around 1%.’ There is no description of how this 1% is calculated or what it means.
- Line 310: ‘It can be obviously concluded from table 1 and table 2 that when the data size is at 20% of the entire dataset, if the model training time increases by more than 4 times, and the difference in error is less than 5%, and the difference in ratio-based metrics is less than 0.08.’ This sentence is unclear. I do not know what the 20% refers to, and it is very difficult to find the 4 times increase and difference in error (what error?) of 5% on the tables. What is ratio-based metrics – these have not been described and it is unclear how the value of 0.08 has been determined?
- Line 323: ‘Specifically, the run-time difference reaches over 4 times, the PE increase less than 3%, the NSE decease less than 0.05.’ Again, support for this is claim cannot be found in the results.
- Line 344: ‘However, it is notable that the baseline model and the incremental learning method had a higher error in the Han River basin than in the Yangtze River basin, likely due to the similar climatic conditions and rainfall patterns between the two regions.’ If conditions are similar, why are errors expected to be different?
The tables would perhaps be more effective if displayed as graphs. The reader needs to be able to compare and comprehend the difference between the values, not just the values themselves, as it is the differences that are referred to as the main conclusions.
Figures:
- Figure 5: what are the units on the y-axis of the Time plot? These results appear suspiciously close together, in the range [4.1-4.5], for all of the stations and all of the models. How is this explained?
- Figures 6 and 7 should be combined into one figure.
Â
Many phrases are incomprehensible, for example Line 329: ‘…which imply that when the incremental data are taken as continuously input, the incremental learning method gives the deep learning models the ability to continuous incremental learning.’  and Line 360: ‘Besides, the similar increase intensity of evaluation metrics differences shows that….’
Again, terms are used that are not described and are not used again, eg.: ‘distribution rules’, ‘weak self-adaptivity’, etc.
Â
Conclusion
The three listed conclusions are unclear and unsupported. In the second point, the claim that the proposed method ‘…guarantee percentage error increase and NSE decrease less than 5%’ has not been clearly demonstrated.
Citation: https://doi.org/10.5194/hess-2024-56-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
254 | 121 | 20 | 395 | 15 | 11 |
- HTML: 254
- PDF: 121
- XML: 20
- Total: 395
- BibTeX: 15
- EndNote: 11
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1