Hydrologically informed machine learning for rainfall–runoff modelling: towards distributed modelling

Herath, Herath Mudiyanselage Viraj Vidura; Chadalawada, Jayashree; Babovic, Vladan

doi:https://doi.org/10.5194/hess-25-4373-2021

Articles | Volume 25, issue 8

https://doi.org/10.5194/hess-25-4373-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/hess-25-4373-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 25, issue 8

Research article

|

11 Aug 2021

Research article |

| 11 Aug 2021

Hydrologically informed machine learning for rainfall–runoff modelling: towards distributed modelling

Herath Mudiyanselage Viraj Vidura Herath, Jayashree Chadalawada, and Vladan Babovic

Download

Final revised paper (published on 11 Aug 2021)
Preprint (discussion started on 19 Oct 2020)

Interactive discussion

Status: closed

AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment

- Printer-friendly version

- Supplement

RC1: 'Review of hess-2020-487', Anonymous Referee #1, 11 Nov 2020
- AC1: 'Reply on RC1', Herath Mudiyanselage Viraj Vidura Herath, 24 Dec 2020
RC2: 'Review: Hydrologically Informed Machine Learning for Rainfall-Runoff Modelling: Towards Distributed Modelling', Anonymous Referee #2, 02 Dec 2020
- AC2: 'Reply on RC2', Herath Mudiyanselage Viraj Vidura Herath, 24 Dec 2020
EC1: 'Comment on hess-2020-487', Fabrizio Fenicia, 12 Apr 2021

Peer-review completion

AR: Author's response | RR: Referee report | ED: Editor decision

ED: Reconsider after major revisions (further review by editor and referees) (01 Jan 2021) by Fabrizio Fenicia

AR by Herath Mudiyanselage Viraj Vidura Herath on behalf of the Authors (10 Jan 2021) Author's response Author's tracked changes Manuscript

ED: Reconsider after major revisions (further review by editor and referees) (14 Jan 2021) by Fabrizio Fenicia

AR by Herath Mudiyanselage Viraj Vidura Herath on behalf of the Authors (06 Feb 2021) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (08 Feb 2021) by Fabrizio Fenicia

RR by Anonymous Referee #3 (30 Mar 2021)

Suggestions for revision or reasons for rejection

This paper developed a model named Machine Induction Knowledge Augmented - System Hydrologique Asiatique (MIKA-SHA) for automatic induction of semi-distributed rainfall-runoff models for a catchment of interest. I think there are two selling points, one is the application of Genetic Programming on semi-distributed models. This should be the main improvment since the authors have published a paper using the same method on a lumped model. The second selling point would be the machine learning algorithm Genetic Programming in the model selection and calibration. Although the results look good, they are all self-comparison. I did not see a comparison between their distributed method to non-distributed one, or the comparison between machine-learning algorithm to the non-machine learning algorithm.

1, The introduction parts are too long. Section 1.1 could be simplfied. In addition, after I read through your paper, you are using two conceptual distributed models SUPERFLEX_TOPO_M2 (Figure 5) and the FUSE_TOPO_M1 model (Figure 8). It looks like there is nothing related to ANN. If my understanding is correct, your section 1.2.1 ANN could be removed or greatly simplified as well.

2, Line 170-172. "the state of art machine learning capabilities have not been tested in hydrological modeling and they expect even distributed hydrological models are to be developed primarily on machine learning in near future." Several rainfall-runoff modeling studies using deep learning have applied the semi-distributed structure using different approaches, for example:
Neural Runoff Model from Xiang et al.: https://www.sciencedirect.com/science/article/abs/pii/S1364815220301900
HydroNets from Google: https://ai.googleblog.com/2020/09/the-technology-behind-our-recent.html

3, You are using the Genetic Programming, a machine learning approach, to tune the parameters of conceptual distributed models. However, people are using many statistically-based approaches such as SUFI2, ParaSol, MCMC to calibrate the model. (https://www.sciencedirect.com/science/article/abs/pii/S0022169408002370)
So, why are we using Genetic Programming? Does it really perform better than SUFI2 or other approaches?

4, It is noticed that you have proposed ML-RR-MI and published a paper titled "Hydrologically Informed Machine Learning for Rainfall‐Runoff Modeling: A Genetic Programming‐Based Toolkit for Automatic Model Induction" on Water Resources Research. The difference and improvement between these two papers should be one of the key descriptions. And I have a question that did you applied the lumped model without a semi-distributed structure at the same catchment? How the performance of the semi-distributed model comparing to the model without it? The comparison would be helpful to understand the effectiveness of the semi-distributed structure in this research.

Hide

RR by Anonymous Referee #4 (31 Mar 2021)

Suggestions for revision or reasons for rejection

Interactive comment on “Hydrologically Informed Machine Learning for Rainfall-Runoff Modelling: Towards Distributed Modelling”
By Herath Mudiyanselage Viraj Vidura Herath, Jayashree Chadalawada, Vladan Babovic
In this manuscript the authors use a machine learning method (Generic Programming) to identify optimal model structures from building blocks of two flexible rainfall-runoff modelling frameworks, i.e. FUSE and SUPERFLEX under semi-distributed catchment setting. This way the authors aim to eliminate the subjectivity in model structure selection. Further, the authors apply GLUE methodology to conduct parameter uncertainty analysis for the selected optimal model structures. The proposed approach was evaluated using data from the Red Creek Catchment, United States. This is very interesting research worthy to be encouraged as it lies within an area where extensive research is needed, i.e. in the area of machine learning applications in hydrology. However, I have some comments both on the content and structure of the manuscript that need to be addressed before the manuscript gets accepted for final publication.
General comment:
The authors have provided an extensive literature review on the subject matter. However, some of the topics are less relevant and may lead astray for the reader from the main subject matter. For example, it could suffice to present the literature on machine learning applications in water resources simply in one paragraph as part of the introduction section than providing own literature review section (section 3). Similarly, the sub-section focused on lumped and distributed models can be removed from the manuscript since this is too common topic in hydrology. On the other hand, less coverage was given to details of certain methodologies followed in this research. For example, it would have been more helpful to provide the reader with further details on set up and components of the Genetic Programming (GP), FUSE and SUPERFLEX by removing the literature review on less relevant topics including the sub-section focused on Artificial Neural Networks (ANN) (since ANN was not used in this research). Generally, with the exception of the sub-topics focused on GP, FUSE, and SUPERFLEX the remaining contents of Section 2 (Fundamental approaches in Hydrological modelling) and Section 3 (Machine Learning in Water Resources) can be either omitted, or placed under the introduction or discussions sections in a concise and relevant form.
The methodology and scientific background contents appear blended in many sections of this manuscript. Thus, I would recommend having a separate Methodology section with only those methodologies followed in your study placed under this section. Similarly, the ‘Discussions’ section is missing and some of the paragraphs in sections 2 to 6 appear more suited to the discussions section. Under this section, you may compare and contrast these previous research works in relation to yours with regards to the methodologies followed and results obtained in your research.
Although the authors have effectively applied machine learning methods for model structure identification under distributed setting, I am a beat skeptical on some of the conclusions arrived in relation to the methodologies followed. For example, the available dataset was divided into four categories, i.e. spin-up, calibration, validation, and test. From the manuscript it can be noticed that both model calibration and validation datasets were used in training the hydrological and machine learning models (e.g. L448 and L464 in section 5.3). Thus, out of the total length of the dataset (i.e. 11 years), only one year was allocated for model testing (2013-2014) or for actual validation of the hydrological model. The question would then be if we can conclude that the proposed methodology achieved the intended goal. Application of the hydrological model for a single hydrologic year may provide the possibility to assess the dominant hydrologic processes in relation to the prevalent climatic and physiographic conditions in that particular year. But it would have been more helpful to use multiple single testing years, for example, using a cross-validation technique. This way the reader may get a better insight into the resulting model structures and the model test results under the different conditions and there by a more robust model evaluation. Similarly, the uncertainty analysis procedure lacks information on how the parameter bounds and threshold for behavioral models are set. A threshold NSE value of 0.6 was used in this study, which I think is very low for many practical applications of a hydrological model. Capability of the prediction bounds in bracketing the observed values is inversely related with the threshold NSE value. This may have yielded to the low modelling uncertainty (high percentage of bracketed observations) of the selected model structures reported in this manuscript.
It would also make easier for readers who are less familiar with GP to follow the manuscript if the GP terminologies can be re-written in hydrological context. For example, what do we mean by initial population here? is it a particular hydrological model component from FUSE/SUPERFLEX ? or is it a set of hydrological model parameters ?
Individual comments:
L6- ‘limited use in scientific fields’ seems too broad area to comment on.
Consider rephrasing it, e.g. ' in rainfall runoff modelling' (accompanied by a relevant reference)
L15- rephrase ‘decreasing meaningfulness of lumped models’.
Lumped models might be preferable under certain conditions, e.g. for very small catchments in data scarce areas where distributed or semi-distributed model settings might be less practical.
L20- ‘without any subjectivity in model selection’ seems less realistic since all model selection algorithms involve certain level of subjectivity, albeit at varying degrees. In your case, for example, setting the model parameter bounds (for the hydrological model) and many of the constants and assumptions related to set up of the machine learning model shown in Table 3 involve certain level of subjectivity.
L35- ‘Therefore, the final goal of any successful hydrological model must be based on a physically meaningful model architecture along with a good predictive performance’
But the measure of success for a hydrological model may vary from one model to another depending on the specific purpose for which they are developed. For example, physically based models might be tailored to enhance our understanding of the underlying physical system. While conceptual models might be expected to have only a partial understanding of the processes with the main purpose being to yield predictions within the required acceptable accuracy for the intended purpose. Further, black-box models, though with little or no understanding of the underlying physical system, still have their own merits when the main goal of the modeler is just to get acceptable outputs from the set of inputs as you’ve mentioned in L212.
L36- ‘Data science models’. Do you mean data-driven-models? Provide reference for this sentence.
Section 2.3 – remove this section or take selected points from this section and concisely discuss in relation to your methodology, results or conclusions (under the Discussions section).
L167-174 – provide references
L212- ‘Certainly, if we are only interested in better forecasting results then, the machine learning models might be the preferred choice over the conceptual or process-based models due to their better predictive capability’.
But can we give this generalization in light of the multiple factors affecting the relative performance of machine learning models, including length of the training dataset and nature of the training algorithm? Provide reference.
L215- ‘actionable models’. Rephrase in hydrological context
L222- ‘Further, data science models…’ Do you mean: However, data…
This seems to contradict with your previous statements in L218: ‘… offer two reasons for the limited success of data driven models’
L286-288- sentence not clear. Re-write, for example, as: Individuals with better performance (based on the objective function values) are assigned higher probability of selection and thereby given the chance to create offspring through genetic operators (crossover, mutation, and elitism).
L306 – mentioned?
L306-310 – long sentence, re-write with shorter sentences
L329- regularized? regular?
L355- ‘GP has been selected as the machine learning technique here due to its ability to optimize both model configuration and model parameters together’.
Was GP used to simultaneously optimize both the hydrological model structure and parameters in this study? If so, how was GP used for parameter optimization of the hydrological model? (The illustrated procedure was focused on model structure). If not, how were the hydrological model parameters optimized before conducting the uncertainty analysis (UA)?
L377- how was the shape parameter value (2.5) of the Gamma-distribution based routing function determined?
L425- Elaborate on how the number of independent runs and other algorithm settings of your framework (Table 3) were determined. For example, why not 10 or 30 independent number of runs instead of 20? Similarly, why a generation number of 50 or population size of 2000 etc.
L471 - The selection of Cross-sample entropy parameters (e.g. r) are quite critical for the evaluation result. How were these values determined in your study?
L487- what are these model parameters? elaborate and provide reference.
L498- ‘…is changed uniformly…’. Do you mean: …are generated … ?
L500- how many and which model parameters were allowed to vary and how were these parameters selected out of the total number of model parameters?
L500-‘ …while keeping the remaining model parameters at their calibrated values.’.
How were these model parameters calibrated before conducting the UA. Was GP applied for that purpose? It would be helpful for the reader if you can clarify this in relation to the comment mentioned under L355.
L501- why was this threshold NSE value of 0.6 chosen?
L502- The term ‘behavioral models’ in this case refers to the parameter sets rather than to their NSE or discharge values.
L512- ‘…to measure the uncertainty estimation capability of the selected optimal model’.
Do you mean: … to measure the level of modelling uncertainty of the selected optimal model structure? In your study, the GLUE methodology was used to estimate the level of uncertainty, while the selected optimal model structure was itself subjected to uncertainty analysis rather than being used as an uncertainty estimation tool.
L513- ‘If the uncertainty estimation capabilities are satisfactory, the model performance of the optimal model is tested for an independent time frame (2013/01/01 to 2014/12/31) which is not used in model selection or identification stages’.
Re-write this sentence as well in accordance to the previous comment. What was your criteria for a satisfactory level of modelling uncertainty (or as in your text- a satisfactory uncertain estimation capability)? It seems that both the model selection and validation periods were actually used for model identification (selection) and not for a hydrological model validation or testing. And a single year of model testing looks quite short period to arrive at a conclusion. Thus, if you have data limitation from additional periods or if some of the hydrological model identification (selection) years cannot be moved to the hydrological model testing (validation), you may consider using alternative model evaluation techniques such as the leave-one-out or other cross-validation techniques. This way you may get more validation (test) results that can help you arrive at a relatively robust conclusion.
L626-629 – ‘Out of the 33 model parameters only 5 parameters can be identified as sensitive parameters. … This demonstrates a lesser dependency on model parameters compared to the total model performance in semi-distributed modelling owing to the large number of model parameters.’.
Among other factors, sensitivity analysis results depend on the minimum and maximum values of a parameter dimension. How, were these values fixed in your study?
L629-‘FUSE_TOPO_M1 results in high value (94%) for the percentage of measured streamflow data within the confidence interval bands and hence shows a significant capability of estimating associated uncertainty.’
Re-write this sentence in accordance to the comment provided in L512.
The percentage of observation bracketed by the uncertainty bounds is highly dependent on the threshold value used during behavioral model identification. The threshold NSE used in this study (0.6) seems very low as compared to the reported calibration and validation results of the optimal model. Given this low threshold NSE, it is expected to get a high percentage of the observations falling within the uncertainty bounds. Thus, try to justify why you adopted this threshold NSE value (under the methodology section).

Referee Report: PDF

Hide

ED: Reconsider after major revisions (further review by editor and referees) (12 Apr 2021) by Fabrizio Fenicia

AR by Herath Mudiyanselage Viraj Vidura Herath on behalf of the Authors (23 May 2021) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (07 Jun 2021) by Fabrizio Fenicia

RR by Anonymous Referee #4 (02 Jul 2021)

RR by Anonymous Referee #3 (10 Jul 2021)

Suggestions for revision or reasons for rejection

This study designed a semi-distributed model named MIKE-SHA, and it shows better performance than a lumped model ML-RR-MI. This paper shows that the experiment is reasonable and the results are credible. There are some minor issues and suggestions listed below.

Line 146. "...without depending on domain knowledge." It could be more accurate to use "...with limited dependent on domain knowledge." Many studies have shown that if you use more domain knowledge or physics-guided models, it would be much better. This means domain knowledge is still needed and may be important in deep learning as well.

From Line 153. You mentioned data-driven models on "ungauged basins" multiple times in this paragraph. I thought you may address this with your models, but I found that you did not mention the predictions in ungauged basins in the rest of your paper. This part could be simplified in your introduction.

Line 249. "There is no catchment size limitation for applying the ML-RR-MI toolkit." It would be more careful to say "There is no catchment size limitation in the design of the ML-RR-MI toolkit." In addition, I am interested in that what is the largest catchment you have tested using ML-RR-MI toolkit?

Line 271. "As MIKA-SHA rely on GP, ..." should be "As MIKA-SHA relies on GP, ..."

Line 411. "To prevent overfitting, the optimal model selection process considers performances of both calibration and validation periods." What detailed method did you use to consider both periods? In machine learning and deep learning, we normally simply used the one with the lowest validation error.

Line 711. "One of the major issues with machine learning models is the overfitting of the model to its training dataset. The consistent performances over the calibration, validation and testing periods of all selected optimal models through MIKA-SHA show no such issues in this case." This is not appropriate to say so since the calibration, validation, and testing periods are independent time, the model efficiency values are not meaningful to compare directly. You could address it by comparing the results of two different models at the same period. Your ML-RR-MI_SUPERFLEX has KGEs of 0.88/ 0.82/ 0.65, while MIKA-SHA_SUPERFLEX has KGEs of 0.83/ 0.82/ 0.83. Comparing these two results, we can see that ML-RR-MI has higher model efficiency at training but a much lower testing efficiency than MIKA-SHA. This is a signal that MIKE-SHA has reduced overfitting issues than the ML-RR-MI models.

Other suggestions in further studies:
For the study of distributed modelling, a larger catchment with more sub-catchments would be much more helpful.

I understand that this is a machine learning modelling paper with the Induction Knowledge Augmented to make it done automatically. However, a lot of technologies in deep learning have been used to make the progress automatically, and we do not need to test them manually as you mentioned in Line 255. "...which makes it almost impossible to test them manually." I am very interested to see the results of comparing interpretable or physical-informed deep learning models in the future if you are interested in it. For example, some physical-informed deep learning models (Rao et al., 2020) predict the parameters first and then applied to physical equations to generate the final results. Deep learning models have natural advantages in adjusting model parameters and selecting models. Hope this would help. Rao, C., Sun, H., & Liu, Y. (2020). Physics-informed deep learning for incompressible laminar flows. Theoretical and Applied Mechanics Letters, 10(3), 207-212.

Hide

ED: Publish subject to minor revisions (review by editor) (13 Jul 2021) by Fabrizio Fenicia

AR by Herath Mudiyanselage Viraj Vidura Herath on behalf of the Authors (15 Jul 2021) Author's response Author's tracked changes Manuscript

ED: Publish as is (16 Jul 2021) by Fabrizio Fenicia

AR by Herath Mudiyanselage Viraj Vidura Herath on behalf of the Authors (17 Jul 2021)

Short summary

Existing hydrological knowledge has been integrated with genetic programming based on a machine learning algorithm (MIKA-SHA) to induce readily interpretable distributed rainfall–runoff models. At present, the model building components of two flexible modelling frameworks (FUSE and SUPERFLEX) represent the elements of hydrological knowledge. The proposed toolkit captures spatial variabilities and automatically induces semi-distributed rainfall–runoff models without any explicit user selections.