Impact of spatial distribution information of rainfall  in runoff simulation using deep learning method

Wang, Yang; Karimi, Hassan A.

doi:https://doi.org/10.5194/hess-26-2387-2022

Articles | Volume 26, issue 9

https://doi.org/10.5194/hess-26-2387-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/hess-26-2387-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 26, issue 9

Research article

|

09 May 2022

Research article |

| 09 May 2022

Impact of spatial distribution information of rainfall in runoff simulation using deep learning method

Yang Wang and Hassan A. Karimi

Download

Final revised paper (published on 09 May 2022)
Preprint (discussion started on 24 Aug 2021)

Interactive discussion

Status: closed

RC1:
'Comment on hess-2021-371', Anonymous Referee #1, 05 Oct 2021
In my eyes this publication sets out to examine an interesting and indeed important research question. Which is: How much information can be leveraged by LSTM based forecast models from spatially distributed inputs.

There are many minor issues in the paper that I would like to discuss with the authors, but — as it stands now — the manuscript does not meet HESS standards. It can simply not be judged properly. This states results from very basic choices of setup and exposition, which I'll summarize in three points in the following:

Setting: To me its not clear why the authors chose two specific basins in a forecasting setting? Regarding the former: Current best practise is to train the models on many basins, since single basin training does not yield particularly good results (see for example: Gauch, Mai & Lin; 2021)., The authors do not explain why they move away from these practise. I suspect it has been done to reduce workload, but this is only an inference, since the manuscript does not explain this choice. Regarding the latter: Why was a forecasting setting chosen over a simulation setting. The inclusion of runoff in the features will unavoidably explain away (Wellman & Henrion, 1993) potential influences of the distributed input, since it already integrates over the past. I can see a potential reason in trying to avoid the decrease in performance of a model when using only two — instead of multiple — basins. However, the choice seems to lie in direct contradiction to the goals lined out by the authors (i.e. understanding the influence of spatially distributed inputs).

Method: The setup is unclear. I was not able to understand how the HRUs have been delineated and how the distributed meteorological inputs have been obtained. If the standard CAMELS data is used — as indicated in the code and data availability paragraph — then a lumped input was somehow disaggregated to match the HRUs. How is this done? A naive way would perhaps be to simply weight the meteorological inputs corresponding to the area of the HRUs. If that is the case, the authors would need to show that their new model is not simply better because it has more parameter than the baselines. And, also, how the results would change if a different weightings are used, since it is not a-priori clear why the specific HRU delineation does improve the result (if it does so). These are all interesting questions. But, as things are currently explained I was not even able to infer how the authors did this and which data really was used. I believe that a clarification here would be of great worth to the readers.

Results: Forecasting (not simulation) models are compared on the basis of two basins with one and half a year of data (arbitrarily chosen from June 6, 2010 to December 23, 2011) each. The resulting outcomes are reported with 5 digits of precision. I suppose this is done to make the models comparable, given that forecast models tend to produce very accurate predictions. However, given the high measurement uncertainties and the potentially large runoff-variability between years this is not possible! There are many other studies (the authors even cite some of them), which report less digits, while evaluating their data-driven approaches on hundreds of basins using multiple years of data. Given the circumstances of the setup, I would suspect that only one or two digits should be reported. And, given the close results I would think that some basic statistical test are necessary, and it also would be good to provide some error bounds related to either how much the results would change with longer/different data and repeated model runs (best both).

There are many other minor points that I'd like to discuss with the authors (for example, I do not see why a calibration period needs to be defined after a training period. Perhaps this is meant to be a validation period?). Before that I would however like to see the manuscript adjusted so that at least these basic points are met and the manuscript can be judged properly.

References:

Gauch, M., Mai, J., & Lin, J. (2021). The proper care and feeding of CAMELS: How limited training data affects streamflow prediction. , , 104926.

Wellman, M. P., & Henrion, M. (1993). Explaining'explaining away'. , (3), 287-292.
Citation: https://doi.org/10.5194/hess-2021-371-RC1
- AC1: 'Reply on RC1', Yang Wang, 09 Oct 2021
  
  First of all, thank you very much for participating in the discussion of the paper!
  1. Reponse for Setting:
  The points you raise here are very correct. Traditional hydrological models are physical-based model, so we need to use different parameters in the model for different areas. That is, we need to use different models for different areas. One potential benefit of using deep learning models for rainfall runoff simulation is that we can use data from multiple watersheds as training data. This makes the trained models suitable for watersheds in different areas. We think this is also a possible direction to look at in the future. Since in this paper, we compare the effect of inputs with and without spatial information on the performance of the deep learning model, the comparison of individual watersheds provides a more direct indication of the difference in the results. This also is to reduce the workload, since a different experiment for each watershed must be set up.
  Most studies applied rainfall and previous discharges with different time steps and combinations as inputs^1,2 .One of the features of the LSTM model is to find the relationship between sequences. Using the previous runoff as input is to discover the intrinsic relationship of the time series data. From the perspective of a data-driven model, we include the previous runoff in the input as additional information to get a better result. The two data-driven models differ only in whether the input contains spatial information. Many factors related to runoff, such as rainfall and temperature, are characterized by uneven spatial distribution. The purpose of this study is also to show the importance of spatial distribution information. As one potential future research direction, instead of runoff we may include topography, evaporation, among other features with spatial distribution information and evaluate the contribution of each to the simulation results.
  2. Reponse for Method:
  CAMELS contains a total of 671 catchments with minimal anthropogenic disturbance in the contiguous United States. For each catchment, CAMELS has the catchment mean forcing (lump) dataset: (i) daily cumulative rainfall, (ii) daily minimum air temperature, (iii) daily maximum air temperature, (iv) mean short-wave radiation, and (v) vapor pressure. In our research the daily cumulative rainfall is treated as the catchment mean rainfall data without spatial distribution information for each catchment. CAMELS dataset also includes the average rainfall for each hydrologic response unit of certain catchment. We use the catchment mean daily cumulative rainfall as the input precipitation with spatial information, and use the combination of average rainfall from each hydrologic response unit in the catchment as the input precipitation with spatial information. For example, as Catchment 1 had a total of 64 hydrologic response units in the dataset, we used a vector of size 64 to represent the rainfall data with spatial distribution information. CAMELS calculates the average rainfall in a catchment by weighting it by the area of each HRU. Our rainfall data with spatial information does not consider weights because we are portraying the distribution of rainfall in the basin by combining rainfall from different HRUs.
  3. Reponse for Results:
  
  Thank you very much for the suggestion. Each experiment was performed with a different look-back window and look forward window, used to test how much the results would change with different data. We think adding some basic statistical test is a good idea.
  1. Van, S. P. et al. Deep learning convolutional neural network in rainfall-runoff modelling. J. Hydroinformatics 22, 541–561 (2020).
  
  2. Xiang, Z., Yan, J. & Demir, I. A Rainfall-Runoff Model With LSTM-Based Sequence-to-Sequence Learning. Water Resour. Res. 56, (2020).
  
  Citation: https://doi.org/10.5194/hess-2021-371-AC1
- AC2: 'Reply on RC1', Yang Wang, 11 Oct 2021
  
  Please use this authors’ response to this reviewer’s comment as the posted text on Friday (Oct 8) was unedited.
  Authors’ response for Setting
  The points you raise here are correct. Traditional hydrological models are physical-based requiring different parameters thus different models for different areas. One advantage of using deep learning models for rainfall runoff simulation is that training data can be drawn from multiple watersheds . This makes the trained models suitable for watersheds in different areas. We think this is also a possible direction to look at in the future. Since in this paper, we compare the effect of inputs with and without spatial information on the performance of the deep learning model, the comparison of individual watersheds provides a more direct indication of the difference in the results.
  Most studies apply rainfall and previous discharges with different time steps and combinations as inputs^1,2.One of the features of the LSTM model is to find the relationship between sequences. Using the previous runoff as input is to discover the intrinsic relationship of the time series data. From the perspective of a data-driven model, we include the previous runoff in the input as additional information to get a better result. The two data-driven models differ only in whether the input contains spatial information or not. Many factors related to runoff, such as rainfall and temperature, are characterized by uneven spatial distribution. The purpose of this study is also to show the importance of spatial distribution information. As one potential future research direction, instead of runoff we may include topography, evaporation, among other features with spatial distribution information and evaluate the contribution of each to the simulation results.
  Authors’ response for Method:
  CAMELS contains a total of 671 catchments with minimal anthropogenic disturbance in the contiguous United States. For each catchment, CAMELS dataset include: (i) daily cumulative rainfall, (ii) daily minimum air temperature, (iii) daily maximum air temperature, (iv) mean short-wave radiation, and (v) vapor pressure. In our work the daily cumulative rainfall is treated as the catchment mean rainfall data without spatial distribution information for each catchment. CAMELS dataset also includes the average rainfall for each hydrologic response unit of certain catchment. We use the catchment mean daily cumulative rainfall as the input precipitation with spatial information, and use the combination of average rainfall from each hydrologic response unit in the catchment as the input precipitation with spatial information. For example, as Catchment 1 had a total of 64 hydrologic response units in the dataset, we used a vector of size 64 to represent the rainfall data with spatial distribution information. CAMELS calculates the average rainfall in a catchment by weighting it by the area of each HRU. Our rainfall data with spatial information does not consider weights because we are portraying the distribution of rainfall in the basin by combining rainfall from different HRUs.
  Authors’ response for Result:
  Thank you very much for the suggestion. Each experiment was performed with a different look-back window and look-forward window to test how much the results would change with different data. We think adding some basic statistical test is a good idea.
  1. Van, S. P. et al. Deep learning convolutional neural network in rainfall-runoff modelling. J. Hydroinformatics 22, 541–561 (2020).
  
  2. Xiang, Z., Yan, J. & Demir, I. A Rainfall-Runoff Model With LSTM-Based Sequence-to-Sequence Learning. Water Resour. Res. 56, (2020).
  
  Citation: https://doi.org/10.5194/hess-2021-371-AC2
CC1:
'Comment on hess-2021-371', Qingtai Qiu, 24 Oct 2021

First of all, I think the authors' study is significant at this point in time. With the development of deep learning techniques, more and more studies are combining deep learning models with hydrology-related issues. However, the combination of many studies does not apply the special conditions and theories in hydrology. As the authors investigate in their paper, by driving deep learning models with hydrological data, we need to consider the impact of spatial distribution information.
1. You mentioned gridded precipitation data in your summary. It seems that gridded precipitation data can better describe the spatial distribution of rainfall than the spatial information provided by using rainfall from HUIs. Why did you not use the grid precipitation data in your study?
2. According to your results, short-term spatial distributed rainfall data with long-term runoff data can give us good simulation results. Would it be more appropriate to use the model for single rainfall event simulations instead of long series?
Thank you.

precipitation

英 [prɪˌsɪpɪˈteɪʃn]

美 [prɪˌsɪpɪˈteɪʃn]

n.降水; 沉淀; 降水量(包括雨、雪、冰等); 淀析

definition

nounoverly eager speed (and possible carelessness)

Citation: https://doi.org/10.5194/hess-2021-371-CC1
- AC3: 'Reply on CC1', Yang Wang, 27 Oct 2021
  
  Authors’ response:
  Thank you for the comments.
  Response to comment #1. We did not use the raster type precipitation data for the following two reasons:
  (a) Date quality . Currently, the commonly used raster type precipitation data are mainly derived from climate models. These data have errors compared with the actual rainfall measurements and often need to be adjusted before they can be used, which was not the focus of the study. Using measured rainfall data to describe spatial distribution information can reduce the impact of rainfall errors on the final results.
  (b) Raster data precipitation resolution. Considering the relatively small size of our chosen study area, if we use raster data, the number of grids covering the study area should be smaller than the number of HUI.
  As we mention in the conclusion section, with high-resolution, high-quality raster type precipitation data, we can use, for example, CNN to process the precipitation to obtain spatial distribution information, which may improve our model. This is one work we want to perform in future.
  Response to comment #2. This study was conducted on a long series of rainfall-runoff relationships. If a single rainfall event is simulated, the time interval of the rainfall data may need to be set to hours. In this case we need to consider the rainfall for the whole rainfall period. The look-back window would be different for different rainfall events. We think this is an idea worth trying, which requires a different setup in our model. However, our conclusions are still valid for the simulation of a single rainfall event. That is, we can improve the simulation results by adding the spatial distribution information of rainfall when performing single rainfall event simulation.
  
  Citation: https://doi.org/10.5194/hess-2021-371-AC3
RC2:
'Comment on hess-2021-371', Anonymous Referee #2, 12 Nov 2021
The authors have presented the novelty of using information of spatial distribution of rainfall in rainfall runoff modelling of two basins using deep learning. The study is interesting and is very relevant for HESS. The manuscript mainly suffers from lack of clarity on different aspects. Some of them are listed here but I am afraid that based on the responses more suggestions may follow.

Tables 2, 3, 4 and 5 will be much easier to read with numbers correct up to 2 decimal places. You do not need to present RMSE/ NSE numbers correct up to 6 decimal places. Is there any reason behind it?

RMSE in the text should be described with unit (mm/d?). Please also provide the average value so that the reader can interpret the quality of the model from the RMSE values.

Data splitting: In my opinion the data splitting is very unfair. The authors have used 40 years’ data for training whereas 1 year for testing. Considering the climatic variability the models need to be tested over a longer period of time. Very often a 65-25-10 split for training-testing-cross validation is used. Any wider deviation needs to be explained. The naming of the datasets (training as well as calibration) is also confusing.

Data splitting: Some data plots/ description and statistics (mean and standard deviations) of the 3 datasets will be good. The authors need to show that data from the 3 partitions are comparable.

Data splitting: If the results are provided based on the testing data then how sure are the authors that the conclusions for the 4 experiments will be similar for other years (dry/ wet/average/..) as well?

Look back windows: There is not much discussion on the selection of the look back windows. Presumably, the selection of the window will depend based on the catchment properties and as a result may vary from catchment to catchments. How were they selected?

Look back windows: The look back windows up to several days will be very important (e.g. from catchment wetness point of view). However, the look- back windows of 180 and 365 days are a bit confusing. What information do they carry?

Process description: The manuscript has almost no description on the catchment processes. What are the sizes of the two catchments? What kind of hydrological processes are there? Do you expect snow melt? Do you observe very strong seasonal variation? Flood/ droughts?

Figures 4,5,6,7,8,9,10,11 and 12: These figures have low resolution. Font sizes of axes labels are too small. What can we learn from these figures? It is impossible to distinguish between the lines. The authors may consider zooming on selected periods of high flows and low flows to highlight the differences.

Equations 1 to 6: Please check if you have explained all the terms used in the equations?

Line 174, 176: Perhaps authors want to say ‘rain gauge’ instead of ‘stream gauge’.

Line: 179: activation instead of activate
Citation: https://doi.org/10.5194/hess-2021-371-RC2
- AC4: 'Reply on RC2', Yang Wang, 19 Nov 2021
  
  Authors’ response:
  
  1. There is no special reason to use 6 decimal places. Considering the amount of data in each table, we will reduce the number of places in order to make it easier to read.
  
  2. Yes, RMSE is described with unit mm/d. We show the units in each of the tables and will add average value to help the reader get a more intuitive feeling for each type of model.
  
  3-5. The goal of this study is not to find the best-trained model for rain-runoff simulation, but rather to investigate the impact of spatial distribution information in the simulation process. Sufficient training data is the key to building deep learning models, which is why we use most of the data for training. However, we strongly agree that the factors mentioned in the comment are something we have to be aware of when applying deep learning models to hydrological simulations. For example, traditional hydrological models have different simulation effects in different regions. A hydrological model constructed for a humid region may have a worse result for a semi-arid region. (Kratzert et al., 2019).’s results show that if we use data from different regions to train the LSTM simultaneously, the models can achieve good results in different types of regions. Just as it is mentioned in the comment, the question of how to integrate the understanding of hydrology into deep learning models, e.g., whether models need to be trained separately based on different types of rainfall, needs continued research. We think our conclusions are still valid for future research. That is, we should pay attention to the spatial distribution information when performing rainfall-runoff simulation.
  
  6-7. We strongly agree that the look-back windows up to several days are more meaningful when considering the rainfall-runoff mechanism, as they directly influence the follow up runoff generation. 180 and 365 days as look-back windows are often used in other studies that apply deep learning models to the field of hydrology. Considering the advantages of RNN models, as data-driven models, which discover the changing patterns of time series data. We can assume that 80 and 365 days as look-back windows help the model learn the correlation between long series of rainfall, runoff, and other factors. On the one hand, shorter windows conform to the rainfall-runoff mechanism, on the other, data-driven models can handle longer windows, which can provide more information. How to choose look-back windows is a question that needs to be further investigated. This is the reason why we compare different windows in the paper.
  
  8. We will add a table describing basic information about the two watersheds, such as area, average rainfall, average flow, etc.
  
  9. Thank you for the suggestion as zooming in on selected periods of high flows and low flows in the graph would make the figure easier to read.
  
  10. Thank you for the reminder. All the variables and functions are explained in the corresponding places.
  
  11. It should be “rain gauge” or “rain station”. Thank you for the suggestion.
  
  12. It should be “activation function”. Thank you for the suggestion.
  
  Citation: https://doi.org/10.5194/hess-2021-371-AC4
EC1: 'Comment on hess-2021-371', Dimitri Solomatine, 30 Nov 2021

I think the referees have provided very thoughtful and constructive comments. In their replies the authors have shown how to address these comments in the subsequent revision of the paper. This discussion was useful. I can recommend the paper to proceed to the next stage, and wish the authors success.

Citation: https://doi.org/10.5194/hess-2021-371-EC1

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (30 Nov 2021) by Dimitri Solomatine

AR by Yang Wang on behalf of the Authors (03 Jan 2022) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (06 Jan 2022) by Dimitri Solomatine

RR by Anonymous Referee #1 (17 Jan 2022)

Suggestions for revision or reasons for rejection

# General Comments

First of all, I would like to commend the authors for the hard work and – as a result – great improvements they could achieve on the manuscript. When reading the provided answers to my first review I was not sure if the authors did understand my critique. However, their revisions prove me wrong. What I am still missing is a discussion about why a “forecasting” setting was chosen over a “simulation” setting for the examination — albeit the former somehow undermines the importance of the rainfall. I believe this would be a perfect addition to the final discussion provided in the conclusions, where the authors (aldready) examine some of the limits of their work.
The other larger thing, which is probably clear in general, and most likely just a result of my oversight or bad memory is the following question: Why are the reported results in this version so much worse than in the first manuscript version?
Given those two points, the rest of my comments are mainly directed to imprecise statements and unclear explanations, and I am sure the authors will be able to resolve them with ease.

All or almost all in-line quotations brackets are set wrong.

L. 57ff: The authors write: “Among these, LSTM has garnered more attention of researchers due to its suitability for processing and predicting events with very long intervals and delays in time series.” –I think that statement does at least need a reference to back it up. While the LSTM has indeed the capacity to handle very long time-series I am not sure if that is the main reason why it has enjoyed more attention than feedforward approaches. In hydrology at least, this does not seem to be the case. Naively I would assume that its main selling point was its ability to provide high-quality simulations (not forecasts). However, my own assertion would demand further inquiry too.

L. 60ff: Following the comment on L. 57ff: I fail to see why these are examples for the interest of researchers in long intervals and delays in time series. Maybe further explanations are required.

L. 74: In the sense of responsible communication, I think it is necessary to tell the reader that the separate setting should NOT be seen as an equivalent approach. As it is stated now readers could get the wrong impression that both approaches are just a matter of taste and setting, while in reality the second approach is the current state-of-the-art.

L. 76: The statement that the regional LSTM received “a lot of attention because it can increase the amount of training data” could be easily misinterpreted. Maybe an active reformulation of the following kind would be useful here:

“The regional setting is of particular interest because it allows the model to encapsulate different hydrological processes by learning from more data and situations.”

L. 79ff: This is wrong. ‘Many to one’ does not designate a next time-step prediction, similarly ‘many to many’ does not imply the prediction of future multiple steps using the past. Also, given the terminological issues with prediction and forecasting this sentence should probably use the term “forecasting”.

L. 84ff: This is a great goal statement. Only thing perhaps missing is that you also just focus on the “local LSTM setting” (or maybe “separate LSTM” as you coined it).

L. 103ff (also in reference to lines 361ff) It does not become clear why it was chosen to only distribute the rainfall in a spatial manner, and not the remainder of the meteorological variables, like, say, the temperature. I guess the argument was to separate out the spatial influence of the rainfall distribution, but then I was wondering if the same would happen if other variables would be distributed. I don’t want to argue that the authors have to examine that question too, but the readers deserve at least an explanation: Models that use some meteorological variables in a lumped way and others in a spatially distributed way are known to hydrologists, the choice here seems special however.

L. 125ff: I think readers will not understand the description of the main RNN problem. I certainly did not.

L. 159ff: For the sake of readability I would recommend to explain all the variables when introduced, before, or as soon as possible after their appearance in an equation. For example, readers not familiar with these particular sets of equations might not follow through the exposition until they get an explanations of what the weight vectors are

L. 159: This is wrong, W denotes weight matrices, not vectors!

L. 188: Why are two regional models used and not one? To be clear, I am not arguing that the choice of two models is wrong here, just that the choice needs some explanation, since it does undermine the purported advantage of using more data.

L. 197: Was a hyper-parameter search conducted? If not (or just informally), please mention that none (or just an informal exploration) has been undertaken – which is fine, it just should be mentioned explicitly.

L.272: The description of the length-standardization of the inputs/rainfall should not be part of the results/discussion section. It should be explained in the method/data section. Furthermore, the provided explanation as such needs to be much clearer. Right now it takes very long to understand what the “length of 20” is referring to. It is also not clear why the setting restricts them to 20, and not, say, to 25 or not use a fusing method at all. I also was wondering if the 0 padding is sensible here, given that with the implied normalization of the input-data it should correspond to the average rainfall.

# Minor Comments

L. 33: Space missing after change.
L. 42: SHE abbreviations is not introduced (VIC is)
L. 57: LSTM abbreviation is not introduced (ANN is)
L. 67: RNN abbreviation is not introduced.
L. 67: The should not be capitalized and a space is missing.
L. 92: Do not use all caps and introduce the CAMELS abbreviations properly
L. 96: I don’t think it is ok to nonchalantly judge whether or not daymet is the better or worse product (as a matter of fact there exist studies which show that none of the products dominates all of the others). Maybe just “..., which is higher than the other two”.
L. 119: Introduction of the abbreviation RMSE is inconsistent (the others used capitalization)
L. 122: The first statement probably needs a reference.
L. 130: The date 2012 is oddly specific and would need a reference.
L. 136: It is not clear what a “flow” is here.
L.142: The notation “f(t)” is not consistent. Also it does not map onto equation 1.
L. 165: I would recommend changing “generally used” to “often used” as it is less judgmental. But this might be a matter of taste and can be ignored.
L.167f: The symbol f was already used as the forget-gate. Also, I’d like to note that the superscript time-index notation is ad odds with the previously introduced subscript notation.
L. 231, Table 3: Please describe what the bold numbers are in table 3. Even if it becomes clear from inspection, it should be part of the table-caption in the first place.
L. 359: LSTM -> LSTMs
L.359ff: Sentence seems to be missing words
L. 376f I think what is written here is true, but the claim that change is “more significantly” usually demands for statistical tests, and/or a description of what is understood by significant here.
L. 378 I would suggest replacing “show” with “suggest” since performance on two catchments allone can hardly be considered as concluding evidence.

Hide

RR by Anonymous Referee #2 (24 Feb 2022)

ED: Publish subject to minor revisions (review by editor) (18 Mar 2022) by Dimitri Solomatine

AR by Yang Wang on behalf of the Authors (24 Mar 2022) Author's response Author's tracked changes Manuscript

ED: Publish as is (08 Apr 2022) by Dimitri Solomatine

AR by Yang Wang on behalf of the Authors (09 Apr 2022) Manuscript

Short summary

We found that rainfall data with spatial information can improve the model's performance, especially when simulating the future multi-day discharges. We did not observe that regional LSTM as a regional model achieved better results than LSTM as individual model. This conclusion applies to both one-day and multi-day simulations. However, we found that using spatially distributed rainfall data can reduce the difference between individual LSTM and regional LSTM.