Flood forecasting with machine learning models in an operational framework
- 1Google Research, Yigal Alon 96, Tel-Aviv 6789141, Israel
- 2Hebrew University of Jerusalem, Safra Campus, Jerusalem 91904, Israel
- 1Google Research, Yigal Alon 96, Tel-Aviv 6789141, Israel
- 2Hebrew University of Jerusalem, Safra Campus, Jerusalem 91904, Israel
Abstract. Google’s operational flood forecasting system was developed to provide accurate real-time flood warnings to agencies and the public, with a focus on riverine floods in large, gauged rivers. It became operational in 2018 and has since expanded geographically. This forecasting system consists of four subsystems: data validation, stage forecasting, inundation modeling, and alert distribution. Machine learning is used for two of the subsystems. Stage forecasting is modeled with the Long Short-Term Memory (LSTM) networks and the Linear models. Flood inundation is computed with the Thresholding and the Manifold models, where the former computes inundation extent and the latter computes both inundation extent and depth. The Manifold model, presented here for the first time, provides a machine-learning alternative to hydraulic modeling of flood inundation. When evaluated on historical data, all models achieve sufficiently high-performance metrics for operational use. The LSTM showed higher skills than the Linear model, while the Thresholding and Manifold models achieved similar performance metrics for modeling inundation extent. During the 2021 monsoon season, the flood warning system was operational in India and Bangladesh, covering flood-prone regions around rivers with a total area of 287,000 km2, home to more than 350M people. More than 100M flood alerts were sent to affected populations, to relevant authorities, and to emergency organizations. Current and future work on the system includes extending coverage to additional flood-prone locations, as well as improving modeling capabilities and accuracy.
- Preprint
(1799 KB) -
Supplement
(150 KB) - BibTeX
- EndNote
Sella Nevo et al.
Status: final response (author comments only)
-
RC1: 'Comment on hess-2021-554', Anonymous Referee #1, 21 Dec 2021
This paper presents a complete workflow for an operational flood forecasting and mapping case study.
The paper details all the critical steps in an operational system for data scarce regions, including remote sensing data integration, forecasting, inundation mapping, and communicating forecasts to the public.
The model performance comparison against linear regression and simple hydraulic models is not a very challenging task compared to evaluating performance against better performing ML and physical models cited in the paper.
The spatial and temporal resolution of the input data for rainfall, elevation, and other datasets is rather low to achieve comparable forecasts, but still useful in data scarce regions with limited resources.
Some of the details, like how AIO pixels are defined in the flooding regions, are not clear in the paper.
- AC1: 'Reply on RC1', Efrat Morin, 09 Feb 2022
-
RC2: 'Comment on hess-2021-554', Anonymous Referee #2, 13 Jan 2022
This paper presents ML models that respectively (i) directly predicts the river stage (rather than predicting discharge and then translating to stage); (ii) predict wet/dry of pixels depending on gauge stage; (iii) and estimate flood inundation depth. Among these, (i) was trained based on historical stream gauge data and near-real-time upstream gauges; (ii) was trained using historical satellite data and coincident stream gauge height data. (iii) was not really a model, per say, but an interpolation procedure.
I think the paper demonstrated strong performance from a completely data-driven model. It highlights the idea of directly simulating stream gauge height, which breaks many barriers. If they didn’t do this, they need to simulation discharge and then resolve the highly-variable (in space) relationship between discharge and stage height. Most of time we cannot resolve it. In the authors’ case, there is no discharge data to begin with. So directly tackling gaging height is a good and necessary idea (but it also leads to some issues I will discuss below). The paper also demonstrates a very efficient forecasting scheme based on upstream gauge data. The whole paper demonstrated how to stack different models together. The authors also showed a unique flood inundation component that is accurate. The work is very useful for hundreds of millions of people and it takes lots of courage to take on such a responsibility.
While there are many reasons why I like this paper and I encourage the publication of this paper, I also noticed a few major issues. These issues are raised here in the hope to make the manuscript more balanced and comprehensive.
(a) There should be some discussion of the potential scientific limitations (even if caused by practical data availability) of the approach and the conditions under which this approach is applicable. As far as I can see, all the models were posed in a highly case-specific way. The gauge height LSTM model has weights that are shared across multiple gauges but it also needs gauge-specific weights that are tuned to local data with a particular configuration. (how much worse will it get if you don’t use those gage-specific weights?) The inundation extent model is tied to the gauge and the particular river bathymetry downstream from that gauge. In other words, it seems these models can only be applied where gauge data is available for training. The trained relationship is not portable anywhere else (if so, it poses a requirement on the available data records). Don’t get me wrong. I think the model is highly useful operationally. In India there are many places where the model is applicable. It just might make sense, if these limitations are true, authors can discuss where and when this model formulation is valid so it is easier for the readers to understand if these algorithms are sound for their purpose. Maybe they can come up with a more uniform model and show its accuracy.This point also contradicts the authors’ claim that the model is highly scalable. You cannot take the model to a new terrain and directly apply it. In addition, the learned relationships may not always stand --- what if you have heavy rainfall in the region between your upstream gauge and the gauge of interest? It seems your model cannot consider such forcings (this may not matter that much for large-scale Indian monsoons, but it could be important elsewhere). This means, while the model is fast to run, it is not scalable in the sense of expanding to new areas ---- you must spend the time and effort to collect the data and train the model in every new area of interest, and that is assuming you are lucky enough to have the data. Hence, it is uncertain how the authors intend to use the model on large areas.
It also exerts some constrain on the eligibility of sites. Because you have to train a site-specific model, you can only use sites with long-enough records to train the model. The model cannot be large, and information from other sites do not help with a particular gauge of interest.
If my understanding is incorrect, I stand corrected and the authors can show a test case where the model is applied to an “ungauged” location.
(b) The training dataset for the models were not clearly described. For the inundation extent model, there should be descriptions of how many events were included as training and test images.
(c) It is not clear if the model accuracy drops as we go further downstream from the gauge. Some exploration here will be useful.
(d) regarding authors’ criticism on the hydraulic model --- are we sure you feed it the best parameters and inputs? There is no description about calibration. Back to point (a), in a region without past observations, the hydraulic model may still function but the ML inundation model may not --- which means these models have their own use cases. If I’m wrong please correct.
(e) there seemed to be no description of network configurations such as hyperparameters, hidden size, minibatch (maybe there is not a minibatch), training epochs, etc.
(g) does it make sense to average precipitation for a drainage area > 100,000 km2?
(h) We have no intuitive understanding of what F metrics mean. Do you mind showing some observed vs simulated maps for different values of the F metric?
(i) the flooding depth model was never tested and we do not know its accuracy. Can you talk about its value in the real world? Also, low-resolution could also give you discontinuity.
(j) can this study be reproduced at all? It seems not much of the study can be reproduced or even compared to in terms of data. All the code and data are either proprietary or unavailable. We were just told they could do this and do that and there is no possible path to trying most of the steps here.
Some minor points:
Line 158. What does “State handoff” mean?
Line 190. Should be “Quasi steady state” to be more exact
Line 196. “Discarded” – see my point above, can you use a more gentle word?
Line 198-199. “when the target gauge exceeds a (pixel-specific) threshold water stage. ” A bit confused. A gauge is just at one location, then why do you have a pixel-specific threshold linked to a gage? If it is pixel-specific, then you end up getting a map of different thresholds? Should it be image-specific thresholding?
Line 219. Maybe I’m missing sth, although the thresholding model does not need DEM, it is tied to a particular gauge and the particular terrain/floodplain characteristics. It needs to be trained for each domain of interest using historical inundation extent and gauge height data, so it is not clear to me you can deploy to a new region without effort.
Line 375 what happened to the flood and the effectiveness of the alert? You get us concerned but didn’t say any outcome.
- AC2: 'Reply on RC2', Efrat Morin, 09 Feb 2022
-
RC3: 'Comment on hess-2021-554', Anonymous Referee #3, 14 Jan 2022
Dear authors,
I found your manuscript very well organized and for sure impressive for the amount of information managed.
To be honest, the scientific advances you present are not enough described to make this paper useful to the community. In my opinion, this manuscript would be a perfect contribution to be published almost as is for NHESS, but fails to some part meeting the scientific innovation standard I expect in HESS.
See attached supplement for further comments.
- AC3: 'Reply on RC3', Efrat Morin, 09 Feb 2022
Sella Nevo et al.
Sella Nevo et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,067 | 514 | 23 | 1,604 | 53 | 17 | 14 |
- HTML: 1,067
- PDF: 514
- XML: 23
- Total: 1,604
- Supplement: 53
- BibTeX: 17
- EndNote: 14
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1