Probabilistic Hierarchical Interpolation and Interpretable Configuration for Flood Prediction

Saberian, Mostafa; Samadi, Vidya; Popescu, Ioana

doi:https://doi.org/10.5194/hess-2024-261

Preprints

https://doi.org/10.5194/hess-2024-261

Preprints

07 Oct 2024

| 07 Oct 2024

Status: a revised version of this preprint is currently under review for the journal HESS.

Probabilistic Hierarchical Interpolation and Interpretable Configuration for Flood Prediction

Mostafa Saberian, Vidya Samadi, and Ioana Popescu

Abstract. The last few years have witnessed the rise of Neural Networks (NNs) applications for hydrological time series modeling. By virtue of their capabilities, NN models can achieve unprecedented levels of performance when learn how to solve increasingly complex rainfall-runoff processes via data, making them pivotal for the development of computational hydrologic tasks such as flood predictions. The NN models should, in order to be considered practical, provide a probabilistic understanding of the model mechanisms and predictions and hints on what could perturb the model. In this paper, we developed two probabilistic NN models, i.e., Neural Hierarchical Interpolation for Time Series Forecasting (N-HiTS) and Network-Based Expansion Analysis for Interpretable Time Series Forecasting (N-BEATS), and benchmarked them with long short-term memory (LSTM) for flood prediction across two headwater streams in Georgia and North Carolina, USA. To generate a probabilistic prediction, a Multi-Quantile Loss was used to assess the 95th percentile prediction uncertainty (95PPU) of multiple flooding events. We conducted extensive flood prediction experiments demonstrating the advantages of hierarchical interpolation and interpretable architecture, where both N-HiTS and N-BEATS provided an average accuracy improvement of almost 5 % (NSE) over the LSTM benchmarking model. On a variety of flooding events with different timing and magnitudes, both N-HiTS and N-BEATS demonstrated significant performance improvements over the LSTM benchmark and showcased their probabilistic predictions by specifying a likelihood parameter.

Received: 21 Aug 2024 – Discussion started: 07 Oct 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Mostafa Saberian, Vidya Samadi, and Ioana Popescu

Status: final response (author comments only)

CC1:
'Comment on hess-2024-261', Nima Zafarmomen, 18 Oct 2024

The paper introduces a novel application of deep learning architectures, specifically the N-HiTS and N-BEATS models, for flood prediction. This is a pioneering approach in the hydrological domain, demonstrating how advanced neural networks can be adapted to model complex environmental systems. The use of these architectures represents a significant advancement in flood prediction, highlighting their ability to capture intricate rainfall-runoff processes and providing more accurate forecasts compared to traditional models.
One of the key strengths of the paper is its focus on probabilistic predictions through the use of the Multi-Quantile Loss (MQL) function. By incorporating uncertainty quantification, the paper enhances the reliability and interpretability of its flood predictions, which is crucial for decision-makers managing flood risks.
The research is also commendable for its comprehensive benchmarking against long short-term memory (LSTM) models, a standard in time series forecasting. The study clearly demonstrates that the N-HiTS and N-BEATS models outperform LSTM, particularly for short-term flood predictions, with a notable 5% improvement in accuracy (NSE metric).
I am highly interested in the models introduced in this paper and intend to use N-HiTS and N-BEATS in my future research endeavors. I strongly recommend publishing this paper as it offers a well-structured methodology, comprehensive benchmarking against established models like LSTM, and rigorous sensitivity and uncertainty analyses.

Citation: https://doi.org/10.5194/hess-2024-261-CC1
- AC3: 'Reply on CC1', Vidya Samadi, 11 Feb 2025
  
  Thank you very much for your review and kind words. We appreciate your support. The codes and data will be freely available in GitHub after publication. Thanks again for your comments.
  
  Citation: https://doi.org/10.5194/hess-2024-261-AC3
RC1:
'Comment on hess-2024-261', Anonymous Referee #1, 04 Nov 2024
Saberian et al. applied two new neural networks to flood prediction at two headwater watersheds. The new approaches have the advantages of uncertainty assessment of the prediction. They also compared the results with LSTM which shows improvement of the prediction performance. This study is novel and important. The manuscript is generally well written, and the structure is well organized. I have several questions as follows:
Are precipitation, temperature, and humidity enough as input variables for your neural networks?

The forcing station is a single point in the watershed while the runoff generation should be attributed to the water convergence involving a large area of the watershed, do you think a single station can represent these complex processes at large areas?

You mentioned, your models predicted one hour ahead? Is this meaningful for flood prediction? In other words, is this enough time to escape once people know the flood will arrive one hour later.

Did you train each NN model for each watershed? Trained based on one watershed and then transferred to the other one? Or trained both watersheds together?
Citation: https://doi.org/10.5194/hess-2024-261-RC1
- CC2: 'Reply on RC1', Mostafa Saberian, 05 Nov 2024
  
  Thank you for your valuable comments and constructive feedback. I have attached a file with detailed responses to the questions.
  
  Citation: https://doi.org/10.5194/hess-2024-261-CC2
- AC1: 'Reply on RC1', Vidya Samadi, 06 Nov 2024
  
  The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2024-261/hess-2024-261-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/hess-2024-261-AC1
RC2:
'Comment on hess-2024-261', Anonymous Referee #2, 06 Nov 2024
In their study, Saberian et al. present two innovative neural network models, N-HiTS and N-BEATS, aimed at advancing flood prediction capabilities across two headwater watersheds in the southeastern United States. The authors emphasize the interpretability of these models, alongside their ability to quantify prediction uncertainty—a valuable aspect in flood forecasting. By benchmarking against the LSTM model, the study demonstrates notable performance gains in short-term flood prediction accuracy. There are areas where additional clarity and methodological detail would strengthen the findings. I offer the following questions and suggestions for improvement:

1. Methodology and Models

Interpretability and Model Complexity: The paper claims that N-HiTS and N-BEATS models offer interpretability. However, further elaboration on how these models achieve interpretability would strengthen the paper. Including visual examples or providing a more explicit breakdown of how interpretability manifests in model outputs could clarify this for readers who may be less familiar with these architectures.

Hyperparameter Selection: The selection process for critical hyperparameters like the lookback window size is not fully justified. Lookback windows are crucial in sequence-based forecasting, and this choice should either be explored as a hyperparameter or explained in greater detail, particularly given the model's dependency on residuals for subsequent window predictions. Additionally, since a 24-hour lookback window is used, further elaboration on how this length captures relevant hydrological features, like seasonality or trends, would enhance clarity.

Metrics Selection: While NSE, RMSE, and MAE are utilized, the omission of the Kling-Gupta Efficiency (KGE) index is notable. KGE is especially relevant for flood forecasting as it provides insights into peak flow timing, magnitude, and correlation. Including KGE would add robustness to the evaluation by capturing aspects critical to hydrological modeling.

2. Model Evaluation

Interpretability in Model Outputs: Although the paper claims interpretability for both N-HiTS and N-BEATS, the explanation is somewhat abstract. Providing visual aids or case studies that illustrate interpretability in flood prediction contexts would be beneficial. Specifically, the paper mentions that projections onto harmonic and trend bases improve prediction accuracy, but further clarification on the physical interpretability of these projections would help. Given the use of a 24-hour window, it would be helpful to explain whether trends, network depth, or some other feature captures seasonality and why this choice is appropriate for flood prediction.

Uncertainty Analysis: The application of Maximum Likelihood Estimation (MLE) for uncertainty quantification is intriguing. However, more details on how MLE is applied in this context would improve reproducibility. A clearer formulation of MLE within the training process or its integration with multi-quantile loss could better inform readers about the strengths and limitations of this approach. Additionally, bootstrapping methods could help quantify uncertainty and assess whether observed performance differences between models are statistically significant, providing a more robust comparison.

3. Data and Experimentation

Separate Model Training for Each Catchment: Each model was trained separately for each catchment, rather than training a single model on both catchments. This approach limits the assessment of the models' generalizability across different hydrological conditions. Training a unified model on data from both catchments would provide insights into the model’s adaptability and robustness across diverse environments, which is crucial for broader flood prediction applications. I recommend including an analysis of a single model trained across both catchments to evaluate cross-catchment performance.

Data Splits for Training, Validation, and Testing: It appears the observational data up to October 1, 2022, was used for training, and data from October 1, 2022, to March 28, 2023, was used for validation. However, the absence of an unseen test set to demonstrate generalization capabilities raises concerns. Dividing the dataset into three splits (training, validation, and testing) would allow for hyperparameter optimization on the validation set and final results on an unseen test set, demonstrating the model’s generalization. Including metrics like loss curves for the training and validation sets or evaluation metrics on a test set would help assess model performance and detect overfitting thereby enhancing reliability.

4. Suggestions for Improvement
Model Reproducibility: Simplifying the explanation of the Multi-Quantile Loss (MQL) function could make the methodology more accessible. Additionally, code availability or pseudocode in an appendix would enhance reproducibility and facilitate further exploration by other researchers.

Additional Comments
Input Sensitivity Inconsistency (Line 568-569): The statement here suggests that the models are indeed sensitive to input conditions, especially during extreme events. However, in the following section, the paper concludes that the models are not sensitive to input data, which presents an inconsistency. This contradiction should be addressed.
Citation: https://doi.org/10.5194/hess-2024-261-RC2
- AC2: 'Reply on RC2', Vidya Samadi, 21 Nov 2024
  
  The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2024-261/hess-2024-261-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/hess-2024-261-AC2

Mostafa Saberian, Vidya Samadi, and Ioana Popescu

Viewed

Total article views: 1,335 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
433	192	710	1,335	31	38

HTML: 433
PDF: 192
XML: 710
Total: 1,335
BibTeX: 31
EndNote: 38

Views and downloads (calculated since 07 Oct 2024)

Month	HTML	PDF	XML	Total
Oct 2024	108	22	5	135
Nov 2024	105	53	48	206
Dec 2024	32	21	49	102
Jan 2025	24	9	47	80
Feb 2025	38	6	21	65
Mar 2025	30	17	100	147
Apr 2025	19	16	175	210
May 2025	22	7	187	216
Jun 2025	35	24	75	134
Jul 2025	19	12	1	32
Aug 2025	1	5	2	8

Cumulative views and downloads (calculated since 07 Oct 2024)

Month	HTML	PDF	XML	Total
Oct 2024	108	22	5	135
Nov 2024	105	53	48	206
Dec 2024	32	21	49	102
Jan 2025	24	9	47	80
Feb 2025	38	6	21	65
Mar 2025	30	17	100	147
Apr 2025	19	16	175	210
May 2025	22	7	187	216
Jun 2025	35	24	75	134
Jul 2025	19	12	1	32
Aug 2025	1	5	2	8

Viewed (geographical distribution)

Total article views: 1,297 (including HTML, PDF, and XML) Thereof 1,297 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 07 Aug 2025

Short summary

Recent progress in neural network accelerated improvements in the performance of catchment modeling systems. Yet flood modeling remains a very difficult task. Focusing on two headwater streams, this paper developed N-HiTS and N-BEATS models and benchmarked them with LSTM to predict flooding events. Analysis suggested that both N-HiTS and N-BEATS outperformed LSTM for short-term (1-hour) flood predictions.


Total:	0
HTML:	0
PDF:	0
XML:	0