Journal cover Journal topic
Hydrology and Earth System Sciences An interactive open-access journal of the European Geosciences Union
Journal topic

Journal metrics

IF value: 5.153
IF 5-year value: 5.460
IF 5-year
CiteScore value: 7.8
SNIP value: 1.623
IPP value: 4.91
SJR value: 2.092
Scimago H <br class='widget-line-break'>index value: 123
Scimago H
h5-index value: 65
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

  09 Oct 2020

09 Oct 2020

Review status
This preprint is currently under review for the journal HESS.

Resampling and ensemble techniques for improving ANN-based high streamflow forecast accuracy

Everett Snieder, Karen Abogadil, and Usman T. Khan Everett Snieder et al.
  • Department of Civil Engineering, York University, 4700 Keele St, Toronto ON, Canada, M3J 1P3

Abstract. Data-driven flow forecasting models, such as Artificial Neural Networks (ANNs), are increasingly used for operational flood warning systems. However, flow distributions are highly imbalanced, resulting in poor prediction accuracy on high flows, both in terms of amplitude and timing error. Resampling and ensemble techniques have shown to improve model performance of imbalanced datasets such as streamflow. In this research, we systematically evaluate and compare three resampling: random undersampling (RUS), random oversampling (ROS), and SMOTER; and four ensemble techniques: randomised weights and biases, bagging, adaptive boosting (AdaBoost), least squares boosting (LSBoost); on their ability to improve high flow prediction accuracy using ANNs. The methods are implemented both independently and in combined, hybrid techniques. While some of these combinations have been explored in the broader machine learning literature, this research contains many of the first instances of these algorithms to address the imbalance problem inherent in flood and high flow forecasting models. Specifically, the implementation of ROS, and new approaches for SMOTER, LSBOOST, and SMOTER-AdaBoost are presented in this research. Data from two Canadian watersheds (the Bow River in Alberta, and the Don River in Ontario), representing distinct hydrological systems, are used as the basis for the comparison of the methods. The models are evaluated on overall performance and on high flows. The results of this research indicate that resampling produces marginal improvements to high flow prediction accuracy, whereas ensemble methods produce more substantial improvements, with or without a resampling method. Compared to simple ANN flow forecast models, the use of ensemble methods is recommended to reduce the amplitude and timing error in highly imbalanced flow datasets.

Everett Snieder et al.

Interactive discussion

Status: open (until 04 Dec 2020)
Status: open (until 04 Dec 2020)
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
[Subscribe to comment alert] Printer-friendly Version - Printer-friendly version Supplement - Supplement

Everett Snieder et al.

Everett Snieder et al.


Total article views: 261 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
210 48 3 261 2 1
  • HTML: 210
  • PDF: 48
  • XML: 3
  • Total: 261
  • BibTeX: 2
  • EndNote: 1
Views and downloads (calculated since 09 Oct 2020)
Cumulative views and downloads (calculated since 09 Oct 2020)

Viewed (geographical distribution)

Total article views: 241 (including HTML, PDF, and XML) Thereof 238 with geography defined and 3 with unknown origin.
Country # Views %
  • 1



No saved metrics found.


No discussed metrics found.
Latest update: 26 Oct 2020
Publications Copernicus
Short summary
Stream flow distributions are highly skewed (known as an imbalanced domain), resulting in low prediction accuracy of high flows when using Artificial Neural Networks flow forecasting models. In this research, we investigate the use of three resampling, four ensemble, and 12 hybrid techniques, to address the problem of imbalanced datasets, and to evaluate the improvement of high flow prediction accuracy for two Canadian watersheds.
Stream flow distributions are highly skewed (known as an imbalanced domain), resulting in low...