Articles | Volume 25, issue 5
https://doi.org/10.5194/hess-25-2543-2021
© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/hess-25-2543-2021
© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Resampling and ensemble techniques for improving ANN-based high-flow forecast accuracy
Everett Snieder
Department of Civil Engineering, York University, 4700 Keele St, Toronto ON, M3J 1P3, Canada
Karen Abogadil
Department of Civil Engineering, York University, 4700 Keele St, Toronto ON, M3J 1P3, Canada
Usman T. Khan
CORRESPONDING AUTHOR
Department of Civil Engineering, York University, 4700 Keele St, Toronto ON, M3J 1P3, Canada
Related authors
Everett Snieder and Usman T. Khan
Hydrol. Earth Syst. Sci., 29, 785–798, https://doi.org/10.5194/hess-29-785-2025, https://doi.org/10.5194/hess-29-785-2025, 2025
Short summary
Short summary
Improving the accuracy of flood forecasts is paramount to minimising flood damage. Machine learning (ML) models are increasingly being applied for flood forecasting. Such models are typically trained on large historic hydrometeorological datasets. In this work, we evaluate methods for selecting training datasets that maximise the spatio-temporal diversity of the represented hydrological processes. Empirical results showcase the importance of hydrological diversity in training ML models.
Everett Snieder and Usman T. Khan
Hydrol. Earth Syst. Sci., 29, 785–798, https://doi.org/10.5194/hess-29-785-2025, https://doi.org/10.5194/hess-29-785-2025, 2025
Short summary
Short summary
Improving the accuracy of flood forecasts is paramount to minimising flood damage. Machine learning (ML) models are increasingly being applied for flood forecasting. Such models are typically trained on large historic hydrometeorological datasets. In this work, we evaluate methods for selecting training datasets that maximise the spatio-temporal diversity of the represented hydrological processes. Empirical results showcase the importance of hydrological diversity in training ML models.
Cited articles
Abbot, J. and Marohasy, J.: Input selection and optimisation for monthly
rainfall forecasting in Queensland, Australia, using artificial neural
networks, Atmos. Res., 138, 166–178,
https://doi.org/10.1016/j.atmosres.2013.11.002, 2014. a
Abrahart, R. J., Heppenstall, A. J., and See, L. M.: Timing error correction
procedure applied to neural network rainfall-runoff modelling, Hydrolog.
Sci. J., 52, 414–431, https://doi.org/10.1623/hysj.52.3.414, 2007. a, b, c, d
Abrahart, R. J., Anctil, F., Coulibaly, P., Dawson, C. W., Mount, N. J., See,
L. M., Shamseldin, A. Y., Solomatine, D. P., Toth, E., and Wilby, R. L.: Two
decades of anarchy? Emerging themes and outstanding challenges for neural
network river forecasting, Prog. Phys. Geog., 36, 480–513, https://doi.org/10.1177/0309133312444943, 2012. a, b, c
Anctil, F. and Lauzon, N.: Generalisation for neural networks through data sampling and training procedures, with applications to streamflow predictions, Hydrol. Earth Syst. Sci., 8, 940–958, https://doi.org/10.5194/hess-8-940-2004, 2004. a, b, c
Atieh, M., Taylor, G., Sattar, A. M. A., and Gharabaghi, B.: Prediction of flow
duration curves for ungauged basins, J. Hydrol., 545, 383–394,
https://doi.org/10.1016/j.jhydrol.2016.12.048, 2017. a
Banjac, G., Vašak, M., and Baotić, M.: Adaptable urban water
demand prediction system, Water Supply, 15, 958–964,
https://doi.org/10.2166/ws.2015.048, 2015. a
Barzegar, R., Ghasri, M., Qi, Z., Quilty, J., and Adamowski, J.: Using
bootstrap ELM and LSSVM models to estimate river ice thickness in the
Mackenzie River Basin in the Northwest Territories, Canada, J.
Hydrol., 577, 123903, https://doi.org/10.1016/j.jhydrol.2019.06.075, 2019. a
Bennett, N. D., Croke, B. F., Guariso, G., Guillaume, J. H., Hamilton, S. H.,
Jakeman, A. J., Marsili-Libelli, S., Newham, L. T., Norton, J. P., Perrin,
C., Pierce, S. A., Robson, B., Seppelt, R., Voinov, A. A., Fath, B. D., and
Andreassian, V.: Characterising performance of environmental models,
Environ. Modell. Softw., 40, 1–20,
https://doi.org/10.1016/j.envsoft.2012.09.011, 2013. a
Błaszczyński, J. and Stefanowski, J.: Neighbourhood sampling in
bagging for imbalanced data, Neurocomputing, 150, 529–542,
https://doi.org/10.1016/j.neucom.2014.07.064, 2015. a
Breiman, L.: Bagging predictors, Mach. Learn., 24, 123–140,
https://doi.org/10.1007/BF00058655, 1996. a, b, c
Cannon, A. J. and Whitfield, P. H.: Downscaling recent streamflow conditions
in British Columbia, Canada using ensemble neural network models, J.
Hydrol., 259, 136–151, https://doi.org/10.1016/S0022-1694(01)00581-9, 2002. a
Chapi, K., Singh, V. P., Shirzadi, A., Shahabi, H., Bui, D. T., Pham, B. T.,
and Khosravi, K.: A novel hybrid artificial intelligence approach for flood
susceptibility assessment, Environ. Modell. Softw., 95,
229–245, https://doi.org/10.1016/j.envsoft.2017.06.012, 2017. a
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P.: SMOTE:
Synthetic minority over-sampling technique, J. Artif.
Intell. Res., 16, 321–357, https://doi.org/10.1613/jair.953, 2002. a
Chen, W., Hong, H., Li, S., Shahabi, H., Wang, Y., Wang, X., and Ahmad, B. B.:
Flood susceptibility modelling using novel hybrid approach of reduced-error
pruning trees with bagging and random subspace ensembles, J.
Hydrol., 575, 864–873, https://doi.org/10.1016/j.jhydrol.2019.05.089, 2019. a
Crochemore, L., Perrin, C., Andréassian, V., Ehret, U., Seibert, S. P.,
Grimaldi, S., Gupta, H., and Paturel, J.-E.: Comparing expert judgement and
numerical criteria for hydrograph evaluation, Hydrolog. Sci. J.,
60, 402–423, https://doi.org/10.1080/02626667.2014.903331, 2015. a, b, c
Dawson, C. W. and Wilby, R. L.: Hydrological modelling using artificial neural
networks, Prog. Phys. Geogr., 25,
80–108, https://doi.org/10.1177/030913330102500104, 2001. a
de Vos, N. and Rientjes, T.: Correction of Timing Errors of Artificial Neural
Network Rainfall-Runoff Models, in: Practical Hydroinformatics, pp.
101–112, Springer, Berlin, Heidelberg, https://doi.org/10.1007/978-3-540-79881-1_8,
2009. a, b, c, d
de Vos, N. J. and Rientjes, T. H. M.: Constraints of artificial neural networks for rainfall-runoff modelling: trade-offs in hydrological state representation and model evaluation, Hydrol. Earth Syst. Sci., 9, 111–126, https://doi.org/10.5194/hess-9-111-2005, 2005. a
Díez-Pastor, J. F., Rodríguez, J. J., García-Osorio, C., and
Kuncheva, L. I.: Random Balance: Ensembles of variable priors classifiers
for imbalanced data, Knowledge-Based Syst., 85, 96–111,
https://doi.org/10.1016/j.knosys.2015.04.022, 2015a. a, b
Díez-Pastor, J. F., Rodríguez, J. J., García-Osorio, C. I.,
and Kuncheva, L. I.: Diversity techniques improve the performance of the
best imbalance learning ensembles, Inform. Sci., 325, 98–117,
https://doi.org/10.1016/j.ins.2015.07.025, 2015b. a
Duncan, A.: The analysis and application of Artificial Neural Networks for early warning systems in hydrology and the environment, PhD thesis, College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, UK, 2014. a
Ehret, U. and Zehe, E.: Series distance – an intuitive metric to quantify hydrograph similarity in terms of occurrence, amplitude and timing of hydrological events, Hydrol. Earth Syst. Sci., 15, 877–896, https://doi.org/10.5194/hess-15-877-2011, 2011. a, b, c
Erdal, H. I. and Karakurt, O.: Advancing monthly streamflow prediction
accuracy of CART models using ensemble learning paradigms, J.
Hydrol., 477, 119–128, https://doi.org/10.1016/j.jhydrol.2012.11.015, 2013. a
Fernando, T., Maier, H., and Dandy, G.: Selection of input variables for data
driven models: An average shifted histogram partial mutual information
estimator approach, J. Hydrol., 367, 165–176,
https://doi.org/10.1016/j.jhydrol.2008.10.019, 2009. a
Fleming, S. W., Bourdin, D. R., Campbell, D., Stull, R. B., and Gardner, T.:
Development and operational testing of a super-ensemble artificial
intelligence flood-forecast model for a pacific northwest river, J.
Am. Water Resour. As., 51, 502–512,
https://doi.org/10.1111/jawr.12259, 2015. a, b, c, d
Freund, Y. and Schapire, R. E.: Experiments with a New Boosting Algorithm, in: ICML'96: Proceedings of the 13th International Conference on Machine Learning, Bari, Italy, 3–6 July 1996, 148–156, 1996. a
Galar, M., Fernández, A., Barrenechea, E., and Herrera, F.: EUSBoost:
Enhancing ensembles for highly imbalanced data-sets by evolutionary
undersampling, Pattern Recognition, 46, 3460–3471,
https://doi.org/10.1016/j.patcog.2013.05.006, 2013. a
Govindaraju, R. S.: Artificial Neural Networks in Hydrology. II: Hydrologic
Applications, J. Hydrol. Eng., 5, 124–137,
https://doi.org/10.1061/(ASCE)1084-0699(2000)5:2(124), 2000. a
Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F.: Decomposition of
the mean squared error and NSE performance criteria: Implications for
improving hydrological modelling, J. Hydrol., 377, 80–91,
https://doi.org/10.1016/j.jhydrol.2009.08.003, 2009. a
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., and Bing, G.:
Learning from class-imbalanced data: Review of methods and applications, Expert Syst. Appl., 73, 220–239,
https://doi.org/10.1016/j.eswa.2016.12.035, 2017. a, b, c
Hastie, T., Tibshirani, R., and Friedman, J.: Elements of Statistical Learning,
2nd ed., no. 2 in Springer Series in Statistics, Springer New York, New
York, NY, https://doi.org/10.1007/978-0-387-84858-7, 2009. a
He, J., Valeo, C., Chu, A., and Neumann, N. F.: Prediction of event-based
stormwater runoff quantity and quality by ANNs developed using PMI-based
input selection, J. Hydrol., 400, 10–23,
https://doi.org/10.1016/j.jhydrol.2011.01.024, 2011. a
Khan, U. T., He, J., and Valeo, C.: River flood prediction using fuzzy neural
networks: an investigation on automated network architecture, Water Sci.
Technol., 2017, 238–247, https://doi.org/10.2166/wst.2018.107, 2018. a
Lauzon, N., Anctil, F., and Baxter, C. W.: Clustering of heterogeneous precipitation fields for the assessment and possible improvement of lumped neural network models for streamflow forecasts, Hydrol. Earth Syst. Sci., 10, 485–494, https://doi.org/10.5194/hess-10-485-2006, 2006. a
Li, J., Zhang, C., Zhang, X., He, H., Liu, W., and Chen, C.: Temperature
Compensation of Piezo-Resistive Pressure Sensor Utilizing Ensemble AMPSO-SVR
Based on Improved Adaboost.RT, IEEE Access, 8, 12413–12425,
https://doi.org/10.1109/ACCESS.2020.2965150, 2020. a
Liu, S., Xu, J., Zhao, J., Xie, X., and Zhang, W.: Efficiency enhancement of a
process-based rainfall–runoff model using a new modified AdaBoost.RT
technique, Appl. Soft Comput., 23, 521–529,
https://doi.org/10.1016/j.asoc.2014.05.033, 2014. a
López, V., Fernández, A., García, S., Palade, V., and
Herrera, F.: An insight into classification with imbalanced data: Empirical
results and current trends on using data intrinsic characteristics,
Inform. Sciences, 250, 113–141, https://doi.org/10.1016/j.ins.2013.07.007, 2013. a
Maier, H. R. and Dandy, G. C.: Neural networks for the prediction and
forecasting of water resources variables: A review of modelling issues and
applications, Environ. Modell. Softw., 15, 101–124,
https://doi.org/10.1016/S1364-8152(99)00007-9, 2000. a
Maier, H. R., Jain, A., Dandy, G. C., and Sudheer, K.: Methods used for the
development of neural networks for the prediction of water resource variables
in river systems: Current status and future directions, Environ.
Modell. Softw., 25, 891–909, https://doi.org/10.1016/j.envsoft.2010.02.003,
2010. a, b
Moniz, N., Ribeiro, R., Cerqueira, V., and Chawla, N.: SMOTEBoost for
Regression: Improving the Prediction of Extreme Values, in: 2018 IEEE 5th
International Conference on Data Science and Advanced Analytics (DSAA),
150–159, IEEE, https://doi.org/10.1109/DSAA.2018.00025, 2018. a
Mosavi, A., Ozturk, P., and Chau, K.-w.: Flood Prediction Using Machine
Learning Models: Literature Review, Water, 10, 1536,
https://doi.org/10.3390/w10111536, 2018. a
Ni, L., Wang, D., Wu, J., Wang, Y., Tao, Y., Zhang, J., and Liu, J.:
Streamflow forecasting using extreme gradient boosting model coupled with
Gaussian mixture model, J. Hydrol., 586, 124901,
https://doi.org/10.1016/j.jhydrol.2020.124901, 2020. a
Nirupama, N., Armenakis, C., and Montpetit, M.: Is flooding in Toronto a
concern?, Nat. Hazards, 72, 1259–1264, https://doi.org/10.1007/s11069-014-1054-2,
2014. a
Ouarda, T. B. M. J. and Shu, C.: Regional low-flow frequency analysis using
single and ensemble artificial neural networks, Water Resour. Res.,
45, W11428, https://doi.org/10.1029/2008WR007196, 2009. a, b
Papacharalampous, G., Tyralis, H., Langousis, A., Jayawardena, A. W.,
Sivakumar, B., Mamassis, N., Montanari, A., and Koutsoyiannis, D.:
Probabilistic hydrological post-processing at scale: Why and how to apply
machine-learning quantile regression algorithms, Water, 11,
2126, https://doi.org/10.3390/w11102126, 2019. a
Pisa, I., Santín, I., Vicario, J. L., Morell, A., and Vilanova, R.: Data
preprocessing for ANN-based industrial time-series forecasting with
imbalanced data, in: European Signal Processing Conference, 2019,
European Signal Processing Conference, EUSIPCO,
https://doi.org/10.23919/EUSIPCO.2019.8902682, 2019. a
Razali, N., Ismail, S., and Mustapha, A.: Machine learning approach for flood
risks prediction, IAES International Journal of Artificial Intelligence, 9,
73–80, https://doi.org/10.11591/ijai.v9.i1.pp73-80, 2020. a, b
Saffarpour, S., Erechtchoukova, M. G., Khaiter, P. A., Chen, S. Y., and Heralall, M.: Short-term prediction of flood events in a small urbanized watershed using multi-year hydrological records, in: Proceedings of the 21st International Congress on Modelling and Simulation (MODSIM2015), Broadbeach, Australia, 29 November–4 December 2014, 2234–2240, https://doi.org/10.36334/MODSIM.2015.L7.saffarpour, 2015. a
Seibert, S. P., Ehret, U., and Zehe, E.: Disentangling timing and amplitude errors in streamflow simulations, Hydrol. Earth Syst. Sci., 20, 3745–3763, https://doi.org/10.5194/hess-20-3745-2016, 2016. a
Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., and Napolitano, A.:
Resampling or reweighting: A comparison of boosting implementations, in:
Proceedings – International Conference on Tools with Artificial Intelligence,
ICTAI, 1, 445–451, https://doi.org/10.1109/ICTAI.2008.59, 2008. a
Sharkey, A. J. C.: On Combining Artificial Neural Nets, Connection Science,
8, 299–314, https://doi.org/10.1080/095400996116785, 1996. a, b
Sharma, A.: Seasonal to interannual rainfall probabilistic forecasts for
improved water supply management: Part 1 — A strategy for system predictor
identification, J. Hydrol., 239, 232–239,
https://doi.org/10.1016/S0022-1694(00)00346-2, 2000. a
Shrestha, D. L. and Solomatine, D. P.: Experiments with AdaBoost.RT, an
improved boosting scheme for regression, Neural Computat., 18, 1678–1710,
https://doi.org/10.1162/neco.2006.18.7.1678, 2006. a, b, c, d
Shu, C. and Ouarda, T. B.: Flood frequency analysis at ungauged sites using
artificial neural networks in canonical correlation analysis physiographic
space, Water Resour. Res., 43, W07438, https://doi.org/10.1029/2006WR005142, 2007. a
Solomatine, D. P. and Ostfeld, A.: Data-driven modelling: some past
experiences and new approaches, J. Hydroinform., 10, 3–22,
https://doi.org/10.2166/hydro.2008.015, 2008. a
Sudheer, K. P., Nayak, P. C., and Ramasastri, K. S.: Improving peak flow
estimates in artificial neural network river flow models, Hydrol.
Process., 17, 677–686, https://doi.org/10.1002/hyp.5103, 2003. a, b, c
Sufi Karimi, H., Natarajan, B., Ramsey, C. L., Henson, J., Tedder, J. L., and
Kemper, E.: Comparison of learning-based wastewater flow prediction
methodologies for smart sewer management, J. Hydrol., 577, 123977,
https://doi.org/10.1016/j.jhydrol.2019.123977, 2019. a
Tiwari, M. K. and Chatterjee, C.: Uncertainty assessment and ensemble flood
forecasting using bootstrap based artificial neural networks (BANNs),
J. Hydrol., 382, 20–33, https://doi.org/10.1016/j.jhydrol.2009.12.013, 2010. a
Tongal, H. and Booij, M. J.: Simulation and forecasting of streamflows using
machine learning models coupled with base flow separation, J.
Hydrol., 564, 266–282, https://doi.org/10.1016/j.jhydrol.2018.07.004, 2018. a, b, c
Torgo, L., Ribeiro, R. P., Pfahringer, B., and Branco, P.: SMOTE for
regression, in: Lecture Notes in Computer Science (including subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), 8154 LNAI, 378–389,
https://doi.org/10.1007/978-3-642-40669-0_33, 2013. a, b
Toronto and Region Conservation Authority: Lower Don River West Remedial
Flood Protection Project, available at:
https://trca.ca/conservation/green-infrastructure/lower-don-river-west-remedial-flood-protection-project/ (last access: 12 May 2021),
2020a. a
Toth, E.: Estimation of flood warning runoff thresholds in ungauged basins with asymmetric error functions, Hydrol. Earth Syst. Sci., 20, 2383–2394, https://doi.org/10.5194/hess-20-2383-2016, 2016. a, b
Vezhnevets, A. and Barinova, O.: Avoiding Boosting Overfitting by Removing
Confusing Samples, in: Machine Learning: ECML 2007, 4701 LNAI,
430–441, Springer Berlin Heidelberg, Berlin, Heidelberg,
https://doi.org/10.1007/978-3-540-74958-5_40, 2007. a
Wang, R., Zhang, X., and Li, M. H.: Predicting bioretention pollutant removal
efficiency with design features: A data-driven approach, J.
Environ. Manage., 242, 403–414, https://doi.org/10.1016/j.jenvman.2019.04.064,
2019a. a
Wang, S.-H., Li, H.-F., Zhang, Y.-J., and Zou, Z.-S.: A Hybrid Ensemble Model
Based on ELM and Improved AdaBoost.RT Algorithm for Predicting the Iron Ore
Sintering Characters, Comput. Intel. Neurosc., 2019,
1–11, https://doi.org/10.1155/2019/4164296, 2019b. a
Wang, W., Gelder, P. H., Vrijling, J. K., and Ma, J.: Forecasting daily
streamflow using hybrid ANN models, J. Hydrol., 324, 383–399,
https://doi.org/10.1016/j.jhydrol.2005.09.032, 2006.
a, b
Worland, S. C., Farmer, W. H., and Kiang, J. E.: Improving predictions of
hydrological low-flow indices in ungaged basins using machine learning,
Environ. Modell. Softw., 101, 169–182,
https://doi.org/10.1016/j.envsoft.2017.12.021, 2018. a
Wu, Y., Ding, Y., and Feng, J.: SMOTE-Boost-based sparse Bayesian model for
flood prediction, Eurasip J. Wirel. Comm.,
2020, 78, https://doi.org/10.1186/s13638-020-01689-2, 2020. a, b
Yap, B. W., Rani, K. A., Rahman, H. A. A., Fong, S., Khairudin, Z., and
Abdullah, N. N.: An Application of Oversampling, Undersampling, Bagging and
Boosting in Handling Imbalanced Datasets, Lect. Notes Electr.
Engr., 285 LNEE, 13–22, https://doi.org/10.1007/978-981-4585-18-7_2, 2014. a, b, c
Zhan, C., Han, J., Zou, L., Sun, F., and Wang, T.: Heteroscedastic and
symmetric efficiency for hydrological model evaluation criteria, Hydrol.
Res., 50, 1189–1201, https://doi.org/10.2166/nh.2019.121, 2019. a
Zhang, H., Yang, Q., Shao, J., and Wang, G.: Dynamic Streamflow Simulation via
Online Gradient-Boosted Regression Tree, J. Hydrol. Eng.,
24, 04019041, https://doi.org/10.1061/(ASCE)HE.1943-5584.0001822, 2019. a
Zhang, Z.-L., Luo, X.-G., Yu, Y., Yuan, B.-W., and Tang, J.-F.: Integration of
an improved dynamic ensemble selection approach to enhance one-vs-one
scheme, Eng. Appl. Artif. Intel., 74, 43–53,
https://doi.org/10.1016/j.engappai.2018.06.002, 2018. a
Zhaowei, Q., Haitao, L., Zhihui, L., and Tao, Z.: Short-Term Traffic Flow
Forecasting Method With M-B-LSTM Hybrid Network, IEEE Transactions on
Intelligent Transportation Systems, 1–11,
https://doi.org/10.1109/TITS.2020.3009725, 2020. a
Short summary
Flow distributions are highly skewed, resulting in low prediction accuracy of high flows when using artificial neural networks for flood forecasting. We investigate the use of resampling and ensemble techniques to address the problem of skewed datasets to improve high flow prediction. The methods are implemented both independently and in combined, hybrid techniques. This research presents the first analysis of the effects of combining these methods on high flow prediction accuracy.
Flow distributions are highly skewed, resulting in low prediction accuracy of high flows when...