References

HESS

Hydrology and Earth System Sciences

HESS

Hydrol. Earth Syst. Sci.

1607-7938

Copernicus Publications

Göttingen, Germany

10.5194/hess-14-1931-2010

Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 1: Concepts and methodology

Elshorbagy

¹ Corzo

² Srinivasulu

¹ Solomatine

D. P.

² ³

Centre for Advanced Numerical Simulation (CANSIM), Department of Civil and Geological Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada

Department of Hydroinformatics and Knowledge Management, UNESCO-IHE Institute for Water Education, Delft, The Netherlands

Water Resources Section, Delft University of Technology, Delft, The Netherlands

14 10 2010

14 10 1931 1941

2010

This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/3.0/

This article is available from https://hess.copernicus.org/articles/14/1931/2010/hess-14-1931-2010.html

The full text article is available as a PDF file from https://hess.copernicus.org/articles/14/1931/2010/hess-14-1931-2010.pdf

A comprehensive data driven modeling experiment is presented in a two-part paper. In this first part, an extensive data-driven modeling experiment is proposed. The most important concerns regarding the way data driven modeling (DDM) techniques and data were handled, compared, and evaluated, and the basis on which findings and conclusions were drawn are discussed. A concise review of key articles that presented comparisons among various DDM techniques is presented. Six DDM techniques, namely, neural networks, genetic programming, evolutionary polynomial regression, support vector machines, M5 model trees, and K-nearest neighbors are proposed and explained. Multiple linear regression and naïve models are also suggested as baseline for comparison with the various techniques. Five datasets from Canada and Europe representing evapotranspiration, upper and lower layer soil moisture content, and rainfall-runoff process are described and proposed, in the second paper, for the modeling experiment. Twelve different realizations (groups) from each dataset are created by a procedure involving random sampling. Each group contains three subsets; training, cross-validation, and testing. Each modeling technique is proposed to be applied to each of the 12 groups of each dataset. This way, both prediction accuracy and uncertainty of the modeling techniques can be evaluated. The description of the datasets, the implementation of the modeling techniques, results and analysis, and the findings of the modeling experiment are deferred to the second part of this paper.

References 1

Abrahart, R., See, L., and Solomatine, D., (Eds): Practical hydroinformatics. Computational intelligence and technological developments in water applications, Springer-Verlag, Berlin, Heidelberg, Germany, 505 pp., 2008.

Abrahart, R., See, L., and Dawson, C.: Neural Network hydroinformatics: Maintaining scientific rigour, in: Practical hydroinformatics. Computational intelligence and technological developments in water applications, edited by: Abrahart, R., See, L., and Solomatine, D., Springer-Verlag, Berlin, Heidelberg, Germany, 33–47, 2008.

ASCE Task Committee on Application of Artificial Neural Networks in Hydrology, Artificial neural networks in hydrology, I: Preliminary concepts, J. Hydrol. Eng., 5(Eq. (2)), 115–123, 2000.

Babovic, V. and Keijzer, M.: Rainfall-runoff modelling based on genetic programming, Nord. Hydrol., 33(5), 331–346, 2002.

Babovic, V. and Keijzer, M.: Genetic Programming as Model Induction Engine, J. Hydroinform., 2(Eq. (1)), 35–60, 2000.

Banzhaf, W., Nordin, P., Keller, R. E., and Francone, F. D.: Genetic programming-an introduction: on the automatic evolution of computer programs and its applications, Morgan Kaufmann Publishers, Inc., 470 pp., 1998.

Behzad, M., Asghari, K.,  Eazi, M., and Palhang, M.: Generalization performance of Support Vector Machines and Neural Networks in Runoff Modeling, Expet. Syst. Appl., 36(Eq. (4)), 7624–7629, https://doi.org/10.1016/j.eswa.2008.09.053, 2008.

Bouckaert, R. R., Frank, E., Hall, M., Kirkby, R., Reutemann, P., Seewald, A., and Scuse, D.: WEKA Manual for version 3.6.0. University of Waikato, Hamilton, New Zealand, 2008.

Brown, M. and Harris, C.: Neurofuzzy adaptive modeling and control, Prentice Hall: New York, 508 pp., 1994.

Cherkassky, V., Krasnopolsky, V., Solomatine, D. and Valdes, J. Computational intelligence in earth sciences and environmental applications: Issues and challenges, Neural Networks, 19, 113–121, 2006.

Cherkassky, V. S. and Mulier, F.: Learning from data: concepts, theory, and methods, 2nd Edn., John Wiley & Sons, Inc., Hoboken, New Jersey, 2007.

Çimen, M.: Estimation of Daily Suspended Sediments using Support Vector Machines, Hydrol. Sci. J., 53(Eq. (3)), 656–666, 2008.

Dibike, Y. B., Velickov, S., Solomatine, D. P. and Abbott, M. B.: Model induction with support vector machines: introduction and applications, ASCE J. Comput. Civil Eng., 15(Eq. (3)), 208–216, 2001.

Dibike, Y. B. and Solomatine, D. P.: River Flow Forecasting Using Artificial Neural Networks, Journal of Physics and Chemistry of the Earth, Part B: Hydrology, Oceans and Atmosphere, 26(Eq. (1)), 1–8, 2001.

Doglioni, A., Giustolisi, O., Savic, D. A., and Webb, B. W.: An investigation on stream temperature analysis based on evolutionary computing, Hydrol. Process. J., 22(Eq. (3)), 315–326, https://doi.org/10.1002/hyp.6607, 2008.

Elshorbagy, A., Corzo, G., Srinivasulu, S., and Solomatine, D. P.: Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology – Part 2: Application, Hydrol. Earth Syst. Sci., 14, 1943–1961, https://doi.org/10.5194/hess-14-1-2010, 2010.

Elshorbagy, A. and El-Baroudy, I.: Investigating the capabilities of evolutionary data-driven techniques using the challenging estimation of soil moisture content. J. Hydroinfo., 11(3–4), 237–251, 2009.

Elshorbagy, A. and Parasuraman, K.: Toward bridging the gap between data-driven and mechanistic models: cluster-based neural networks for hydrologic processes, in: Practical hydroinformatics, edited by: Abrahart, R., See, L., and Solomatine, D., Computational intelligence and technological developments in water applications, Springer-Verlag, Berlin, Heidelberg, Germany, 389–403, 2008.

Giustolisi, O., Doglioni, A., Savic, D. A., and Webb, B. W.: A multi-model approach to analysis of environmental phenomena, Environ. Modell. Softw., 22(Eq. (5)), 674–682, 2007.

Evans, D. and Jones, A. J.: A proof of the gamma test, P. Roy. Soc. A-Math. Phy., 458, 2759–2799, 2002.

Giustolisi, O. and Savic, D. A.: A symbolic data-driven technique based on evolutionary polynomial regression, J. Hydroinform., 8(Eq. (3)), 207–222, https://doi.org/10.2166/hydro.2006.020, 2006.

Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn., MacMillan, New York, USA, 1999.

Jayawardena, A. W., Muttil, N., and Lee, J. H. W.: Comparative Analysis of Data-Driven and GIS-Based Conceptual Rainfall-Runoff Model, J. Hydrologic. Eng., 11(Eq. (1)), 1–11, 2006.

Jayawardena, A. W., Muttil, N., and Fernando, T. M. K. G.: Rainfall-runoff modelling using genetic programming, MODSIM 2005 International Congress on Modelling and Simulation, in: Modelling and Simulation Society of Australia and New Zealand, edited by: Zerger, A. and Argent, R. M., December 2005, 1841–1847, ISBN:0-9758400-2-9, 2005.

Jones, A. J., Margetts, S., and Durrant, P.: The winGamma$^{\rm TM}$ User Guide, University of Wales, Cardiff, 2001.

Khan, M. S. and Coulibaly, P.: Application of Support Vector Machine in Lake Water Level Prediction, J. Hydrol. Engng., 11(3), 199–205, 2006.

Koza, J. R.: Genetic programming: On the programming of computers by means of natural selection, The MIT Press, Cambridge, MA, 1992.

Laucelli, D., Berardi, L., Doglioni, A.: Evolutionary polynomial regression toolbox: version 1.SA., Department of Civil and Environmental Engineering, Technical University of Bari, Bari, Italy, available at: <a href="http://www.hydroinformatics.it/prod02.htm">http://www.hydroinformatics.it/prod02.htm</a>, last access: March 2008, 2005.

Maier, H. and Dandy, G. Neural networks for the prediction and forecasting of water resources variables: A review of modeling issues and applications, Environ. Modell. Software, 15(Eq. (1)), 101–124, 2000.

Makkeasorn, A., Chang, N. B., and Zhou, X.: Short-term streamflow forecasting with global climate change implications – A comparative study between genetic programming and neural network models, J. Hydrol. 352, 336–354, 2008.

Mattera, D. and Haykin, S.: Support vector machines for dynamic reconstruction of a chaotic system, in: Advances in Kernel Methods – Support Vector Learning, edited by: Schölkopf, B., Burges, C. J. C., and Smola, A. J., 211–242, Cambridge, MA, MIT Press, 1999.

Minns, A. W. and Hall, M. J.: Artificial neural networks as rainfall-runoff models, Hydrolog. Sci. J., 41, 399–417, 1996.

Müller, K. R., Smola, A., Rätsch, G., Schölkopf, B., Kohlmorgen, J., and Vapnik, V.: Predicting time series with support vector machines, in: Artificial Neural Networks – ICANN'97, edited by: Gerstner, W., Germond, A., Hasler, M., and Nicoud, J. D., 999–1004, Berlin, Springer Lecture Notes in Computer Science, 1327, 1997.

Karlsson, M. and Yakowitz, S.: Nearest neighbour methods for nonparametric rainfall-runoff forecasting, Water Resour. Res., {23}(Eq. (7)), 1300–1308, 1987.

Parasuraman, K. and Elshorbagy, A.: Cluster-based hydrologic prediction using genetic algorithm-trained neural networks, J. Hydrol. Engng., ASCE, 12(Eq. (1)), 52–62, 2007.

Parasuraman, K., Elshorbagy, A., and Carey, S. K.: Modelling dynamics of the evapotranspiration process using genetic programming, Hydrolog. Sci. J., 53(Eq. (3)), 563–578, 2007a.

Parasuraman, K., Elshorbagy, A., and Si, B. C.: Estimating saturated hydraulic conductivity using genetic programming, Soil Sci. Soc. Am. J., 71, 1676–1684, 2007b.

Rabuñal, J. R., Puertas, J., Su'arez, J., and Rivero, D.: Determination of the unit hydrograph of a typical urban basin using genetic programming and artificial neural Networks, Hydrol. Process., 21, 476–485, 2007.

Savic, D. A., Giustolisi, O., Berardi, L., Shepherd, W., Djordjevic, S., and Saul, A.: Sewers failure analysis using evolutionary computing, Water Management J., 159(Eq. (2)), 111–118, https://doi.org/10.1680/wama.2006.159.2.111, 2006.

Silva, S.: GPLAB – a genetic programming toolbox for MATLAB, <a href="http://gplab.sourceforge.net">http://gplab.sourceforge.net</a>, 2005.

Sivapragasam, C., Vincent, P., and Vasudevan, G.: Genetic programming model for forecast of short and noisy data, Hydrol. Process., 21, 266–272, 2007.

Solomatine, D. P. and Dulal, K. N.: Dulal Model trees as an alternative to neural networks in rainfall-runoff modelling, Hydrolog. Sci. J., 48(Eq. (3)), 399–411, 2003.

Solomatine, D. P., Maskey, M., and Shrestha, D. L.: Instance-based learning compared to other data-driven methods in hydrological forecasting, Hydrol. Proc., 22, 275–287, 2008.

Solomatine, D. P. and Siek, M. B.: Modular learning models in forecasting natural phenomena, Neural Networks, 19, 225–235, 2006.

Solomatine, D. P. and Xue, Y.: M5 Model Trees and Neural Networks: Application to Flood Forecasting in the Upper Reach of the Huai River in China, J. Hydrol. Engng., 9(Eq. (6)), 491–501, 2004.

Solomatine, D. P. and Ostfeld, A.: Data-driven modelling: some past experiences and new approaches, J. Hydroinform., 10(Eq. (1)), 3–22, 2008.

Stef\'{\rm a}nsson, A, Kon\`{\rm e}ar, N., and Jones, A. J.: A note on the Gamma test, Neural Comput. Appl., 5, 131–133, 1997.

Stravs, L. and Brilly, M.: Development of a low-flow forecasting model using the M5 machine learning method, Hydrolog. Sci. J0, 52(Eq. (3)), 466–477, 2007.

Vapnik, V.: The Nature of statistical learning theory, Springer, N.Y., USA, 1995.

Witten, I. H. and Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn., Morgan Kaufmann, San Francisco, USA, 2005.

Wu, C. L., Chau, K. W., and Li, Y. S.: River Stage Prediction based on a Distributed Support Vector Regression, J. Hydrol., 358, 96–111, 2008.

Wu, W., Wang, X., Xie, D., and Liu, H.: Soil Water Content Forecasting by Support Vector Machine in Purple Hilly Region, Computer and Computing Technologies in Agriculture, 1, 223–230, 2008.

Zhang, B. and Govindaraju, S.: Prediction of watershed runoff using Bayesian concepts and modular neural networks, Water Resour. Res., 36(Eq. (3)), 753–762, 2000.