Articles | Volume 30, issue 4
https://doi.org/10.5194/hess-30-1077-2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/hess-30-1077-2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Sensitivity of hydrological machine learning prediction accuracy to information quantity and quality
Minhyuk Jeung
Department of Rural & Biosystems Engineering (Brain Korea 21), Chonnam National University, Gwangju 61186, Republic of Korea
Younggu Her
Department of Agricultural and Biological Engineering/Tropical Research and Education Center, University of Florida, Homestead, Florida 33186, USA
Sang-Soo Baek
Department of Environmental Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea
Kwangsik Yoon
CORRESPONDING AUTHOR
Department of Rural & Biosystems Engineering (Brain Korea 21), Chonnam National University, Gwangju 61186, Republic of Korea
Related authors
Minhyuk Jeung, Younggu Her, Sang-Soo Baek, and Kwangsik Yoon
Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2024-284, https://doi.org/10.5194/hess-2024-284, 2024
Revised manuscript not accepted
Short summary
Short summary
Machine learning (ML) techniques have become widely used due to the availability of large data repositories and advancements in computing resources and methods. Our study explored the connection between a model’s accuracy and the information content of input data. Results showed that the accuracy of three ML models significantly improved when high-quality input data were included. These findings highlight the importance of data quality in ML model training.
Minhyuk Jeung, Younggu Her, Sang-Soo Baek, and Kwangsik Yoon
Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2024-284, https://doi.org/10.5194/hess-2024-284, 2024
Revised manuscript not accepted
Short summary
Short summary
Machine learning (ML) techniques have become widely used due to the availability of large data repositories and advancements in computing resources and methods. Our study explored the connection between a model’s accuracy and the information content of input data. Results showed that the accuracy of three ML models significantly improved when high-quality input data were included. These findings highlight the importance of data quality in ML model training.
Cited articles
Adeola Fashae, O., Abiola Ayorinde, H., Oludapo Olusola, A., and Oluseyi Obateru, R.: Landuse and surface water quality in an emerging urban city, Appl. Water Sci., 9, 25, https://doi.org/10.1007/s13201-019-0903-2, 2019.
Ahmad, I., Basheri, M., Iqbal, M. J., and Rahim, A.: Performance Comparison of Support Vector Machine, Random Forest, and Extreme Learning Machine for Intrusion Detection, IEEE Access, 6, 33789–33795, https://doi.org/10.1109/ACCESS.2018.2841987, 2018.
Ahmed, S., Khalid, M., and Akram, U.: A method for short-term wind speed time series forecasting using Support Vector Machine Regression Model, 2017 6th International Conference on Clean Electrical Power (ICCEP), 27–29 June 2017, 190–195, https://doi.org/10.1109/ICCEP.2017.8004814, 2017.
Aktan, S.: Application of machine learning algorithms for business failure prediction, Invest. Manage. And Financial Inno., 8, 52–65, 2011.
Al-Mukhtar, M.: Random forest, support vector machine, and neural networks to modelling suspended sediment in Tigris River-Baghdad, Environ. Monit. Assess., 191, 673, https://doi.org/10.1007/s10661-019-7821-5, 2019.
Alzubi, J., Nayyar, A., and Kumar, A.: Machine Learning from Theory to Algorithms: An Overview, J. Phys. Conf. Ser., 1142, 012012, https://doi.org/10.1088/1742-6596/1142/1/012012, 2018.
Andersson, J. C. M., Arheimer, B., Traoré, F., Gustafsson, D., and Ali, A.: Process refinements improve a hydrological model concept applied to the Niger River basin, Hydrol. Process., 31, 4540–4554, https://doi.org/10.1002/hyp.11376, 2017.
Arnold, J. G., Moriasi, D. N., Gassman, P. W., Abbaspour, K. C., White, M. J., Srinivasan, R., Santhi, C., Harmel, R. D., Van Griensven, A., and Van Liew, M. W.: SWAT: Model Use, Calibration, and Validation, Transactions of the ASABE, 55, 1491–1508, https://doi.org/10.13031/2013.42256, 2012.
Behrendt, S., Dimpfl, T., Peter, F. J., and Zimmermann, D. J.: RTransferEntropy – Quantifying information flow between different time series using effective transfer entropy, SoftwareX, 10, 100265, https://doi.org/10.1016/j.softx.2019.100265, 2019.
Bennett, A., Nijssen, B., Ou, G., Clark, M., and Nearing, G.: Quantifying Process Connectivity With Transfer Entropy in Hydrologic Models, Water Resour. Res., 55, 4613–4629, https://doi.org/10.1029/2018WR024555, 2019.
Breiman, L.: Random Forests, Mach. Learn., 45, 5–32, https://doi.org/10.1023/A:1010933404324, 2001.
Breiman, L., Friedman, J., Olshen, R. A., and Stone, C. J.: Classification and Regression Trees, CRC press, Wadsworth, ISBN 9780412048418, 1984.
Chaudhary, S., Chua, L. H., and Kansal, A.: Event mean concentration and first flush from residential catchments in different climate zones, Water Res., 219, 118594, https://doi.org/10.1016/j.watres.2022.118594, 2022.
Chen, Z., Zhu, Z., Jiang, H., and Sun, S.: Estimating daily reference evapotranspiration based on limited meteorological data using deep learning and classical machine learning methods, J. Hydrol., 591, 125286, https://doi.org/10.1016/j.jhydrol.2020.125286, 2020.
Cover, T. M. and Thomas, J. A.: Elements of Information Theory, John Wiley & Sons, Inc., Hoboken, ISBN 9780471062592, 2006.
Díaz-Uriarte, R. and Alvarez de Andrés, S.: Gene selection and classification of microarray data using random forest, BMC Bioinformatics, 7, 3, https://doi.org/10.1186/1471-2105-7-3, 2006.
Domingos, P.: A few useful things to know about machine learning, Commun. ACM, 55, 78–87, https://doi.org/10.1145/2347736.2347755, 2012.
Douglas-Mankin, K., Srinivasan, R., and Arnold, G. J.: Soil and Water Assessment Tool (SWAT) Model: Current Developments and Applications, Transactions of the ASABE, 53, 1423–1431, https://doi.org/10.13031/2013.34915, 2010.
El-Sadek, A. and Irvem, A.: Evaluating the impact of land use uncertainty on the simulated streamflow and sediment yield of the Seyhan River basin using the SWAT model, Turkish Journal of Agriculture and Forestry, 38, 515–530, https://doi.org/10.3906/tar-1309-89, 2014.
Engel, B., Storm, D., White, M., Arnold, J., and Arabi, M.: A Hydrologic/Water Quality Model Application, JAWRA Journal of the American Water Resources Association, 43, 1223–1236, https://doi.org/10.1111/j.1752-1688.2007.00105.x, 2007.
Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F.: Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling, J. Hydrol., 377, 80–91, https://doi.org/10.1016/j.jhydrol.2009.08.003, 2009.
Hasanipanah, M., Faradonbeh, R. S., Amnieh, H. B., Armaghani, D. J., and Monjezi, M.: Forecasting blast-induced ground vibration developing a CART model, Eng. Comput., 33, 307–316, https://doi.org/10.1007/s00366-016-0475-9, 2017.
Her, Y. and Jeong, J.: SWAT+ versus SWAT2012: Comparison of Sub-Daily Urban Runoff Simulations, Transactions of the ASABE, 61, 1287–1295, https://doi.org/10.13031/trans.12600, 2018.
Her, Y., Jeong, J., Arnold, J., Gosselink, L., Glick, R., and Jaber, F.: A new framework for modeling decentralized low impact developments using Soil and Water Assessment Tool, Environ. Model. Softw., 96, 305–322, https://doi.org/10.1016/j.envsoft.2017.06.005, 2017.
Ioffe, S. and Szegedy, C.: Batch Normalization: Acceleration Deep Network Training by Reducing Internal Covariate Shift, arXiv [preprint], https://doi.org/10.48550/arXiv.1502.03167, 2015.
Jang, W. S., Engel, B., and Yeum, C. M.: Integrated environmental modeling for efficient aquifer vulnerability assessment using machine learning, Environ. Model. Softw., 124, 104602, https://doi.org/10.1016/j.envsoft.2019.104602, 2020.
Jha, D., Ward, L., Paul, A., Liao, W.-K., Choudhary, A., Wolverton, C., and Agrawal, A.: ElemNet: Deep Learning the Chemistry of Materials From Only Elemental Composition, Sci. Rep., 8, 17593, https://doi.org/10.1038/s41598-018-35934-y, 2018.
Jones, D. R.: A Taxonomy of Global Optimization Methods Based on Response Surfaces, J. Glob. Optim., 21, 345–383, https://doi.org/10.1023/A:1012771025575, 2001.
Khashei, M. and Bijari, M.: An artificial neural network (p,d,q) model for timeseries forecasting, Expert Syst Appl., 37, 479–489, https://doi.org/10.1016/j.eswa.2009.05.044, 2010.
Knoben, W. J. M., Freer, J. E., and Woods, R. A.: Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores, Hydrol. Earth Syst. Sci., 23, 4323–4331, https://doi.org/10.5194/hess-23-4323-2019, 2019.
Konapala, G., Kao, S.-C., and Addor, N.: Exploring Hydrologic Model Process Connectivity at the Continental Scale Through an Information Theory Approach, Water Resour. Res., 56, e2020WR027340, https://doi.org/10.1029/2020WR027340, 2020.
Kratzert, F., Klotz, D., Hochreiter, S., and Nearing, G. S.: A note on leveraging synergy in multiple meteorological data sets with deep learning for rainfall–runoff modeling, Hydrol. Earth Syst. Sci., 25, 2685–2703, https://doi.org/10.5194/hess-25-2685-2021, 2021.
Krenker, A., Bester, J., and Kos, A.: Introduction to the Artificial Neural Networks, in: Artificial Neural Networks – Methodological Advances and Biomedical Applications, edited by: Suzuki, K., IntechOpen, London, https://doi.org/10.5772/15751, 2011.
Li, S., Liu, Y., Her, Y., Chen, J., Guo, T., and Shao, G.: Improvement of simulating sub-daily hydrological impacts of rainwater harvesting for landscape irrigation with rain barrels/cisterns in the SWAT model, Sci. Total Environ., 798, 149336, https://doi.org/10.1016/j.scitotenv.2021.149336, 2021.
Liu, B., Wei, Y., Zhang, Y., and Yang, Q.: Deep Neural Networks for High Dimension, Low Sample Size Data, In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI), 2287–2293, https://doi.org/10.24963/ijcai.2017/318, 2017.
Liu, Z. A., Yang, J., Yang, Z., and Zou, J.: Effects of rainfall and fertilizer types on nitrogen and phosphorus concentrations in surface runoff from subtropical tea fields in Zhejiang, China, Nutr. Cycl. Agroecosyst., 93, 297–307, https://doi.org/10.1007/s10705-012-9517-x, 2012.
Loague, K., Heppner, C. S., Ebel, B. A., and VanderKwaak, J. E.: The quixotic search for a comprehensive understanding of hydrologic response at the surface: Horton, Dunne, Dunton, and the role of concept-development simulation, Hydrol. Process., 24, 2499–2505, https://doi.org/10.1002/hyp.7834, 2010.
Mendie, U. E.: The theory and practice of clean water production for domestic and industrial use: Purified and package water, Lacto-Medal Ltd, Lagos, ISBN 9798300766955, 2005.
Moriasi, D. N., Arnold, J. G., Van Liew, M. W., Bingner, R. L., Harmel, R. D., and Veith, T. L.: Model Evaluation Guidelines for Systematic Quantification of Accuracy in Watershed Simulations, Transactions of the ASABE, 50, 885–900, https://doi.org/10.13031/2013.23153, 2007.
Nash, J. E. and Sutcliffe, J. V.: River flow forecasting through conceptual models part I – A discussion of principles, J. Hydrol., 10, 282–290, https://doi.org/10.1016/0022-1694(70)90255-6, 1970.
Nearing, G. S., Ruddell, B. L., Bennett, A. R., Prieto, C., and Gupta, H. V.: Does Information Theory Provide a New Paradigm for Earth Science? Hypothesis Testing, Water Resour. Res., 56, e2019WR024918, https://doi.org/10.1029/2019WR024918, 2020.
Nie, C. X.: Dynamics of the price-volume information flow based on surrogate time series, Chaos, 31, 013106, https://doi.org/10.1063/5.0024375, 2021.
Nietsch, S. L., Arnold, J. G., Kiniry, J. R., Srinivasan, R., and Williams, J. R.: SWAT: Soil and water assessment tool user's manual. Texas Water Resources Institute, USDA Agricultural Research Service, College Station, TX, 2002.
Noori, N., Kalin, L., and Isik, S.: Water quality prediction using SWAT-ANN coupled approach, J. Hydrol., 590, 125220, https://doi.org/10.1016/j.jhydrol.2020.125220, 2020.
Panidhapu, A., Li, Z., Aliashrafi, A., and Peleato, N. M.: Integration of weather conditions for predicting microbial water quality using Bayesian Belief Networks, Water Res., 170, 115349, https://doi.org/10.1016/j.watres.2019.115349, 2020.
Pechlivanidis, I. G., Gupta, H., and Bosshard, T.: An Information Theory Approach to Identifying a Representative Subset of Hydro-Climatic Simulations for Impact Modeling Studies, Water Resour. Res., 54, 5422–5435, https://doi.org/10.1029/2017WR022035, 2018.
Pullanikkatil, D., Palamuleni, L. G., and Ruhiiga, T. M.: Impact of land use on water quality in the Likangala catchment, southern Malawi, Afr. J. Aquat. Sci., 40, 277–286, https://doi.org/10.2989/16085914.2015.1077777, 2015.
Raju, V. N. G., Lakshmi, K. P., Jain, V. M., Kalidindi, A., and Padma, V.: Study the Influence of Normalization/Transformation process on the Accuracy of Supervised Classification, 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), 20–22 August 2020, 729–735, https://doi.org/10.1109/ICSSIT48917.2020.9214160, 2020.
Razavi, S., Hannah, D. M., Elshorbagy, A., Kumar, S., Marshall, L., Solomatine, D. P., Dezfuli, A., Sadegh, M., and Famiglietti, J.: Coevolution of machine learning and process-based modelling to revolutionize Earth and environmental sciences: A perspective, Hydrol. Process., 36, e14596, https://doi.org/10.1002/hyp.14596, 2022.
Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N., and Prabhat: Deep learning and process understanding for data-driven Earth system science, Nature, 566, 195–204, https://doi.org/10.1038/s41586-019-0912-1, 2019.
Rural Development Administration (RDA): Agricultural work schedule – Machine transplanting cultivation, http://www.nongsaro.go.kr (last access: 16 February 2026), 2014.
Santhi, C., Arnold, J. G., Williams, J. R., Dugas, W. A., Srinivasan, R., and Hauck, L. M.: VALIDATION OF THE SWAT MODEL ON A LARGE RWER BASIN WITH POINT AND NONPOINT SOURCES, J. Am. Water Resour. Assoc., 37, 1169–1188, https://doi.org/10.1111/j.1752-1688.2001.tb03630.x, 2001.
Sao, D., Kato, T., Tu, L. H., Thouk, P., Fitriyah, A., and Oeurng, C.: Evaluation of Different Objective Functions Used in the SUFI-2 Calibration Process of SWAT-CUP on Water Balance Analysis: A Case Study of the Pursat River Basin, Cambodia, Water, 12, 2901, https://doi.org/10.3390/w12102901, 2020.
Schaefli, B. and Gupta, H. V.: Do Nash values have value?, Hydrol. Process., 21, 2075–2080, https://doi.org/10.1002/hyp.6825, 2007.
Schreiber, T.: Measuring Information Transfer, Phys. Rev. Lett., 85, 461–464, https://doi.org/10.1103/PhysRevLett.85.461, 2000.
Senent-Aparicio, J., Jimeno-Sáez, P., Bueno-Crespo, A., Pérez-Sánchez, J., and Pulido-Velázquez, D.: Coupling machine-learning techniques with SWAT model for instantaneous peak flow prediction, Biosys. Eng., 177, 67–77, https://doi.org/10.1016/j.biosystemseng.2018.04.022, 2019.
Shannon, C. E.: A Mathematical Theory of Communication, Bcl Syst. Tech. J., 27, 379–423, https://doi.org/10.1002/j.1538-7305.1948.tb01338.x, 1948.
Shawe-Taylor, J. and Sun, S.: A review of optimization methodologies in support vector machines, Neurocomputing, 74, 3609–3618, https://doi.org/10.1016/j.neucom.2011.06.026, 2011.
Siddique, M. H. and Tokhi, M. O.: Training neural networks: backpropagation vs. genetic algorithms, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222), vol. 2674, 4, 2673–2678, 2001.
Silva, V. D. P. R. D., Belo Filho, A. F., Singh, V. P., Almeida, R. S. R., Silva, B. B. D., de Sousa, I. F., and Holanda, R. M. D.: Entropy theory for analysing water resources in northeastern region of Brazil, Hydrol. Sci. J., 62, 1029–1038, https://doi.org/10.1080/02626667.2015.1099789, 2017.
Srinivasan, R., Zhang, X., and Arnold, J.: SWAT Ungauged: Hydrological Budget and Crop Yield Predictions in the Upper Mississippi River Basin, Transactions of the ASABE, 53, 1533–1546, https://doi.org/10.13031/2013.34903, 2010.
Srivastava, A., Kumari, N., and Maza, M.: Hydrological Response to Agricultural Land Use Heterogeneity Using Variable Infiltration Capacity Model, Water Resour. Manag., 34, 3779–3794, https://doi.org/10.1007/s11269-020-02630-4, 2020.
Sun, W., Lv, Y., Li, G., and Chen, Y.: Modeling River Ice Breakup Dates by k-Nearest Neighbor Ensemble, Water, 12, 220, https://doi.org/10.3390/w12010220, 2020.
Tang, X., Zhang, J., Wang, G., Jin, J., Liu, C., Liu, Y., He, R., and Bao, Z.: Uncertainty Analysis of SWAT Modeling in the Lancang River Basin Using Four Different Algorithms, Water, 13, 341, https://doi.org/10.3390/w13030341, 2021.
Tao, J., Chen, W., Wang, B., Jiezhen, X., Nianzhi, J., and Luo, T.: Real-Time Red Tide Algae Classification Using Naive Bayes Classifier and SVM, 2008 2nd International Conference on Bioinformatics and Biomedical Engineering, 16–18 May 2008, 2888–2891, https://doi.org/10.1109/ICBBE.2008.1054, 2008.
Tobin, K. J. and Bennett, M. E.: Constraining SWAT Calibration with Remotely Sensed Evapotranspiration Data, J. Am. Water Resour. Assoc., 53, 593–604, https://doi.org/10.1111/1752-1688.12516, 2017.
Tosun, E., Aydin, K., and Bilgili, M.: Comparison of linear regression and artificial neural network model of a diesel engine fueled with biodiesel-alcohol mixtures, Alex. Eng. J., 55, 3081–3089, https://doi.org/10.1016/j.aej.2016.08.011, 2016.
Vapnik, V.: The nature of statistical learning theory, Springer, Berlin, ISBN 9780387945590, 1995.
Vapnik, V.: Statistical learning theory, John Wiley & Sons, New York, ISBN 9780471030034, 1998.
Wang, Y. and Xia, S. T.: A novel feature subspace selection method in random forests for high dimensional data, 2016 International Joint Conference on Neural Networks (IJCNN), 24–29 July 2016, 4383–4389, https://doi.org/10.1109/IJCNN.2016.7727772, 2016.
Xu, T. and Liang, F.: Machine learning for hydrologic sciences: An introductory overview, WIREs Water, 8, e1533, https://doi.org/10.1002/wat2.1533, 2021.
Ye, Y., Wu, Q., Zhexue Huang, J., Ng, M. K., and Li, X.: Stratified sampling for feature subspace selection in random forests for high dimensional data, Pattern Recognit., 46, 769–787, https://doi.org/10.1016/j.patcog.2012.09.005, 2013.
Yilmazkaya, E., Dagdelenler, G., Ozcelik, Y., and Sonmez, H.: Prediction of mono-wire cutting machine performance parameters using artificial neural network and regression models, Eng. Geol., 239, 96–108, https://doi.org/10.1016/j.enggeo.2018.03.009, 2018.
Yu, T. and Zhu, H.: Hyper-parameter optimization: A review of algorithms and applications, arXiv [preprint], arXiv:2003.05689, https://doi.org/10.48550/arXiv.2003.05689, 2020.
Zhang, N. and Zhao, X.: Quantile transfer entropy: Measuring the heterogeneous information transfer of nonlinear time series, Commun. Nonlinear Sci. Numer. Simul., 111, 106505, https://doi.org/10.1016/j.cnsns.2022.106505, 2022.
Short summary
Machine learning (ML) techniques have become widely used due to the availability of large data repositories and advancements in computing resources and methods. Our study explored the connection between a model’s accuracy and the information content of input data. Results showed that the accuracy of three ML models significantly improved when high-quality input data were included. These findings highlight the importance of data quality in ML model training.
Machine learning (ML) techniques have become widely used due to the availability of large data...