Articles | Volume 30, issue 8
https://doi.org/10.5194/hess-30-2373-2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/hess-30-2373-2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Never Train a Deep Learning Model on a Single Well? Revisiting Training Strategies for Groundwater Level Prediction
Institute for Applied Geosciences (AGW), Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Tanja Liesch
Institute for Applied Geosciences (AGW), Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Related authors
Tanja Liesch and Marc Ohmer
Hydrol. Earth Syst. Sci., 30, 1877–1890, https://doi.org/10.5194/hess-30-1877-2026, https://doi.org/10.5194/hess-30-1877-2026, 2026
Short summary
Short summary
We studied how to add site information to deep learning models that predict groundwater levels at many wells at once. Using data from Germany, we compared four simple ways to combine time varying weather with time invariant site characteristics. All methods gave similar average accuracy. Repeating site data at each time step was slightly best but used more computer power. The informativeness of site information mattered more than the method, guiding future model design.
Marc Ohmer, Tanja Liesch, Bastian Habbel, Benedikt Heudorfer, Mariana Gomez, Patrick Clos, Maximilian Nölscher, and Stefan Broda
Earth Syst. Sci. Data, 18, 77–95, https://doi.org/10.5194/essd-18-77-2026, https://doi.org/10.5194/essd-18-77-2026, 2026
Short summary
Short summary
We present a public dataset of weekly groundwater levels from more than 3000 wells across Germany, spanning 32 years. It combines weather data and site-specific environmental information to support forecasting groundwater changes. Three benchmark models of varying complexity show how data and modeling approaches influence predictions. This resource promotes open, reproducible research and helps guide future water management decisions.
Marc Ohmer, Tanja Liesch, and Andreas Wunsch
Hydrol. Earth Syst. Sci., 26, 4033–4053, https://doi.org/10.5194/hess-26-4033-2022, https://doi.org/10.5194/hess-26-4033-2022, 2022
Short summary
Short summary
We present a data-driven approach to select optimal locations for groundwater monitoring wells. The applied approach can optimize the number of wells and their location for a network reduction (by ranking wells in order of their information content and reducing redundant) and extension (finding sites with great information gain) or both. It allows us to include a cost function to account for more/less suitable areas for new wells and can help to obtain maximum information content for a budget.
Tanja Liesch and Marc Ohmer
Hydrol. Earth Syst. Sci., 30, 1877–1890, https://doi.org/10.5194/hess-30-1877-2026, https://doi.org/10.5194/hess-30-1877-2026, 2026
Short summary
Short summary
We studied how to add site information to deep learning models that predict groundwater levels at many wells at once. Using data from Germany, we compared four simple ways to combine time varying weather with time invariant site characteristics. All methods gave similar average accuracy. Repeating site data at each time step was slightly best but used more computer power. The informativeness of site information mattered more than the method, guiding future model design.
Fabienne Doll, Tanja Liesch, Maria Wetzel, Stefan Kunz, and Stefan Broda
Geosci. Model Dev., 19, 2657–2675, https://doi.org/10.5194/gmd-19-2657-2026, https://doi.org/10.5194/gmd-19-2657-2026, 2026
Short summary
Short summary
With the growing use of machine learning for groundwater level (GWL) prediction, proper performance estimation is crucial. This study compares three validation strategies—blocked cross-validation (bl-CV), repeated out-of-sample (repOOS), and out-of-sample (OOS)—for 1D-CNN and LSTM models using meteorological inputs. Results show that bl-CV offers the most reliable performance estimates, while OOS is the most uncertain, highlighting the need for careful method selection.
Marc Ohmer, Tanja Liesch, Bastian Habbel, Benedikt Heudorfer, Mariana Gomez, Patrick Clos, Maximilian Nölscher, and Stefan Broda
Earth Syst. Sci. Data, 18, 77–95, https://doi.org/10.5194/essd-18-77-2026, https://doi.org/10.5194/essd-18-77-2026, 2026
Short summary
Short summary
We present a public dataset of weekly groundwater levels from more than 3000 wells across Germany, spanning 32 years. It combines weather data and site-specific environmental information to support forecasting groundwater changes. Three benchmark models of varying complexity show how data and modeling approaches influence predictions. This resource promotes open, reproducible research and helps guide future water management decisions.
Raoul A. Collenteur, Ezra Haaf, Mark Bakker, Tanja Liesch, Andreas Wunsch, Jenny Soonthornrangsan, Jeremy White, Nick Martin, Rui Hugman, Ed de Sousa, Didier Vanden Berghe, Xinyang Fan, Tim J. Peterson, Jānis Bikše, Antoine Di Ciacca, Xinyue Wang, Yang Zheng, Maximilian Nölscher, Julian Koch, Raphael Schneider, Nikolas Benavides Höglund, Sivarama Krishna Reddy Chidepudi, Abel Henriot, Nicolas Massei, Abderrahim Jardani, Max Gustav Rudolph, Amir Rouhani, J. Jaime Gómez-Hernández, Seifeddine Jomaa, Anna Pölz, Tim Franken, Morteza Behbooei, Jimmy Lin, and Rojin Meysami
Hydrol. Earth Syst. Sci., 28, 5193–5208, https://doi.org/10.5194/hess-28-5193-2024, https://doi.org/10.5194/hess-28-5193-2024, 2024
Short summary
Short summary
We show the results of the 2022 Groundwater Time Series Modelling Challenge; 15 teams applied data-driven models to simulate hydraulic heads, and three model groups were identified: lumped, machine learning, and deep learning. For all wells, reasonable performance was obtained by at least one team from each group. There was not one team that performed best for all wells. In conclusion, the challenge was a successful initiative to compare different models and learn from each other.
Andreas Wunsch, Tanja Liesch, and Nico Goldscheider
Hydrol. Earth Syst. Sci., 28, 2167–2178, https://doi.org/10.5194/hess-28-2167-2024, https://doi.org/10.5194/hess-28-2167-2024, 2024
Short summary
Short summary
Seasons have a strong influence on groundwater levels, but relationships are complex and partly unknown. Using data from wells in Germany and an explainable machine learning approach, we showed that summer precipitation is the key factor that controls the severeness of a low-water period in fall; high summer temperatures do not per se cause stronger decreases. Preceding winters have only a minor influence on such low-water periods in general.
Benedikt Heudorfer, Tanja Liesch, and Stefan Broda
Hydrol. Earth Syst. Sci., 28, 525–543, https://doi.org/10.5194/hess-28-525-2024, https://doi.org/10.5194/hess-28-525-2024, 2024
Short summary
Short summary
We build a neural network to predict groundwater levels from monitoring wells. We predict all wells at the same time, by learning the differences between wells with static features, making it an entity-aware global model. This works, but we also test different static features and find that the model does not use them to learn exactly how the wells are different, but only to uniquely identify them. As this model class is not actually entity aware, we suggest further steps to make it so.
Guillaume Cinkus, Naomi Mazzilli, Hervé Jourde, Andreas Wunsch, Tanja Liesch, Nataša Ravbar, Zhao Chen, and Nico Goldscheider
Hydrol. Earth Syst. Sci., 27, 2397–2411, https://doi.org/10.5194/hess-27-2397-2023, https://doi.org/10.5194/hess-27-2397-2023, 2023
Short summary
Short summary
The Kling–Gupta Efficiency (KGE) is a performance criterion extensively used to evaluate hydrological models. We conduct a critical study on the KGE and its variant to examine counterbalancing errors. Results show that, when assessing a simulation, concurrent over- and underestimation of discharge can lead to an overall higher criterion score without an associated increase in model relevance. We suggest that one carefully choose performance criteria and use scaling factors.
Guillaume Cinkus, Andreas Wunsch, Naomi Mazzilli, Tanja Liesch, Zhao Chen, Nataša Ravbar, Joanna Doummar, Jaime Fernández-Ortega, Juan Antonio Barberá, Bartolomé Andreo, Nico Goldscheider, and Hervé Jourde
Hydrol. Earth Syst. Sci., 27, 1961–1985, https://doi.org/10.5194/hess-27-1961-2023, https://doi.org/10.5194/hess-27-1961-2023, 2023
Short summary
Short summary
Numerous modelling approaches can be used for studying karst water resources, which can make it difficult for a stakeholder or researcher to choose the appropriate method. We conduct a comparison of two widely used karst modelling approaches: artificial neural networks (ANNs) and reservoir models. Results show that ANN models are very flexible and seem great for reproducing high flows. Reservoir models can work with relatively short time series and seem to accurately reproduce low flows.
Marc Ohmer, Tanja Liesch, and Andreas Wunsch
Hydrol. Earth Syst. Sci., 26, 4033–4053, https://doi.org/10.5194/hess-26-4033-2022, https://doi.org/10.5194/hess-26-4033-2022, 2022
Short summary
Short summary
We present a data-driven approach to select optimal locations for groundwater monitoring wells. The applied approach can optimize the number of wells and their location for a network reduction (by ranking wells in order of their information content and reducing redundant) and extension (finding sites with great information gain) or both. It allows us to include a cost function to account for more/less suitable areas for new wells and can help to obtain maximum information content for a budget.
Andreas Wunsch, Tanja Liesch, Guillaume Cinkus, Nataša Ravbar, Zhao Chen, Naomi Mazzilli, Hervé Jourde, and Nico Goldscheider
Hydrol. Earth Syst. Sci., 26, 2405–2430, https://doi.org/10.5194/hess-26-2405-2022, https://doi.org/10.5194/hess-26-2405-2022, 2022
Short summary
Short summary
Modeling complex karst water resources is difficult enough, but often there are no or too few climate stations available within or close to the catchment to deliver input data for modeling purposes. We apply image recognition algorithms to time-distributed, spatially gridded meteorological data to simulate karst spring discharge. Our models can also learn the approximate catchment location of a spring independently.
Cited articles
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, in: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), arXiv [preprint], https://doi.org/10.48550/arXiv.1603.04467, 2016. a
Acuña Espinoza, E., Loritz, R., Kratzert, F., Klotz, D., Gauch, M., Álvarez Chaves, M., and Ehret, U.: Analyzing the generalization capabilities of a hybrid hydrological model for extrapolation to extreme events, Hydrol. Earth Syst. Sci., 29, 1277–1294, https://doi.org/10.5194/hess-29-1277-2025, 2025. a
Bandara, K., Bergmeir, C., and Smyl, S.: Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach, Expert Syst. Appl., 140, 112896, https://doi.org/10.1016/j.eswa.2019.112896, 2020. a, b
Baste, S., Klotz, D., Acuña Espinoza, E., Bardossy, A., and Loritz, R.: Unveiling the limits of deep learning models in hydrological extrapolation tasks, Hydrol. Earth Syst. Sci., 29, 5871–5891, https://doi.org/10.5194/hess-29-5871-2025, 2025. a, b
Chidepudi, S. K. R., Massei, N., Jardani, A., Dieppois, B., Henriot, A., and Fournier, M.: Training deep learning models with a multi-station approach and static aquifer attributes for groundwater level simulation: what is the best way to leverage regionalised information?, Hydrol. Earth Syst. Sci., 29, 841–861, https://doi.org/10.5194/hess-29-841-2025, 2025. a, b, c, d, e, f, g, h, i
Chollet, F.: Keras, GitHub [code], https://github.com/fchollet/keras (last access: 14 April 2026), 2015. a
Chu, H., Bian, J., Lang, Q., Sun, X., and Wang, Z.: Daily Groundwater Level Prediction and Uncertainty Using LSTM Coupled with PMI and Bootstrap Incorporating Teleconnection Patterns Information, Sustainability, 14, 11598, https://doi.org/10.3390/su141811598, 2022. a, b
Collenteur, R. A., Haaf, E., Bakker, M., Liesch, T., Wunsch, A., Soonthornrangsan, J., White, J., Martin, N., Hugman, R., de Sousa, E., Vanden Berghe, D., Fan, X., Peterson, T. J., Bikše, J., Di Ciacca, A., Wang, X., Zheng, Y., Nölscher, M., Koch, J., Schneider, R., Benavides Höglund, N., Krishna Reddy Chidepudi, S., Henriot, A., Massei, N., Jardani, A., Rudolph, M. G., Rouhani, A., Gómez-Hernández, J. J., Jomaa, S., Pölz, A., Franken, T., Behbooei, M., Lin, J., and Meysami, R.: Data-driven modelling of hydraulic-head time series: results and lessons learned from the 2022 Groundwater Time Series Modelling Challenge, Hydrol. Earth Syst. Sci., 28, 5193–5208, https://doi.org/10.5194/hess-28-5193-2024, 2024. a
Gomez, M., Nölscher, M., Hartmann, A., and Broda, S.: Assessing groundwater level modelling using a 1-D convolutional neural network (CNN): linking model performances to geospatial and time series features, Hydrol. Earth Syst. Sci., 28, 4407–4425, https://doi.org/10.5194/hess-28-4407-2024, 2024. a, b, c
Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant, T. E.: Array programming with NumPy, Nature, 585, 357–362, https://doi.org/10.1038/s41586-020-2649-2, 2020. a
Hauswirth, S. M., Bierkens, M. F., Beijk, V., and Wanders, N.: The potential of data driven approaches for quantifying hydrological extremes, Adv. Water Resour., 155, 104017, https://doi.org/10.1016/j.advwatres.2021.104017, 2021. a
Hunter, J. D.: Matplotlib: A 2D Graphics Environment, Comput. Sci. Eng., 9, 90–95, https://doi.org/10.1109/MCSE.2007.55, 2007. a
Kratzert, F., Klotz, D., Herrnegger, M., Sampson, A. K., Hochreiter, S., and Nearing, G. S.: Toward Improved Predictions in Ungauged Basins: Exploiting the Power of Machine Learning, Water Resour. Res., 55, 11344–11354, https://doi.org/10.1029/2019wr026065, 2019a. a
Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and Nearing, G.: Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets, Hydrol. Earth Syst. Sci., 23, 5089–5110, https://doi.org/10.5194/hess-23-5089-2019, 2019b. a, b, c
Kratzert, F., Gauch, M., Nearing, G., Hochreiter, S., and Klotz, D.: Niederschlags-Abfluss-Modellierung mit Long Short-Term Memory (LSTM), Österreichische Wasser- und Abfallwirtschaft, 73, 270–280, https://doi.org/10.1007/s00506-021-00767-z, 2021. a
Kratzert, F., Gauch, M., Klotz, D., and Nearing, G.: HESS Opinions: Never train a Long Short-Term Memory (LSTM) network on a single basin, Hydrol. Earth Syst. Sci., 28, 4187–4201, https://doi.org/10.5194/hess-28-4187-2024, 2024. a, b, c, d
Kunz, S., Schulz, A., Wetzel, M., Nölscher, M., Chiaburu, T., Biessmann, F., and Broda, S.: Towards a global spatial machine learning model for seasonal groundwater level predictions in Germany, Hydrol. Earth Syst. Sci., 29, 3405–3433, https://doi.org/10.5194/hess-29-3405-2025, 2025. a, b, c
Lees, T., Reece, S., Kratzert, F., Klotz, D., Gauch, M., De Bruijn, J., Kumar Sahu, R., Greve, P., Slater, L., and Dadson, S. J.: Hydrological concept formation inside long short-term memory (LSTM) networks, Hydrol. Earth Syst. Sci., 26, 3079–3101, https://doi.org/10.5194/hess-26-3079-2022, 2022. a
Ma, K., Feng, D., Lawson, K., Tsai, W., Liang, C., Huang, X., Sharma, A., and Shen, C.: Transferring Hydrologic Data Across Continents – Leveraging Data‐Rich Regions to Improve Hydrologic Prediction in Data‐Sparse Regions, Water Resour. Res., 57, https://doi.org/10.1029/2020wr028600, 2021. a, b
Martel, J.-L., Arsenault, R., Turcotte, R., Castañeda-Gonzalez, M., Brissette, F., Armstrong, W., Mailhot, E., Pelletier-Dumont, J., Lachance-Cloutier, S., Rondeau-Genesse, G., and Caron, L.-P.: Exploring the ability of LSTM-based hydrological models to simulate streamflow time series for flood frequency analysis, Hydrol. Earth Syst. Sci., 29, 4951–4968, https://doi.org/10.5194/hess-29-4951-2025, 2025. a
Mbouopda, M. F., Guyet, T., Labroche, N., and Henriot, A.: Experimental study of time series forecasting methods for groundwater level prediction, arXiv [preprint], https://doi.org/10.48550/arXiv.2209.13927, 2022. a, b
McKinney, W.: Data Structures for Statistical Computing in Python, in: Proceedings of the 9th Python in Science Conference, edited by: van der Walt, S. and Millman, J., SciPy, Austin, Texas, 56–61, https://doi.org/10.25080/Majora-92bf1922-00a, 2010. a
Nayak, P. C., Rao, Y. R. S., and Sudheer, K. P.: Groundwater Level Forecasting in a Shallow Aquifer Using Artificial Neural Network Approach, Water Resour. Manag., 20, 77–90, https://doi.org/10.1007/s11269-006-4007-z, 2006. a
Nearing, G., Cohen, D., Dube, V., Gauch, M., Gilon, O., Harrigan, S., Hassidim, A., Klotz, D., Kratzert, F., Metzger, A., Nevo, S., Pappenberger, F., Prudhomme, C., Shalev, G., Shenzis, S., Tekalign, T. Y., Weitzner, D., and Matias, Y.: Global prediction of extreme floods in ungauged watersheds, Nature, 627, 559–563, https://doi.org/10.1038/s41586-024-07145-1, 2024. a
Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D., Frame, J. M., Prieto, C., and Gupta, H. V.: What Role Does Hydrological Science Play in the Age of Machine Learning?, Water Resour. Res., 57, https://doi.org/10.1029/2020wr028091, 2021. a
Ohmer, M.: GEMS-GER: A Machine Learning Benchmark Dataset of Long-Term Groundwater Levels in Germany with Meteorological Forcings and Site-Specific Environmental Features, Zenodo [data set], https://doi.org/10.5281/zenodo.16736908, 2025. a
Ohmer, M. and Liesch, T.: singlewell-vs-global-gwl, Zenodo [data set], https://doi.org/10.5281/zenodo.19453511, 2025. a
Ohmer, M., Liesch, T., Habbel, B., Heudorfer, B., Gomez, M., Clos, P., Nölscher, M., and Broda, S.: GEMS-GER: a machine learning benchmark dataset of long-term groundwater levels in Germany with meteorological forcings and site-specific environmental features, Earth Syst. Sci. Data, 18, 77–95, https://doi.org/10.5194/essd-18-77-2026, 2026. a, b, c, d, e, f, g, h, i, j
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Édouard Duchesnay: Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res., 12, 2825–2830, 2011. a
Tran, V. N., Nguyen, T. V., Kim, J., and Ivanov, V. Y.: Technical note: Does Multiple Basin Training Strategy Guarantee Superior Machine Learning Performance for Streamflow Predictions in Gaged Basins?, EGUsphere [preprint], https://doi.org/10.5194/egusphere-2025-769, 2025. a, b
van Rossum, G.: Python Programming Language, https://www.python.org/ (last access: 14 April 2026), 1995. a
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C. J., Polat, İ., Feng, Y., Moore, E. W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa, F., and van Mulbregt, P.: SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python, Nat. Methods, 17, 261–272, https://doi.org/10.1038/s41592-019-0686-2, 2020. a
Wunsch, A., Liesch, T., and Broda, S.: Groundwater level forecasting with artificial neural networks: a comparison of long short-term memory (LSTM), convolutional neural networks (CNNs), and non-linear autoregressive networks with exogenous input (NARX), Hydrol. Earth Syst. Sci., 25, 1671–1687, https://doi.org/10.5194/hess-25-1671-2021, 2021. a, b
Wunsch, A., Liesch, T., and Broda, S.: Feature-based Groundwater Hydrograph Clustering Using Unsupervised Self-Organizing Map-Ensembles, Water Resour. Manag., 36, 39–54, https://doi.org/10.1007/s11269-021-03006-y, 2022. a
Yu, Q., Tolson, B. A., Shen, H., Han, M., Mai, J., and Lin, J.: Enhancing long short-term memory (LSTM)-based streamflow prediction with a spatially distributed approach, Hydrol. Earth Syst. Sci., 28, 2107–2122, https://doi.org/10.5194/hess-28-2107-2024, 2024. a, b
Zhou, Y., Zhang, Q., Bai, G., Zhao, H., Shuai, G., Cui, Y., and Shao, J.: Groundwater dynamics clustering and prediction based on grey relational analysis and LSTM model: A case study in Beijing Plain, China, Journal of Hydrology: Regional Studies, 56, 102011, https://doi.org/10.1016/j.ejrh.2024.102011, 2024. a, b, c, d, e
Short summary
We compared global vs. local deep learning models for groundwater level prediction using ~3,000 wells across Germany. Unlike surface water, groundwater is complex and data-scarce. Results: global models show no systematic accuracy advantage over local ones. Data similarity matters more than quantity for better predictions. Successful groundwater modeling requires strategies tailored to these unique complexities, not just larger datasets.
We compared global vs. local deep learning models for groundwater level prediction using ~3,000...