Articles | Volume 29, issue 21 
            
                
                    
            
            
            https://doi.org/10.5194/hess-29-5871-2025
                    © Author(s) 2025. This work is distributed under 
the Creative Commons Attribution 4.0 License.
                the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/hess-29-5871-2025
                    © Author(s) 2025. This work is distributed under 
the Creative Commons Attribution 4.0 License.
                the Creative Commons Attribution 4.0 License.
Unveiling the limits of deep learning models in hydrological extrapolation tasks
                                            Institute of Water and Environment, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
                                        
                                    Daniel Klotz
                                            Interdisciplinary Transformation University Austria, Linz, Austria
                                        
                                    
                                            Google Research, Vienna, Austria
                                        
                                    Eduardo Acuña Espinoza
                                            Institute of Water and Environment, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
                                        
                                    Andras Bardossy
                                            Institut für Wasser- und Umweltsystemmodellierung, Universität Stuttgart, Stuttgart, Germany
                                        
                                    Ralf Loritz
                                            Institute of Water and Environment, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
                                        
                                    Related authors
No articles found.
Maria Staudinger, Anna Herzog, Ralf Loritz, Tobias Houska, Sandra Pool, Diana Spieler, Paul D. Wagner, Juliane Mai, Jens Kiesel, Stephan Thober, Björn Guse, and Uwe Ehret
                                    Hydrol. Earth Syst. Sci., 29, 5005–5029, https://doi.org/10.5194/hess-29-5005-2025, https://doi.org/10.5194/hess-29-5005-2025, 2025
                                    Short summary
                                    Short summary
                                            
                                                Three process-based and four data-driven hydrological models are compared using different training data. We found that process-based models perform better with small datasets but stop learning soon, while data-driven models learn longer. The study highlights the importance of memory in data and the impact of different data sampling methods on model performance. The direct comparison of these models is novel and provides a clear understanding of their performance under various data conditions.
                                            
                                            
                                        Judith Nijzink, Ralf Loritz, Laurent Gourdol, Davide Zoccatelli, Jean François Iffly, and Laurent Pfister
                                        Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2024-482, https://doi.org/10.5194/essd-2024-482, 2025
                                    Preprint under review for ESSD 
                                    Short summary
                                    Short summary
                                            
                                                The CAMELS-LUX dataset (Catchment Attributes and MEteorology for Large-sample Studies – LUXembourg) contains hydrologic, meteorologic and thunderstorm formation relevant atmospheric time series of 56 Luxembourgish catchments (2004–2021). These catchments are characterized by a large physiographic variety on a relatively small scale in a homogeneous climate. The dataset can be applied for (regional) hydrological analyses.
                                            
                                            
                                        Manuel Álvarez Chaves, Eduardo Acuña Espinoza, Uwe Ehret, and Anneli Guthke
                                        EGUsphere, https://doi.org/10.5194/egusphere-2025-1699, https://doi.org/10.5194/egusphere-2025-1699, 2025
                                    Short summary
                                    Short summary
                                            
                                                This study evaluates hybrid hydrological models that combine physics-based and data-driven components, using Information Theory to measure their relative contributions. When testing conceptual models with LSTMs that adjust parameters over time, we found performance primarily comes from the data-driven component, with physics constraints adding minimal value. We propose a quantitative tool to analyse this behaviour and suggest a workflow for diagnosing hybrid models.
                                            
                                            
                                        Martin Gauch, Frederik Kratzert, Daniel Klotz, Grey Nearing, Deborah Cohen, and Oren Gilon
                                        EGUsphere, https://doi.org/10.5194/egusphere-2025-1224, https://doi.org/10.5194/egusphere-2025-1224, 2025
                                    Short summary
                                    Short summary
                                            
                                                Missing input data are one of the most common challenges when building deep learning hydrological models. We present and analyze different methods that can produce predictions when certain inputs are missing during training or inference. Our proposed strategies provide high accuracy while allowing for more flexible data handling and being robust to outages in operational scenarios.
                                            
                                            
                                        Eduardo Acuña Espinoza, Frederik Kratzert, Daniel Klotz, Martin Gauch, Manuel Álvarez Chaves, Ralf Loritz, and Uwe Ehret
                                    Hydrol. Earth Syst. Sci., 29, 1749–1758, https://doi.org/10.5194/hess-29-1749-2025, https://doi.org/10.5194/hess-29-1749-2025, 2025
                                    Short summary
                                    Short summary
                                            
                                                Long short-term memory (LSTM) networks have demonstrated state-of-the-art performance for rainfall-runoff hydrological modelling. However, most studies focus on predictions at a daily scale, limiting the benefits of sub-daily (e.g. hourly) predictions in applications like flood forecasting. In this study, we introduce a new architecture, multi-frequency LSTM (MF-LSTM), designed to use inputs of various temporal frequencies to produce sub-daily (e.g. hourly) predictions at a moderate computational cost.
                                            
                                            
                                        Abbas El Hachem, Jochen Seidel, and András Bárdossy
                                    Hydrol. Earth Syst. Sci., 29, 1335–1357, https://doi.org/10.5194/hess-29-1335-2025, https://doi.org/10.5194/hess-29-1335-2025, 2025
                                    Short summary
                                    Short summary
                                            
                                                The influence of climate change on areal precipitation extremes is examined. After an upscaling of reference observations, the climate model data are corrected, and a downscaling to a finer spatial scale is done. For different temporal durations and spatial scales, areal precipitation extremes are derived. The final result indicates an increase in the expected rainfall depth compared to reference values. However, the increase varied with the duration and area size.
                                            
                                            
                                        Eduardo Acuña Espinoza, Ralf Loritz, Frederik Kratzert, Daniel Klotz, Martin Gauch, Manuel Álvarez Chaves, and Uwe Ehret
                                    Hydrol. Earth Syst. Sci., 29, 1277–1294, https://doi.org/10.5194/hess-29-1277-2025, https://doi.org/10.5194/hess-29-1277-2025, 2025
                                    Short summary
                                    Short summary
                                            
                                                Data-driven techniques have shown the potential to outperform process-based models in rainfall–runoff simulations. Hybrid models, combining both approaches, aim to enhance accuracy and maintain interpretability. Expanding the set of test cases to evaluate hybrid models under different conditions, we test their generalization capabilities for extreme hydrological events.
                                            
                                            
                                        Daniel Klotz, Peter Miersch, Thiago V. M. do Nascimento, Fabrizio Fenicia, Martin Gauch, and Jakob Zscheischler
                                        Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2024-450, https://doi.org/10.5194/essd-2024-450, 2025
                                    Revised manuscript under review for ESSD 
                                    Short summary
                                    Short summary
                                            
                                                Data availability is central to hydrological science. It is the basis for advancing our understanding of hydrological processes, building prediction models, and anticipatory water management. We present a data-driven daily runoff reconstruction product for natural streamflow. We name it EARLS: European aggregated reconstruction for large-sample studies. The reconstructions represent daily simulations of natural streamflow across Europe and cover the period from 1953 to 2020.
                                            
                                            
                                        Ashish Manoj J, Ralf Loritz, Hoshin Gupta, and Erwin Zehe
                                        Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2024-375, https://doi.org/10.5194/hess-2024-375, 2024
                                    Revised manuscript accepted for HESS 
                                    Short summary
                                    Short summary
                                            
                                                Traditional hydrological models typically operate in a forward mode, simulating streamflow and other catchment fluxes based on precipitation input. In this study, we explored the possibility of reversing this process—inferring precipitation from streamflow data—to improve flood event modelling. We then used the generated precipitation series to run hydrological models, resulting in more accurate estimates of streamflow and soil moisture.
                                            
                                            
                                        Ralf Loritz, Alexander Dolich, Eduardo Acuña Espinoza, Pia Ebeling, Björn Guse, Jonas Götte, Sibylle K. Hassler, Corina Hauffe, Ingo Heidbüchel, Jens Kiesel, Mirko Mälicke, Hannes Müller-Thomy, Michael Stölzle, and Larisa Tarasova
                                    Earth Syst. Sci. Data, 16, 5625–5642, https://doi.org/10.5194/essd-16-5625-2024, https://doi.org/10.5194/essd-16-5625-2024, 2024
                                    Short summary
                                    Short summary
                                            
                                                The CAMELS-DE dataset features data from 1582 streamflow gauges across Germany, with records spanning from 1951 to 2020. This comprehensive dataset, which includes time series of up to 70 years (median 46 years), enables advanced research on water flow and environmental trends and supports the development of hydrological models.
                                            
                                            
                                        Abbas El Hachem, Jochen Seidel, Tess O'Hara, Roberto Villalobos Herrera, Aart Overeem, Remko Uijlenhoet, András Bárdossy, and Lotte de Vos
                                    Hydrol. Earth Syst. Sci., 28, 4715–4731, https://doi.org/10.5194/hess-28-4715-2024, https://doi.org/10.5194/hess-28-4715-2024, 2024
                                    Short summary
                                    Short summary
                                            
                                                This study presents an overview of open-source quality control (QC) algorithms for rainfall data from personal weather stations (PWSs). The methodology and usability along technical and operational guidelines for using every QC algorithm are presented. All three QC algorithms are available for users to explore in the OpenSense sandbox. They were applied in a case study using PWS data from the Amsterdam region in the Netherlands.  The results highlight the necessity for data quality control.
                                            
                                            
                                        Amy C. Green, Chris Kilsby, and András Bárdossy
                                    Hydrol. Earth Syst. Sci., 28, 4539–4558, https://doi.org/10.5194/hess-28-4539-2024, https://doi.org/10.5194/hess-28-4539-2024, 2024
                                    Short summary
                                    Short summary
                                            
                                                Weather radar is a crucial tool in rainfall estimation, but radar rainfall estimates are subject to many error sources, with the true rainfall field unknown. A flexible model for simulating errors relating to the radar rainfall estimation process is implemented, inverting standard processing methods. This flexible and efficient model performs well in generating realistic weather radar images visually for a large range of event types.
                                            
                                            
                                        Frederik Kratzert, Martin Gauch, Daniel Klotz, and Grey Nearing
                                    Hydrol. Earth Syst. Sci., 28, 4187–4201, https://doi.org/10.5194/hess-28-4187-2024, https://doi.org/10.5194/hess-28-4187-2024, 2024
                                    Short summary
                                    Short summary
                                            
                                                Recently, a special type of neural-network architecture became increasingly popular in hydrology literature. However, in most applications, this model was applied as a one-to-one replacement for hydrology models without adapting or rethinking the experimental setup. In this opinion paper, we show how this is almost always a bad decision and how using these kinds of models requires the use of large-sample hydrology data sets.
                                            
                                            
                                        Andreas Auer, Martin Gauch, Frederik Kratzert, Grey Nearing, Sepp Hochreiter, and Daniel Klotz
                                    Hydrol. Earth Syst. Sci., 28, 4099–4126, https://doi.org/10.5194/hess-28-4099-2024, https://doi.org/10.5194/hess-28-4099-2024, 2024
                                    Short summary
                                    Short summary
                                            
                                                This work examines the impact of temporal and spatial information on the uncertainty estimation of streamflow forecasts. The study emphasizes the importance of data updates and global information for precise uncertainty estimates. We use conformal prediction to show that recent data enhance the estimates, even if only available infrequently. Local data yield reasonable average estimations but fall short for peak-flow events. The use of global data significantly improves these predictions.
                                            
                                            
                                        Daniel Klotz, Martin Gauch, Frederik Kratzert, Grey Nearing, and Jakob Zscheischler
                                    Hydrol. Earth Syst. Sci., 28, 3665–3673, https://doi.org/10.5194/hess-28-3665-2024, https://doi.org/10.5194/hess-28-3665-2024, 2024
                                    Short summary
                                    Short summary
                                            
                                                The evaluation of model performance is essential for hydrological modeling. Using performance criteria requires a deep understanding of their properties. We focus on a counterintuitive aspect of the Nash–Sutcliffe efficiency (NSE) and show that if we divide the data into multiple parts, the overall performance can be higher than all the evaluations of the subsets. Although this follows from the definition of the NSE, the resulting behavior can have unintended consequences in practice.
                                            
                                            
                                        Eduardo Acuña Espinoza, Ralf Loritz, Manuel Álvarez Chaves, Nicole Bäuerle, and Uwe Ehret
                                    Hydrol. Earth Syst. Sci., 28, 2705–2719, https://doi.org/10.5194/hess-28-2705-2024, https://doi.org/10.5194/hess-28-2705-2024, 2024
                                    Short summary
                                    Short summary
                                            
                                                Hydrological hybrid models promise to merge the performance of deep learning methods with the interpretability of process-based models. One hybrid approach is the dynamic parameterization of conceptual models using long short-term memory (LSTM) networks. We explored this method to evaluate the effect of the flexibility given by LSTMs on the process-based part.
                                            
                                            
                                        András Bárdossy and Faizan Anwar
                                    Hydrol. Earth Syst. Sci., 27, 1987–2000, https://doi.org/10.5194/hess-27-1987-2023, https://doi.org/10.5194/hess-27-1987-2023, 2023
                                    Short summary
                                    Short summary
                                            
                                                This study demonstrates the fact that the large river flows forecasted by the models show an underestimation that is inversely related to the number of locations where precipitation is recorded, which is independent of the model. The higher the number of points where the amount of precipitation is recorded, the better the estimate of the river flows.
                                            
                                            
                                        Abbas El Hachem, Jochen Seidel, Florian Imbery, Thomas Junghänel, and András Bárdossy
                                    Hydrol. Earth Syst. Sci., 26, 6137–6146, https://doi.org/10.5194/hess-26-6137-2022, https://doi.org/10.5194/hess-26-6137-2022, 2022
                                    Short summary
                                    Short summary
                                            
                                                Through this work, a methodology to identify outliers in intense precipitation data was presented. The results show the presence of several suspicious observations that strongly differ from their surroundings. Many identified outliers did not have unusually high values but disagreed with their neighboring values at the corresponding time steps. Weather radar and discharge data were used to distinguish between single events and false observations.
                                            
                                            
                                        Grey S. Nearing, Daniel Klotz, Jonathan M. Frame, Martin Gauch, Oren Gilon, Frederik Kratzert, Alden Keefe Sampson, Guy Shalev, and Sella Nevo
                                    Hydrol. Earth Syst. Sci., 26, 5493–5513, https://doi.org/10.5194/hess-26-5493-2022, https://doi.org/10.5194/hess-26-5493-2022, 2022
                                    Short summary
                                    Short summary
                                            
                                                When designing flood forecasting models, it is necessary to use all available data to achieve the most accurate predictions possible. This manuscript explores two basic ways of ingesting near-real-time streamflow data into machine learning streamflow models. The point we want to make is that when working in the context of machine learning (instead of traditional hydrology models that are based on 
bio-geophysics), it is not necessary to use complex statistical methods for injecting sparse data.
                                            
                                            
                                        Ralf Loritz, Maoya Bassiouni, Anke Hildebrandt, Sibylle K. Hassler, and Erwin Zehe
                                    Hydrol. Earth Syst. Sci., 26, 4757–4771, https://doi.org/10.5194/hess-26-4757-2022, https://doi.org/10.5194/hess-26-4757-2022, 2022
                                    Short summary
                                    Short summary
                                            
                                                In this study, we combine a deep-learning approach that predicts sap flow with a hydrological model to improve soil moisture and transpiration estimates at the catchment scale. Our results highlight that hybrid-model approaches, combining machine learning with physically based models, are a promising way to improve our ability to make hydrological predictions.
                                            
                                            
                                        Juliane Mai, Hongren Shen, Bryan A. Tolson, Étienne Gaborit, Richard Arsenault, James R. Craig, Vincent Fortin, Lauren M. Fry, Martin Gauch, Daniel Klotz, Frederik Kratzert, Nicole O'Brien, Daniel G. Princz, Sinan Rasiya Koya, Tirthankar Roy, Frank Seglenieks, Narayan K. Shrestha, André G. T. Temgoua, Vincent Vionnet, and Jonathan W. Waddell
                                    Hydrol. Earth Syst. Sci., 26, 3537–3572, https://doi.org/10.5194/hess-26-3537-2022, https://doi.org/10.5194/hess-26-3537-2022, 2022
                                    Short summary
                                    Short summary
                                            
                                                Model intercomparison studies are carried out to test various models and compare the quality of their outputs over the same domain. In this study, 13 diverse model setups using the same input data are evaluated over the Great Lakes region. Various model outputs – such as streamflow, evaporation, soil moisture, and amount of snow on the ground – are compared using standardized methods and metrics. The basin-wise model outputs and observations are made available through an interactive website.
                                            
                                            
                                        Thomas Lees, Steven Reece, Frederik Kratzert, Daniel Klotz, Martin Gauch, Jens De Bruijn, Reetik Kumar Sahu, Peter Greve, Louise Slater, and Simon J. Dadson
                                    Hydrol. Earth Syst. Sci., 26, 3079–3101, https://doi.org/10.5194/hess-26-3079-2022, https://doi.org/10.5194/hess-26-3079-2022, 2022
                                    Short summary
                                    Short summary
                                            
                                                Despite the accuracy of deep learning rainfall-runoff models, we are currently uncertain of what these models have learned. In this study we explore the internals of one deep learning architecture and demonstrate that the model learns about intermediate hydrological stores of soil moisture and snow water, despite never having seen data about these processes during training. Therefore, we find evidence that the deep learning approach learns a physically realistic mapping from inputs to outputs.
                                            
                                            
                                        Dhiraj Raj Gyawali and András Bárdossy
                                    Hydrol. Earth Syst. Sci., 26, 3055–3077, https://doi.org/10.5194/hess-26-3055-2022, https://doi.org/10.5194/hess-26-3055-2022, 2022
                                    Short summary
                                    Short summary
                                            
                                                In this study, different extensions of the degree-day model were calibrated on snow-cover distribution against freely available satellite snow-cover images. The calibrated models simulated the distribution very well in Baden-Württemberg (Germany) and Switzerland. In addition to reliable identification of snow cover, the melt outputs from the calibrated models were able to improve the flow simulations in different catchments in the study region.
                                            
                                            
                                        Daniel Klotz, Frederik Kratzert, Martin Gauch, Alden Keefe Sampson, Johannes Brandstetter, Günter Klambauer, Sepp Hochreiter, and Grey Nearing
                                    Hydrol. Earth Syst. Sci., 26, 1673–1693, https://doi.org/10.5194/hess-26-1673-2022, https://doi.org/10.5194/hess-26-1673-2022, 2022
                                    Short summary
                                    Short summary
                                            
                                                This contribution evaluates distributional runoff predictions from deep-learning-based approaches. We propose a benchmarking setup and establish four strong baselines. The results show that accurate, precise, and reliable uncertainty estimation can be achieved with deep learning.
                                            
                                            
                                        Alexander Sternagel, Ralf Loritz, Brian Berkowitz, and Erwin Zehe
                                    Hydrol. Earth Syst. Sci., 26, 1615–1629, https://doi.org/10.5194/hess-26-1615-2022, https://doi.org/10.5194/hess-26-1615-2022, 2022
                                    Short summary
                                    Short summary
                                            
                                                We present a (physically based) Lagrangian approach to simulate diffusive mixing processes on the pore scale beyond perfectly mixed conditions. Results show the feasibility of the approach for reproducing measured mixing times and concentrations of isotopes over pore sizes and that typical shapes of breakthrough curves (normally associated with non-uniform transport in heterogeneous soils) may also occur as a result of imperfect subscale mixing in a macroscopically homogeneous soil matrix.
                                            
                                            
                                        Erwin Zehe, Ralf Loritz, Yaniv Edery, and Brian Berkowitz
                                    Hydrol. Earth Syst. Sci., 25, 5337–5353, https://doi.org/10.5194/hess-25-5337-2021, https://doi.org/10.5194/hess-25-5337-2021, 2021
                                    Short summary
                                    Short summary
                                            
                                                This study uses the concepts of entropy and work to quantify and explain the emergence of preferential flow and transport in heterogeneous saturated porous media. We found that the downstream concentration of solutes in preferential pathways implies a downstream declining entropy in the transverse distribution of solute transport pathways. Preferential flow patterns with lower entropies emerged within media of higher heterogeneity – a stronger self-organization despite a higher randomness.
                                            
                                            
                                        Jieru Yan, Fei Li, András Bárdossy, and Tao Tao
                                    Hydrol. Earth Syst. Sci., 25, 3819–3835, https://doi.org/10.5194/hess-25-3819-2021, https://doi.org/10.5194/hess-25-3819-2021, 2021
                                    Short summary
                                    Short summary
                                            
                                                Accurate spatial precipitation estimates are important in various fields. An approach to simulate spatial rainfall fields conditioned on radar and rain gauge data is proposed. Unlike the commonly used Kriging methods, which provide a Kriged mean field, the output of the proposed approach is an ensemble of estimates that represents the estimation uncertainty. The approach is robust to nonlinear error in radar estimates and is shown to have some advantages, especially when estimating the extremes.
                                            
                                            
                                        Frederik Kratzert, Daniel Klotz, Sepp Hochreiter, and Grey S. Nearing
                                    Hydrol. Earth Syst. Sci., 25, 2685–2703, https://doi.org/10.5194/hess-25-2685-2021, https://doi.org/10.5194/hess-25-2685-2021, 2021
                                    Short summary
                                    Short summary
                                            
                                                We investigate how deep learning models use different meteorological data sets in the task of (regional) rainfall–runoff modeling. We show that performance can be significantly improved when using different data products as input and further show how the model learns to combine those meteorological input differently across time and space. The results are carefully benchmarked against classical approaches, showing the supremacy of the presented approach.
                                            
                                            
                                        Martin Gauch, Frederik Kratzert, Daniel Klotz, Grey Nearing, Jimmy Lin, and Sepp Hochreiter
                                    Hydrol. Earth Syst. Sci., 25, 2045–2062, https://doi.org/10.5194/hess-25-2045-2021, https://doi.org/10.5194/hess-25-2045-2021, 2021
                                    Short summary
                                    Short summary
                                            
                                                We present multi-timescale Short-Term Memory (MTS-LSTM), a machine learning approach that predicts discharge at multiple timescales within one model. MTS-LSTM is significantly more accurate than the US National Water Model and computationally more efficient than an individual LSTM model per timescale. Further, MTS-LSTM can process different input variables at different timescales, which is important as the lead time of meteorological forecasts often depends on their temporal resolution.
                                            
                                            
                                        Alexander Sternagel, Ralf Loritz, Julian Klaus, Brian Berkowitz, and Erwin Zehe
                                    Hydrol. Earth Syst. Sci., 25, 1483–1508, https://doi.org/10.5194/hess-25-1483-2021, https://doi.org/10.5194/hess-25-1483-2021, 2021
                                    Short summary
                                    Short summary
                                            
                                                The key innovation of the study is a method to simulate reactive solute transport in the vadose zone within a Lagrangian framework. We extend the LAST-Model with a method to account for non-linear sorption and first-order degradation processes during unsaturated transport of reactive substances in the matrix and macropores. Model evaluations using bromide and pesticide data from irrigation experiments under different flow conditions on various timescales show the feasibility of the method.
                                            
                                            
                                        András Bárdossy, Jochen Seidel, and Abbas El Hachem
                                    Hydrol. Earth Syst. Sci., 25, 583–601, https://doi.org/10.5194/hess-25-583-2021, https://doi.org/10.5194/hess-25-583-2021, 2021
                                    Short summary
                                    Short summary
                                            
                                                In this study, the applicability of data from private weather stations (PWS) for precipitation interpolation was investigated. Due to unknown errors and biases in these observations, a two-step filter was developed that uses indicator correlations and event-based spatial precipitation patterns. The procedure was tested and cross validated for the state of Baden-Württemberg (Germany). The biggest improvement is achieved for the shortest time aggregations.
                                            
                                            
                                        Ralf Loritz, Markus Hrachowitz, Malte Neuper, and Erwin Zehe
                                    Hydrol. Earth Syst. Sci., 25, 147–167, https://doi.org/10.5194/hess-25-147-2021, https://doi.org/10.5194/hess-25-147-2021, 2021
                                    Short summary
                                    Short summary
                                            
                                                This study investigates the role and value of distributed rainfall in the runoff generation of a mesoscale catchment. We compare the performance of different hydrological models at different periods and show that a distributed model driven by distributed rainfall yields improved performances only during certain periods. We then step beyond this finding and develop a spatially adaptive model that is capable of dynamically adjusting its spatial model structure in time.
                                            
                                            
                                        Cited articles
                        
                        Acuña Espinoza, E., Loritz, R., Álvarez Chaves, M., Bäuerle, N., and Ehret, U.: To bucket or not to bucket? Analyzing the performance and interpretability of hybrid hydrological models with dynamic parameterization, Hydrol. Earth Syst. Sci., 28, 2705–2719, https://doi.org/10.5194/hess-28-2705-2024, 2024a. a, b
                    
                
                        
                        Acuña Espinoza, E., Kratzert, F., Klotz, D., Gauch, M., Álvarez Chaves, M., Loritz, R., and Ehret, U.: Technical note: An approach for handling multiple temporal frequencies with different input dimensions using a single LSTM cell, Hydrol. Earth Syst. Sci., 29, 1749–1758, https://doi.org/10.5194/hess-29-1749-2025, 2025a. a
                    
                
                        
                        Acuña Espinoza, E., Loritz, R., Kratzert, F., Klotz, D., Gauch, M., Álvarez Chaves, M., and Ehret, U.: Analyzing the generalization capabilities of a hybrid hydrological model for extrapolation to extreme events, Hydrol. Earth Syst. Sci., 29, 1277–1294, https://doi.org/10.5194/hess-29-1277-2025, 2025b. a, b, c, d, e, f, g, h
                    
                
                        
                        Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, https://doi.org/10.5194/hess-21-5293-2017, 2017. a, b
                    
                
                        
                        Aghakouchak, A. and Habib, E.: Application of a Conceptual Hydrologic Model in Teaching Hydrologic Processes, International Journal of Engineering Education, 26, 963–973, 2010. a
                    
                
                        
                        Balestriero, R., Pesenti, J., and LeCun, Y.: Learning in High Dimension Always Amounts to Extrapolation, arXiv [preprint], https://doi.org/10.48550/arXiv.2110.09485, 2021. a
                    
                
                        
                        Bárdossy, A. and Pegram, G.: Interpolation of precipitation under topographic influence at different time scales, Water Resources Research, 49, 4545–4565, https://doi.org/10.1002/wrcr.20307, 2013. a
                    
                
                        
                        Baste, S.: Unveiling the Limits of Deep Learning Models in Hydrological Extrapolation Tasks, Zenodo [code], https://doi.org/10.5281/zenodo.14771377, 2025. a
                    
                
                        
                        Beck, H. E., Pan, M., Lin, P., Seibert, J., van Dijk, A. I. J. M., and Wood, E. F.: Global Fully Distributed Parameter Regionalization Based on Observed Streamflow From 4,229 Headwater Catchments, Journal of Geophysical Research: Atmospheres, 125, e2019JD031485, https://doi.org/10.1029/2019JD031485, 2020. a
                    
                
                        
                        Beck, M., Pöppel, K., Spanring, M., Auer, A., Prudnikova, O., Kopp, M., Klambauer, G., Brandstetter, J., and Hochreiter, S.: xLSTM: Extended Long Short-Term Memory, arXiv [preprint], https://doi.org/10.48550/arXiv.2405.04517, 2024. a, b
                    
                
                        
                        Bergström, S.: THE HBV MODEL – its structure and applications, Swedish Meteorological and Hydrological Institute (SMHI), https://www.smhi.se/en/publications/the-hbv-model-its-structure-and-applications-1.83591 (last access: 12 January 2025), 1992. a
                    
                
                        
                        Beven, K. J., Kirkby, M. J., Freer, J. E., and Lamb, R.: A history of TOPMODEL, Hydrol. Earth Syst. Sci., 25, 527–549, https://doi.org/10.5194/hess-25-527-2021, 2021. a, b, c
                    
                
                        
                        Chen, C.-T. and Chang, W.-D.: A feedforward neural network with function shape autotuning, Neural Networks, 9, 627–641, https://doi.org/10.1016/0893-6080(96)00006-8, 1996. a, b
                    
                
                        
                        Chung, S. and Siegelmann, H.: Turing Completeness of Bounded-Precision Recurrent Neural Networks, https://proceedings.neurips.cc/paper_files/paper/2021/file/ef452c63f81d0105dd4486f775adec81-Paper.pdf (last access: 15 December 2024), 2021. a
                    
                
                        
                        Federal Department for the Environment, Transport, Energy and Communications DETEC: The Floods of 2005 in Switzerland, https://www.bafu.admin.ch/bafu/en/home/topics/natural-hazards/publications-studies/publications/the-floods-of-2005-in-switzerland.html (last access: 12 October 2024), 2005. a
                    
                
                        
                        Feng, D., Liu, J., Lawson, K., and Shen, C.: Differentiable, Learnable, Regionalized Process-Based Models With Multiphysical Outputs can Approach State-Of-The-Art Hydrologic Prediction Accuracy, Water Resources Research, 58, e2022WR032404, https://doi.org/10.1029/2022WR032404, 2022. a, b, c, d
                    
                
                        
                        Froidevaux, P., Schwanbeck, J., Weingartner, R., Chevalier, C., and Martius, O.: Flood triggering in Switzerland: the role of daily to monthly preceding precipitation, Hydrol. Earth Syst. Sci., 19, 3903–3924, https://doi.org/10.5194/hess-19-3903-2015, 2015. a
                    
                
                        
                        Global Water Partnership (GWP) and World Meteorological Organization (WMO): Integrated Flood Management Tools Series No. 20 Flood Mapping, Tech. rep., World Meteorological Organization (WMO), https://library.wmo.int/idurl/4/37083 (last access: 15 December 2024), 2013. a
                    
                
                        
                        Hochreiter, S. and Schmidhuber, J.: Long Short-Term Memory, Neural Computation, 9, 1735–1780, https://doi.org/10.1162/neco.1997.9.8.1735, 1997. a, b, c
                    
                
                        
                        Höge, M., Kauzlaric, M., Siber, R., Schönenberger, U., Horton, P., Schwanbeck, J., Floriancic, M. G., Viviroli, D., Wilhelm, S., Sikorska-Senoner, A. E., Addor, N., Brunner, M., Pool, S., Zappa, M., and Fenicia, F.: CAMELS-CH: hydro-meteorological time series and landscape attributes for 331 catchments in hydrologic Switzerland, Earth Syst. Sci. Data, 15, 5755–5784, https://doi.org/10.5194/essd-15-5755-2023, 2023. a, b, c
                    
                
                        
                        Höge, M., Kauzlaric, M., Siber, R., Schönenberger, U., Horton, P., Schwanbeck, J., Floriancic, M. G., Viviroli, D., Wilhelm, S., Sikorska-Senoner, A. E., Addor, N., Brunner, M., Pool, S., Zappa, M., and Fenicia, F.: Catchment attributes and hydro-meteorological time series for large-sample studies across hydrologic Switzerland (CAMELS-CH) (0.9), Zenodo [data set], https://doi.org/10.5281/zenodo.15025258, 2025. a
                    
                
                        
                        Houska, T., Kraft, P., Chamorro-Chavez, A., and Breuer, L.: SPOTting Model Parameters Using a Ready-Made Python Package, PLOS ONE, 10, e0145180, https://doi.org/10.1371/journal.pone.0145180, 2015. a
                    
                
                        
                        Kingma, D. P. and Ba, J.: Adam: A Method for Stochastic Optimization, arXiv [preprint], https://doi.org/10.48550/arXiv.1412.6980, 2017. a
                    
                
                        
                        Kirchner, J. W.: Characterizing nonlinear, nonstationary, and heterogeneous hydrologic behavior using ensemble rainfall–runoff analysis (ERRA): proof of concept, Hydrol. Earth Syst. Sci., 28, 4427–4454, https://doi.org/10.5194/hess-28-4427-2024, 2024. a
                    
                
                        
                        Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.: Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks, Hydrol. Earth Syst. Sci., 22, 6005–6022, https://doi.org/10.5194/hess-22-6005-2018, 2018. a
                    
                
                        
                        Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and Nearing, G.: Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets, Hydrol. Earth Syst. Sci., 23, 5089–5110, https://doi.org/10.5194/hess-23-5089-2019, 2019. a, b, c, d, e, f, g, h
                    
                
                        
                        Lees, T., Buechel, M., Anderson, B., Slater, L., Reece, S., Coxon, G., and Dadson, S. J.: Benchmarking data-driven rainfall–runoff models in Great Britain: a comparison of long short-term memory (LSTM)-based models with four lumped conceptual models, Hydrol. Earth Syst. Sci., 25, 5517–5534, https://doi.org/10.5194/hess-25-5517-2021, 2021. a, b, c
                    
                
                        
                        Loritz, R., Dolich, A., Acuña Espinoza, E., Ebeling, P., Guse, B., Götte, J., Hassler, S. K., Hauffe, C., Heidbüchel, I., Kiesel, J., Mälicke, M., Müller-Thomy, H., Stölzle, M., and Tarasova, L.: CAMELS-DE: hydro-meteorological time series and attributes for 1582 catchments in Germany, Earth Syst. Sci. Data, 16, 5625–5642, https://doi.org/10.5194/essd-16-5625-2024, 2024. a, b, c
                    
                
                        
                        MeteoSwiss, F.: Records and extremes, https://www.meteoswiss.admin.ch/climate/the-climate-of-switzerland/records-and-extremes.html (last access: 15 December 2024), 2024. a
                    
                
                        
                        Meyer, H. and Pebesma, E.: Predicting into unknown space? Estimating the area of applicability of spatial prediction models, Methods in Ecology and Evolution, 12, 1620–1633, https://doi.org/10.1111/2041-210X.13650, 2021. a
                    
                
                        
                        Nearing, G., Cohen, D., Dube, V., Gauch, M., Gilon, O., Harrigan, S., Hassidim, A., Klotz, D., Kratzert, F., Metzger, A., Nevo, S., Pappenberger, F., Prudhomme, C., Shalev, G., Shenzis, S., Tekalign, T. Y., Weitzner, D., and Matias, Y.: Global prediction of extreme floods in ungauged watersheds, Nature, 627, 559–563, https://doi.org/10.1038/s41586-024-07145-1, 2024. a, b
                    
                
                        
                        Newman, A. J., Clark, M. P., Sampson, K., Wood, A., Hay, L. E., Bock, A., Viger, R. J., Blodgett, D., Brekke, L., Arnold, J. R., Hopson, T., and Duan, Q.: Development of a large-sample watershed-scale hydrometeorological data set for the contiguous USA: data set characteristics and assessment of regional variability in hydrologic model performance, Hydrol. Earth Syst. Sci., 19, 209–223, https://doi.org/10.5194/hess-19-209-2015, 2015. a, b, c
                    
                
                        
                        Newman, A. J., Sampson, K., Clark, M., Bock, A., Viger, R., Blodgett, D., Addor, N., and Mizukami, M.: CAMELS: Catchment Attributes and MEteorology for Large-sample Studies (1.2), Zenodo [data set], https://doi.org/10.5065/D6MW2F4D, 2022. a
                    
                
                        
                        Nguyen, V. D., Merz, B., Hundecha, Y., Haberlandt, U., and Vorogushyn, S.: Comprehensive evaluation of an improved large-scale multi-site weather generator for Germany, International Journal of Climatology, 41, 4933–4956, https://doi.org/10.1002/joc.7107, 2021. a
                    
                
                        
                        Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S.: PyTorch: An Imperative Style, High-Performance Deep Learning Library, in: Advances in Neural Information Processing Systems 32, 8024–8035, Curran Associates, Inc., http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf (last access: 12 October 2024), 2019. a
                    
                
                        
                        Rakitianskaia, A. and Engelbrecht, A.: Measuring Saturation in Neural Networks, in: 2015 IEEE Symposium Series on Computational Intelligence, 1423–1430, https://doi.org/10.1109/SSCI.2015.202, 2015. a
                    
                
                        
                        Seibert, J. and Vis, M. J. P.: Teaching hydrological modeling with a user-friendly catchment-runoff-model software package, Hydrol. Earth Syst. Sci., 16, 3315–3325, https://doi.org/10.5194/hess-16-3315-2012, 2012. a, b, c, d
                    
                
                        
                        Siegelmann, H. T. and Sontag, E. D.: On the computational power of neural nets, COLT '92: Proceedings of the fifth annual workshop on Computational learning theory, https://doi.org/10.1145/130385.130432, 1992. a
                    
                
                        
                        Song, Y., Sawadekar, K., Frame, J. M., Pan, M., Clark, M., Knoben, W. J. M., Wood, A. W., Patel, T., and Shen, C.: Improving Physics-informed, Differentiable Hydrologic Models for Capturing Unseen Extreme Events, ESS Open Archive [data set], https://doi.org/10.22541/essoar.172304428.82707157/v1, 2024. a, b, c
                    
                
                        
                        Staudinger, M., Kauzlaric, M., Mas, A., Evin, G., Hingray, B., and Viviroli, D.: The role of antecedent conditions in translating precipitation events into extreme floods at the catchment scale and in a large-basin context, Nat. Hazards Earth Syst. Sci., 25, 247–265, https://doi.org/10.5194/nhess-25-247-2025, 2025. a
                    
                
                        
                        Tanrikulu, O. D., Ehret, U., Haag, I., Loritz, R., and Badde, U.: Untersuchungen zum Potenzial maschineller Lernverfahren für die hydrologische Simulation und Vorhersage am Beispiel von LSTM und LARSIM in Baden-Württemberg, Federal Institute of Hydrology, https://doi.org/10.5675/HYWA_2024.3_1, 2024. a, b
                    
                
                        
                        Viviroli, D., Zappa, M., Gurtz, J., and Weingartner, R.: An introduction to the hydrological modelling system PREVAH and its pre- and post-processing-tools, Environmental Modelling & Software, 24, 1209–1222, https://doi.org/10.1016/j.envsoft.2009.04.001, 2009. a
                    
                
                        
                        Vrugt, J. A.: Markov chain Monte Carlo simulation using the DREAM software package: Theory, concepts, and MATLAB implementation, Environmental Modelling & Software, 75, 273–316, https://doi.org/10.1016/j.envsoft.2015.08.013, 2016. a
                    
                
                        
                        World Meteorological Organization (WMO): Manual for Estimation of Probable Maximum Precipitation, Tech. rep., World Meteorological Organization (WMO), ISBN 978-92-63-11045-9, 1973. a
                    
                
                        
                        World Meteorological Organization (WMO): Guide to Hydrological Practices, Volume II Management of Water Resources and Applications of Hydrological Practices, Tech. rep., World Meteorological Organization (WMO), https://library.wmo.int/idurl/4/36066 (last access: 15 December 2024), 2009. a
                    
                Short summary
            This study evaluates the extrapolation performance of long short-term memory (LSTM) networks in rainfall–runoff modeling, specifically under extreme precipitation conditions. The findings reveal that the LSTM cannot predict discharge values beyond a theoretical limit and that this limit is well below the extremity of its training data. This behavior results from the LSTM's gating structures rather than saturation of the cell states alone.
            This study evaluates the extrapolation performance of long short-term memory (LSTM) networks in...