Articles | Volume 28, issue 15
https://doi.org/10.5194/hess-28-3665-2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/hess-28-3665-2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical Note: The divide and measure nonconformity – how metrics can mislead when we evaluate on different data partitions
Department of Compound Environmental Risks, Helmholtz Centre for Environmental Research – UFZ, Leipzig, Germany
Martin Gauch
Google Research, Zurich, Switzerland
Frederik Kratzert
Google Research, Vienna, Austria
Grey Nearing
Google Research, Mountain View, California, USA
Jakob Zscheischler
Department of Compound Environmental Risks, Helmholtz Centre for Environmental Research – UFZ, Leipzig, Germany
Department of Hydro Sciences, TUD Dresden University of Technology, Dresden, Germany
Related authors
Martin Gauch, Frederik Kratzert, Daniel Klotz, Grey Nearing, Deborah Cohen, and Oren Gilon
EGUsphere, https://doi.org/10.5194/egusphere-2025-1224, https://doi.org/10.5194/egusphere-2025-1224, 2025
Short summary
Short summary
Missing input data are one of the most common challenges when building deep learning hydrological models. We present and analyze different methods that can produce predictions when certain inputs are missing during training or inference. Our proposed strategies provide high accuracy while allowing for more flexible data handling and being robust to outages in operational scenarios.
Eduardo Acuña Espinoza, Frederik Kratzert, Daniel Klotz, Martin Gauch, Manuel Álvarez Chaves, Ralf Loritz, and Uwe Ehret
Hydrol. Earth Syst. Sci., 29, 1749–1758, https://doi.org/10.5194/hess-29-1749-2025, https://doi.org/10.5194/hess-29-1749-2025, 2025
Short summary
Short summary
Long short-term memory (LSTM) networks have demonstrated state-of-the-art performance for rainfall-runoff hydrological modelling. However, most studies focus on predictions at a daily scale, limiting the benefits of sub-daily (e.g. hourly) predictions in applications like flood forecasting. In this study, we introduce a new architecture, multi-frequency LSTM (MF-LSTM), designed to use inputs of various temporal frequencies to produce sub-daily (e.g. hourly) predictions at a moderate computational cost.
Eduardo Acuña Espinoza, Ralf Loritz, Frederik Kratzert, Daniel Klotz, Martin Gauch, Manuel Álvarez Chaves, and Uwe Ehret
Hydrol. Earth Syst. Sci., 29, 1277–1294, https://doi.org/10.5194/hess-29-1277-2025, https://doi.org/10.5194/hess-29-1277-2025, 2025
Short summary
Short summary
Data-driven techniques have shown the potential to outperform process-based models in rainfall–runoff simulations. Hybrid models, combining both approaches, aim to enhance accuracy and maintain interpretability. Expanding the set of test cases to evaluate hybrid models under different conditions, we test their generalization capabilities for extreme hydrological events.
Sanika Baste, Daniel Klotz, Eduardo Acuña Espinoza, Andras Bardossy, and Ralf Loritz
EGUsphere, https://doi.org/10.5194/egusphere-2025-425, https://doi.org/10.5194/egusphere-2025-425, 2025
Short summary
Short summary
This study evaluates the extrapolation performance of Long Short-Term Memory (LSTM) networks in rainfall-runoff modeling, specifically under extreme conditions. The findings reveal that the LSTM cannot predict discharge values beyond a theoretical limit, which is well below the extremity of its training data. This behavior results from the LSTM's gating structures rather than saturation of cell states alone.
Daniel Klotz, Peter Miersch, Thiago V. M. do Nascimento, Fabrizio Fenicia, Martin Gauch, and Jakob Zscheischler
Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2024-450, https://doi.org/10.5194/essd-2024-450, 2025
Preprint under review for ESSD
Short summary
Short summary
Data availability is central to hydrological science. It is the basis for advancing our understanding of hydrological processes, building prediction models, and anticipatory water management. We present a data-driven daily runoff reconstruction product for natural streamflow. We name it EARLS: European aggregated reconstruction for large-sample studies. The reconstructions represent daily simulations of natural streamflow across Europe and cover the period from 1953 to 2020.
Frederik Kratzert, Martin Gauch, Daniel Klotz, and Grey Nearing
Hydrol. Earth Syst. Sci., 28, 4187–4201, https://doi.org/10.5194/hess-28-4187-2024, https://doi.org/10.5194/hess-28-4187-2024, 2024
Short summary
Short summary
Recently, a special type of neural-network architecture became increasingly popular in hydrology literature. However, in most applications, this model was applied as a one-to-one replacement for hydrology models without adapting or rethinking the experimental setup. In this opinion paper, we show how this is almost always a bad decision and how using these kinds of models requires the use of large-sample hydrology data sets.
Andreas Auer, Martin Gauch, Frederik Kratzert, Grey Nearing, Sepp Hochreiter, and Daniel Klotz
Hydrol. Earth Syst. Sci., 28, 4099–4126, https://doi.org/10.5194/hess-28-4099-2024, https://doi.org/10.5194/hess-28-4099-2024, 2024
Short summary
Short summary
This work examines the impact of temporal and spatial information on the uncertainty estimation of streamflow forecasts. The study emphasizes the importance of data updates and global information for precise uncertainty estimates. We use conformal prediction to show that recent data enhance the estimates, even if only available infrequently. Local data yield reasonable average estimations but fall short for peak-flow events. The use of global data significantly improves these predictions.
Grey S. Nearing, Daniel Klotz, Jonathan M. Frame, Martin Gauch, Oren Gilon, Frederik Kratzert, Alden Keefe Sampson, Guy Shalev, and Sella Nevo
Hydrol. Earth Syst. Sci., 26, 5493–5513, https://doi.org/10.5194/hess-26-5493-2022, https://doi.org/10.5194/hess-26-5493-2022, 2022
Short summary
Short summary
When designing flood forecasting models, it is necessary to use all available data to achieve the most accurate predictions possible. This manuscript explores two basic ways of ingesting near-real-time streamflow data into machine learning streamflow models. The point we want to make is that when working in the context of machine learning (instead of traditional hydrology models that are based on
bio-geophysics), it is not necessary to use complex statistical methods for injecting sparse data.
Juliane Mai, Hongren Shen, Bryan A. Tolson, Étienne Gaborit, Richard Arsenault, James R. Craig, Vincent Fortin, Lauren M. Fry, Martin Gauch, Daniel Klotz, Frederik Kratzert, Nicole O'Brien, Daniel G. Princz, Sinan Rasiya Koya, Tirthankar Roy, Frank Seglenieks, Narayan K. Shrestha, André G. T. Temgoua, Vincent Vionnet, and Jonathan W. Waddell
Hydrol. Earth Syst. Sci., 26, 3537–3572, https://doi.org/10.5194/hess-26-3537-2022, https://doi.org/10.5194/hess-26-3537-2022, 2022
Short summary
Short summary
Model intercomparison studies are carried out to test various models and compare the quality of their outputs over the same domain. In this study, 13 diverse model setups using the same input data are evaluated over the Great Lakes region. Various model outputs – such as streamflow, evaporation, soil moisture, and amount of snow on the ground – are compared using standardized methods and metrics. The basin-wise model outputs and observations are made available through an interactive website.
Thomas Lees, Steven Reece, Frederik Kratzert, Daniel Klotz, Martin Gauch, Jens De Bruijn, Reetik Kumar Sahu, Peter Greve, Louise Slater, and Simon J. Dadson
Hydrol. Earth Syst. Sci., 26, 3079–3101, https://doi.org/10.5194/hess-26-3079-2022, https://doi.org/10.5194/hess-26-3079-2022, 2022
Short summary
Short summary
Despite the accuracy of deep learning rainfall-runoff models, we are currently uncertain of what these models have learned. In this study we explore the internals of one deep learning architecture and demonstrate that the model learns about intermediate hydrological stores of soil moisture and snow water, despite never having seen data about these processes during training. Therefore, we find evidence that the deep learning approach learns a physically realistic mapping from inputs to outputs.
Daniel Klotz, Frederik Kratzert, Martin Gauch, Alden Keefe Sampson, Johannes Brandstetter, Günter Klambauer, Sepp Hochreiter, and Grey Nearing
Hydrol. Earth Syst. Sci., 26, 1673–1693, https://doi.org/10.5194/hess-26-1673-2022, https://doi.org/10.5194/hess-26-1673-2022, 2022
Short summary
Short summary
This contribution evaluates distributional runoff predictions from deep-learning-based approaches. We propose a benchmarking setup and establish four strong baselines. The results show that accurate, precise, and reliable uncertainty estimation can be achieved with deep learning.
Frederik Kratzert, Daniel Klotz, Sepp Hochreiter, and Grey S. Nearing
Hydrol. Earth Syst. Sci., 25, 2685–2703, https://doi.org/10.5194/hess-25-2685-2021, https://doi.org/10.5194/hess-25-2685-2021, 2021
Short summary
Short summary
We investigate how deep learning models use different meteorological data sets in the task of (regional) rainfall–runoff modeling. We show that performance can be significantly improved when using different data products as input and further show how the model learns to combine those meteorological input differently across time and space. The results are carefully benchmarked against classical approaches, showing the supremacy of the presented approach.
Martin Gauch, Frederik Kratzert, Daniel Klotz, Grey Nearing, Jimmy Lin, and Sepp Hochreiter
Hydrol. Earth Syst. Sci., 25, 2045–2062, https://doi.org/10.5194/hess-25-2045-2021, https://doi.org/10.5194/hess-25-2045-2021, 2021
Short summary
Short summary
We present multi-timescale Short-Term Memory (MTS-LSTM), a machine learning approach that predicts discharge at multiple timescales within one model. MTS-LSTM is significantly more accurate than the US National Water Model and computationally more efficient than an individual LSTM model per timescale. Further, MTS-LSTM can process different input variables at different timescales, which is important as the lead time of meteorological forecasts often depends on their temporal resolution.
Lily-belle Sweet, Christoph Müller, Jonas Jägermeyr, and Jakob Zscheischler
EGUsphere, https://doi.org/10.5194/egusphere-2025-3006, https://doi.org/10.5194/egusphere-2025-3006, 2025
This preprint is open for discussion and under review for Geoscientific Model Development (GMD).
Short summary
Short summary
This study presents a method to identify climate drivers of an impact, such as agricultural yield failure, from high-resolution weather data. The approach systematically generates, selects and combines predictors that generalise across different environments. Tested on crop model simulations, the identified drivers are used to create parsimonious models that achieve high predictive performance over long time horizons, offering a more interpretable alternative to black-box models.
Lou Brett, Christopher J. White, Daniela I. V. Domeisen, Bart van den Hurk, Philip Ward, and Jakob Zscheischler
Nat. Hazards Earth Syst. Sci., 25, 2591–2611, https://doi.org/10.5194/nhess-25-2591-2025, https://doi.org/10.5194/nhess-25-2591-2025, 2025
Short summary
Short summary
Compound events, where multiple weather or climate hazards occur together, pose significant risks to both society and the environment. These events, like simultaneous wind and rain, can have more severe impacts than single hazards. Our review of compound event research from 2012–2022 reveals a rise in studies, especially on events that occur concurrently, hot and dry events, and compounding flooding. The review also highlights opportunities for research in the coming years.
Martin Gauch, Frederik Kratzert, Daniel Klotz, Grey Nearing, Deborah Cohen, and Oren Gilon
EGUsphere, https://doi.org/10.5194/egusphere-2025-1224, https://doi.org/10.5194/egusphere-2025-1224, 2025
Short summary
Short summary
Missing input data are one of the most common challenges when building deep learning hydrological models. We present and analyze different methods that can produce predictions when certain inputs are missing during training or inference. Our proposed strategies provide high accuracy while allowing for more flexible data handling and being robust to outages in operational scenarios.
Eduardo Acuña Espinoza, Frederik Kratzert, Daniel Klotz, Martin Gauch, Manuel Álvarez Chaves, Ralf Loritz, and Uwe Ehret
Hydrol. Earth Syst. Sci., 29, 1749–1758, https://doi.org/10.5194/hess-29-1749-2025, https://doi.org/10.5194/hess-29-1749-2025, 2025
Short summary
Short summary
Long short-term memory (LSTM) networks have demonstrated state-of-the-art performance for rainfall-runoff hydrological modelling. However, most studies focus on predictions at a daily scale, limiting the benefits of sub-daily (e.g. hourly) predictions in applications like flood forecasting. In this study, we introduce a new architecture, multi-frequency LSTM (MF-LSTM), designed to use inputs of various temporal frequencies to produce sub-daily (e.g. hourly) predictions at a moderate computational cost.
Eduardo Acuña Espinoza, Ralf Loritz, Frederik Kratzert, Daniel Klotz, Martin Gauch, Manuel Álvarez Chaves, and Uwe Ehret
Hydrol. Earth Syst. Sci., 29, 1277–1294, https://doi.org/10.5194/hess-29-1277-2025, https://doi.org/10.5194/hess-29-1277-2025, 2025
Short summary
Short summary
Data-driven techniques have shown the potential to outperform process-based models in rainfall–runoff simulations. Hybrid models, combining both approaches, aim to enhance accuracy and maintain interpretability. Expanding the set of test cases to evaluate hybrid models under different conditions, we test their generalization capabilities for extreme hydrological events.
Sanika Baste, Daniel Klotz, Eduardo Acuña Espinoza, Andras Bardossy, and Ralf Loritz
EGUsphere, https://doi.org/10.5194/egusphere-2025-425, https://doi.org/10.5194/egusphere-2025-425, 2025
Short summary
Short summary
This study evaluates the extrapolation performance of Long Short-Term Memory (LSTM) networks in rainfall-runoff modeling, specifically under extreme conditions. The findings reveal that the LSTM cannot predict discharge values beyond a theoretical limit, which is well below the extremity of its training data. This behavior results from the LSTM's gating structures rather than saturation of cell states alone.
Daniel Klotz, Peter Miersch, Thiago V. M. do Nascimento, Fabrizio Fenicia, Martin Gauch, and Jakob Zscheischler
Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2024-450, https://doi.org/10.5194/essd-2024-450, 2025
Preprint under review for ESSD
Short summary
Short summary
Data availability is central to hydrological science. It is the basis for advancing our understanding of hydrological processes, building prediction models, and anticipatory water management. We present a data-driven daily runoff reconstruction product for natural streamflow. We name it EARLS: European aggregated reconstruction for large-sample studies. The reconstructions represent daily simulations of natural streamflow across Europe and cover the period from 1953 to 2020.
Gab Abramowitz, Anna Ukkola, Sanaa Hobeichi, Jon Cranko Page, Mathew Lipson, Martin G. De Kauwe, Samuel Green, Claire Brenner, Jonathan Frame, Grey Nearing, Martyn Clark, Martin Best, Peter Anthoni, Gabriele Arduini, Souhail Boussetta, Silvia Caldararu, Kyeungwoo Cho, Matthias Cuntz, David Fairbairn, Craig R. Ferguson, Hyungjun Kim, Yeonjoo Kim, Jürgen Knauer, David Lawrence, Xiangzhong Luo, Sergey Malyshev, Tomoko Nitta, Jerome Ogee, Keith Oleson, Catherine Ottlé, Phillipe Peylin, Patricia de Rosnay, Heather Rumbold, Bob Su, Nicolas Vuichard, Anthony P. Walker, Xiaoni Wang-Faivre, Yunfei Wang, and Yijian Zeng
Biogeosciences, 21, 5517–5538, https://doi.org/10.5194/bg-21-5517-2024, https://doi.org/10.5194/bg-21-5517-2024, 2024
Short summary
Short summary
This paper evaluates land models – computer-based models that simulate ecosystem dynamics; land carbon, water, and energy cycles; and the role of land in the climate system. It uses machine learning and AI approaches to show that, despite the complexity of land models, they do not perform nearly as well as they could given the amount of information they are provided with about the prediction problem.
Claudia Färber, Henning Plessow, Simon Mischel, Frederik Kratzert, Nans Addor, Guy Shalev, and Ulrich Looser
Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2024-427, https://doi.org/10.5194/essd-2024-427, 2024
Revised manuscript accepted for ESSD
Short summary
Short summary
Large-sample datasets are essential in hydrological science to support modelling studies and advance process understanding. Caravan is a community initiative to create a large-sample hydrology dataset of meteorological forcing data, catchment attributes, and discharge data for catchments around the world. This dataset is a subset of hydrological discharge data and station-based watersheds from the Global Runoff Data Centre (GRDC), which are covered by an open data policy.
Frederik Kratzert, Martin Gauch, Daniel Klotz, and Grey Nearing
Hydrol. Earth Syst. Sci., 28, 4187–4201, https://doi.org/10.5194/hess-28-4187-2024, https://doi.org/10.5194/hess-28-4187-2024, 2024
Short summary
Short summary
Recently, a special type of neural-network architecture became increasingly popular in hydrology literature. However, in most applications, this model was applied as a one-to-one replacement for hydrology models without adapting or rethinking the experimental setup. In this opinion paper, we show how this is almost always a bad decision and how using these kinds of models requires the use of large-sample hydrology data sets.
Andreas Auer, Martin Gauch, Frederik Kratzert, Grey Nearing, Sepp Hochreiter, and Daniel Klotz
Hydrol. Earth Syst. Sci., 28, 4099–4126, https://doi.org/10.5194/hess-28-4099-2024, https://doi.org/10.5194/hess-28-4099-2024, 2024
Short summary
Short summary
This work examines the impact of temporal and spatial information on the uncertainty estimation of streamflow forecasts. The study emphasizes the importance of data updates and global information for precise uncertainty estimates. We use conformal prediction to show that recent data enhance the estimates, even if only available infrequently. Local data yield reasonable average estimations but fall short for peak-flow events. The use of global data significantly improves these predictions.
Beijing Fang, Emanuele Bevacqua, Oldrich Rakovec, and Jakob Zscheischler
Hydrol. Earth Syst. Sci., 28, 3755–3775, https://doi.org/10.5194/hess-28-3755-2024, https://doi.org/10.5194/hess-28-3755-2024, 2024
Short summary
Short summary
We use grid-based runoff from a hydrological model to identify large spatiotemporally connected flood events in Europe, assess extent trends over the last 70 years, and attribute the trends to different drivers. Our findings reveal a general increase in flood extent, with regional variations driven by diverse factors. The study not only enables a thorough examination of flood events across multiple basins but also highlights the potential challenges arising from changing flood extents.
Derrick Muheki, Axel A. J. Deijns, Emanuele Bevacqua, Gabriele Messori, Jakob Zscheischler, and Wim Thiery
Earth Syst. Dynam., 15, 429–466, https://doi.org/10.5194/esd-15-429-2024, https://doi.org/10.5194/esd-15-429-2024, 2024
Short summary
Short summary
Climate change affects the interaction, dependence, and joint occurrence of climate extremes. Here we investigate the joint occurrence of pairs of river floods, droughts, heatwaves, crop failures, wildfires, and tropical cyclones in East Africa under past and future climate conditions. Our results show that, across all future warming scenarios, the frequency and spatial extent of these co-occurring extremes will increase in this region, particularly in areas close to the Nile and Congo rivers.
Louise J. Slater, Louise Arnal, Marie-Amélie Boucher, Annie Y.-Y. Chang, Simon Moulds, Conor Murphy, Grey Nearing, Guy Shalev, Chaopeng Shen, Linda Speight, Gabriele Villarini, Robert L. Wilby, Andrew Wood, and Massimiliano Zappa
Hydrol. Earth Syst. Sci., 27, 1865–1889, https://doi.org/10.5194/hess-27-1865-2023, https://doi.org/10.5194/hess-27-1865-2023, 2023
Short summary
Short summary
Hybrid forecasting systems combine data-driven methods with physics-based weather and climate models to improve the accuracy of predictions for meteorological and hydroclimatic events such as rainfall, temperature, streamflow, floods, droughts, tropical cyclones, or atmospheric rivers. We review recent developments in hybrid forecasting and outline key challenges and opportunities in the field.
Shijie Jiang, Emanuele Bevacqua, and Jakob Zscheischler
Hydrol. Earth Syst. Sci., 26, 6339–6359, https://doi.org/10.5194/hess-26-6339-2022, https://doi.org/10.5194/hess-26-6339-2022, 2022
Short summary
Short summary
Using a novel explainable machine learning approach, we investigated the contributions of precipitation, temperature, and day length to different peak discharges, thereby uncovering three primary flooding mechanisms widespread in European catchments. The results indicate that flooding mechanisms have changed in numerous catchments over the past 70 years. The study highlights the potential of artificial intelligence in revealing complex changes in extreme events related to climate change.
Natacha Le Grix, Jakob Zscheischler, Keith B. Rodgers, Ryohei Yamaguchi, and Thomas L. Frölicher
Biogeosciences, 19, 5807–5835, https://doi.org/10.5194/bg-19-5807-2022, https://doi.org/10.5194/bg-19-5807-2022, 2022
Short summary
Short summary
Compound events threaten marine ecosystems. Here, we investigate the potentially harmful combination of marine heatwaves with low phytoplankton productivity. Using satellite-based observations, we show that these compound events are frequent in the low latitudes. We then investigate the drivers of these compound events using Earth system models. The models share similar drivers in the low latitudes but disagree in the high latitudes due to divergent factors limiting phytoplankton production.
Grey S. Nearing, Daniel Klotz, Jonathan M. Frame, Martin Gauch, Oren Gilon, Frederik Kratzert, Alden Keefe Sampson, Guy Shalev, and Sella Nevo
Hydrol. Earth Syst. Sci., 26, 5493–5513, https://doi.org/10.5194/hess-26-5493-2022, https://doi.org/10.5194/hess-26-5493-2022, 2022
Short summary
Short summary
When designing flood forecasting models, it is necessary to use all available data to achieve the most accurate predictions possible. This manuscript explores two basic ways of ingesting near-real-time streamflow data into machine learning streamflow models. The point we want to make is that when working in the context of machine learning (instead of traditional hydrology models that are based on
bio-geophysics), it is not necessary to use complex statistical methods for injecting sparse data.
Sella Nevo, Efrat Morin, Adi Gerzi Rosenthal, Asher Metzger, Chen Barshai, Dana Weitzner, Dafi Voloshin, Frederik Kratzert, Gal Elidan, Gideon Dror, Gregory Begelman, Grey Nearing, Guy Shalev, Hila Noga, Ira Shavitt, Liora Yuklea, Moriah Royz, Niv Giladi, Nofar Peled Levi, Ofir Reich, Oren Gilon, Ronnie Maor, Shahar Timnat, Tal Shechter, Vladimir Anisimov, Yotam Gigi, Yuval Levin, Zach Moshe, Zvika Ben-Haim, Avinatan Hassidim, and Yossi Matias
Hydrol. Earth Syst. Sci., 26, 4013–4032, https://doi.org/10.5194/hess-26-4013-2022, https://doi.org/10.5194/hess-26-4013-2022, 2022
Short summary
Short summary
Early flood warnings are one of the most effective tools to save lives and goods. Machine learning (ML) models can improve flood prediction accuracy but their use in operational frameworks is limited. The paper presents a flood warning system, operational in India and Bangladesh, that uses ML models for forecasting river stage and flood inundation maps and discusses the models' performances. In 2021, more than 100 million flood alerts were sent to people near rivers over an area of 470 000 km2.
Juliane Mai, Hongren Shen, Bryan A. Tolson, Étienne Gaborit, Richard Arsenault, James R. Craig, Vincent Fortin, Lauren M. Fry, Martin Gauch, Daniel Klotz, Frederik Kratzert, Nicole O'Brien, Daniel G. Princz, Sinan Rasiya Koya, Tirthankar Roy, Frank Seglenieks, Narayan K. Shrestha, André G. T. Temgoua, Vincent Vionnet, and Jonathan W. Waddell
Hydrol. Earth Syst. Sci., 26, 3537–3572, https://doi.org/10.5194/hess-26-3537-2022, https://doi.org/10.5194/hess-26-3537-2022, 2022
Short summary
Short summary
Model intercomparison studies are carried out to test various models and compare the quality of their outputs over the same domain. In this study, 13 diverse model setups using the same input data are evaluated over the Great Lakes region. Various model outputs – such as streamflow, evaporation, soil moisture, and amount of snow on the ground – are compared using standardized methods and metrics. The basin-wise model outputs and observations are made available through an interactive website.
Jonathan M. Frame, Frederik Kratzert, Daniel Klotz, Martin Gauch, Guy Shalev, Oren Gilon, Logan M. Qualls, Hoshin V. Gupta, and Grey S. Nearing
Hydrol. Earth Syst. Sci., 26, 3377–3392, https://doi.org/10.5194/hess-26-3377-2022, https://doi.org/10.5194/hess-26-3377-2022, 2022
Short summary
Short summary
The most accurate rainfall–runoff predictions are currently based on deep learning. There is a concern among hydrologists that deep learning models may not be reliable in extrapolation or for predicting extreme events. This study tests that hypothesis. The deep learning models remained relatively accurate in predicting extreme events compared with traditional models, even when extreme events were not included in the training set.
Thomas Lees, Steven Reece, Frederik Kratzert, Daniel Klotz, Martin Gauch, Jens De Bruijn, Reetik Kumar Sahu, Peter Greve, Louise Slater, and Simon J. Dadson
Hydrol. Earth Syst. Sci., 26, 3079–3101, https://doi.org/10.5194/hess-26-3079-2022, https://doi.org/10.5194/hess-26-3079-2022, 2022
Short summary
Short summary
Despite the accuracy of deep learning rainfall-runoff models, we are currently uncertain of what these models have learned. In this study we explore the internals of one deep learning architecture and demonstrate that the model learns about intermediate hydrological stores of soil moisture and snow water, despite never having seen data about these processes during training. Therefore, we find evidence that the deep learning approach learns a physically realistic mapping from inputs to outputs.
Alexandre Tuel, Bettina Schaefli, Jakob Zscheischler, and Olivia Martius
Hydrol. Earth Syst. Sci., 26, 2649–2669, https://doi.org/10.5194/hess-26-2649-2022, https://doi.org/10.5194/hess-26-2649-2022, 2022
Short summary
Short summary
River discharge is strongly influenced by the temporal structure of precipitation. Here, we show how extreme precipitation events that occur a few days or weeks after a previous event have a larger effect on river discharge than events occurring in isolation. Windows of 2 weeks or less between events have the most impact. Similarly, periods of persistent high discharge tend to be associated with the occurrence of several extreme precipitation events in close succession.
Elisabeth Tschumi, Sebastian Lienert, Karin van der Wiel, Fortunat Joos, and Jakob Zscheischler
Biogeosciences, 19, 1979–1993, https://doi.org/10.5194/bg-19-1979-2022, https://doi.org/10.5194/bg-19-1979-2022, 2022
Short summary
Short summary
Droughts and heatwaves are expected to occur more often in the future, but their effects on land vegetation and the carbon cycle are poorly understood. We use six climate scenarios with differing extreme occurrences and a vegetation model to analyse these effects. Tree coverage and associated plant productivity increase under a climate with no extremes. Frequent co-occurring droughts and heatwaves decrease plant productivity more than the combined effects of single droughts or heatwaves.
Daniel Klotz, Frederik Kratzert, Martin Gauch, Alden Keefe Sampson, Johannes Brandstetter, Günter Klambauer, Sepp Hochreiter, and Grey Nearing
Hydrol. Earth Syst. Sci., 26, 1673–1693, https://doi.org/10.5194/hess-26-1673-2022, https://doi.org/10.5194/hess-26-1673-2022, 2022
Short summary
Short summary
This contribution evaluates distributional runoff predictions from deep-learning-based approaches. We propose a benchmarking setup and establish four strong baselines. The results show that accurate, precise, and reliable uncertainty estimation can be achieved with deep learning.
Roberto Villalobos-Herrera, Emanuele Bevacqua, Andreia F. S. Ribeiro, Graeme Auld, Laura Crocetti, Bilyana Mircheva, Minh Ha, Jakob Zscheischler, and Carlo De Michele
Nat. Hazards Earth Syst. Sci., 21, 1867–1885, https://doi.org/10.5194/nhess-21-1867-2021, https://doi.org/10.5194/nhess-21-1867-2021, 2021
Short summary
Short summary
Climate hazards may be caused by events which have multiple drivers. Here we present a method to break down climate model biases in hazard indicators down to the bias caused by each driving variable. Using simplified fire and heat stress indicators driven by temperature and relative humidity as examples, we show how multivariate indicators may have complex biases and that the relationship between driving variables is a source of bias that must be considered in climate model bias corrections.
Frederik Kratzert, Daniel Klotz, Sepp Hochreiter, and Grey S. Nearing
Hydrol. Earth Syst. Sci., 25, 2685–2703, https://doi.org/10.5194/hess-25-2685-2021, https://doi.org/10.5194/hess-25-2685-2021, 2021
Short summary
Short summary
We investigate how deep learning models use different meteorological data sets in the task of (regional) rainfall–runoff modeling. We show that performance can be significantly improved when using different data products as input and further show how the model learns to combine those meteorological input differently across time and space. The results are carefully benchmarked against classical approaches, showing the supremacy of the presented approach.
Martin Gauch, Frederik Kratzert, Daniel Klotz, Grey Nearing, Jimmy Lin, and Sepp Hochreiter
Hydrol. Earth Syst. Sci., 25, 2045–2062, https://doi.org/10.5194/hess-25-2045-2021, https://doi.org/10.5194/hess-25-2045-2021, 2021
Short summary
Short summary
We present multi-timescale Short-Term Memory (MTS-LSTM), a machine learning approach that predicts discharge at multiple timescales within one model. MTS-LSTM is significantly more accurate than the US National Water Model and computationally more efficient than an individual LSTM model per timescale. Further, MTS-LSTM can process different input variables at different timescales, which is important as the lead time of meteorological forecasts often depends on their temporal resolution.
Jun Li, Zhaoli Wang, Xushu Wu, Jakob Zscheischler, Shenglian Guo, and Xiaohong Chen
Hydrol. Earth Syst. Sci., 25, 1587–1601, https://doi.org/10.5194/hess-25-1587-2021, https://doi.org/10.5194/hess-25-1587-2021, 2021
Short summary
Short summary
We introduce a daily-scale index, termed the standardized compound drought and heat index (SCDHI), to measure the key features of compound dry-hot conditions. SCDHI can not only monitor the long-term compound dry-hot events, but can also capture such events at sub-monthly scale and reflect the related vegetation activity impacts. The index can provide a new tool to quantify sub-monthly characteristics of compound dry-hot events, which are vital for releasing early and timely warning.
Natacha Le Grix, Jakob Zscheischler, Charlotte Laufkötter, Cecile S. Rousseaux, and Thomas L. Frölicher
Biogeosciences, 18, 2119–2137, https://doi.org/10.5194/bg-18-2119-2021, https://doi.org/10.5194/bg-18-2119-2021, 2021
Short summary
Short summary
Marine ecosystems could suffer severe damage from the co-occurrence of a marine heat wave with extremely low chlorophyll concentration. Here, we provide a first assessment of compound marine heat wave and
low-chlorophyll events in the global ocean from 1998 to 2018. We reveal hotspots of these compound events in the equatorial Pacific and in the Arabian Sea and show that they mostly occur in summer at high latitudes and their frequency is modulated by large-scale modes of climate variability.
Johannes Vogel, Pauline Rivoire, Cristina Deidda, Leila Rahimi, Christoph A. Sauter, Elisabeth Tschumi, Karin van der Wiel, Tianyi Zhang, and Jakob Zscheischler
Earth Syst. Dynam., 12, 151–172, https://doi.org/10.5194/esd-12-151-2021, https://doi.org/10.5194/esd-12-151-2021, 2021
Short summary
Short summary
We present a statistical approach for automatically identifying multiple drivers of extreme impacts based on LASSO regression. We apply the approach to simulated crop failure in the Northern Hemisphere and identify which meteorological variables including climate extreme indices and which seasons are relevant to predict crop failure. The presented approach can help unravel compounding drivers in high-impact events and could be applied to other impacts such as wildfires or flooding.
Jakob Zscheischler, Philippe Naveau, Olivia Martius, Sebastian Engelke, and Christoph C. Raible
Earth Syst. Dynam., 12, 1–16, https://doi.org/10.5194/esd-12-1-2021, https://doi.org/10.5194/esd-12-1-2021, 2021
Short summary
Short summary
Compound extremes such as heavy precipitation and extreme winds can lead to large damage. To date it is unclear how well climate models represent such compound extremes. Here we present a new measure to assess differences in the dependence structure of bivariate extremes. This measure is applied to assess differences in the dependence of compound precipitation and wind extremes between three model simulations and one reanalysis dataset in a domain in central Europe.
Andreia Filipa Silva Ribeiro, Ana Russo, Célia Marina Gouveia, Patrícia Páscoa, and Jakob Zscheischler
Biogeosciences, 17, 4815–4830, https://doi.org/10.5194/bg-17-4815-2020, https://doi.org/10.5194/bg-17-4815-2020, 2020
Short summary
Short summary
This study investigates the impacts of compound dry and hot extremes on crop yields, namely wheat and barley, over two regions in Spain dominated by rainfed agriculture. We provide estimates of the conditional probability of crop loss under compound dry and hot conditions, which could be an important tool for responsible authorities to mitigate the impacts magnified by the interactions between the different hazards.
Cited articles
Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, https://doi.org/10.5194/hess-21-5293-2017, 2017. a, b
Beven, K.: Benchmarking hydrological models for an uncertain future, Hydrol. Process., 37, e14882, https://doi.org/10.1002/hyp.14882, 2023. a
Clark, M. P., Vogel, R. M., Lamontagne, J. R., Mizukami, N., Knoben, W. J., Tang, G., Gharari, S., Freer, J. E., Whitfield, P. H., Shook, K. R., and Papalexiou, S. M.: The abuse of popular performance metrics in hydrologic modeling, Water Resour. Res., 57, e2020WR029001, https://doi.org/10.1029/2020WR029001, 2021. a, b, c, d, e, f
Duc, L. and Sawada, Y.: A signal-processing-based interpretation of the Nash–Sutcliffe efficiency, Hydrol. Earth Syst. Sci., 27, 1827–1839, https://doi.org/10.5194/hess-27-1827-2023, 2023. a
Feng, D., Beck, H., Lawson, K., and Shen, C.: The suitability of differentiable, physics-informed machine learning hydrologic models for ungauged regions and climate change impact assessment, Hydrol. Earth Syst. Sci., 27, 2357–2373, https://doi.org/10.5194/hess-27-2357-2023, 2023. a
Gauch, M., Kratzert, F., Gilon, O., Gupta, H., Mai, J., Nearing, G., Tolson, B., Hochreiter, S., and Klotz, D.: In Defense of Metrics: Metrics Sufficiently Encode Typical Human Preferences Regarding Hydrological Model Performance, Water Resour. Res., 59, e2022WR033918, https://doi.org/10.1029/2022WR033918, 2023. a, b
Good, I. J. and Mittal, Y.: The amalgamation and geometry of two-by-two contingency tables, Ann. Stat., 15, 694–711, https://doi.org/10.1214/aos/1176350369, 1987. a, b
Highleyman, W. H.: The design and analysis of pattern recognition experiments, Bell Syst. Tech. J., 41, 723–744, 1962. a
Klotz, D.: Acompaning code for Technical Note: The divide and measure nonconformity, GitHub [code], https://github.com/danklotz/a-damn-paper/tree/main, last access: 7 August 2024. a
Knoben, W. J. M., Freer, J. E., and Woods, R. A.: Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores, Hydrol. Earth Syst. Sci., 23, 4323–4331, https://doi.org/10.5194/hess-23-4323-2019, 2019. a
Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and Nearing, G.: Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets, Hydrol. Earth Syst. Sci., 23, 5089–5110, https://doi.org/10.5194/hess-23-5089-2019, 2019. a, b, c, d, e, f, g, h, i
Kratzert, F., Gauch, M., Nearing, G., and Klotz, D.: NeuralHydrology – A Python library for Deep Learning research in hydrology, J. Open Sour. Softw., 7, 4050, https://doi.org/10.21105/joss.04050, 2022. a
Lamontagne, J. R., Barber, C. A., and Vogel, R. M.: Improved Estimators of Model Performance Efficiency for Skewed Hydrologic Data, Water Resourc. Res., 56, e2020WR027101, https://doi.org/10.1029/2020WR027101, 2020. a
Larson, S. C.: The shrinkage of the coefficient of multiple correlation, J. Educ. Psychol., 22, 45–55, https://doi.org/10.1037/h0072400, 1931. a
Mai, J., Shen, H., Tolson, B. A., Gaborit, É., Arsenault, R., Craig, J. R., Fortin, V., Fry, L. M., Gauch, M., Klotz, D., Kratzert, F., O'Brien, N., Princz, D. G., Rasiya Koya, S., Roy, T., Seglenieks, F., Shrestha, N. K., Temgoua, A. G. T., Vionnet, V., and Waddell, J. W.: The Great Lakes Runoff Intercomparison Project Phase 4: the Great Lakes (GRIP-GL), Hydrol. Earth Syst. Sci., 26, 3537–3572, https://doi.org/10.5194/hess-26-3537-2022, 2022. a, b
Matejka, J. and Fitzmaurice, G.: Same stats, different graphs: generating datasets with varied appearance and identical statistics through simulated annealing, in: Proceedings of the 2017 CHI conference on human factors in computing systems, Denver, Colorado, USA, 6–11 May 2017, 1290–1294, https://doi.org/10.1145/3025453.3025912, 2017. a
Mayr, A., Klambauer, G., Unterthiner, T., Steijaert, M., Wegner, J. K., Ceulemans, H., Clevert, D.-A., and Hochreiter, S.: Large-scale comparison of machine learning methods for drug target prediction on ChEMBL, Chem. Sci., 9, 5441–5451, 2018. a
Mizukami, N., Rakovec, O., Newman, A. J., Clark, M. P., Wood, A. W., Gupta, H. V., and Kumar, R.: On the choice of calibration metrics for “high-flow” estimation using hydrologic models, Hydrol. Earth Syst. Sci., 23, 2601–2614, https://doi.org/10.5194/hess-23-2601-2019, 2019. a
Nash, J. E. and Sutcliffe, J. V.: River flow forecasting through conceptual models part I – A discussion of principles, J. Hydrol., 10, 282–290, 1970. a
Nearing, G. S., Mocko, D. M., Peters-Lidard, C. D., Kumar, S. V., and Xia, Y.: Benchmarking NLDAS-2 soil moisture and evapotranspiration to separate uncertainty contributions, J. Hydrometeorol., 17, 745–759, 2016. a
Nearing, G. S., Ruddell, B. L., Clark, M. P., Nijssen, B., and Peters-Lidard, C.: Benchmarking and process diagnostics of land models, J. Hydrometeorol., 19, 1835–1852, 2018. a
Newman, A. J., Clark, M. P., Sampson, K., Wood, A., Hay, L. E., Bock, A., Viger, R. J., Blodgett, D., Brekke, L., Arnold, J. R., Hopson, T., and Duan, Q.: Development of a large-sample watershed-scale hydrometeorological data set for the contiguous USA: data set characteristics and assessment of regional variability in hydrologic model performance, Hydrol. Earth Syst. Sci., 19, 209–223, https://doi.org/10.5194/hess-19-209-2015, 2015. a
Seibert, J.: On the need for benchmarks in hydrological modelling, Hydrol. Process., 15, 1063–1064, https://doi.org/10.1002/hyp.446, 2001. a
Shen, H., Tolson, B. A., and Mai, J.: Time to update the split-sample approach in hydrological model calibration, Water Resour. Res., 58, e2021WR031523, https://doi.org/10.1029/2021WR031523, 2022. a, b
Simpson, E. H.: The interpretation of interaction in contingency tables, J. Roy. Stat. Soc. B, 13, 238–241, 1951. a
Stone, M.: Cross-validatory choice and assessment of statistical predictions, J. Roy. Stat. Soc. B, 36, 111–133, 1974. a
Sweet, L.-b., Müller, C., Anand, M., and Zscheischler, J.: Cross-validation strategy impacts the performance and interpretation of machine learning models, Artific. Intel. Earth Syst., 2, e230026, https://doi.org/10.1175/AIES-D-23-0026.1, 2023. a
Vapnik, V.: Principles of risk minimization for learning theory, Adv. Neural Inform. Process. Syst., 4, 831–838, 1991. a
Wagener, T., McIntyre, N., Lees, M., Wheater, H., and Gupta, H.: Towards reduced uncertainty in conceptual rainfall-runoff modelling: Dynamic identifiability analysis, Hydrol. Process., 17, 455–476, 2003. a
Wagner, C. H.: Simpson's paradox in real life, Am. Stat., 36, 46–48, 1982. a
Wayland, J.: Jon Wayland: What is Simposon's Paradox, https://www.quora.com/What-is-Simpsons-paradox/answer/Jon-Wayland (last access: 13 December 2023), 2018. a
Winkler, R. L.: A decision-theoretic approach to interval estimation, J. Am. Stat. Assoc., 67, 187–191, 1972. a
Wright, D. P., Thyer, M., and Westra, S.: Influential point detection diagnostics in the context of hydrological model calibration, J. Hydrol., 527, 1161–1172, 2015. a
Short summary
The evaluation of model performance is essential for hydrological modeling. Using performance criteria requires a deep understanding of their properties. We focus on a counterintuitive aspect of the Nash–Sutcliffe efficiency (NSE) and show that if we divide the data into multiple parts, the overall performance can be higher than all the evaluations of the subsets. Although this follows from the definition of the NSE, the resulting behavior can have unintended consequences in practice.
The evaluation of model performance is essential for hydrological modeling. Using performance...