the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
How well do hydrological models simulate streamflow extremes and drought-to-flood transitions?
Eduardo Muñoz-Castro
Bailey J. Anderson
Paul C. Astagneau
Daniel L. Swain
Pablo A. Mendoza
Manuela I. Brunner
Flood impacts can be enhanced when they occur shortly after droughts. Hydrological models are useful tools to better understand the underlying processes and mechanisms driving the response of floods occurring in close succession to streamflow drought. However, it is yet unclear how well hydrological models capture these compound extreme events and which modeling decisions are most important for model performance. To address this research gap, we calibrated four conceptual bucket-style hydrological models with different structures (GR4J, GR5J, GR6J, and TUW) for 63 catchments located in Chile and Switzerland using different calibration strategies. Specifically, we assessed the relative importance of different methodological choices in simulating and detecting observed drought-to-flood transitions, including model structure, streamflow transformation, and the Kling–Gupta efficiency (KGE) formulation and weights used to calibrate the model parameters. We demonstrate that model performance, as expressed by the KGE, does not guarantee a good performance in terms of detecting streamflow extremes and their transitions. Further, we show that a model's performance with respect to capturing extreme events primarily depends on how well it captures streamflow timing. Our results also highlight that model structure, catchment characteristics and meteorological forcings play a key role in the detection of transitions. Overall, we find that model representation of drought-to-flood transitions is generally poor, especially in semi-arid and high-mountain catchments compared to humid low-elevation catchments. Ultimately, our study provides insights for further model improvements to simulate and better understand drought-to-flood transitions and to identify regions prone to this hazard.
- Article
(6382 KB) - Full-text XML
-
Supplement
(3527 KB) - BibTeX
- EndNote
Hydrological extreme events such as streamflow droughts and floods are expected to become more frequent, severe, and persistent in a warming climate (e.g., Gu et al., 2023; Asadieh and Krakauer, 2017; Martin, 2018; Tabari et al., 2021). This can lead to severe impacts on infrastructure, agriculture, water supply, and hydropower generation (e.g., McClymont et al., 2020; McMartin et al., 2018; Lehner et al., 2006; Sivakumar, 2011; Wasti et al., 2022), as well as social and political systems (e.g., Doocy et al., 2013; Hurlbert and Gupta, 2017; Kiem and Austin, 2013; Visconti, 2022).
Studies focusing on hydrological extreme events and their impacts often assume temporal and/or spatial independence between them, neglecting that extremes may be multivariate phenomena (Banfi and De Michele, 2022; Brunner, 2023). However, the impacts of floods can be enhanced when they occur during or shortly after dry periods (e.g., Barendrecht et al., 2024; Swain et al., 2018; He and Sheffield, 2020; Rashid and Wahl, 2022). For instance, Handwerger et al. (2019) and Valenzuela et al. (2022) have demonstrated an increase in the occurrence of landslides in California and Chile due to shifts from meteorological drought to intense precipitation. Similarly, Dietze et al. (2022) have shown that the 2018–2020 drought in Europe enhanced debris mobilisation during the 2021 flood in the Eifel region of western Germany and Belgium. In 2017, intense precipitation broke the 2012–2016 drought in California and led to severe flooding, the activation of the emergency spillway of the Lake Oroville dam for the first time in its history, and the declaration of emergency (Griffin and Anchukaitis, 2014; Robeson, 2015; Wang et al., 2017). Despite evidence of the interactions between drought and flood events, they are still most frequently studied independently (e.g., Ward et al., 2020; Quesada-Montano et al., 2018; Di Baldassarre et al., 2017).
The transition from drought to flood can occur within hours or days, whereas the transition from floods to droughts can range from weeks to years, leading to different water management challenges and reaction times for decision-makers (Hammond et al., 2025). Due to the inherent asymmetry in spatiotemporal characteristics and underlying drivers, as has been recently reviewed by Swain et al. (2025) from both meteorological and hydrological perspectives, drought-to-flood transitions often have more severe impacts than flood-to-drought transitions.
Both hydrological droughts and floods are linked to meteorological conditions such as precipitation surplus/deficit or low/high evapotranspiration rates. However, it has been shown that meteorological dry-to-wet spells are only weakly associated with hydrological drought-to-flood transitions, with a propagation rate of just 10 % within a 30 d period, and that wet spells are less likely to lead to floods than dry spells are to cause droughts (Brunner et al., 2025). Consequently, the occurrence and drivers of these compound events are not yet fully understood (e.g., Matanó et al., 2022, 2024; Brunner, 2023; Götte and Brunner, 2024; Hammond et al., 2025; Brunner et al., 2025). Similarly, it is yet unclear how increasing hydrological volatility in a warming climate (Swain et al., 2025) will translate to changes in drought-to-flood transition behavior.
Process-based hydrological models can provide valuable insights into how streamflow and/or other hydrological fluxes and states react to variations in meteorological and climate inputs (Hrachowitz and Clark, 2017). In recent decades, several efforts have been made to improve the realism of hydrological models in terms of spatial variability (e.g., Dembélé et al., 2020), the simulation of low (e.g., Garcia et al., 2017) and high flows (e.g., Mizukami et al., 2019), or the representation of flood-triggering mechanisms and spatiotemporal coherence (e.g., Brunner et al., 2020, 2021), under current and changing climatic conditions (e.g., Fowler et al., 2018). However, modeling hydrological extreme events such as droughts and floods is still challenging (e.g., Mizukami et al., 2019; Bruno et al., 2024), especially when multiple variables are involved. Such cases include, for example, modeling the dependence between flood peaks and volumes (Brunner and Sikorska-Senoner, 2019), or modeling the spatial dependence of floods happening in different locations (Brunner et al., 2021). This complexity suggests that capturing consecutive drought-to-flood events might not be trivial either. As model evaluations targeted at compound extremes have not yet been performed, it is still unclear how well hydrological models can, in fact, capture drought-to-flood transitions.
Hydrological modeling involves making decisions about model structure (i.e., process representations and parameterizations), spatial discretization, meteorological forcings, and parameter estimation approach (e.g., calibration/evaluation periods, hydrological target variables or signatures used in objective functions), which affect hydrological simulations and whose importance might vary depending on the modeling purpose (e.g., Mendoza et al., 2016; Mizukami et al., 2016; Baez-Villanueva et al., 2021; Guo et al., 2017; Melsen et al., 2019). Previous studies have highlighted that such modeling decisions can substantially influence simulated hydrological extremes and their uncertainties (e.g., Alexander et al., 2023; Melsen and Guse, 2019; Melsen et al., 2019). They have also shown that the choice of objective function for model calibration, model structure, and spatial discretization (forcings and domain) are the most influential decisions on modeling outcomes. While these previous studies have focused on analyzing the impacts of modeling decisions on drought and flood attributes (e.g., severity, duration), they have not looked at how these decisions influence event detection, i.e. whether or not a model can capture extreme events below or above a certain threshold. Moreover, previous work has focused on individual extremes instead of looking at them in a multivariate setting (Brunner, 2023). As such, it is unclear how individual modeling decisions might influence the representation of hydrological transitions.
Hydrological modeling often relies on a calibration process to find parameter values that minimize discrepancies between observations and simulations of a target variable (e.g., streamflow). The calibration process requires defining an objective function to measure the similarity between observations and simulations. In general, these objective functions are defined based on “least squares” formulations such as the widely used Nash-Sutcliffe Efficiency (NSE; Nash and Sutcliffe, 1970) and the Kling–Gupta Efficiency (KGE; Gupta et al., 2009). Although alternative objective functions have been proposed to enhance the robustness of calibrated parameters and hydrological consistency (e.g., Fowler et al., 2018; Yilmaz et al., 2008; McMillan, 2020), KGE and NSE remain widely used for model calibration and evaluation (e.g., Klemeš, 1986; Motavita et al., 2019; Seibert et al., 2019; Beven, 2025).
The Kling–Gupta Efficiency (KGE), originally proposed by Gupta et al. (2009), has been one of the most popular performance metrics used in hydrology over the last decades. Thanks to the possibility of disaggregating it into its three components – bias, variability, and correlation – KGE provides interpretability and diagnostic power. It has been applied for many modeling purposes, including the analysis of streamflow extremes (e.g., Gu et al., 2023; Hirpa et al., 2018). Calibrations are often considered successful if the KGE performance exceeds a certain threshold during both the calibration and evaluation periods (e.g., KGE > 0.4). It is also often assumed that the KGE criterion can be used as a proxy for how well a model represents streamflow properties such as extreme events (e.g., Lema et al., 2025; Cinkus et al., 2023; Zhao et al., 2025). However, model evaluations often do not explicitly evaluate how drought or flood events are represented at the event scale. As a consequence, it remains unclear how suitable of a proxy KGE and alternative formulations (Gupta et al., 2009; Kling et al., 2012; Pool et al., 2018; Tang et al., 2021; Pizarro and Jorquera, 2024) or adaptations (e.g., transformations and weights; Garcia et al., 2017; Wu et al., 2025; Mizukami et al., 2019) are for describing model accuracy in terms of extreme events and their consecutive occurrence.
In summary, it is unclear how different modeling decisions, such as the choice of the hydrological model, objective function, and streamflow transformations, affect drought-to-flood transition simulations and how well overall performance metrics, such as KGE, relate to a model's ability to capture streamflow extremes. It remains to be explored which modeling choices are most suitable for capturing these compound hydrological extreme events without compromising hydrological consistency (i.e., representation of different hydrological processes or properties). To address these research gaps, we investigate the extent to which hydrological models can represent consecutive drought-to-flood transitions and the impact of model structure and calibration choices on their representation. Specifically, we address the following research questions:
-
How suitable is the KGE for calibrating models aimed at jointly simulating streamflow droughts and floods?
-
Which modeling choices (e.g., model structure, KGE formulation, etc.) are most important for simulating droughts, floods, and their transitions?
-
In which catchments are drought-to-flood transitions most challenging to model and detect?
To address these questions, we performed several calibration experiments with four conceptual bucket-type hydrological models (GR4J, GR5J, GR6J, and TUW) across 63 catchments in Chile and Switzerland. In our experiments, we tested different configurations of the Kling–Gupta efficiency (KGE) to assess their performance in simulating and detecting observed transitions. These configurations included five KGE formulations (Table 1), two streamflow transformations (i.e., Q and ) and their linear combination (i.e., ), and four different weights applied to the variability term of the KGE (). Secondly, we assessed the relative importance of each methodological choice for detecting events and ensuring hydrological consistency. Finally, we explored the link between model performance and hydrometeorological and physiographic catchment descriptors.
Gupta et al. (2009)Kling et al. (2012)Pool et al. (2018)Tang et al. (2021)Pizarro and Jorquera (2024)The study domain encompasses 24 and 39 near-natural catchments in Chile (CL; Fig. 1a) and Switzerland (CH; Fig. 1b), respectively. These catchments are selected based on the availability of complete daily streamflow records between 1981 and 2020 for at least 30 years, with a complete year being defined as one in which all months had information for at least 90 % of the days. We characterize the hydroclimatology of the catchments in our study domain by the wetness index (), runoff coefficient (), p-seasonality and q-seasonality index, and fraction of precipitation falling as snow (fsnow) over the period 1985–2020. The p-seasonality index (Woods, 2009; Berghuijs et al., 2014), as well as its analogue, q-seasonality, describes the seasonality of precipitation (or streamflow) and the degree of synchronization with the temperature seasonality. The fsnow is computed according to the formulation proposed by Woods (2009) and ranges from 0 (all precipitation falls as rain) to 1 (all precipitation falls as snow).
Figure 1Study domain and hydroclimatic characteristics computed for the period 1985–2020 using data retrieved from CAMELS Chile (CL) and Switzerland (CH). Location of catchments across the study domain in (a) Chile and (b) Switzerland, (c) relationship between wetness index (), runoff coefficient (), and mean annual precipitation, and (d) relationship of seasonality of precipitation and streamflow and fraction of precipitation falling as snow. For p-seasonality and q-seasonality, positive (negative) values indicate summer (winter) dominated precipitation or streamflow, while values close to zero indicate a uniform distribution across the year.
This characterization shows that selected catchments span a wide range of hydroclimatic characteristics (Fig. 1c), from energy to water-limited, and different hydrological regimes (Fig. 1d), from snowmelt (e.g., p-seasonality < −0.5 and q-seasonality > 0.5) to rainfall-dominated (e.g., p-seasonality < −0.5 and q-seasonality < −0.5). Some catchments are positioned above the water limit (i.e., ) or below the energy limit (i.e., ; Fig. 1c), which suggests an underestimation of precipitation – which might require correcting for precipitation undercatch (e.g., Newman et al., 2015; Stisen et al., 2012; Hughes et al., 2021) – or a surplus of streamflow due to, e.g., uncertainties in stage-discharge relationships or glacier and/or ground water contributions.
The CAMELS Chile (CL; Alvarez-Garreton et al., 2018a) and Switzerland (CH; Höge et al., 2023a) datasets are used to obtain the meteorological forcings, streamflow records, snow water equivalent (SWE) estimates, and catchment boundaries for the catchments in the two study domains. The meteorological forcings of both datasets, CR2Met version 2.5 for Chile (Boisier, 2023) and RhiresD version 2 for Switzerland (MeteoSwiss, 2021b, a), are based on local gridded observation-based products, while SWE products are based on a snow cover model and data assimilation (for more detail refer to Cortés and Margulis, 2017; Magnusson et al., 2014). We prefer these local products over global ones such as ERA5 (Hersbach et al., 2020) because of their reliance on observations and high horizontal resolutions (approximately 5 km × 5 km for CR2Met and 2 km × 2 km for RhiresD) that enable a better representation of precipitation patterns in the complex topography of our study domains. Further, these products have been widely used for hydrological studies in Chile (e.g., Vásquez et al., 2021; Alvarez-Garreton et al., 2021; Araya et al., 2023) and Switzerland (e.g., Peleg et al., 2020; Fatichi et al., 2015; Tuel et al., 2022). Streamflow records available through the CAMELS datasets were acquired from the national agencies in each country (i.e., the General Directorate of Water of Chile – DGA and the Swiss Federal Office for the Environment – FOEN). We compute topographic characteristics and hypsometric curves, which are needed to set up the snow routines, using the catchment outlines from CAMELS and the Multi-Error-Removed Improved-Terrain (MERIT) digital elevation model (Yamazaki et al., 2019). Additionally, we retrieve time series of actual evapotranspiration (ET) from the satellite- and reanalysis-based GLEAM v3.8a dataset (Miralles et al., 2011), which are spatially aggregated to the catchment scale and used to complement the model performance assessment.
Our methodological approach is illustrated in Fig. 2. Four hydrological models were calibrated against observed streamflow records, using five different formulations of the Kling–Gupta efficiency (KGE) as objective functions. In addition, we tested three streamflow transformations and four different weights applied to the KGE variability term. This calibration experiment resulted in 60 optimal parameter sets per model and catchment (i.e., ). We evaluated model performance based on (1) general goodness-of-fit metrics such as the NSE (Legates and McCabe, 1999; Althoff and Rodrigues, 2021), (2) simulation of extreme events and transitions between them using categorical indices, and (3) hydrological consistency in different processes related to streamflow, snow, and evapotranspiration by comparing simulated time series of these variables with observations or reference products. In this paper, we used the terms “formulation” to refer to a specific definition of the KGE (Table 1), “case” to refer to the application of KGE weights or flow transformations, and “configuration” to refer to the combination of a specific KGE formulation and a specific case using certain weights and a specific streamflow transformation. The cases without weights and/or the linear combination between streamflow without (i.e., Q) and with low-flow transformations (i.e., ) were used as a reference for the comparison of the results. To assess the statistical significance of the differences between, e.g., the ability to capture streamflow extreme events across models, as well as other configurations tested in this study, we applied the Wilcoxon test (Wilcoxon, 1945) at a 5 % significance level and provided p-values where possible. The Wilcoxon test is a nonparametric test used to determine whether two groups differ statistically, without making any specific assumptions about their distributions (e.g., normality). The following sections provide a detailed description of the different methodological steps.
3.1 Streamflow extremes characterization
We detected droughts, floods, and drought-to-flood transitions using the methods proposed by Götte and Brunner (2024). Their approach identifies periods of negative streamflow anomalies (i.e., droughts) using a daily varying threshold based on a 30 d rolling quantile of the daily streamflow data and high streamflow events (i.e., floods) using a fixed threshold based on a quantile of the annual maximum streamflow values. We further required that all drought events have a minimum duration of 30 d, and we merge droughts separated by fewer than 15 d (Van Loon and Van Lanen, 2012; Fleig et al., 2006; Tallaksen et al., 1997) to limit the detection of minor events. The thresholds for droughts (30th percentile of the smoothed daily flow) and floods (40th percentile of the annual maxima series) were set to ensure roughly one streamflow extreme event of each type (i.e., drought and flood) per year on average for each catchment (see Fig. S7 in the Supplement). This target was set in order to identify a statistically representative number of extreme events, comparable to the sample size that would be obtained by the commonly used annual maximum approach (e.g., Meylan et al., 2012). Using the flood and drought events identified, we, in a second step, identified transition events. Rapid (within 14 d) and seasonal (within 90 d) transitions are defined based on the number of days between the end of the drought and the onset of the flood, following Götte and Brunner (2024). Considering this definition and the thresholds adopted, we identified one transition event every 4 years on average for each catchment. Figure 3 illustrates the detection of droughts, floods, and their transitions for two catchments within the study domain.
3.2 Modeling approach
3.2.1 Hydrological models
We use four conceptual bucket-style rainfall-runoff hydrological models: GR4J (Perrin et al., 2003), GR5J (Le Moine, 2008; Pushpalatha et al., 2011), GR6J (Pushpalatha et al., 2011), all coupled to the snow accumulation-ablation module CemaNeige (Valéry et al., 2014a, b), and TUWmodel (Parajka et al., 2007), which is based on the HBV model (Bergström and Forsman, 1973). All models have been widely used within the hydrological community during the last decades (Seibert and Bergström, 2022). GR4J, GR5J, and GR6J (with 6, 7, and 8 parameters coupled with CemaNeige, respectively; see Table S1 in the Supplement) were chosen to explore how slight changes in model structure affect simulated streamflow extremes, and the TUW model (with 15 parameters; see Table S2) was selected to explore how more complex models, in terms of the snow routine and the representation of the processes occurring in the production storage, simulate these phenomena.
Bucket-type conceptual models often include parameters and functions that allow for non-conservative adjustments to the water balance (i.e., artificially adding or leaking water). While they can help correct potential mismatches, e.g., between topographical and underground catchments, they can also compensate for biases in the forcing. To explicitly correct for biases in the meteorological forcings (as illustrated in Fig. 1c), two parameters were included in the calibration process in addition to the original setup for each hydrological model. Specifically, a multiplicative parameter for precipitation (dP) and an additive parameter for temperature (dT) were included to adjust systematic biases in precipitation and temperature.
The GR4J, GR5J, and GR6J models – hereafter referred to collectively as GRXJ for simplicity – share the same genealogy, meaning that they are based on the same core structure. These models can be coupled to the snow module CemaNeige, which partitions precipitation into liquid and solid precipitation and simulates snow accumulation and melt (rainfall and snowmelt enter the GRXJ structures). The basic structure of the GRXJ family corresponds to the GR4J model, which includes a parameter for production storage capacity, representing surface processes, and a parameter for routing storage capacity, representing subsurface processes. Additionally, GR4J includes an intercatchment exchange parameter and a unit hydrograph parameter that represents the delay between precipitation and streamflow. GR5J adds an additional parameter to the GR4J structure to improve the intercatchment exchange function, while GR6J includes a parameter for exponential storage in parallel to the routing storage included in GR4J and GR5J to improve the representation of groundwater processes (i.e., slow runoff). It is important to note that the original structure of GR4J cannot be recovered by setting the parameter X5 equal to zero in GR5J, nor can GR5J be obtained by setting parameter X6 = 0.01 (the minimum value that can be adopted) in GR6J. This is because, e.g., in GR5J the routing function differs from GR4J, whereas in GR6J the effect of the exponential storage (defined by X6) cannot be canceled. Thus, despite having the same core structure, the models are intrinsically different from each other.
The TUW model consists of a snow, soil, groundwater (subsurface flow), and a routing routine, similar to the HBV model (Bergström and Forsman, 1973). One of the major differences between the HBV and TUW models lies in their snow routines. The TUW model does not allow for meltwater or rainfall to be retained within the snowpack, nor does it account for the refreezing of liquid water. The snow routine partitions between liquid and solid precipitation and estimates snow accumulation and melt. Rainfall and snowmelt enter the soil routine, where actual evaporation, soil moisture, and recharge are estimated. Then, the recharge flow goes to the groundwater routine, represented by two storages that produce surface runoff and quick flow (upper), and baseflow (lower). The sum of these flows is delayed in the routing routine using a triangular transfer function. Unlike the GRXJ models, which follow a water balance approach to characterize the production storage, TUW estimates evapotranspiration and recharge based on an explicit conceptualization of soil moisture content.
While both CemaNeige and the snow routine implemented in the TUW model follow a degree-day factor approach, there are differences in (i) the characterization of the precipitation phase (TUW allows the existence of a mixed partition between rain and snow), (ii) the conditions for snowmelt (free parameter in the TUW model and set to 0 °C for CemaNeige), and (iii) the presence (or absence) of a parameter to correct for snowfall undercatch (not available in CemaNeige). These differences also explain the number of parameters that each of the snow routines has (two and five for CemaNeige and the snow routine in the TUW model, respectively).
Despite their structural differences and conceptualizations (for further details refer to Astagneau et al., 2021b), these models provide simplified representations of some hydrological states, fluxes, and processes at the catchment scale using precipitation (P), mean temperature (T), and potential evapotranspiration (PET) at daily time steps as inputs. To estimate PET, we used the approach proposed by Oudin et al. (2005), which is based on temperature and requires latitude and the day of the year as a proxy for extraterrestrial radiation. Additionally, as the snow module CemaNeige can be configured in a semi-distributed way, we discretized each catchment into equal-area elevation bands based on the hypsometric curve by considering 10 elevation bands for all evaluated model structures. To make simulations comparable across model structures, precipitation and temperature inputs for the TUW model were extrapolated through 10 elevation bands following the approach implemented in the GRXJ models, based on the orographic gradients defined by Valéry et al. (2010).
3.2.2 Calibration strategy
The parameters of each model structure, as well as the forcing adjustment parameters introduced, were calibrated using daily streamflow records and the Dynamically Dimensioned Search (DDS; Tolson and Shoemaker, 2007) over the period 2000–2020. This calibration period was defined to capture the current hydroclimatic conditions in the modeling setup. Note that, because the temperature-adjustment parameter was incorporated, potential evapotranspiration was recalculated at each evaluation run within the calibration algorithm to maintain consistency between the two variables. Additionally, following the traditional calibration approach proposed for GRXJ models (e.g., Pelletier and Andréassian, 2022), a parameter-space transformation is applied to improve the search process during calibration (details in Table S3).
Different objective functions based on the KGE configuration were used to calibrate each model. In its most general form, the KGE (Eq. 1) compares simulations to a reference based on three components, i.e., dynamics (e.g., correlation), variability (e.g., standard deviation), and bias (e.g., mean). KGE values range from negative infinity to one, which represents the optimum. How each component is defined depends on which KGE formulation is used. To the best of our knowledge, there exist five such formulations in the literature (Gupta et al., 2009; Kling et al., 2012; Pool et al., 2018; Tang et al., 2021; Pizarro and Jorquera, 2024, more details in Table 1). Additionally, different scaling factors or weights (i.e., c1, c2, and c3 in Eq. 1) can be used to put more emphasis on some of the components of the KGE as well as different streamflow transformations to give more weight to specific parts of the flow distribution (e.g., Thirel et al., 2024; Mizukami et al., 2019). To emphasize low flows, for example, flows can be transformed to the inverse of streamflow (i.e., ; e.g., Garcia et al., 2017; Wu et al., 2025). Further, linear combinations of the KGE applied to flows with and without transformation (i.e., Q and , respectively) have been presented as useful objective functions to find a good compromise between high- and low-flows (e.g., Araya et al., 2023; Knoben et al., 2020; Muñoz-Castro et al., 2023).
For each hydrological model and catchment, 60 different objective functions were implemented based on the possible combinations of the following methodological choices: (i) 5 KGE formulations (Table 1), (ii) 3 streamflow transformation cases (High, Low, High-Low), and (iii) 4 weights applied to the variability term of the KGE (i.e., in Eq. 1, ). For the low-flow transformation (Low; i.e., using ), a constant equal to 1 % of the mean streamflow was added to the series to avoid zero-flow problems (see e.g., Pushpalatha et al., 2012; Garcia et al., 2017; Knoben et al., 2020). To facilitate the notation associated with the streamflow transformations tested here, we refer to the case as “Hi” (High) when a certain formulation of KGE was applied to untransformed streamflow (i.e., Q), while “Lo” (Low) refers to the case where a low-flow transformation was applied (i.e., ). We refer to the linear combination of both cases (i.e., ) as “HiLo”.
3.3 Model accuracy assessment
We assessed model accuracy both in terms of general model performance and the ability of the model to capture extreme events and the transitions between them. We followed a traditional split-sample test approach (Klemeš, 1986; Beven, 2025) to assess the general model accuracy over two time periods defined as (i) calibration (2000–2020) and (ii) evaluation (1985–1999). To test for general accuracy and hydrological consistency across the calibration experiments tested here, we computed several goodness-of-fit metrics (e.g., KGE) and hydrological signatures (e.g., seasonality, low- and high-flows). First, we assessed model performance across the 60 configurations by comparing the values obtained for each objective function during calibration. Second, we assess the predictive skill of our calibrated models by comparing their performance during calibration with that of a simple daily mean flow benchmark. This benchmark is defined as the mean flow for each day, calculated from all instances over the calibration period (referred as BM05 in Knoben, 2024). Third, we assessed model performance by looking at biases in a set of hydrological signatures, including seasonality, statistical properties (mean, variance), flow duration curve-derived signatures (e.g., mid-segment slope), and annual extremes (see Table S5). We conducted this analysis in two steps: (i) we analyzed the models' ability to reproduce the seasonal timing (seasonality) of streamflow (Q), snow water equivalent (SWE), and actual evapotranspiration (ET); and (ii) we computed biases in streamflow-derived signatures. The results of this general model performance assessment are presented in Supplement Sect. S1.
To assess the model's capability to detect streamflow extremes and their transitions, we used the Critical Success Index (CSI; Eq. 2), which is formulated based on the number of hits (H; events identified both in the reference/observations and the simulations), misses (M; events only identified in the reference/observations), and false alarm events (F; events identified only in the simulations). The CSI values vary between zero and one, with one representing the optimum. We defined hits as simulated events overlapping at least 50 % of the time window with their observed counterparts. Additionally, a tolerance window of 30 and 5 d was defined before the onset and after the end of an observed drought and flood event, respectively. In short, we aimed to evaluate the models' ability to capture streamflow extremes and their transitions rather than their characteristics (e.g., cumulative deficit during the drought period, flow peak, etc.).
3.4 Assessment of the relative importance of modeling decisions
To assess the relative importance of modeling decisions on the detection of streamflow extremes and their transitions, we conducted an analysis of variance (ANOVA; Fisher, 1992; Kaufmann and Schering, 2014). The ANOVA enabled us to examine the relationship between different modeling decisions (e.g., choice of structure and different decisions related to calibration) and quantify their relative importance in explaining the total variance in the target variable (e.g., CSI). Thus, by dividing the total variance into different groups, genuine sources of variation that are not explained by chance can be identified. We assumed that the total variance (TV) in the target variable can be mainly explained by the differences between hydrological models (HM), KGE formulations (KGEf), streamflow transformations (QTR), and KGE component weights (W). If, for example, weights do not have a significant impact on the detection of streamflow extremes, we would expect a low value for the term “W”, that is a lower relative importance (i.e., ) for explaining the total variance with respect to other decisions. Based on this conceptualization and considering a residual term (RS) that groups all the interactions between decisions and the variance that we cannot explain from them, the ANOVA can be expressed as follows:
We also analyze the relative importance of the differences between catchments by including them in the ANOVA test. However, we ultimately removed this component from the explanatory variables because its influence sometimes dominated the results, thereby hiding the contribution of the intrinsic modeling decisions being tested to the variability observed in the CSI values.
3.5 Identification of important processes in simulating drought-to-flood transitions
To identify the most important processes in simulating drought-to-flood transitions, we assessed which model parameters explain the detection of events. To do so, we analyzed the relative importance of each model parameter in estimating the CSI through an ANOVA test applied per catchment. This analysis, expressed by Eq. (4), considers the 60 alternative configurations (i.e., parameter sets) available per model and catchment and uses the total variance explained (TV) by each parameter (θi; where , and Np is the number of parameters) as a proxy for the importance of the associated variable/process in explaining event detection. The approach used to analyze the relative importance of the parameters explaining the variance of the CSI may have problems if the parameters do not show enough variation between the different configurations. However, despite the similarities in the configurations used for calibration, almost all the parameters show high variability among the calibrated parameter sets per catchment (see Fig. S15).
4.1 General model performance assessment
Before looking at model performance in terms of capturing extreme events, we assessed the overall performance of the four models used. For this, we independently evaluated the calibration results for each configuration. Our results shown comparable performance across the hydrological models evaluated here (Fig. S1). For instance, all configurations outperform the defined daily mean flow benchmark (see Fig. S2), indicating that our models have greater predictive power with respect to the long-term observed streamflow series. Our more detailed analyses show that the seasonality of variables such as streamflow, SWE, and ET are simulated accurately, with median performance values across catchments and configurations between 0.79–0.98 (with 1.0 being the optimum). However, our evaluation shows that using weights for the variability term of KGE greater than 2 can be detrimental to the overall performance of the model, both in terms of representing the seasonality of the aforementioned variables (Fig. S3) and some hydrological signatures such as the high- and low-segments of the slope in the flow duration curve (FDC, Fig. S4). In general, the use of flow transformations yields values that are consistent with what the application seeks to capture (e.g., low-flows are better simulated with “Lo” transformation and high-flows are better simulated without transformation; see Fig. S5). There is little difference between different models and KGE formulations when weights and the HiLo transformation are used (Fig. S6). Considering those configurations with comparable performance (i.e., removing those relying on weights greater than 2), average accuracy across configurations ranges between 0.87–0.92, 0.88–0.93, and 0.75–0.85, for the high-, mid-, and low-segment of the slope of the FDC, respectively. Further details on overall model performance are presented in Sect. S1.
The results presented subsequently are based on the simulations with the HiLo (i.e., ) configuration, except for specific cases for which all 30 configurations (i.e., removing weights greater than two) per catchment were used (e.g., ANOVA tests). Previous studies have already shown that the use of this configuration results in a good compromise between simulating low and high flows (e.g., Garcia et al., 2017; Thirel et al., 2024; Lema et al., 2025).
4.2 Suitability of KGE for calibrating models aimed at simulating drought-to-flood transitions
Next, we assess how strongly the general model performance described by the KGE is linked to the capability of the model in detecting extreme events. To do so, we compared the objective function value retrieved for one of our calibration configurations – the original KGE formulation configured with unweighted HiLo (i.e., c2=1 and HiLo = ), which is later used as a reference – with the performance in detecting droughts, floods and their transitions based on the CSI (Fig. 4). Our comparison clearly shows that model performance varies across catchments and model structures for both the KGE and CSI. While the overall performance described by the KGE can potentially be used as a proxy for a model's performance in capturing droughts for some catchments (e.g., points close to the optimal values for both KGE and CSI, i.e., 1, and CSI ranges from 0.18 to 0.74 for GR4J and from 0.18 to 0.78 for TUW), this link between general model performance and event detection is neither generalizable to floods and transitions, nor to all the models tested here. Rather, a high KGE does not necessarily imply a high CSI for these two types of events.
Figure 4Comparison between the Kling–Gupta Efficiency (KGE) for the calibration period and the Critical Success Index (CSI) for droughts, floods, and transitions, based on the simulations with the models (a) GR4J, (b) GR5J, (c) GR6J, and (d) TUW calibrated with the unweighted original KGE formulation as the objective function. The dispersion bars are associated with the 10th and 90th percentiles across catchments, while the central shape indicates the 50th percentile. Transparent circles show results for each catchment. For both KGE and CSI, the optimal value is 1.
While KGE is not necessarily a good proxy for how well a model captures extreme events (especially floods and transitions), some specific KGE formulations might be better suited for this task than others. We evaluate this in a next step by exploring to what extent different adjustments in the “basic” configuration used for the analysis presented above can (or cannot) improve the performance in detecting streamflow extreme events and, particularly, drought-to-flood transitions.
4.3 Impacts of KGE configurations on drought-to-flood transition simulations
To assess the added value of the application of weights to the variability term of the KGE as well as the use of different KGE formulations for detecting independent extreme events and their transitions, we use the GR4J model as an example to quantify differences in CSI between the unweighted original KGE (reference) and alternative cases (e.g., weights and/or KGE formulations; Fig. 5). We find that, in the context of a large-sample study, weighting the variability term of the KGE does not consistently enhance model performance in detecting streamflow extremes and their transitions (median difference is centered around 0 in both cases) and may even be detrimental. Further, weighting the variability term can substantially worsen flood detection (e.g., GRXJ models, Fig. S10). Additionally, we find that using a modified KGE formulation, rather than the original, does not substantially improve model performance. In short, the use of weights and the choice of the KGE formulation do not play a dominant role in the overall performance of the model over the study domain. These findings are consistent across the other model structures tested (see Fig. S10).
Figure 5Difference in the CSI for GR4J simulations obtained using model calibrations with no weights and the original KGE (reference) versus different weights and KGE formulations (alternative) for (a) droughts, (b) floods, and (c) transitions. Values above (below) 0 indicate better (worse) performance of the reference compared to the alternative. Each boxplot displays the information of 63 values (i.e., one per catchment).
4.4 Importance of model structure
Our results show that drought detection is typically more reliable than that of floods and transitions between the two (Fig. 6). However, there are no significant differences in the detection rate of droughts, floods and their transitions, across the hydrological models. While the CSI median values slightly improve when switching from the GR4J model to the other GRXJ and TUW versions for droughts (0.49 to 0.55), this is not the case for floods and transitions. For instance, for floods, GR5J stands out among the GRXJ models (CSI = 0.30, compared to 0.22 and 0.18 for GR4J and GR6J, respectively), while for transitions, GR6J shows a better performance compared to the simpler models (CSI = 0.30, compared to 0.25 and 0.29 for GR4J and GR5J, respectively). This suggests that adding more parameters does not necessarily lead to improved model performance when detecting extreme streamflow events.
Figure 6Critical Success Index (CSI) for (a) droughts, (b) floods, and (c) drought-to-flood transitions, based on the simulations with GR4J, GR5J, GR6J, and TUW (different colors) calibrated with the unweighted HiLo original KGE formulation as the objective function. Each boxplot displays the information of 63 values (i.e., one per catchment).
These results hold independently of the country considered (see Fig. S11 for a comparison between Swiss and Chilean catchments). However, the detection of extreme events is more challenging in catchments located in Chile compared to those located in Switzerland, with differences in the median CSI between countries (i.e., CSICH−CSICL) lying around 0.28, 0.12, and 0.16 for droughts, floods, and drought-to-flood transitions, respectively.
Different model structures can result in similar streamflow simulations even though they represent hydrological fluxes and states in different ways. To illustrate this, we compare simulated fluxes obtained for an observed seasonal drought-to-flood transition in the Dischma river in Switzerland across the four hydrological models (Fig. 7). While three out of four models capture the transition event successfully (GR6J fails in capturing its timing) and show similar temporal patterns of ET, snowmelt, and SWE, the contribution of baseflow (presented as a percentage of total runoff) varies strongly among them. Consequently, the analysis of the drivers associated with such transition events will vary depending on which model structure is analyzed. Although there is a high agreement between the models in terms of the detection of the event in this sample case (i.e., 3 out of 4), such agreement is not necessarily apparent for all events and catchments (Fig. 4).
Figure 7Example of how different hydrological fluxes and states – such as runoff (Q), baseflow, actual evapotranspiration (ET), snowmelt, and snow water equivalent (SWE) – are simulated for an observed drought-to-flood transition in the Dischma river (Switzerland) with the GR4J, GR5J, GR6J, and TUW hydrological models calibrated with the unweighted HiLo original KGE formulation.
4.5 Relative importance of different modeling decisions
Our previous results showed no significant differences when pooling results by model (Fig. 6). However, when it comes to the relative importance in explaining the total variance of the detection skill, the results of the ANOVA show that the most important modeling decision in simulating extreme events and their transitions is the choice of a suitable model structure, followed by the choice of the streamflow transformation (Fig. 8). In contrast, the choices of KGE formulation and weights do not have a strong impact on the performance in simulating streamflow extremes. For floods, the transformation is more important because of the loss of performance in representing high flows when the model is calibrated with a low-flow transformation (Fig. S5). This highlights the importance of selecting the appropriate transformation according to the modeling objectives. Additionally, when catchment characteristics are included as an explanatory variable, they strongly influence drought detection, while they have little effect on flood detection (see Fig. S13). The relative importance of the methodological choices is similar when analyzing other categorical indices, such as the probability of detection, false alarm ratio, and frequency bias (see Fig. S12).
4.6 Model accuracy depends on catchment characteristics
We further explore the relationship between model accuracy and catchment characteristics using Spearman's rank correlation coefficient. To this end, we focus on the CSI obtained for the different types of extreme events of interest (droughts, floods, and transitions) generated with the GR4J and TUW models calibrated with the unweighted HiLo original KGE formulation (Fig. 9; extended version including all models in Fig. S14). Drought-to-flood transitions are more difficult to capture in semi-arid (negative correlation between aridity index and CSI), high-mountain (negative correlation between mean elevation and CSI), and flashy (negative correlation between the slope of the flow duration curve and CSI) catchments than in humid low-elevation catchments with high streamflow elasticity to precipitation (Fig. 9). This result is generalizable to the other models and the different KGE formulations tested (see Fig. S14).
Figure 9Spearman's rank correlation coefficient between different catchment attributes and the CSI for (a) droughts, (b) floods, and (c) drought-to-flood transitions, based on the simulations with GR4J and TUW calibrated using the unweighted HiLo original KGE formulation as the objective function. The circles with thick outlines indicate correlation coefficients with p-values lower than 0.05.
4.7 Linking model performance to hydrological processes during streamflow extremes
We conduct an ANOVA test to analyze the relative importance of different model parameters in detecting streamflow extremes and their transitions (Fig. 10; the extended version with rapid and seasonal transitions is presented in Fig. S18). We show that some model parameters are relatively more important than others (e.g., X4 for floods in GRXJ models), but that the relative importance of a given parameter can vary substantially across catchments. All of the hydrological models show a high importance of the parameters aimed to adjust the forcings (i.e., dP and dT for all the models as well as SCF in TUW model, which seeks to correct for the snow undercatch). For the GRXJ models, X3 (routing store capacity) and X4 (unit hydrograph time constant) are most important in the simulation of low and high flows compared to the rest of the parameters, which is accentuated when more complexity is added to the base structure (i.e., GR6J). In the TUW model, which has more parameters than the GRXJ structures, the relative importance of each parameter is more uniform, and their relative importance is low, except for the parameter k0 (storage coefficient for very fast response), which becomes more important for flood detection in comparison to, e.g., drought detection.
5.1 Simulation of compounding streamflow extreme events
We find that the hydrological models tested are better at detecting droughts (median CSI across catchments and KGE formulations: 0.50–0.58) than floods (median CSI across catchments and KGE formulations: 0.13–0.34), and their performance in detecting drought-to-flood transitions is closely related to (and likely limited by) their performance in detecting floods (0.25–0.33; Fig. 4). This difference in drought and flood simulation performance can be attributed to the different timescales associated with these two types of extreme events: while droughts vary in duration from months to years (or decades), floods develop, and may also subside, in a matter of hours or days. This is consistent with the poor performance of all the models tested in capturing rapid transitions (i.e., occurring within 14 d). Moreover, we obtained CSI values greater than zero for rapid transitions only in 13 basins (not shown). Overall, our analyses highlight that these fast processes are rather difficult to capture in conceptual rainfall-runoff models like GRXJ and TUW.
5.2 Good general model performance does not imply that extremes are well detected
Our results highlight that a good general model performance in terms of KGE does not necessarily imply a good performance in detecting streamflow extremes. Even models with high accuracy, measured by traditional metrics such as KGE, struggle to capture extreme events, particularly floods and transitions from drought to flood (Fig. 4). These findings are aligned with previous studies discussing the potential of KGE to represent high-flow values or capture flashy streamflow dynamics (e.g., Astagneau et al., 2022; Brunner et al., 2021; Mathevet et al., 2020). For instance, from a modeling perspective, Astagneau et al. (2021a) demonstrated that the relationship between KGE values and a model's capability in simulating summer floods is weak. Similarly, Bruno et al. (2024) showed that, during extreme low-flow conditions, model performance is usually lower than during normal flow conditions. Spieler and Schütze (2024) showed that the KGE lacks the capacity to provide information about detailed processes, leading to gaps between model accuracy (i.e., how well a model matches simulations with observations) and adequacy (i.e., how well a model captures key processes and behaviors of the observed system). These findings suggest that the traditional evaluation of hydrological models through goodness-of-fit metrics such as KGE or NSE must be accompanied by an explicit examination of their capability to simulate and detect streamflow extreme events, e.g., by using metrics such as the CSI.
From a process perspective, hydrological model underperformance can be linked to oversimplified or poorly represented (or understood) processes (Beven, 2019; Clark et al., 2017; Hrachowitz et al., 2014; McMillan et al., 2018). For instance, in the context of drought-to-flood transitions, prolonged dry conditions can alter soil properties, such as cracking (Gimbel et al., 2016; dos Santos et al., 2016), water repellency (Doerr et al., 2007; Leighton-Boyce et al., 2007), and macropore connectivity (Or et al., 2013), changing the infiltration-runoff partitioning and potentially intensifying catchment responses to precipitation. Despite their importance for flood generation, these soil and near-surface processes remain poorly understood and, consequently, are rarely represented in conceptual or even physically-based hydrological models (Brunner, 2023; Barendrecht et al., 2024; Blöschl et al., 2019). This limits their ability to reproduce streamflow extremes, such as droughts and floods, and rapid shifts between them. Therefore, it is important to improve our understanding of the processes behind transitions and how they are represented in hydrological models.
5.3 The importance of different modeling decisions for simulating streamflow extremes and their transitions
Our results show that model structure is the most important modeling decision for capturing extreme events and their transitions (Fig. 8), which is consistent with previous studies focused on the independent analysis of extreme events (e.g., Alexander et al., 2023; Melsen and Guse, 2019; van Kempen et al., 2021). Among the structures tested, slight but non-significant differences were found in terms of their performance in representing streamflow extreme events (Fig. 6). However, there is an evident decrease in performance in detecting floods compared to droughts, which generalizes across the four models tested. These deficiencies in flood simulation performance translate to deficiencies in capturing drought-to-flood transitions (Figs. 4 and 6). These findings suggest that the lack of an explicit structural component allowing for the simulation of floods that occur under dry and low soil moisture conditions could explain the poor performance associated with this type of compound event. Indeed, Astagneau et al. (2022) highlighted that conditioning the storages and fluxes of a lumped conceptual hourly-timestep model on rainfall intensities could benefit model performance in catchments with a fast response to precipitation (i.e., flashy-catchments). For droughts, van Kempen et al. (2021) have shown that the magnitudes of the low-flow events are significantly affected by alterations in the architecture of the upper and lower storages, which is consistent with the changes in performance among the GRXJ models (Fig. 6), where small structural modifications lead to changes in the detection of these events.
We demonstrated that the capability of a model to identify streamflow extreme events and their transitions in simulations varies depending on its structure. In contrast, model accuracy in detecting extreme events does not necessarily depend on the number of model parameters (Fig. 6). Several studies have highlighted that including a more detailed representation of hydrological processes in models does not necessarily imply better accuracy (e.g., Orth et al., 2015; Valéry et al., 2014a). This is because more realistic representations require more detailed information to characterize the system of interest (e.g., land cover maps, distributed forcings, a high-resolution digital terrain model, soil properties), which are not always available. Recently, Santos et al. (2025) found that models with varying complexity can lead to similar robustness issues, stressing the need to improve strategies for diagnosing the suitability of model structures to improve the understanding of specific hydrological processes (e.g., Spieler and Schütze, 2024; Knoben et al., 2020).
The results presented here show that the choice of objective function is relatively less important compared to the choice of model structure (Fig. 8). However, choosing an appropriate transformation can be an important decision for improving a model's ability to capture flood events. Model performance can be optimized both in terms of general performance and the representation of extreme events (CSI) under (a) the application of equal weights to all components of the KGE (Fig. 5) and (b) the application of a streamflow transformation that focuses on both high and low flows (Figs. S1–S6). Our comparison also highlights that the potential benefit from adjusting these choices (e.g., using other weights or other transformations) varies widely between catchments (Fig. 5). This is in line with the findings of Mizukami et al. (2019), who found that the influence of weights on model performance depends on model structure and catchment characteristics. While none of the tested modifications in the objective function consistently improve the simulation of streamflow extremes across all catchments in the study domain, some of the alternative KGE formulations could improve the simulation of certain variables in certain catchments.
The high relative importance of the forcing adjustment parameters for event detection (i.e., dP and dT for all the models as well as SCF in TUW model, which seeks to correct the snow undercatch; Fig. 10) suggests that the meteorological forcings can have a major impact on the detection of streamflow extremes and their transitions. Several studies have shown that errors in meteorological forcing are a key challenge in hydrological modeling (e.g., Brunner, 2023; Döll et al., 2016) due to, e.g., their large influence on the simulation of snow processes (e.g., Tang et al., 2023; Günther et al., 2019), or significant impacts on the partitioning between evaporation and runoff (e.g., Nasonova et al., 2011). Here, we attempt to reduce this effect by (1) preferring local meteorological products over global ones (e.g., Clerc-Schwarzenbach et al., 2024), and (2) incorporating adjustment factors to account for potential systematic biases associated with them (e.g., Hughes, 2024; Probst and Mauser, 2022). However, introducing forcing adjustment factors could artificially compensate for some model deficiencies by modifying the inputs (e.g., Tang et al., 2023, 2025). This is evidenced by the high dispersion of forcing adjustment factors within each configuration (Fig. S16) where, e.g., catchments with higher precipitation falling as snow tend to have higher values in precipitation adjustment. We acknowledge that the incorporation of forcing adjustment parameters could have an impact on the partitioning of precipitation between ET and runoff. However, this problem also occurs when working with different parameter sets, which may come from different calibration functions. We have evaluated the impact of incorporating these additional parameters on the identifiability of the original model parameters, showing that there are no meaningful impacts (see Fig. S19). In light of these findings, we surmise that improvement in the spatiotemporal representation of precipitation and temperature, as well as of the potential interactions between these variables, might represent a critical step towards improved representations of compound streamflow extreme events in hydrological models.
5.4 Limitations and recommendations for future work
Our model calibration experiments focused on the simulation of extreme streamflow events, which required the choice of specific event definitions. Here, we defined hydrological droughts and floods using threshold-based approaches, and the thresholds were adjusted in a way to identify, on average, one event per year and catchment. Because this methodological choice does, to a certain degree, affect the outcomes of the comparison, we tested different thresholds for defining streamflow extreme events. The results of this sensitivity analysis indicate that using more flexible thresholds to define droughts (i.e., higher percentiles) can enhance the detection of these events, as more instances are identified, and they tend to be less severe compared to more restrictive thresholds (see Fig. S8). The improvement in drought detection when the threshold is relaxed can be explained by the fact that models generally struggle during more extreme hydrological drought periods (e.g., Bruno et al., 2024), which are relatively less frequent if the threshold is raised. However, we did not find such an effect for floods and transitions, for which we obtained similar model performances regardless of the thresholds used (see Fig. S8). Similar results are obtained when the overlap window used to identify the hits is modified (Fig. S9). While our study shows that the choice of threshold does not substantially affect model accuracy in terms of transition events, the method used to define streamflow extreme events can have a major impact on the characteristics of the transition events identified.
To support our analysis, we tested four bucket-type hydrological models used within the hydrological modeling community (Addor and Melsen, 2019). Even though these models are at the lower end in terms of model complexity (Hrachowitz and Clark, 2017), and three of them share the same core structure, they allowed us to perform a comprehensive analysis of different model structures at a lower computational cost than when using models with more complex structures (e.g., Clark et al., 2017; Orth et al., 2015; Poncelet et al., 2017). Furthermore, previous studies have also shown that more complexity does not necessarily imply better performance (Fig. 6; e.g., Li et al., 2015; Merz et al., 2022).
These models have been calibrated based on daily streamflow records, using different objective functions derived from KGE formulations, and considering the set of parameters with the best performance as the optimum. However, it is important to acknowledge that potential compensations for biases in meteorological forcings or model deficiencies can make the “optimal” parameter sets less identifiable (e.g., Clark and Vrugt, 2006; Vrugt et al., 2005; Beven, 2025). Here, we explore the (dis)agreement between the optimal parameters for each configuration (Fig. S17), showing overall agreement indices of around 0.5 (i.e., the parameters have a range of variation of approximately 50 % of the parameter space). Based on the evaluation of the models' performance, we were able to verify that, despite the dispersion of optimal parameters, the simulations are consistent with the products used to evaluate the models (Figs. S3 and S6). To (i) complement model assessment, (ii) better define the parameter exploration range, and (iii) lead to parameter sets that ensure reliability and fidelity in representing hydrological processes, hydrological variables other than streamflow, such as SWE or ET, can provide useful information to improve hydrological modeling.
Our results provide insights on possible avenues of future research that could benefit drought-to-flood transitions modeling, which include: (1) exploring the use of modular platforms and a multi-model ensemble approach to quantify model uncertainty and identify more suitable model structures (e.g., Saavedra et al., 2022); (2) improving our understanding of the role of the spatial variability of precipitation for accurate flood simulations (e.g., Macdonald et al., 2025; Astagneau et al., 2022); (3) assessing the benefits of model runs at a subdaily timestep (e.g., hourly); and (4) exploring alternative data-driven modeling approaches such as long short-term memory (LSTM) networks (e.g., Frame et al., 2022; Acuña Espinoza et al., 2025; Kratzert et al., 2018).
We performed a modeling intercomparison study to (i) explore to what extent hydrological models can simulate drought-to-flood transitions and (ii) identify suitable modeling choices aimed at capturing these compound extreme events. For this intercomparison, we calibrated four conceptual bucket-type hydrological models (GR4J, GR5J, GR6J, and TUW) for 63 catchments in Chile and Switzerland using 60 different configurations of the Kling–Gupta Efficiency (KGE) as objective functions, based on five KGE formulations, four scaling factors, and three streamflow transformations. Based on the results of this intercomparison, we draw the following conclusions:
-
A satisfactory general model performance, as expressed by the KGE, does not guarantee a good performance in terms of detecting streamflow extremes and their transitions. While KGE can serve as a rough proxy for low-flow performance, it cannot for high-flows and drought-to-flood transitions. Consequently, assessments of the suitability of hydrological models for simulating extreme events and their transitions should use metrics capable of directly quantifying performance in terms of capturing extreme events, such as the critical success index (CSI).
-
The most important modeling decision when it comes to simulating floods, droughts, and their transitions is the choice of a suitable model structure. However, in a large-sample context, we demonstrate that the four models tested here (i.e., GR4J, GR5J, GR6J, and TUW) have similar performance, showing that adding more parameters does not necessarily improve the representation of extreme events.
-
In contrast, despite it still playing a large role in floods, the choice of the objective function and its exact configuration are, overall, less important. The choice of a suitable streamflow transformation can improve the simulation of extreme events to a certain degree. Specifically, a joint focus on high and low flows by equally weighting the two streamflow transformations in the objective function (referred to as HiLo in our analysis) can improve model performance without compromising its ability to capture streamflow extremes. However, the choice of the exact KGE formulation and the use of weights for the variability term of the KGE do not substantially affect the simulation of extreme events and the direction of this effect depends on the catchment.
-
Drought-to-flood transitions are more difficult to capture in semi-arid, high-mountain catchments than in humid low-elevation catchments.
-
Overall, simulation of both high and low streamflow extremes (i.e., those associated with floods and droughts), as well as transitions between them, remains challenging. This appears to be especially true for floods and drought-to-flood transitions. This may complicate interpretation of hydrologic response to increasingly volatile hydroclimate forcings in a warming world, and suggests that new modeling methods may be required to better understand extremes and their transitions amid climate change.
This methodological intercomparison highlights that simulating streamflow extremes and their transitions is not a trivial modeling task and continued research is needed to improve model accuracy for compound events. The results of this intercomparison study suggest that time is best invested when focusing on improving model structures rather than calibration procedures. Specifically, hydrological model development should focus on improving the representation of processes and components associated with the temporal dynamics of discharge, such as routing or the soil response to intense snowmelt and rainfall. Additionally, the strong link between model accuracy and parameters aimed at correcting precipitation inputs suggests that the representation of extreme events can be improved by investing in the quality of meteorological forcing datasets. Investments in improving the simulation of extreme events and their transitions are crucial because hydrological models can not only support process understanding related to compounding streamflow extremes, but can also be used to forecast such events at short time scales and to project future changes in the occurrence of drought-to-flood transitions. Such applications are critical to ensure society's preparedness for these types of hydrometeorological extreme events.
The R-scripts and data used to produce the results shown in this paper – such as parameter sets used to generate the simulations and performance metrics – are publicly available through Zenodo (https://doi.org/10.5281/zenodo.14803500, Muñoz-Castro et al., 2025). CAMELS-CL (Alvarez-Garreton et al., 2018a) is available on PANGAEA (https://doi.org/10.1594/PANGAEA.894885, Alvarez-Garreton et al., 2018b) and https://camels.cr2.cl/ (last access: 9 February 2026) while CAMELS-CH (Höge et al., 2023a) can be downloaded from Zenodo (https://doi.org/10.5281/zenodo.7784633, Höge et al., 2023b). The GLEAM3.8a dataset (Miralles et al., 2011) is available upon request at https://www.gleam.eu/ (last access: 9 February 2026).
The supplement related to this article is available online at https://doi.org/10.5194/hess-30-825-2026-supplement.
EMC, BJA, and MIB conceptualized and designed the study. EMC conducted all the model simulations, analyses of results, and created the figures. All authors contributed to discussing the methodology and results and to reviewing and editing the manuscript.
At least one of the (co-)authors is a member of the editorial board of Hydrology and Earth System Sciences. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
We thank the Swiss National Science Foundation (SNSF) for support this project. PM acknowledges the support from ANID-PIA Project AFB230001 (AMTC).
This research has been supported by the Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (grant no. 200021_214907).
This paper was edited by Fabrizio Fenicia and reviewed by Wouter Knoben and one anonymous referee.
Acuña Espinoza, E., Loritz, R., Kratzert, F., Klotz, D., Gauch, M., Álvarez Chaves, M., and Ehret, U.: Analyzing the generalization capabilities of a hybrid hydrological model for extrapolation to extreme events, Hydrol. Earth Syst. Sci., 29, 1277–1294, https://doi.org/10.5194/hess-29-1277-2025, 2025. a
Addor, N. and Melsen, L. A.: Legacy, Rather Than Adequacy, Drives the Selection of Hydrological Models, Water Resources Research, 55, 378–390, https://doi.org/10.1029/2018WR022958, 2019. a
Alexander, A. A., Kumar, D. N., Knoben, W. J. M., and Clark, M. P.: Evaluating the parameter sensitivity and impact of hydrologic modeling decisions on flood simulations, Advances in Water Resources, 181, 104560, https://doi.org/10.1016/j.advwatres.2023.104560, 2023. a, b
Althoff, D. and Rodrigues, L. N.: Goodness-of-fit criteria for hydrological models: Model calibration and performance assessment, Journal of Hydrology, 600, 126674, https://doi.org/10.1016/j.jhydrol.2021.126674, 2021. a
Alvarez-Garreton, C., Mendoza, P. A., Boisier, J. P., Addor, N., Galleguillos, M., Zambrano-Bigiarini, M., Lara, A., Puelma, C., Cortes, G., Garreaud, R., McPhee, J., and Ayala, A.: The CAMELS-CL dataset: catchment attributes and meteorology for large sample studies – Chile dataset, Hydrol. Earth Syst. Sci., 22, 5817–5846, https://doi.org/10.5194/hess-22-5817-2018, 2018a. a, b
Alvarez-Garreton, C., Mendoza, P. A., Boisier, J. P., Addor, N., Galleguillos, M., Zambrano-Bigiarini, M., Lara, A., Puelma, C., Cortes, G., Garreaud, R., McPhee, J., and Ayala, A.: Links to files – The CAMELS-CL dataset, PANGAEA [data set], https://doi.org/10.1594/PANGAEA.894885, 2018b. a
Alvarez-Garreton, C., Boisier, J. P., Garreaud, R., Seibert, J., and Vis, M.: Progressive water deficits during multiyear droughts in basins with long hydrological memory in Chile, Hydrol. Earth Syst. Sci., 25, 429–446, https://doi.org/10.5194/hess-25-429-2021, 2021. a
Araya, D., Mendoza, P. A., Muñoz-Castro, E., and McPhee, J.: Towards robust seasonal streamflow forecasts in mountainous catchments: impact of calibration metric selection in hydrological modeling, Hydrol. Earth Syst. Sci., 27, 4385–4408, https://doi.org/10.5194/hess-27-4385-2023, 2023. a, b
Asadieh, B. and Krakauer, N. Y.: Global change in streamflow extremes under climate change over the 21st century, Hydrol. Earth Syst. Sci., 21, 5863–5874, https://doi.org/10.5194/hess-21-5863-2017, 2017. a
Astagneau, P. C., Bourgin, F., Andréassian, V., and Perrin, C.: When does a parsimonious model fail to simulate floods? Learning from the seasonality of model bias, Hydrological Sciences Journal, 66, 1288–1305, https://doi.org/10.1080/02626667.2021.1923720, 2021a. a
Astagneau, P. C., Thirel, G., Delaigue, O., Guillaume, J. H. A., Parajka, J., Brauer, C. C., Viglione, A., Buytaert, W., and Beven, K. J.: Technical note: Hydrology modelling R packages – a unified analysis of models and practicalities from a user perspective, Hydrol. Earth Syst. Sci., 25, 3937–3973, https://doi.org/10.5194/hess-25-3937-2021, 2021b. a
Astagneau, P. C., Bourgin, F., Andréassian, V., and Perrin, C.: Catchment response to intense rainfall: Evaluating modelling hypotheses, Hydrological Processes, 36, e14676, https://doi.org/10.1002/hyp.14676, 2022. a, b, c
Baez-Villanueva, O. M., Zambrano-Bigiarini, M., Mendoza, P. A., McNamara, I., Beck, H. E., Thurner, J., Nauditt, A., Ribbe, L., and Thinh, N. X.: On the selection of precipitation products for the regionalisation of hydrological model parameters, Hydrol. Earth Syst. Sci., 25, 5805–5837, https://doi.org/10.5194/hess-25-5805-2021, 2021. a
Banfi, F. and De Michele, C.: Compound flood hazard at Lake Como, Italy, is driven by temporal clustering of rainfall events, Communications Earth & Environment, 3, 1–10, https://doi.org/10.1038/s43247-022-00557-9, 2022. a
Barendrecht, M. H., Matanó, A., Mendoza, H., Weesie, R., Rohse, M., Koehler, J., de Ruiter, M., Garcia, M., Mazzoleni, M., Aerts, J. C. J. H., Ward, P. J., Di Baldassarre, G., Day, R., and Van Loon, A. F.: Exploring drought-to-flood interactions and dynamics: A global case review, WIREs Water, 11, e1726, https://doi.org/10.1002/wat2.1726, 2024. a, b
Berghuijs, W. R., Sivapalan, M., Woods, R. A., and Savenije, H. H. G.: Patterns of similarity of seasonal water balances: A window into streamflow variability over a range of time scales, Water Resources Research, 50, 5638–5661, https://doi.org/10.1002/2014WR015692, 2014. a
Bergström, S. and Forsman, A.: Development of a conceptual deterministic rainfall-runoff model, Nordic Hydrol., 4, 147–170, https://doi.org/10.2166/nh.1973.0012, 1973. a, b
Beven, K.: How to make advances in hydrological modelling, Hydrology Research, 50, 1481–1494, https://doi.org/10.2166/nh.2019.134, 2019. a
Beven, K. J.: A short history of philosophies of hydrological model evaluation and hypothesis testing, WIREs Water, 12, e1761, https://doi.org/10.1002/wat2.1761, 2025. a, b, c
Blöschl, G., Hall, J., Viglione, A., Perdigão, R. A. P., Parajka, J., Merz, B., Lun, D., Arheimer, B., Aronica, G. T., Bilibashi, A., Boháč, M., Bonacci, O., Borga, M., Čanjevac, I., Castellarin, A., Chirico, G. B., Claps, P., Frolova, N., Ganora, D., Gorbachova, L., Gül, A., Hannaford, J., Harrigan, S., Kireeva, M., Kiss, A., Kjeldsen, T. R., Kohnová, S., Koskela, J. J., Ledvinka, O., Macdonald, N., Mavrova-Guirguinova, M., Mediero, L., Merz, R., Molnar, P., Montanari, A., Murphy, C., Osuch, M., Ovcharuk, V., Radevski, I., Salinas, J. L., Sauquet, E., Šraj, M., Szolgay, J., Volpi, E., Wilson, D., Zaimi, K., and Živković, N.: Changing climate both increases and decreases European river floods, Nature, 573, 108–111, https://doi.org/10.1038/s41586-019-1495-6, 2019. a
Boisier, J. P.: CR2MET: A high-resolution precipitation and temperature dataset for the period 1960-2021 in continental Chile, Zenodo [data set], https://doi.org/10.5281/zenodo.7529682, 2023. a
Brunner, M. I.: Floods and droughts: a multivariate perspective, Hydrol. Earth Syst. Sci., 27, 2479–2497, https://doi.org/10.5194/hess-27-2479-2023, 2023. a, b, c, d, e
Brunner, M. I. and Sikorska-Senoner, A. E.: Dependence of flood peaks and volumes in modeled discharge time series: Effect of different uncertainty sources, Journal of Hydrology, 572, 620–629, https://doi.org/10.1016/j.jhydrol.2019.03.024, 2019. a
Brunner, M. I., Gilleland, E., Wood, A., Swain, D. L., and Clark, M.: Spatial Dependence of Floods Shaped by Spatiotemporal Variations in Meteorological and Land-Surface Processes, Geophysical Research Letters, 47, e2020GL088000, https://doi.org/10.1029/2020GL088000, 2020. a
Brunner, M. I., Melsen, L. A., Wood, A. W., Rakovec, O., Mizukami, N., Knoben, W. J. M., and Clark, M. P.: Flood spatial coherence, triggers, and performance in hydrological simulations: large-sample evaluation of four streamflow-calibrated models, Hydrol. Earth Syst. Sci., 25, 105–119, https://doi.org/10.5194/hess-25-105-2021, 2021. a, b, c
Brunner, M. I., Anderson, B., and Muñoz-Castro, E.: Meteorological and hydrological dry-to-wet transition events are only weakly related over European catchments, Environmental Research Letters, 20, 084013, https://doi.org/10.1088/1748-9326/ade72c, 2025. a, b
Bruno, G., Avanzi, F., Alfieri, L., Libertino, A., Gabellani, S., and Duethmann, D.: Hydrological model skills change with drought severity; insights from multi-variable evaluation, Journal of Hydrology, 634, 131023, https://doi.org/10.1016/j.jhydrol.2024.131023, 2024. a, b, c
Cinkus, G., Mazzilli, N., Jourde, H., Wunsch, A., Liesch, T., Ravbar, N., Chen, Z., and Goldscheider, N.: When best is the enemy of good – critical evaluation of performance criteria in hydrological models, Hydrol. Earth Syst. Sci., 27, 2397–2411, https://doi.org/10.5194/hess-27-2397-2023, 2023. a
Clark, M. P. and Vrugt, J. A.: Unraveling uncertainties in hydrologic model calibration: Addressing the problem of compensatory parameters, Geophysical Research Letters, 33, https://doi.org/10.1029/2005GL025604, 2006. a
Clark, M. P., Bierkens, M. F. P., Samaniego, L., Woods, R. A., Uijlenhoet, R., Bennett, K. E., Pauwels, V. R. N., Cai, X., Wood, A. W., and Peters-Lidard, C. D.: The evolution of process-based hydrologic models: historical challenges and the collective quest for physical realism, Hydrol. Earth Syst. Sci., 21, 3427–3440, https://doi.org/10.5194/hess-21-3427-2017, 2017. a, b
Clerc-Schwarzenbach, F., Selleri, G., Neri, M., Toth, E., van Meerveld, I., and Seibert, J.: Large-sample hydrology – a few camels or a whole caravan?, Hydrol. Earth Syst. Sci., 28, 4219–4237, https://doi.org/10.5194/hess-28-4219-2024, 2024. a
Cortés, G. and Margulis, S.: Impacts of El Niño and La Niña on interannual snow accumulation in the Andes: Results from a high-resolution 31 year reanalysis, Geophysical Research Letters, 44, 6859–6867, https://doi.org/10.1002/2017GL073826, 2017. a
Dembélé, M., Hrachowitz, M., Savenije, H. H. G., Mariéthoz, G., and Schaefli, B.: Improving the Predictive Skill of a Distributed Hydrological Model by Calibration on Spatial Patterns With Multiple Satellite Data Sets, Water Resources Research, 56, e2019WR026085, https://doi.org/10.1029/2019WR026085, 2020. a
Di Baldassarre, G., Martinez, F., Kalantari, Z., and Viglione, A.: Drought and flood in the Anthropocene: feedback mechanisms in reservoir operation, Earth Syst. Dynam., 8, 225–233, https://doi.org/10.5194/esd-8-225-2017, 2017. a
Dietze, M., Bell, R., Ozturk, U., Cook, K. L., Andermann, C., Beer, A. R., Damm, B., Lucia, A., Fauer, F. S., Nissen, K. M., Sieg, T., and Thieken, A. H.: More than heavy rain turning into fast-flowing water – a landscape perspective on the 2021 Eifel floods, Nat. Hazards Earth Syst. Sci., 22, 1845–1856, https://doi.org/10.5194/nhess-22-1845-2022, 2022. a
Döll, P., Douville, H., Güntner, A., Müller Schmied, H., and Wada, Y.: Modelling Freshwater Resources at the Global Scale: Challenges and Prospects, Surveys in Geophysics, 37, 195–221, https://doi.org/10.1007/s10712-015-9343-1, 2016. a
Doerr, S. H., Ritsema, C. J., Dekker, L. W., Scott, D. F., and Carter, D.: Water repellence of soils: new insights and emerging research needs, Hydrological Processes, 21, 2223–2228, https://doi.org/10.1002/hyp.6762, 2007. a
Doocy, S., Daniels, A., Murray, S., and Kirsch, T. D.: The Human Impact of Floods: a Historical Review of Events 1980–2009 and Systematic Literature Review, PLoS Currents, 5, https://pmc.ncbi.nlm.nih.gov/articles/PMC3644291/ (last access: 9 February 2026), 2013. a
dos Santos, J. C. N., de Andrade, E. M., Guerreiro, M. J. S., Medeiros, P. H. A., de Queiroz Palácio, H. A., and de Araújo Neto, J. R.: Effect of dry spells and soil cracking on runoff generation in a semiarid micro watershed under land use change, Journal of Hydrology, 541, 1057–1066, https://doi.org/10.1016/j.jhydrol.2016.08.016, 2016. a
Fatichi, S., Rimkus, S., Burlando, P., Bordoy, R., and Molnar, P.: High-resolution distributed analysis of climate and anthropogenic changes on the hydrology of an Alpine catchment, Journal of Hydrology, 525, 362–382, https://doi.org/10.1016/j.jhydrol.2015.03.036, 2015. a
Fisher, R. A.: Statistical Methods for Research Workers, in: Breakthroughs in Statistics: Methodology and Distribution, edited by: Kotz, S. and Johnson, N. L., Springer, New York, NY, 66–70, https://doi.org/10.1007/978-1-4612-4380-9_6, 1992. a
Fleig, A. K., Tallaksen, L. M., Hisdal, H., and Demuth, S.: A global evaluation of streamflow drought characteristics, Hydrol. Earth Syst. Sci., 10, 535–552, https://doi.org/10.5194/hess-10-535-2006, 2006. a
Fowler, K., Peel, M., Western, A., and Zhang, L.: Improved Rainfall-Runoff Calibration for Drying Climate: Choice of Objective Function, Water Resources Research, 54, 3392–3408, https://doi.org/10.1029/2017WR022466, 2018. a, b
Frame, J. M., Kratzert, F., Klotz, D., Gauch, M., Shalev, G., Gilon, O., Qualls, L. M., Gupta, H. V., and Nearing, G. S.: Deep learning rainfall–runoff predictions of extreme events, Hydrol. Earth Syst. Sci., 26, 3377–3392, https://doi.org/10.5194/hess-26-3377-2022, 2022. a
Garcia, F., Folton, N., and Oudin, L.: Which objective function to calibrate rainfall–runoff models for low-flow index simulations?, Hydrological Sciences Journal, 62, 1149–1166, https://doi.org/10.1080/02626667.2017.1308511, 2017. a, b, c, d, e
Gimbel, K. F., Puhlmann, H., and Weiler, M.: Does drought alter hydrological functions in forest soils?, Hydrol. Earth Syst. Sci., 20, 1301–1317, https://doi.org/10.5194/hess-20-1301-2016, 2016. a
Götte, J. and Brunner, M. I.: Hydrological Drought-To-Flood Transitions Across Different Hydroclimates in the United States, Water Resources Research, 60, e2023WR036504, https://doi.org/10.1029/2023WR036504, 2024. a, b, c
Griffin, D. and Anchukaitis, K. J.: How unusual is the 2012–2014 California drought?, Geophysical Research Letters, 41, 9017–9023, https://doi.org/10.1002/2014GL062433, 2014. a
Gu, L., Yin, J., Slater, L. J., Chen, J., Do, H. X., Wang, H.-M., Chen, L., Jiang, Z., and Zhao, T.: Intensification of Global Hydrological Droughts Under Anthropogenic Climate Warming, Water Resources Research, 59, e2022WR032997, https://doi.org/10.1029/2022WR032997, 2023. a, b
Günther, D., Marke, T., Essery, R., and Strasser, U.: Uncertainties in Snowpack Simulations – Assessing the Impact of Model Structure, Parameter Choice, and Forcing Data Error on Point-Scale Energy Balance Snow Model Performance, Water Resources Research, 55, 2779–2800, https://doi.org/10.1029/2018WR023403, 2019. a
Guo, D., Westra, S., and Maier, H. R.: Impact of evapotranspiration process representation on runoff projections from conceptual rainfall-runoff models, Water Resources Research, 53, 435–454, https://doi.org/10.1002/2016WR019627, 2017. a
Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F.: Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling, Journal of Hydrology, 377, 80–91, https://doi.org/10.1016/j.jhydrol.2009.08.003, 2009. a, b, c, d, e
Hammond, J., Anderson, B., Simeone, C., Brunner, M., Muñoz-Castro, E., Archfield, S., Magee, E., and Armitage, R.: Hydrological Whiplash: Highlighting the Need for Better Understanding and Quantification of Sub-Seasonal Hydrological Extreme Transitions, Hydrological Processes, 39, e70113, https://doi.org/10.1002/hyp.70113, 2025. a, b
Handwerger, A. L., Huang, M.-H., Fielding, E. J., Booth, A. M., and Bürgmann, R.: A shift from drought to extreme rainfall drives a stable landslide to catastrophic failure, Scientific Reports, 9, 1569, https://doi.org/10.1038/s41598-018-38300-0, 2019. a
He, X. and Sheffield, J.: Lagged Compound Occurrence of Droughts and Pluvials Globally Over the Past Seven Decades, Geophysical Research Letters, 47, e2020GL087924, https://doi.org/10.1029/2020GL087924, 2020. a
Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., Muñoz-Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Simmons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., Bonavita, M., De Chiara, G., Dahlgren, P., Dee, D., Diamantakis, M., Dragani, R., Flemming, J., Forbes, R., Fuentes, M., Geer, A., Haimberger, L., Healy, S., Hogan, R. J., Hólm, E., Janisková, M., Keeley, S., Laloyaux, P., Lopez, P., Lupu, C., Radnoti, G., de Rosnay, P., Rozum, I., Vamborg, F., Villaume, S., and Thépaut, J.-N.: The ERA5 global reanalysis, Quarterly Journal of the Royal Meteorological Society, 146, 1999–2049, https://doi.org/10.1002/qj.3803, 2020. a
Hirpa, F. A., Salamon, P., Beck, H. E., Lorini, V., Alfieri, L., Zsoter, E., and Dadson, S. J.: Calibration of the Global Flood Awareness System (GloFAS) using daily streamflow data, Journal of Hydrology, 566, 595–606, https://doi.org/10.1016/j.jhydrol.2018.09.052, 2018. a
Höge, M., Kauzlaric, M., Siber, R., Schönenberger, U., Horton, P., Schwanbeck, J., Floriancic, M. G., Viviroli, D., Wilhelm, S., Sikorska-Senoner, A. E., Addor, N., Brunner, M., Pool, S., Zappa, M., and Fenicia, F.: CAMELS-CH: hydro-meteorological time series and landscape attributes for 331 catchments in hydrologic Switzerland, Earth Syst. Sci. Data, 15, 5755–5784, https://doi.org/10.5194/essd-15-5755-2023, 2023a. a, b
Höge, M., Kauzlaric, M., Siber, R., Schönenberger, U., Horton, P., Schwanbeck, J., Floriancic, M. G., Viviroli, D., Wilhelm, S., Sikorska-Senoner, A. E., Addor, N., Brunner, M., Pool, S., Zappa, M., and Fenicia, F.: Catchment attributes and hydro-meteorological time series for large-sample studies across hydrologic Switzerland (CAMELS-CH), Zenodo [data set], https://doi.org/10.5281/zenodo.7784633, 2023b. a
Hrachowitz, M. and Clark, M. P.: HESS Opinions: The complementary merits of competing modelling philosophies in hydrology, Hydrol. Earth Syst. Sci., 21, 3953–3973, https://doi.org/10.5194/hess-21-3953-2017, 2017. a, b
Hrachowitz, M., Fovet, O., Ruiz, L., Euser, T., Gharari, S., Nijzink, R., Freer, J., Savenije, H. H. G., and Gascuel-Odoux, C.: Process consistency in models: The importance of system signatures, expert knowledge, and process complexity, Water Resources Research, 50, 7445–7469, https://doi.org/10.1002/2014WR015484, 2014. a
Hughes, D., Birkinshaw, S., and Parkin, G.: A method to include reservoir operations in catchment hydrological models using SHETRAN, Environmental Modelling & Software, 138, 104980, https://doi.org/10.1016/j.envsoft.2021.104980, 2021. a
Hughes, D. A.: Adjusting rainfall inputs to hydrological models in a data-scarce area of Southern Africa, Hydrological Sciences Journal, 69, 1186–1197, https://doi.org/10.1080/02626667.2024.2360971, 2024. a
Hurlbert, M. and Gupta, J.: The adaptive capacity of institutions in Canada, Argentina, and Chile to droughts and floods, Regional Environmental Change, 17, 865–877, https://doi.org/10.1007/s10113-016-1078-0, 2017. a
Kaufmann, J. and Schering, A.: Analysis of Variance ANOVA, in: Wiley StatsRef: Statistics Reference Online, John Wiley & Sons, Ltd, https://doi.org/10.1002/9781118445112.stat06938, 2014. a
Kiem, A. S. and Austin, E. K.: Drought and the future of rural communities: Opportunities and challenges for climate change adaptation in regional Victoria, Australia, Global Environmental Change, 23, 1307–1316, https://doi.org/10.1016/j.gloenvcha.2013.06.003, 2013. a
Klemeš, V., V.: Operational testing of hydrological simulation models, Hydrological Sciences Journal, 31, 13–24, https://doi.org/10.1080/02626668609491024, 1986. a, b
Kling, H., Fuchs, M., and Paulin, M.: Runoff conditions in the upper Danube basin under an ensemble of climate change scenarios, Journal of Hydrology, 424–425, 264–277, https://doi.org/10.1016/j.jhydrol.2012.01.011, 2012. a, b, c
Knoben, W. J. M.: Setting expectations for hydrologic model performance with an ensemble of simple benchmarks, Hydrological Processes, 38, e15288, https://doi.org/10.1002/hyp.15288, 2024. a
Knoben, W. J. M., Freer, J. E., Peel, M. C., Fowler, K. J. A., and Woods, R. A.: A Brief Analysis of Conceptual Model Structure Uncertainty Using 36 Models and 559 Catchments, Water Resources Research, 56, e2019WR025975, https://doi.org/10.1029/2019WR025975, 2020. a, b, c
Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.: Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks, Hydrol. Earth Syst. Sci., 22, 6005–6022, https://doi.org/10.5194/hess-22-6005-2018, 2018. a
Le Moine, N.: Le bassin versant de surface vu par le souterrain: une voie d'amélioration des performances et du réalisme des modèles pluie-débit?, PhD thesis, Doctorat Géosciences et Ressources Naturelles, Université Pierre et Marie Curie Paris VI, 348 pp., https://hal.inrae.fr/tel-02591478 (last access: 9 February 2026), 2008. a
Legates, D. R. and McCabe Jr., G. J.: Evaluating the use of “goodness-of-fit” Measures in hydrologic and hydroclimatic model validation, Water Resources Research, 35, 233–241, https://doi.org/10.1029/1998WR900018, 1999. a
Lehner, B., Döll, P., Alcamo, J., Henrichs, T., and Kaspar, F.: Estimating the Impact of Global Change on Flood and Drought Risks in Europe: A Continental, Integrated Analysis, Climatic Change, 75, 273–299, https://doi.org/10.1007/s10584-006-6338-4, 2006. a
Leighton-Boyce, G., Doerr, S. H., Shakesby, R. A., and Walsh, R. P. D.: Quantifying the impact of soil water repellency on overland flow generation and erosion: a new approach using rainfall simulation and wetting agent on in situ soil, Hydrological Processes, 21, 2337–2345, https://doi.org/10.1002/hyp.6744, 2007. a
Lema, F., Mendoza, P. A., Vásquez, N. A., Mizukami, N., Zambrano-Bigiarini, M., and Vargas, X.: Technical note: What does the Standardized Streamflow Index actually reflect? Insights and implications for hydrological drought analysis, Hydrol. Earth Syst. Sci., 29, 1981–2002, https://doi.org/10.5194/hess-29-1981-2025, 2025. a, b
Li, H., Xu, C. Y., and Beldring, S.: How much can we gain with increasing model complexity with the same model concepts?, Journal of Hydrology, 527, 858–871, https://doi.org/10.1016/j.jhydrol.2015.05.044, 2015. a
Macdonald, E., Merz, B., Nguyen, V. D., and Vorogushyn, S.: Heavy-tailed flood peak distributions: what is the effect of the spatial variability of rainfall and runoff generation?, Hydrol. Earth Syst. Sci., 29, 447–463, https://doi.org/10.5194/hess-29-447-2025, 2025. a
Magnusson, J., Gustafsson, D., Hüsler, F., and Jonas, T.: Assimilation of point SWE data into a distributed snow cover model comparing two contrasting methods, Water Resources Research, 50, 7816–7835, https://doi.org/10.1002/2014WR015302, 2014. a
Martin, E. R.: Future Projections of Global Pluvial and Drought Event Characteristics, Geophysical Research Letters, 45, 11913–11920, https://doi.org/10.1029/2018GL079807, 2018. a
Matanó, A., de Ruiter, M. C., Koehler, J., Ward, P. J., and Van Loon, A. F.: Caught Between Extremes: Understanding Human-Water Interactions During Drought-To-Flood Events in the Horn of Africa, Earths Future, 10, e2022EF002747, https://doi.org/10.1029/2022EF002747, 2022. a
Matanó, A., Berghuijs, W. R., Mazzoleni, M., Ruiter, M. C. d., Ward, P. J., and Loon, A. F. V.: Compound and consecutive drought-flood events at a global scale, Environmental Research Letters, 19, 064048, https://doi.org/10.1088/1748-9326/ad4b46, 2024. a
Mathevet, T., Gupta, H., Perrin, C., Andréassian, V., and Le Moine, N.: Assessing the performance and robustness of two conceptual rainfall-runoff models on a worldwide sample of watersheds, Journal of Hydrology, 585, 124698, https://doi.org/10.1016/j.jhydrol.2020.124698, 2020. a
McClymont, K., Morrison, D., Beevers, L., and Carmen, E.: Flood resilience: a systematic review, Journal of Environmental Planning and Management, 63, 1151–1176, https://doi.org/10.1080/09640568.2019.1641474 2020. a
McMartin, D. W., Hernani Merino, B. H., Bonsal, B., Hurlbert, M., Villalba, R., Ocampo, O. L., Upegui, J. J. V., Poveda, G., and Sauchyn, D. J.: Limitations of Water Resources Infrastructure for Reducing Community Vulnerabilities to Extremes and Uncertainty of Flood and Drought, Environmental Management, 62, 1038–1047, https://doi.org/10.1007/s00267-018-1104-8, 2018. a
McMillan, H. K.: Linking hydrologic signatures to hydrologic processes: A review, Hydrological Processes, 34, 1393–1409, https://doi.org/10.1002/hyp.13632, 2020. a
McMillan, H. K., Westerberg, I. K., and Krueger, T.: Hydrological data uncertainty and its implications, WIREs Water, 5, e1319, https://doi.org/10.1002/wat2.1319, 2018. a
Melsen, L. A. and Guse, B.: Hydrological Drought Simulations: How Climate and Model Structure Control Parameter Sensitivity, Water Resources Research, 55, 10527–10547, https://doi.org/10.1029/2019WR025230, 2019. a, b
Melsen, L. A., Teuling, A. J., Torfs, P. J. J. F., Zappa, M., Mizukami, N., Mendoza, P. A., Clark, M. P., and Uijlenhoet, R.: Subjective modeling decisions can significantly impact the simulation of flood and drought events, Journal of Hydrology, 568, 1093–1104, https://doi.org/10.1016/j.jhydrol.2018.11.046, 2019. a, b
Mendoza, P. A., Clark, M. P., Mizukami, N., Gutmann, E. D., Arnold, J. R., Brekke, L. D., and Rajagopalan, B.: How do hydrologic modeling decisions affect the portrayal of climate change impacts?, Hydrological Processes, 30, 1071–1095, https://doi.org/10.1002/hyp.10684, 2016. a
Merz, R., Miniussi, A., Basso, S., Petersen, K.-J., and Tarasova, L.: More Complex is Not Necessarily Better in Large-Scale Hydrological Modeling: A Model Complexity Experiment across the Contiguous United States, Bulletin of the American Meteorological Society, 103, E1947–E1967, https://doi.org/10.1175/BAMS-D-21-0284.1, 2022. a
MeteoSwiss: Documentation of MeteoSwiss Grid-Data Products. Daily Mean, Minimum and MaximumTemperature: TabsD, TminD, TmaxD, Technical Report, Bundesamt für Meteorologie und Klimatologie MeteoSchweiz, Zürich, Switzerland, https://www.meteoschweiz.admin.ch/dam/jcr:818a4d17-cb0c-4e8b-92c6-1a1bdf5348b7/ProdDoc_TabsD.pdf (last access: 19 December 2025), 2021a. a
MeteoSwiss: Documentation of MeteoSwiss Grid-Data Products. Daily Precipitation (final analysis): RhiresD, Technical Report, Bundesamt für Meteorologie und Klimatologie MeteoSchweiz, Zürich, Switzerland, https://www.meteoschweiz.admin.ch/dam/jcr:4f51f0f1-0fe3-48b5-9de0-15666327e63c/ProdDoc_RhiresD.pdf (last access: 19 December 2025), 2021b. a
Meylan, P., Favre, A.-C., and Musy, A.: Predictive Hydrology: A Frequency Analysis Approach, CRC Press, ISBN 9780429066603, https://doi.org/10.1201/b11832, 2012. a
Miralles, D. G., Holmes, T. R. H., De Jeu, R. A. M., Gash, J. H., Meesters, A. G. C. A., and Dolman, A. J.: Global land-surface evaporation estimated from satellite-based observations, Hydrol. Earth Syst. Sci., 15, 453–469, https://doi.org/10.5194/hess-15-453-2011, 2011. a, b
Mizukami, N., Clark, M. P., Gutmann, E. D., Mendoza, P. A., Newman, A. J., Nijssen, B., Livneh, B., Hay, L. E., Arnold, J. R., and Brekke, L. D.: Implications of the Methodological Choices for Hydrologic Portrayals of Climate Change over the Contiguous United States: Statistically Downscaled Forcing Data and Hydrologic Models, Journal of Hydrometeorology, 17, 73–98, https://doi.org/10.1175/JHM-D-14-0187.1, 2016. a
Mizukami, N., Rakovec, O., Newman, A. J., Clark, M. P., Wood, A. W., Gupta, H. V., and Kumar, R.: On the choice of calibration metrics for “high-flow” estimation using hydrologic models, Hydrol. Earth Syst. Sci., 23, 2601–2614, https://doi.org/10.5194/hess-23-2601-2019, 2019. a, b, c, d, e
Motavita, D., Chow, R., Guthke, A., and Nowak, W.: The comprehensive differential split-sample test: A stress-test for hydrological model robustness under climate variability, Journal of Hydrology, 573, 501–515, https://doi.org/10.1016/j.jhydrol.2019.03.054, 2019. a
Muñoz-Castro, E., Mendoza, P. A., Vásquez, N., and Vargas, X.: Exploring parameter (dis)agreement due to calibration metric selection in conceptual rainfall–runoff models, Hydrological Sciences Journal, 68, 1754–1768, https://doi.org/10.1080/02626667.2023.2231434, 2023. a
Muñoz-Castro, E., Anderson, B., and Brunner, M. I.: Implementation of four conceptual rainfall-runoff models to simulate drought-to-flood transitions in Chile and Switzerland, Zenodo [data set], https://doi.org/10.5281/zenodo.14803500, 2025. a
Nash, J. E. and Sutcliffe, J. V.: River flow forecasting through conceptual models part I – A discussion of principles, Journal of Hydrology, 10, 282–290, https://doi.org/10.1016/0022-1694(70)90255-6, 1970. a
Nasonova, O. N., Gusev, Y. M., and Kovalev, Y. E.: Impact of uncertainties in meteorological forcing data and land surface parameters on global estimates of terrestrial water balance components, Hydrological Processes, 25, 1074–1090, https://doi.org/10.1002/hyp.7651, 2011. a
Newman, A. J., Clark, M. P., Sampson, K., Wood, A., Hay, L. E., Bock, A., Viger, R. J., Blodgett, D., Brekke, L., Arnold, J. R., Hopson, T., and Duan, Q.: Development of a large-sample watershed-scale hydrometeorological data set for the contiguous USA: data set characteristics and assessment of regional variability in hydrologic model performance, Hydrol. Earth Syst. Sci., 19, 209–223, https://doi.org/10.5194/hess-19-209-2015, 2015. a
Or, D., Lehmann, P., Shahraeeni, E., and Shokri, N.: Advances in Soil Evaporation Physics – A Review, Vadose Zone Journal, 12, vzj2012.0163, https://doi.org/10.2136/vzj2012.0163, 2013. a
Orth, R., Staudinger, M., Seneviratne, S. I., Seibert, J., and Zappa, M.: Does model performance improve with complexity? A case study with three hydrological models, Journal of Hydrology, 523, 147–159, https://doi.org/10.1016/j.jhydrol.2015.01.044, 2015. a, b
Oudin, L., Hervieu, F., Michel, C., Perrin, C., Andréassian, V., Anctil, F., and Loumagne, C.: Which potential evapotranspiration input for a lumped rainfall–runoff model?: Part 2 – Towards a simple and efficient potential evapotranspiration model for rainfall–runoff modelling, Journal of Hydrology, 303, 290–306, https://doi.org/10.1016/j.jhydrol.2004.08.026, 2005. a
Parajka, J., Merz, R., and Blöschl, G.: Uncertainty and multiple objective calibration in regional water balance modelling: case study in 320 Austrian catchments, Hydrological Processes, 21, 435–446, https://doi.org/10.1002/hyp.6253, 2007. a
Peleg, N., Skinner, C., Fatichi, S., and Molnar, P.: Temperature effects on the spatial structure of heavy rainfall modify catchment hydro-morphological response, Earth Surf. Dynam., 8, 17–36, https://doi.org/10.5194/esurf-8-17-2020, 2020. a
Pelletier, A. and Andréassian, V.: On constraining a lumped hydrological model with both piezometry and streamflow: results of a large sample evaluation, Hydrol. Earth Syst. Sci., 26, 2733–2758, https://doi.org/10.5194/hess-26-2733-2022, 2022. a
Perrin, C., Michel, C., and Andréassian, V.: Improvement of a parsimonious model for streamflow simulation, Journal of Hydrology, 279, 275–289, https://doi.org/10.1016/S0022-1694(03)00225-7, 2003. a
Pizarro, A. and Jorquera, J.: Advancing objective functions in hydrological modelling: Integrating knowable moments for improved simulation accuracy, Journal of Hydrology, 634, 131071, https://doi.org/10.1016/j.jhydrol.2024.131071, 2024. a, b, c
Poncelet, C., Merz, R., Merz, B., Parajka, J., Oudin, L., Andréassian, V., and Perrin, C.: Process-based interpretation of conceptual hydrological model performance using a multinational catchment set, Water Resources Research, 53, 7247–7268, https://doi.org/10.1002/2016WR019991, 2017. a
Pool, S., Vis, M., and Seibert, J.: Evaluating model performance: towards a non-parametric variant of the Kling-Gupta efficiency, Hydrological Sciences Journal, 63, 1941–1953, https://doi.org/10.1080/02626667.2018.1552002, 2018. a, b, c
Probst, E. and Mauser, W.: Evaluation of ERA5 and WFDE5 forcing data for hydrological modelling and the impact of bias correction with regional climatologies: A case study in the Danube River Basin, Journal of Hydrology: Regional Studies, 40, 101023, https://doi.org/10.1016/j.ejrh.2022.101023, 2022. a
Pushpalatha, R., Perrin, C., Le Moine, N., Mathevet, T., and Andréassian, V.: A downward structural sensitivity analysis of hydrological models to improve low-flow simulation, Journal of Hydrology, 411, 66–76, https://doi.org/10.1016/j.jhydrol.2011.09.034, 2011. a, b
Pushpalatha, R., Perrin, C., Moine, N. L., and Andréassian, V.: A review of efficiency criteria suitable for evaluating low-flow simulations, Journal of Hydrology, 420–421, 171–182, https://doi.org/10.1016/j.jhydrol.2011.11.055, 2012. a
Quesada-Montano, B., Di Baldassarre, G., Rangecroft, S., and Van Loon, A. F.: Hydrological change: Towards a consistent approach to assess changes on both floods and droughts, Advances in Water Resources, 111, 31–35, https://doi.org/10.1016/j.advwatres.2017.10.038, 2018. a
Rashid, M. M. and Wahl, T.: Hydrologic risk from consecutive dry and wet extremes at the global scale, Environmental Research Communications, 4, 071001, https://doi.org/10.1088/2515-7620/ac77de, 2022. a
Robeson, S. M.: Revisiting the recent California drought as an extreme value, Geophysical Research Letters, 42, 6771–6779, https://doi.org/10.1002/2015GL064593, 2015. a
Saavedra, D., Mendoza, P. A., Addor, N., Llauca, H., and Vargas, X.: A multi-objective approach to select hydrological models and constrain structural uncertainties for climate impact assessments, Hydrological Processes, 36, e14446, https://doi.org/10.1002/hyp.14446, 2022. a
Santos, L., Andréassian, V., Sonnenborg, T. O., Lindström, G., de Lavenne, A., Perrin, C., Collet, L., and Thirel, G.: Lack of robustness of hydrological models: a large-sample diagnosis and an attempt to identify hydrological and climatic drivers, Hydrol. Earth Syst. Sci., 29, 683–700, https://doi.org/10.5194/hess-29-683-2025, 2025. a
Seibert, J. and Bergström, S.: A retrospective on hydrological catchment modelling based on half a century with the HBV model, Hydrol. Earth Syst. Sci., 26, 1371–1388, https://doi.org/10.5194/hess-26-1371-2022, 2022. a
Seibert, J., Staudinger, M., and van Meerveld, H. J. I.: Validation and Over-Parameterization – Experiences from Hydrological Modeling, in: Computer Simulation Validation: Fundamental Concepts, Methodological Frameworks, and Philosophical Perspectives, edited by: Beisbart, C. and Saam, N. J., Springer International Publishing, Cham, 811–834, https://doi.org/10.1007/978-3-319-70766-2_33, 2019. a
Sivakumar, B.: Global climate change and its impacts on water resources planning and management: assessment and challenges, Stochastic Environmental Research and Risk Assessment, 25, 583–600, https://doi.org/10.1007/s00477-010-0423-y, 2011. a
Spieler, D. and Schütze, N.: Investigating the Model Hypothesis Space: Benchmarking Automatic Model Structure Identification With a Large Model Ensemble, Water Resources Research, 60, e2023WR036199, https://doi.org/10.1029/2023WR036199, 2024. a, b
Stisen, S., Højberg, A. L., Troldborg, L., Refsgaard, J. C., Christensen, B. S. B., Olsen, M., and Henriksen, H. J.: On the importance of appropriate precipitation gauge catch correction for hydrological modelling at mid to high latitudes, Hydrol. Earth Syst. Sci., 16, 4157–4176, https://doi.org/10.5194/hess-16-4157-2012, 2012. a
Swain, D. L., Langenbrunner, B., Neelin, J. D., and Hall, A.: Increasing precipitation volatility in twenty-first-century California, Nature Climate Change, 8, 427–433, https://doi.org/10.1038/s41558-018-0140-y, 2018. a
Swain, D. L., Prein, A. F., Abatzoglou, J. T., Albano, C. M., Brunner, M., Diffenbaugh, N. S., Singh, D., Skinner, C. B., and Touma, D.: Hydroclimate volatility on a warming Earth, Nature Reviews Earth & Environment, 6, 35–50, https://doi.org/10.1038/s43017-024-00624-z, 2025. a, b
Tabari, H., Hosseinzadehtalaei, P., Thiery, W., and Willems, P.: Amplified Drought and Flood Risk Under Future Socioeconomic and Climatic Change, Earths Future, 9, e2021EF002295, https://doi.org/10.1029/2021EF002295, 2021. a
Tallaksen, L. M., Madsen, H., and Clausen, B.: On the definition and modelling of streamflow drought duration and deficit volume, Hydrological Sciences Journal, 42, 15–33, https://doi.org/10.1080/02626669709492003, 1997. a
Tang, G., Clark, M. P., and Papalexiou, S. M.: SC-Earth: A Station-Based Serially Complete Earth Dataset from 1950 to 2019, Journal of Climate, 34, 6493–6511, https://doi.org/10.1175/JCLI-D-21-0067.1, 2021. a, b, c
Tang, G., Clark, M. P., Knoben, W. J. M., Liu, H., Gharari, S., Arnal, L., Beck, H. E., Wood, A. W., Newman, A. J., and Papalexiou, S. M.: The Impact of Meteorological Forcing Uncertainty on Hydrological Modeling: A Global Analysis of Cryosphere Basins, Water Resources Research, 59, e2022WR033767, https://doi.org/10.1029/2022WR033767, 2023. a, b
Tang, G., Clark, M. P., Knoben, W. J. M., Liu, H., Gharari, S., Arnal, L., Wood, A. W., Newman, A. J., Freer, J., and Papalexiou, S. M.: Uncertainty Hotspots in Global Hydrologic Modeling: The Impact of Precipitation and Temperature Forcings, Bulletin of the American Meteorological Society, https://doi.org/10.1175/BAMS-D-24-0007.1, 2025. a
Thirel, G., Santos, L., Delaigue, O., and Perrin, C.: On the use of streamflow transformations for hydrological model calibration, Hydrol. Earth Syst. Sci., 28, 4837–4860, https://doi.org/10.5194/hess-28-4837-2024, 2024. a, b
Tolson, B. A. and Shoemaker, C. A.: Dynamically dimensioned search algorithm for computationally efficient watershed model calibration, Water Resources Research, 43, https://doi.org/10.1029/2005WR004723, 2007. a
Tuel, A., Schaefli, B., Zscheischler, J., and Martius, O.: On the links between sub-seasonal clustering of extreme precipitation and high discharge in Switzerland and Europe, Hydrol. Earth Syst. Sci., 26, 2649–2669, https://doi.org/10.5194/hess-26-2649-2022, 2022. a
Valenzuela, R., Garreaud, R., Vergara, I., Campos, D., Viale, M., and Rondanelli, R.: An extraordinary dry season precipitation event in the subtropical Andes: Drivers, impacts and predictability, Weather and Climate Extremes, 37, 100472, https://doi.org/10.1016/j.wace.2022.100472, 2022. a
Valéry, A., Andréassian, V., and Perrin, C.: Regionalization of precipitation and air temperature over high-altitude catchments – learning from outliers, Hydrological Sciences Journal, 55, 928–940, https://doi.org/10.1080/02626667.2010.504676, 2010. a
Valéry, A., Andréassian, V., and Perrin, C.: `As simple as possible but not simpler': What is useful in a temperature-based snow-accounting routine? Part 1 – Comparison of six snow accounting routines on 380 catchments, Journal of Hydrology, 517, 1166–1175, https://doi.org/10.1016/j.jhydrol.2014.04.059, 2014a. a, b
Valéry, A., Andréassian, V., and Perrin, C.: `As simple as possible but not simpler': What is useful in a temperature-based snow-accounting routine? Part 2 – Sensitivity analysis of the Cemaneige snow accounting routine on 380 catchments, Journal of Hydrology, 517, 1176–1187, https://doi.org/10.1016/j.jhydrol.2014.04.058, 2014b. a
van Kempen, G., van der Wiel, K., and Melsen, L. A.: The impact of hydrological model structure on the simulation of extreme runoff events, Nat. Hazards Earth Syst. Sci., 21, 961–976, https://doi.org/10.5194/nhess-21-961-2021, 2021. a, b
Van Loon, A. F. and Van Lanen, H. A. J.: A process-based typology of hydrological drought, Hydrol. Earth Syst. Sci., 16, 1915–1946, https://doi.org/10.5194/hess-16-1915-2012, 2012. a
Vásquez, N., Cepeda, J., Gómez, T., Mendoza, P. A., Lagos, M., Boisier, J. P., Álvarez Garretón, C., and Vargas, X.: Catchment-Scale Natural Water Balance in Chile, in: Water Resources of Chile, edited by: Fernández, B. and Gironás, J., Springer International Publishing, Cham, 189–208, https://doi.org/10.1007/978-3-030-56901-3_9, 2021. a
Visconti, G.: After the Flood: Disasters, Ideological Voting and Electoral Choices in Chile, Political Behavior, 44, 1985–2004, https://doi.org/10.1007/s11109-022-09814-1, 2022. a
Vrugt, J. A., Diks, C. G. H., Gupta, H. V., Bouten, W., and Verstraten, J. M.: Improved treatment of uncertainty in hydrologic modeling: Combining the strengths of global optimization and data assimilation, Water Resources Research, 41, https://doi.org/10.1029/2004WR003059, 2005. a
Wang, S.-Y. S., Yoon, J.-H., Becker, E., and Gillies, R.: California from drought to deluge, Nature Climate Change, 7, 465–468, https://doi.org/10.1038/nclimate3330, 2017. a
Ward, P. J., de Ruiter, M. C., Mård, J., Schröter, K., Van Loon, A., Veldkamp, T., von Uexkull, N., Wanders, N., AghaKouchak, A., Arnbjerg-Nielsen, K., Capewell, L., Carmen Llasat, M., Day, R., Dewals, B., Di Baldassarre, G., Huning, L. S., Kreibich, H., Mazzoleni, M., Savelli, E., Teutschbein, C., van den Berg, H., van der Heijden, A., Vincken, J. M. R., Waterloo, M. J., and Wens, M.: The need to integrate flood and drought disaster risk reduction strategies, Water Security, 11, 100070, https://doi.org/10.1016/j.wasec.2020.100070, 2020. a
Wasti, A., Ray, P., Wi, S., Folch, C., Ubierna, M., and Karki, P.: Climate change and the hydropower sector: A global review, WIREs Climate Change, 13, e757, https://doi.org/10.1002/wcc.757, 2022. a
Wilcoxon, F.: Individual Comparisons by Ranking Methods, Biometrics Bulletin, 1, 80–83, https://doi.org/10.2307/3001968, 1945. a
Woods, R. A.: Analytical model of seasonal climate impacts on snow hydrology: Continuous snowpacks, Advances in Water Resources, 32, 1465–1481, https://doi.org/10.1016/j.advwatres.2009.06.011, 2009. a, b
Wu, S., Zhao, J., and Sivapalan, M.: A parsimonious daily water balance model based on the proportionality hypothesis, Journal of Hydrology, 649, 132414, https://doi.org/10.1016/j.jhydrol.2024.132414, 2025. a, b
Yamazaki, D., Ikeshima, D., Sosa, J., Bates, P. D., Allen, G. H., and Pavelsky, T. M.: MERIT Hydro: A High-Resolution Global Hydrography Map Based on Latest Topography Dataset, Water Resources Research, 55, 5053–5073, https://doi.org/10.1029/2019WR024873, 2019. a
Yilmaz, K. K., Gupta, H. V., and Wagener, T.: A process-based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model, Water Resources Research, 44, https://doi.org/10.1029/2007WR006716, 2008. a
Zhao, F., Nie, N., Liu, Y., Yi, C., Guillaumot, L., Wada, Y., Burek, P., Smilovic, M., Frieler, K., Buechner, M., Schewe, J., and Gosling, S. N.: Benefits of Calibrating a Global Hydrological Model for Regional Analyses of Flood and Drought Projections: A Case Study of the Yangtze River Basin, Water Resources Research, 61, e2024WR037153, https://doi.org/10.1029/2024WR037153, 2025. a