Multi-model approach in a variable spatial framework for streamflow simulation

Thébault, Cyril; Perrin, Charles; Andréassian, Vazken; Thirel, Guillaume; Legrand, Sébastien; Delaigue, Olivier

doi:https://doi.org/10.5194/hess-28-1539-2024

Articles | Volume 28, issue 7

https://doi.org/10.5194/hess-28-1539-2024

Articles | Volume 28, issue 7

Research article

04 Apr 2024

Research article |

| 04 Apr 2024

Multi-model approach in a variable spatial framework for streamflow simulation

Cyril Thébault, Charles Perrin, Vazken Andréassian, Guillaume Thirel, Sébastien Legrand, and Olivier Delaigue

Abstract

Accounting for the variability of hydrological processes and climate conditions between catchments and within catchments remains a challenge in rainfall–runoff modelling. Among the many approaches developed over the past decades, multi-model approaches provide a way to consider the uncertainty linked to the choice of model structure and its parameter estimates. Semi-distributed approaches make it possible to account explicitly for spatial variability while maintaining a limited level of complexity. However, these two approaches have rarely been used together. Such a combination would allow us to take advantage of both methods. The aim of this work is to answer the following question: what is the possible contribution of a multi-model approach within a variable spatial framework compared to lumped single models for streamflow simulation?

To this end, a set of 121 catchments with limited anthropogenic influence in France was assembled, with precipitation, potential evapotranspiration, and streamflow data at the hourly time step over the period 1998–2018. The semi-distribution set-up was kept simple by considering a single downstream catchment defined by an outlet and one or more upstream sub-catchments. The multi-model approach was implemented with 13 rainfall–runoff model structures, three objective functions, and two spatial frameworks, for a total of 78 distinct modelling options. A simple averaging method was used to combine the various simulated streamflow at the outlet of the catchments and sub-catchments. The lumped model with the highest efficiency score over the whole catchment set was taken as the benchmark for model evaluation.

Overall, the semi-distributed multi-model approach yields better performance than the different lumped models considered individually. The gain is mainly brought about by the multi-model set-up, with the spatial framework providing a benefit on a more occasional basis. These results, based on a large catchment set, evince the benefits of using a multi-model approach in a variable spatial framework to simulate streamflow.

Download & links

Article (PDF, 5510 KB)

Download & links

How to cite.

Received: 24 Mar 2023 – Discussion started: 13 Apr 2023 – Revised: 20 Jan 2024 – Accepted: 15 Feb 2024 – Published: 04 Apr 2024

1 Introduction

1.1 Uncertainty in rainfall–runoff modelling

A rainfall–runoff model is a numerical tool based on a simplified representation of a real-world system, namely the catchment (Moradkhani and Sorooshian, 2008). It usually computes streamflow time series from climatic data, such as rainfall and potential evapotranspiration. Many rainfall–runoff models have been developed according to various assumptions in order to meet specific needs (e.g. water resources management, flood and low-flow forecasting, hydroelectricity), with choices and constraints concerning the following (Perrin, 2000):

the temporal resolution, i.e. the way variables and processes are aggregated over time;
the spatial resolution, i.e. the way spatial variability is taken into account more or less explicitly in the model;
the description of dominant processes.

Different models will necessarily produce different streamflow simulations. Intuitively, one often expects that working at a finer spatio-temporal scale should allow for a better description of the processes (Atkinson et al., 2002). However, this generally leads to additional complexity, i.e. a larger number of parameters, which requires more information to be estimated and often yields more uncertain results (Her and Chaubey, 2015).

Uncertainty in rainfall–runoff models depends on the assumptions made regarding the choice of the general structure and also on the parameter estimates. The variety of model structures and equations results in a large variability of streamflow simulations (Ajami et al., 2007). The spatial and temporal resolutions also result in different streamflow simulations. Due to the complexity of the real system and the lack of information to parameterize the various equations over the whole catchment, parameter estimates must be set. Usually, these parameters are determined for each entity of interest by minimizing the error induced by the simulation compared to an observation. The choice of the optimization algorithm, the objective function, and the streamflow transformation is therefore also a source of uncertainty. Since input data are used to derive model structures and parameters, the uncertainty associated with these data also contributes to the overall model uncertainty (Beven, 1993; Liu and Gupta, 2007; Pechlivanidis et al., 2011; McMillan et al., 2012).

Various approaches aim to improve models by taking uncertainties into account, among which are multi-model approaches, which are the main topic of our research.

1.2 Multi-model approach

The multi-model approach consists in using several models in order to take advantage of the strengths of each one. This concept has been gaining momentum in hydrology since the end of the 20th century for simulation (e.g. Shamseldin et al., 1997) and forecasting (e.g. Loumagne et al., 1995). In this section, we distinguish between probabilistic and deterministic approaches.

A probabilistic multi-model approach seeks an explicit quantification of the uncertainty associated with simulations or forecasts through statistical methods. The ensemble concept has commonly been applied in meteorology for several decades, and subsequently has been widely used in hydrology to improve prediction (i.e. simulation or forecast). The international Hydrologic Ensemble Prediction Experiment initiative (Schaake et al., 2007) fostered the work on this topic. The ensemble concept has also been adapted to rainfall–runoff models in order to reduce modelling bias: Duan et al. (2007) used multiple predictions made by several rainfall–runoff models using the same hydroclimatic forcing variables. An ensemble consisting of nine different models (from three different structures and parameterizations) was constructed and applied to three catchments in the United States. The predictions were then combined through a statistical procedure (Bayesian model averaging or BMA), which assigns larger weight to a probabilistic likelihood measure. The authors showed that the probabilistic multi-model approach improves flow prediction and quantifies model uncertainty compared to using a single rainfall–runoff model. Block et al. (2009) coupled both multiple climate and multiple rainfall–runoff models, increasing the pool of streamflow forecast ensemble members and accounting for cumulative sources of uncertainty. In their study, 10 scenarios were built for each of the three climatic models and applied to two rainfall–runoff models, i.e. 60 different forecasts. This super-ensemble was applied to the Iguatu catchment in Brazil and showed better performance than the hydroclimatic or rainfall–runoff model ensembles studied separately. Note that the authors tested three different combination methods: pooling, linear regression weighting, and a kernel density estimator. They found that the last technique seems to perform better. Velázquez et al. (2011) showed that the combination of different climatic scenarios with several models in a forecasting context leads to a reduction in uncertainty, particularly when the forecast horizon increases. However, such methods generate a large number of scenarios and can therefore become time-consuming and difficult to analyse. The probabilistic combination of simulations remains a major topic in the scientific community (see Bogner et al., 2017).

A deterministic multi-model approach seeks to define a single best streamflow time series, which often consists in a combination of the simulations of individual models. Shamseldin et al. (1997) tested three methods in order to combine model outputs: a simple average, a weighted average, and a non-linear neural network procedure. Their study was conducted on a sample of 11 catchments mainly located in southeast Asia using five different lumped models operating at the daily time step and showed that multiple models perform better than models applied individually. Similar conclusions were reached in the Distributed Model Intercomparison Project (DMIP) (Smith et al., 2004) conducted by Georgakakos et al. (2004) in simulation or by Ajami et al. (2006) for forecasting. In both articles, 6 to 10 rainfall–runoff models were applied at the hourly time step over a few catchments in the United States. These studies showed that a model that performs poorly individually can contribute positively to the multi-model set-up. Winter and Nychka (2010) specify that the composition of the multi-model set-up is important. Indeed, using 19 global climate models, the authors have shown that simple – or weighted – average combinations are more efficient if the individual models used produce very different results. Studies combining rainfall–runoff models by machine learning techniques led to the same conclusions (see, for example, Zounemat-Kermani et al., 2021, for a review).

All of the aforementioned multi-model approaches only focus on the structural aspect of rainfall–runoff models. Some authors have also combined streamflow generated from different parameterizations of the same rainfall–runoff model. Oudin et al. (2006) proposed combining two outputs obtained with a single model (GR4J) from two calibrations, one adapted to high flows and the other to low flows, by weighting each of the simulations on the basis of a seasonal index (filling rate of the production reservoir). Such a method makes it possible to provide good efficiency in both low and high flows, whereas usually an a priori modelling choice must be made to focus on a specific streamflow range. More recently, Wan et al. (2021) used a multi-model approach based on four rainfall–runoff models calibrated with four objective functions on a large set of 383 Chinese catchments. The authors showed that methods based on weighted averaging outperform the ensemble members, except in low-flow simulation. They also highlighted the benefit of using several structures with different objective functions. The size of the ensemble was also studied, and it was found that using more than nine ensemble members does not further improve performance. Note that different results for optimal size can be found in the literature (Arsenault et al., 2015; Kumar et al., 2015).

The aforementioned studies were carried out within a fixed spatial framework (e.g. lumped, semi-distributed, distributed), i.e. considering that the model structures implemented are relevant over the whole modelling domain. Implicitly, the underlying assumption is that a fixed rainfall–runoff model can capture the main hydrological processes affecting streamflow in a catchment (and its sub-catchments). However, this may not be true. Introducing a variable spatial modelling framework into the multi-model approach could help to overcome this issue.

1.3 Scope of the paper

This study intends to test whether streamflow simulation can be improved through a multi-model approach. More precisely, we aim here to deal with the uncertainty stemming from (i) the spatial dimension (e.g. catchment division, aggregation of hydroclimatic forcing, boundary conditions), (ii) the general structure of the model (e.g. formulation of water storages, filling/draining equations), and (iii) the parameter estimation (e.g. calibration algorithm, objective function, calibration period). However, we decided here not to focus on quantifying these uncertainties individually (as it could be done with a probabilistic ensemble), but we focus on the aggregated impact of all uncertainties through comparing the deterministic averaging combination of several models with a single one. Ultimately, our aim is to answer the following question: what is the possible contribution of a multi-model approach within a variable spatial framework compared to lumped single models for streamflow simulation?

This study follows on from the work of Squalli (2020), who carried out exploratory multi-model tests on lumped and semi-distributed configurations at a daily time step. The remainder of the paper is organized as follows: first, the catchment set, the hydroclimatic data, the spatial framework, and the rainfall–runoff models used for this work are presented. The multi-model methodology and the calibration/evaluation procedure are described. Then we present, analyse, and discuss the results. Last, we summarize the main conclusions of this work and discuss its perspectives.

2 Material and methods

2.1 Catchments and hydroclimatic data

This study was conducted at an hourly time step using precipitation, potential evapotranspiration and streamflow time series over the period 1998–2018 (Delaigue et al., 2020). Precipitation (P) was extracted from the radar-based COMEPHORE re-analysis produced by Météo-France (Tabary et al., 2012), which provides information at a 1 km² resolution and which has already been extensively used in hydrological studies (Artigue et al., 2012; van Esse et al., 2013; Bourgin et al., 2014; Lobligeois et al., 2014; Saadi et al., 2021).

Potential evapotranspiration (E₀) is calculated with the formula proposed by Oudin et al. (2005). This equation was chosen for its simplicity, as the only input required is daily air temperature (from the SAFRAN re-analysis of Météo-France; see Vidal et al., 2010) and extra-terrestrial radiation (which only depends on the Julian day and the latitude). Once calculated, the daily potential evapotranspiration was disaggregated to the hourly time step using a simple parabola (Lobligeois, 2014). These steps for converting daily temperature data into hourly potential evapotranspiration are directly possible in the airGR software (Coron et al., 2017, 2021; developed using the R programming language; R Core Team, 2020), which was used for this work. We did not use any gap-filling method since all climatic data were complete during the study period.

Streamflow time series (Q) were extracted from the national streamflow archive Hydroportail (Dufeu et al., 2022), which makes the data produced by hydrometric services in regional environmental agencies in charge of measuring flows in France, as well as by other data producers (e.g. hydropower companies and dam managers), available. Before being archived, flow data undergo quality control procedures applied by data producers, with corrections when necessary. Quality codes are also available, although this information is not uniformly provided for all stations. These data are freely available on the Hydroportail website and are widely used in France for hydraulic and hydrological studies.

Here, we focus on simulating streamflow at the main catchment outlet, addressing the issue from a large-sample-hydrology (LSH) perspective (Andréassian et al., 2006; Gupta et al., 2014), in which many catchments are used. For this study, 121 catchments spread over mainland France with limited human influence were selected (Fig. 1). The first criterion used to select catchments is based on streamflow availability. Here, a threshold of 10 % maximum gaps per year over the whole period was considered (1999–2018). However, this criterion may be slightly too restrictive (e.g. removal of a station installed in 2000 and presenting continuous data since then). In order to overcome this problem, we decided to allow this threshold to be exceeded for a maximum of 3 years over the whole period considered. It is therefore a compromise between having a large number of catchments for the study and having a long enough period for model calibration and evaluation. The catchment selection also considered the level of human influence. In France, the vast majority of catchments have human influence (e.g. dams, dikes, irrigation, or urbanization). Here, streamflow with limited human influence corresponds to gauged stations where the streamflow records have a hydrological behaviour considered close enough to a natural streamflow (e.g. low water withdrawals, influences far enough upstream to be sufficiently diluted downstream) not to strongly limit model performance. This was based on numerical indicators on the influence of dams and local expertise. Although snow-dominant or glacial regimes were rejected (due to lack of data or anthropogenic influence), the various catchments selected offer a wide hydroclimatic variability (Table 1).

https://hess.copernicus.org/articles/28/1539/2024/hess-28-1539-2024-f01

Figure 1Boundaries (in red) and outlets (black dots) of the 121 catchments selected for this study.

Table 1Minimum, median, and maximum values of some characteristics of the 121 catchments (P stands for mean annual precipitation, E₀ for mean annual potential evapotranspiration, Q for mean annual flow).

Download Print Version | Download XLSX

2.2 Principle of catchment spatial discretization

In this work, two spatial frameworks are used: lumped and semi-distributed. A lumped model considers the catchment as a single entity, while the semi-distribution seeks to divide this catchment into several sub-catchments in order to partly take into account the spatial variability of hydroclimatic forcing and physical characteristics within the catchment.

Generally, the division of a catchment is defined on the basis of expertise and requires good knowledge of its characteristics (hydrological response units based on geology or land use). From a large-sample hydrology perspective, an automatic definition of semi-distribution was needed. To this end, we simplified the problem by looking at a first-order distribution, i.e. a single downstream catchment defined by an outlet and one or more upstream sub-catchments. The underlying assumption is therefore that a second-order distribution (i.e. further dividing the upstream sub-catchments into a few smaller sub-catchments) will have a more limited impact on model behaviour than the first, when considering the main downstream outlet. This assumption is based on the work of Lobligeois et al. (2014) which showed that a multitude of sub-basins of approximately 4 km² provide limited gain compared to a few sub-catchments of 64 km². Under these hypotheses, we developed an automatic procedure to select semi-distributed configurations nested in each other, which we termed “Matryoshka doll”. This approach consists in creating different simple and distinct combinations of upstream–downstream gauged catchments starting from the main downstream station and progressively moving upstream.

Specifically, the Matryoshka doll selection approach (Fig. 2) was implemented as follows:

Select a downstream station defining a catchment with one or more gauged internal points.
Restrict the upstream sub-catchment partitioning to a first-order split, i.e. going back only to the nearest upstream station(s) without going back to the stations further upstream and respecting a size criterion to avoid sensitivity issues which may result from a too-small or too-large downstream catchment (in this study, we limited the area of the upstream sub-catchments to a value between 10 % and 70 % of the area of the total catchment). This step creates a combination of stations defining a single downstream catchment (which receives the upstream contributions).
If the upstream catchments have one or more internal gauged points, repeat step 1 and consider them as a downstream catchment.

The Matryoshka doll approach allows us to create distinct configurations (i.e. there cannot be two different semi-distributed configurations for the same downstream catchment) and therefore avoids over-sampling issues.

https://hess.copernicus.org/articles/28/1539/2024/hess-28-1539-2024-f02

Figure 2Illustration of the Matryoshka doll approach to the Vézère River at Larche. The steps of the method are shown in the columns and the discretization levels in the rows. From this initial catchment (top left), three semi-distributed configurations were obtained (number of rows). For each semi-distributed configuration, the boundary of the catchment considered is in red, the first-order upstream catchments are filled in dark grey, and the downstream catchment is in light grey.

The semi-distributed approach consists in performing lumped modelling in each sub-catchment by linking them through a hydraulic routing scheme. Thus, we need to distinguish between the first-order upstream catchment (Fig. 2, dark grey), where we applied a lumped rainfall–runoff model, and the downstream catchment (Fig. 2, light grey), where the rainfall–runoff model was applied after integrating the upstream inflows using a runoff-runoff model (hydraulic routing scheme). It is therefore important to differentiate between the routing part of hydrological models (enabling us to distribute the quantity of water contributing to the streamflow in the sub-catchment of interest, i.e. the intra-sub-catchment propagation, in time) and the hydraulic routing scheme (enabling us to propagate the streamflow simulated at one outlet to downstream catchment, i.e. the inter-sub-basin propagation). For this study, a single hydraulic routing scheme was applied. It is a time lag between the upstream and downstream outlet, as done by Lobligeois et al. (2014). In order to reduce the computation time, the authors propose calculating a lumped parameter C₀ corresponding to the average flow velocity over the downstream catchment. Since the hydraulic lengths d_i (i.e. the distance between the downstream outlet and each upstream sub-catchment) are known, the transit time T_i can be calculated as follows:

\begin{matrix} (1) & T_{i} = \frac{d_{i}}{C_{0}} . \end{matrix}

This approach is fairly simple but offers comparative performance to that of more complex routing models such as lag and route schemes (with linear or quadratic reservoirs) that account for peak-shaving phenomena (results not shown for the sake of brevity).

2.3 Models

In the context of this study, a model is defined as a configuration composed of a model structure and an associated set of parameters (i.e. which may vary according to the objective function selected for calibration). These models will be applied independently in a lumped or a semi-distributed modelling framework.

For this study, the airGRplus software (Coron et al., 2022), based on the works of Perrin (2000) and Mathevet (2005), was used. It includes various rainfall–runoff model structures running at the daily time step. airGRplus is an add-on to airGR (Coron et al., 2017, 2021). An adaptation of the work made by Perrin and Mathevet was carried out to use these structures at the hourly time step (mostly ensuring consistency of parameter ranges when changing simulation time steps and changing fixed time-dependent parameters). Finally, a set of 13 structures available in airGRplus, already widely tested in France and adapted to the hourly time step, was selected (Table 2). They are simplified versions of original rainfall–runoff models taken from the literature (except GR5H, which corresponds to the original version). To avoid confusion with the original models, a four-letter abbreviation was used here. Since the various catchments used for this study do not experience much snowfall, no snow module was implemented.

Table 2List of rainfall–runoff models available in the airGRplus software at the hourly time step and used for this work.

Download Print Version | Download XLSX

The objective function used for parameter calibration is the Kling–Gupta efficiency (KGE) (Gupta et al., 2009), defined by

\begin{matrix} (2) & KGE = 1 - \sqrt{{(r - 1)}^{2} + {(α - 1)}^{2} + (β - 1)^{2}}, \end{matrix}

with r the correlation, α the ratio between standard deviations, and β the ratio between the means (i.e. the bias) of the observed and simulated streamflow.

Thirel et al. (2023) showed that streamflow transformations are adapted to a specific modelling objective (e.g. low flows, floods). However, they highlighted that it is difficult to represent a wide range of streamflow with a single transformation. According to this study, we selected three transformations, two of which target high flows (Q^+0.5) and low flows (Q^−0.5), respectively, and one which is intermediate (Q^+0.1).

The algorithm used for model calibration comes from Michel (1991) and is available in the airGR package (Coron et al., 2017, 2021). It combines a global and a local optimization approach. First, a coarse screening of the parameters space is performed using either a rough predefined grid or a list of parameter sets. Then a steepest descent local search algorithm is performed, starting from the result of the screening procedure. Such calibration (over 10 years of hourly data) is about 0.5 to 6 min long (depending mainly of the catchment considered and the number of free parameters) and gives a single parameter set for a chosen objective function. Thus, we did not focus explicitly here on parameter uncertainty; i.e. we did not use multiple parameters sets for a single objective function as can be done with Monte Carlo simulations, for example. Such an approach would be interesting to consider as a perspective for this work but will not be covered here for computation time constraints. In a semi-distributed context, the calibration is carried out sequentially, i.e. in each sub-catchment from upstream to downstream. Note that the calibration takes slightly more time in the downstream catchment due to the additional free parameter of the routing function.

Overall, 13 structures and three objective functions were used, resulting in 39 models. Applied over two different spatial frameworks, a total of 78 distinct modelling options were available for this study.

2.4 Multi-model methodology

The multi-model approach consists in running various rainfall–runoff models. More specifically, here, we are interested in a deterministic combination of the different streamflow simulations. Let us recall that for our study, a model corresponds to the association of a structure and an objective function. By definition, a model is imperfect. Indeed, the different structures have been designed to meet different objectives (e.g. water resources management, forecasting, and climate change) in different geographical or geological contexts (e.g. high mountains, karstic zone, and alluvial plain). The objective functions (e.g. optimization algorithm, objective function, streamflow transformations), selected to optimize the parameters, are also choices that will eventually impact the simulation. The hypothesis made here is that the multi-model approach makes it possible to take advantage of the strengths of each model.

In the lumped framework, we consider every model in each catchment. In the semi-distributed framework, we consider every model in each sub-catchment. As the calibration is sequential, the various models are first applied to each upstream sub-catchment, and then their simulated streamflow is propagated to the downstream catchment to be modelled. However, transferring every upstream possibility to the downstream catchment is excessively time consuming. Therefore, the simulated streamflow in each upstream sub-catchment was first set with an a priori choice, whatever the model used, and then transferred to the downstream catchment (this choice is discussed in Sect. 4.4).

The multi-model framework enables these different streamflow simulations to be combined in each catchment and sub-catchment in order to create multiple additional simulations. At the downstream outlet, we will consider mixed combinations, using streamflow simulations from lumped and semi-distributed modelling (Fig. 3). To this end, deterministic averaging methods were used. Here, we will focus on a simple average combination (SAC), i.e. giving an equal weight to all models combined, defined by

\begin{matrix} (3) & Q_{SAC} = \frac{\sum_{i = 1}^{n} Q_{i}}{n}, \end{matrix}

with Q_SAC the streamflow from a simple average combination and Q_i the simulated streamflow with a model i selected among the n models.

Note that a weighted average combination (WAC) was also tested but did not significantly change the mean results and was therefore not used further (discussed in Sect. 4.3).

The number of possible combinations on a given outlet from the total number of available streamflow simulations increases exponentially and can be computed by

\begin{matrix} (4) & n_{c} = \sum_{i = 2}^{n_{sim}} (\frac{n_{sim}}{i}), \end{matrix}

with i the number of streamflow simulations to choose from the total number of available streamflow simulations n_sim.

As an indication, there are approximately 1000 combinations for a streamflow ensemble simulated by 10 models, but there are over 1 000 000 solutions for 20 models in a lumped framework. Although a single combination is quick to perform (between 0.1 and 0.2 s), the number of combinations quickly becomes a limiting factor in terms of computation time. For this study, combinations will be set to a maximum of four different streamflow time series among the total number of models available, i.e. approximately 1 500 000 different combinations (discussed in Sect. 4.2):

\begin{matrix} (5) & n_{c} = \sum_{i = 2}^{4} (\frac{78}{i}) \approx 1 500 000 . \end{matrix}

The objective of these combinations is to create a large set of simulations from which the best multi-model approach will be selected. Here we aim to obtain simulations that can perform well over a wide range of streamflow, and that can be applied to a large number of French catchments. Therefore, the best models (and multi-model approach) correspond to those which will achieve the highest performance in each catchment on average during the evaluation periods.

2.5 Testing methodology

A split-sample test (Klemeš, 1986), commonly used in hydrology, was implemented. This practice consists in separating a streamflow time series into two distinct periods, the first for calibration and the second for evaluation, and then exchanging these two periods. The two periods chosen are 1999–2008 and 2009–2018. An initialization period of at least 2 years was used before each test period to avoid errors attributable to the wrong estimation of initial conditions within the rainfall–runoff model.

For this study, results will only be analysed for evaluation (i.e. over the two untrained periods). Model performance was evaluated on two levels.

With a general criterion. Model performance was evaluated with a composite criterion focusing on a wide range of streamflow, defined as follows:
$\begin{matrix} (6) & {KGE}_{comp} = \frac{KGE (Q^{+ 0.5}) + KGE (Q^{+ 0.1}) + KGE (Q^{- 0.5})}{3} . \end{matrix}$
With event-based criteria. Model performance was evaluated with several criteria characterizing flood and low flows. In a context of high flows (5447 events selected), the timing of the peak (i.e. the date at which the flood peak was reached), the flood peak (i.e. the maximum streamflow value observed during the flood) and the flood flow (i.e. mean streamflow during the event) were analysed. In a context of low flows (1332 events selected), the annual low-flow duration (i.e. number of low-flow days) and severity (i.e. largest cumulative streamflow deficit) were studied. Table 3 provides typical ranges of values of flood and low-flow characteristics over the catchment set. Please refer to Appendix A and B for more details on the event selection method.

Table 3Minimum, median, and maximum values of flood and low-flow characteristics over the 121 catchments.

Download Print Version | Download XLSX

In a multi-model framework, the best (i.e. giving the best performance over the evaluation periods) model or combination of models for each catchment can be determined. Therefore, this model or combination of models will differ from one catchment to another. For this work we chose as a benchmark a lumped one-size-fits-all model (i.e. the same model whatever the catchment), which is the hydrological modelling approach usually used.

3 Results

Results are presented from lumped (L) single models (SMs), i.e. run individually, to more complex semi-distributed (SD) multi-model (MM) approaches (see Fig. 3). The mixed (M) multi-model approach allows for a variable spatial framework combining both lumped and semi-distributed approaches. The aim of this section is to present the results obtained with each modelling framework and their intercomparison.

https://hess.copernicus.org/articles/28/1539/2024/hess-28-1539-2024-f03

Figure 3Summary of the different approaches tested. Q is the target streamflow at the main catchment outlet; black dots show gauging stations used. The different colours represent different model structures. The variations of the same colour indicate different parameterizations. Red links represent the combination of streamflow.

3.1 Lumped single models (LSMs)

In this part, each model was run individually in a lumped mode (see Fig. 3). Parameters of the 13 structures were calibrated successively with the three objective functions, resulting in 39 lumped models.

Figure 4 shows the distribution of the performance of lumped single models over the 121 downstream outlets and over the evaluation periods. As a reminder, the KGE_comp used for the evaluation is a composite criterion which considers different transformations in order to provide an overall picture of model performance for a wide range of streamflow (Eq. 6). Overall, lumped single models give median KGE_comp values between 0.70 and 0.88. This upper value is reached with the GR5H structure calibrated with a generalist objective function (KGE applied to Q^+0.1) and will be used in the paper as a benchmark. Since efficiency criteria values depend on the variety of errors found in the evaluation period (see, for example, Berthet et al., 2010), this may impact the significance of performance differences between models and ultimately their comparison. Therefore, we tried to quantify the sampling uncertainty in KGE scores. The bootstrap–jackknife methodology proposed by Clark et al. (2021) was applied over our sample of 121 catchments for the 39 lumped models. It showed a median sampling uncertainty in KGE scores of 0.02 (Appendix C). The objective function applied during the calibration phase seems to have a variable impact on performance depending on the structure. For example, GR5H shows a similar performance regardless of the transformation applied, whereas TAN0 shows a large variation. The strong decrease in the 25 % quantile of the latter is linked to the great difficulty for this structure to represent the low-flow component of KGE_comp when it is calibrated with more weight on high flows (KGE applied to Q^+0.5). The reverse is also true since a structure optimized with more weight on low flows (KGE applied to Q^−0.5) will have more difficulties to represent the high-flow component of KGE_comp (e.g. NAM0 or GARD). Although the differences remain limited, the highest KGE_comp scores are achieved with a more generalist objective function (KGE applied to Q^+0.1). These results confirm the conclusions reached by Thirel et al. (2023).

https://hess.copernicus.org/articles/28/1539/2024/hess-28-1539-2024-f04

Figure 4Distribution of the performance (KGE_comp score) of the 39 lumped single models over the 121 catchments and over the evaluation periods. The box plots represent the 10 %, 25 %, 50 %, 75 % and 90 % quantiles. The dashed red line represents the optimal KGE value. Each colour represents a structure, and each geometric pattern represents the power transformation applied to the streamflow during the calibration.

Multi-model approach in a variable spatial framework for streamflow simulation

1.1 Uncertainty in rainfall–runoff modelling

1.2 Multi-model approach

1.3 Scope of the paper

2.1 Catchments and hydroclimatic data

2.2 Principle of catchment spatial discretization

2.3 Models

2.4 Multi-model methodology

2.5 Testing methodology

3.1 Lumped single models (LSMs)

3.2 Semi-distributed single models (SDSMs)

3.3 Lumped multi-model (LMM) approach

3.4 Semi-distributed multi-model (SDMM) approach

3.5 Mixed multi-model (MMM) approach

3.6 Modelling framework comparison

4.1 What is the possible contribution of a multi-model approach within a variable spatial framework?

4.2 What is the optimal number of models to combine in a multi-model framework?

4.3 Is a weighted average combination always better than a simple average approach?