Transferring Global Uncertainty Estimates from Gauged to Ungauged Catchments

Predicting streamflow hydrographs in ungauged catchments is challenging, and accompanying the estimates with realistic uncertainty bounds is an even more complex task. In this paper, we present a method to transfer global uncertainty estimates from gauged to ungauged catchments and we test it over a set of 907 catchments located in France, using two rainfall–runoff models. We evaluate the quality of the uncertainty estimates based on three expected qualities: reliability, sharpness, and overall skill. The robustness of the method to the availability of information on gauged catchments was also evaluated using a hydrometrical desert approach. Our results show that the method presents advantageous perspectives, providing reliable and sharp uncertainty bounds at ungauged locations in a majority of cases.


Predicting streamflow in ungauged catchments with uncertainty estimates
Predicting the entire runoff hydrograph in ungauged catchments is a challenge that has attracted much attention during the last decade.In this context, the use of suitable conceptual rainfall-runoff models has proved to be useful, and because traditional calibration approaches based on observed discharge data cannot be applied to ungauged catchments, other approaches are required.Various methods have been proposed for the estimation of rainfall-runoff model parameters in ungauged catchments, as reported by the recent summary of the Prediction in Ungauged Basins (PUB) decade (Blöschl et al., 2013;Hrachowitz et al., 2013;Parajka et al., 2013).The estimation of predictive uncertainty is deemed good practice in any environmental modelling activity (Refsgaard et al., 2007).In hydrological modelling, the topic has been widely discussed for years, and no general agreement has yet been reached on how to adequately quantify uncertainty.In practice, various methodologies are currently used.
While many methods have been proposed for gauged catchments, only a few have been proposed for the estimation of predictive uncertainty on ungauged catchments.McIntyre et al. (2005) presented a GLUE-type approach consisting of transferring ensembles of parameter sets obtained on donor (gauged) catchments to target (ungauged) catchments.More recently, a framework based on constrained parameter sets was applied in several studies (Yadav et al., 2007;Zhang et al., 2008;Winsemius et al., 2009;Kapangaziwiri et al., 2012).It is a two-step procedure.The first step consists in estimating with uncertainty various summary metrics of the hydrograph, also called "signatures" of the catchments, or gathering other "soft" or "hard" information at the target ungauged catchment.The second step is the selec-F.Bourgin et al.: Transferring global uncertainty estimates from gauged to ungauged catchments tion of an ensemble of model parameter sets: "acceptable" or "behavioural" parameter sets are those that yield sufficiently close simulated summary metrics compared to the regionalised metrics obtained during the first step.A Bayesian approach can also be used (Bulygina et al., 2011(Bulygina et al., , 2012)).The parameter sets are given a relative weight depending on the proximity of their summary metrics compared to regionalised metrics and depending on a priori information.The reader can refer to Wagener and Montanari (2011) for a comprehensive description of both formal and informal methods belonging to this framework.
One difficulty of the above-mentioned approaches lies in the interpretation of the uncertainty bounds obtained from the parameter ensemble predictions.As noted by McIntyre et al. (2005) and Winsemius et al. (2009), the uncertainty bounds cannot easily be interpreted as confidence intervals, and therefore it remains difficult to use them in practice.In addition, relying solely on an ensemble of model parameter sets to quantify total predictive uncertainty is often insufficient when imperfect rainfall-runoff models are used.
A pragmatic alternative consists in addressing the parameter estimation and the global uncertainty estimation issues separately.It has been argued by several authors (Montanari and Brath, 2004;Andréassian et al., 2007;Ewen and O'Donnell, 2012) that a posteriori characterisation of modelling errors of a "best" or "optimal" simulation can yield valid uncertainty bounds at gauged locations.In earlier studies, the terms "total uncertainty", "global uncertainty" and "post-processing" approach have been used interchangeably to refer to this approach.The various sources of uncertainty are indeed lumped into a unique error term with the goal of estimating uncertainty bounds for model outputs.
As stated by Solomatine and Shrestha (2009), The historical model residuals (errors) between the model prediction ŷ and the observed data y are the best available quantitative indicators of the discrepancy between the model and the real-world system or process, and they provide valuable information that can be used to assess the predictive uncertainty.
Similarly, one could argue that the model residuals between the model's prediction and the observed data at neighbouring gauged locations are, perhaps, the best available indicators of the discrepancy between the model and the realworld system at the target ungauged location, under the condition that the increased uncertainty introduced by the regionalisation step compared to the calibration step is adequately taken into account.
The only attempt to apply a global uncertainty estimation approach at ungauged locations that we are aware of is the one presented by Roscoe et al. (2012).They quantified uncertainty for river stage prediction at ungauged locations by first estimating the residual errors at ungauged locations based on residual errors at gauged locations, and then applying quantile regression to these estimated errors.

Scope of the paper
The aim of this paper is to provide an estimation of the global uncertainty affecting runoff prediction at ungauged locations when a rainfall-runoff model and a regionalisation scheme are used.
To our knowledge, a framework based on residual errors and global uncertainty quantification has not yet been extensively tested in the context of prediction in ungauged catchments.This paper contributes to the search for methods able to provide uncertainty estimates when runoff predictions in ungauged catchments are sought.

Data and methods
Our objective is not to develop a new parameter regionalisation approach.Therefore, we purposely chose to use readyto-use materials and methods and only focus on the uncertainty quantification issue.This study can be considered as a follow-up of the work by Oudin et al. (2008) on the comparison of regionalisation approaches.We only provide here an overview of the data set, the rainfall-runoff models and the parameter calibration and regionalisation approach, since the details can be found in Oudin et al. (2008).

Data set
A database of 907 French catchments was used.They represent various hydrological conditions, given the variability in climate, topography and geology in France.This set includes fast-responding Mediterranean catchments with intense precipitation as well as larger, groundwater-dominated catchments.Some characteristics of the data set are given in Table 1.Catchments were selected to have limited snow influence, since no snowmelt module was used in the hydrological modelling.Daily rainfall, runoff, and potential evapotranspiration (PE) data series over the 1995-2005 period were available.Meteorological inputs originate from Météo-France SAFRAN reanalysis (Vidal et al., 2010).PE was estimated using the temperature-based formula proposed by Oudin et al. (2005).Hydrological data were extracted from the HYDRO national archive (http://www.hydro.eaufrance.fr).

Rainfall-runoff models
Two daily, continuous lumped rainfall-runoff models were used.
-The GR4J rainfall-runoff model, an efficient and parsimonious daily lumped continuous rainfall-runoff model described by Perrin et al. (2003).-The TOPMO rainfall-runoff model, inspired by TOP-MODEL (Beven and Kirkby, 1979).This version was tested on large data sets and showed performance comparable to that of the GR4J model, while being quite different (Michel et al., 2003;Oudin et al., 2008Oudin et al., , 2010)).
Using these two models rather than a single one makes it possible to draw more general conclusions.The two models use a soil moisture accounting procedure as well as routing stores.However, they differ markedly in the formulation of their functions.While the GR4J model uses two non-linear stores and a unit-hydrograph, the TOPMO model uses a linear and an exponential store, and a pure time delay.
The GR4J and TOPMO models have four and six free parameters, respectively.On gauged catchments, parameter estimation is performed using a local gradient search procedure, applied in combination with pre-screening of the parameter space (Mathevet, 2005;Perrin et al., 2008).This optimisation procedure has proved to be efficient in past applications for the conceptual models used here.As an objective function, we used the Nash and Sutcliffe (1970) criterion computed on root square transformed flows (NSVQ).This criterion was shown to yield a good compromise between different objectives (Oudin et al., 2006).

Regionalisation approach
By definition, no discharge data are available for calibrating parameter sets at ungauged locations.Therefore, other strategies are needed to estimate the parameters of hydrological models for prediction in ungauged catchments.Oudin et al. (2008) assessed the relative performance of three classical regionalisation schemes over a set of French catchments: spatial proximity, physical similarity and regression.Several options within each regionalisation approach were tested and compared.Based on their results, the following choices were made here for the regionalisation approach, as they offered the best regionalisation solution.
-Poorly modelled catchments were discarded as potential donors: only catchments with a performance criterion NSVQ in calibration above 0.7 were used as possible donors.
-The spatial proximity approach was used.It consists in transferring parameter sets from neighbouring catchments to the target ungauged catchment.The proximity of the catchments to the gauged catchments was quantified by the distances between catchment centroids.
-The output averaging option was chosen.It consists in computing the mean of the streamflow simulations obtained on the ungauged catchment with the set of parameters of the donor catchments.
-The number of neighbours was set to four and seven catchments for GR4J and TOPMO, respectively, following the work reported by Oudin et al. (2008).
3 Proposed approach: transfer of relative errors by flow groups

Description of the method
Transferring calibrated model parameters from gauged catchments to ungauged catchments is a well-established approach when parameters cannot be inferred from available data.The method presented here extends the parameter transfer approach to the domain of uncertainty estimation.
The main ideas underlying the proposed approach are to (i) treat each donor as if it was ungauged (simulating flow through the above described regionalisation approach), (ii) characterise the empirical distribution of relative errors (understood as the ratio between observed and simulated flows, i.e. considering a multiplicative model error) for each of these donors, and (iii) transfer global uncertainty estimates to the ungauged catchment.
The methodology used to transfer global uncertainty estimates can be described by the following steps, illustrated in Fig. 1.

Selection of catchments
Here we consider a target ungauged catchment (TUC).This catchment has n neighbouring gauged catchments, called NGC 1 , NGC 2 , . . ., NGC n , which will be considered as donors for the TUC.For the ith catchment NGC i , one can also select n neighbouring catchments with the notation: NGC i1 , NGC i2 , . . ., NGC in , which can be considered as donors for NGC i .Obviously, the TUC catchment would be excluded from this set of second-order donor catchments.
2. Application of the parameter regionalisation scheme to the donor catchments NGC i a. Apply the parameter regionalisation scheme to obtain a simulated discharge time series for each NGC i using neighbours NGC ij (with j between 1 and n).
b. Compute the relative errors of discharge reconstitution (i.e. the ratio between observed and simulated discharges) by comparing simulated and observed discharge series for catchment NGC i and create 10 groups of relative errors according to the magnitude of the simulated discharge.To ensure that each group contains the same number of points, we split the simulated discharge range into 10 subgroups of equal size, using the deciles of the simulated discharge distribution.Using several flow groups allows taking into account the possible variability of model error characteristics.
3. Computation of the multiplicative coefficients applicable to simulated discharge a.Put together all the relative errors from the donors NGC ij (with j between 1 and n) according to the group they belong to; i.e. all relative errors of groups k of the n donors are assembled into a master group k.This is done for k between 1 and 10.
b. Compute the empirical quantiles of the relative error distribution within each master group k (with k between 1 and 10).Since relative errors were computed (i.e.ratio of simulated to observed discharge values), each quantile of relative errors can be considered a multiplicative coefficient applicable to the simulated discharge.These multiplicative coefficients will be used to obtain the prediction bounds.
4. Computation of the uncertainty bounds for the TUC a. Apply the parameter regionalisation scheme to obtain a simulated discharge time series for the TUC using the parameter sets of the n neighbouring gauged catchments NGC 1 , NGC 2 , . . ., NGC n .
b. Multiply the simulated discharge by the set of multiplicative coefficients obtained at Step 3b to obtain the uncertainty bounds.The coefficients calculated for the group k are used when the simulated discharge belongs to the group k.
Note that we based our approach on multiplicative errors and not on additive errors because using multiplicative coefficients yields prediction bounds for discharge that are always positive, whereas this might not always be the case with additive errors.
Finally, we mention that the choice to use 10 groups reflects a trade-off between the number of points available to obtain reasonable estimates of empirical quantiles computed for each group and an adequate treatment of the variability of the characteristics of errors with the magnitude of simulated discharge.A larger (lower) number of groups could obviously be used if more (fewer) data are available (see discussion in Sect.5.3) or based on the analysis of the statistical properties of errors.

Why should donors be considered as ungauged?
The first step deserves a brief explanation.The choice to treat donors as ungauged is related to the well-known fact that the performance of rainfall-runoff models decreases when they are applied at ungauged locations with a regionalisation scheme, compared to when local data are available for parameter estimation.The quadratic efficiency criterion used here is the C2M (Mathevet et al., 2006), a bounded version of the Nash and Sutcliffe (1970) efficiency (NSE) criterion.The criterion is based solely on the simulated discharges of the deterministic rainfall-runoff model and is completely independent of the application of the uncertainty method.The equations are where T is the total number of time steps, Q obs t and Q sim t are the observed and simulated discharge, respectively, at timestep t, and µ o is the mean of the observed discharges.This bounded version has the advantage of avoiding large negative values which are difficult to interpret.
Figure 3 illustrates the general performance decrease for both models on our catchment set when a regionalisation scheme is used instead of a parameter estimation based on local data.As a consequence, we should expect predictive uncertainty at ungauged locations to be larger than predictive uncertainty at gauged locations, i.e. when the rainfall-runoff model is calibrated with observed discharge data.That is why donors must be considered as ungauged.We will come back to this important point in Sect. 5.

Quantitative evaluation of uncertainty bounds
We assessed the relevance of the 90 % uncertainty bounds by focusing on three characteristics: reliability, sharpness and overall skill.A general introduction to probabilistic evaluation can be found in Gneiting et al. (2007), Wilks (2011), andFranz andHogue (2011) for a more hydrological perspective.
Reliability refers to the statistical consistency of the uncertainty estimation with the observation, i.e. a 90 % prediction interval is expected to contain approximately 90 % of the observations if prediction errors are adequately characterised by the uncertainty estimation.To estimate reliability, we used the coverage ratio (CR) index, computed as the percentage of observations contained in the prediction intervals.
Sharpness refers to the concentration of predictive uncertainty.The average width (AW) of the uncertainty bounds is widely used to quantify sharpness: where Q l t and Q u t are, respectively, the lower and upper bounds of the prediction interval [Q l t , Q u t ] at time-step t.To ease comparison between catchments, we used the width of the 90 % interval where Q 5 and Q 95 are the 5th and 95th percentiles of the flow duration curve.This value characterises the natural variability of the flows for a given catchment and has the same unit as the average width of the uncertainty bounds.It can be viewed as the average width of the uncertainty bounds of a climatological prediction, where the uncertainty bounds are constant over time and defined by the interval [Q 5 , Q 95 ].A graphical illustration is given in Fig. 2.
Comparing the two values AW and AW clim leads to the following dimensionless criterion called the average width index (AWI): It is positive if the uncertainty obtained by applying the rainfall-runoff model and the methodology presented here is reduced compared to the climatology, and negative otherwise.
Uncertainty bounds that are as sharp as possible and reasonably reliable are sought: sharp intervals that would consistently miss the target would be misleading, while overly large intervals that would successfully cover the observations at the expense of sharpness would be of limited value for decision making.
To complete the assessment of the prediction bounds, we used the interval score (Gneiting and Raftery, 2007).The interval score (IS) accounts for both the width of an uncertainty bound and the position of the observed value compared to the uncertainty bound.The scoring rule of the interval score at time-step t is defined as where Q obs t is the value observed at time-step t and β is equal to 90 % since a 90 % interval is sought here.See Fig. 2 for an illustration of how S is computed.
Illustration of the evaluation of the uncertainty bounds.Q 5 and Q 95 are the 5th and 95th percentiles of the flow duration curve.S is the interval score defined at one time step for the situation where the observed value is above the upper limit of the uncertainty bound, with k equal to 20 because a 90 % interval is given.
See the text for further details.
IS is the average value of S t over the time series: To ease comparison between catchments and evaluate the skill of the prediction bounds, we used the 90 % interval [Q 5 , Q 95 ] as a benchmark, similar to what we did for the sharpness index.The climatological prediction gives uncertainty bounds that are constant in time and defined by the interval [Q 5 , Q 95 ], where Q 5 and Q 95 are the 5th and 95th percentiles of the flow duration curve.Thus we computed the interval skill score: where IS clim is the interval score obtained with the 90 % interval [Q 5 , Q 95 ].Using skill scores is a very common approach in probabilistic forecasting.Dimensionless scores can thus be obtained, in much the same way as the computation of the well-known NSE criterion for assessing deterministic performance.
The interval skill score (ISS) is positive when the prediction bounds are more skilful than climatology, and negative otherwise.The best value that can be achieved is equal to 1.

Reliability, sharpness and overall skill
Figure 4 shows the distributions of the three criteria used to evaluate the uncertainty bounds on the 907 catchments.Boxplots (5th, 25th, 50th, 75th and 95th percentiles) are used to summarise the variety of scores over the 907 catchments of the data set.

GR4J TOPMO
. Impact of the regionalisation scheme on deterministic performance, as quantified by the bounded C2M efficiency criterion.Note that in a very few cases, the performance obtained with the regionalisation scheme is better than the performance obtained with calibration.This is possible because of the output averaging option used by the regionalisation scheme.

Reliability
For both models, half of the catchments (from the lower quartile to the upper quartile) have CR values between 80 and 95 %.The median values are equal to 89 and 90 % for GR4J and TOPMO, respectively.Since a value of 90 % is expected for 90 % prediction bounds, these results suggest that the prediction bounds are in a majority of cases able to reflect the magnitude of errors when predicting runoff hydrographs in ungauged catchments, even though it is clear that the perfect value of 90 % is not reached in most cases.
The CR values fall below 70 % for around 14 % of the catchments with GR4J and 13 % with TOPMO, which indicates cases where the proposed approach yields predictive bounds that are clearly too narrow or biased (i.e.not well centred on the observations).Note that we did not find any guidance on how to properly evaluate the CR values in the literature.The results presented here may be used as a bench-mark to comparatively assess the ranges of values that would be obtained in future studies.

Sharpness
Regarding sharpness, it can be seen that for GR4J, half of the catchments (from the lower quartile to the upper quartile) have AWI values between 39 and 67 %, while for TOPMO corresponding values are equal to 38 and 65 %.The median values are equal to 57 and 55 % for GR4J and TOPMO, respectively.The higher the AWI values, the lower the predictive uncertainty is.Since it would be utopic to expect that no errors will be made when predicting runoff hydrographs for ungauged catchments, we could consider here uncertainty reduction values between 30 and 80 % as quite satisfactory, even though we recognise that this statement is arbitrary since there are no widely agreed values to base our analysis on. .Distributions of the three performance criteria, obtained in two cases, (i) when the donor catchments are treated as ungauged (solid lines) and (ii) when the donor catchments are treated as gauged (dashed lines).Boxplots (5th, 25th, 50th, 75th and 95th percentiles) summarise the variety of scores over the 907 catchments of the data set.
Note that negative values are seen for 7 % of the catchments with both GR4J and TOPMO, which indicates cases where the approach yields prediction intervals whose average width is larger than the width of the [Q 5 , Q 95 ] interval (Q 5 and Q 95 are the 5th and 95th percentiles of the flow duration curve).

Overall skill
Finally, Fig. 4c shows that the predictive skill for both models is positive for most catchments.For both models, half of the catchments (from the lower quartile to the upper quartile) have ISS values between 40 and 70 %.The median values are equal to 61 and 59 % for GR4J and TOPMO, respectively.While it might be argued that the unconditional climatology is not a very challenging benchmark, we consider that it is still a positive and reassuring result.

Do we need to treat the donor catchments as ungauged?
As mentioned earlier, a critical step of the proposed approach is to apply the regionalisation scheme to obtain a simulated discharge time series for each donor catchment (Step 2a).To assess the impact of this methodological choice, we applied the methodology described earlier to transfer uncertainty estimates, but this time the donor catchments are treated as gauged.Similar to Fig. 4, Fig. 5 shows the distributions of the three criteria obtained in the two cases: whether or not the donor catchments are treated as ungauged.We can observe a drop in reliability for both models, whereas sharpness increases.This is because the relative errors are smaller when the donor catchments are treated as gauged, yielding narrower but less reliable prediction bounds for the target catchment.Interestingly, this results in skill scores that are quite similar: im-provements in terms of sharpness compensate decreases in terms of reliability.
Note that reliability is generally considered as a prevailing characteristic over sharpness, since it reflects the ability of the uncertainty method to adequately reflect the magnitude of errors we might expect at locations for which prediction is done.Therefore, the benefit of treating the donor catchments as ungauged clearly appears in Fig. 5a, illustrating the theoretical argument presented in the methodological section.

Do we need to use groups of relative errors?
Another critical step of the proposed approach is to use 10 groups of relative errors.The groups are defined according to the magnitude of the simulated discharge (Step 2b).This was done to take into account the fact that the characteristics of errors usually change according to the magnitude of the simulated discharge.To assess the impact of this methodological choice, we again applied the methodology described earlier to transfer global uncertainty estimates, but this time using only one group instead of 10.
Figure 6 shows the distributions of the three criteria obtained in the following two cases: whether 10 groups or only one group of relative errors are used.For both models, reliability slightly increases when going from 10 groups to a single group, whereas both sharpness and skill decrease.It appears that improvements in terms of reliability are not sufficient to compensate for decreases in terms of sharpness when overall skill is considered.This can be understood by the fact that considering a single group instead of a few groups widens the uncertainty bounds on average, since the errors are generally heteroscedastic.
Obviously, although it appears that a single group is not enough to account for the variability of properties of relative errors, 10 groups may not provide significant performance gains and a compromise may be sought and (ii) when only one group is used (dashed lines).Boxplots (5th, 25th, 50th, 75th and 95th percentiles) summarise the variety of scores over the 907 catchments of the data set.tion of scatter plots between relative errors and simulated discharge reveals that the shapes can be very different between catchments, hence potentially requiring different numbers of groups.Moreover, the simulation objectives, e.g.simulating intermediate or extreme flows, may also be considered when choosing the number of flow groups.Hence it appears that the number of groups may need further trial-and-error tests in specific applications to obtain the best compromise.
Although our tests reveal that the number of groups is a sensitive setting of the method, further research would be needed to evaluate whether different numbers of groups can be advised for specific objectives or conditions.

How does the performance of the rainfall-runoff models relate to the characteristics of uncertainty bounds?
To gain insights into the possible relationships between the performance of the deterministic rainfall-runoff simulations and the characteristics of the uncertainty bounds at ungauged locations, the three criteria used to characterise the uncertainty bounds are plotted in Fig. 7 as a function of a quadratic efficiency criterion for the 907 catchments, the C2M defined in Eq. (1).A trend appears between deterministic performance and characteristics of the prediction bounds at ungauged locations, for the two rainfall-runoff models.The reliability in- dex exhibits greater variability compared to the sharpness index, and the stronger link is seen for the skill score.Reliability is relatively less affected by the poor deterministic performance of the simulation at an ungauged location because there are cases where poor performance at neighbouring locations leads (through the transfer of relative errors) to wide prediction bounds that are able to cover the observed values.
We can also observe that skill scores and C2M scores are strongly related, which indicates that when the transfer of model parameters performs well, the transfer of global uncertainty estimates also performs well.

How does the method perform in data-sparse conditions?
The results presented so far were obtained with a dense network of gauging stations.To investigate the impact of the network density on our results, we applied a demanding test called the hydrometrical desert.It consists in excluding potential donors that are closer to the target ungauged catchment than a given threshold.For example, a threshold distance of 100 km means that the closest donor catchment must be at least 100 km away from the ungauged target catchment.This test results in a notable decrease of deterministic performance, as shown in Table 2, where the mean of the C2M efficiency criterion over the 907 catchments is reported for both models.Note that this is a more demanding test than a decrease of network density, because catchments retain the possibility of still having close donors in this case.
Figure 8 shows the distributions of the three criteria obtained by applying the hydrometrical desert with threshold values of 10, 20, 50, 100 and 200 km, respectively.A clear decrease appears with increasing distances.While we should expect that the sharpness of the uncertainty bounds decreases because of larger errors, and that this situation leads to a decrease of skill, the results in terms of reliability reveal the limitation of the method.With increasing distances, the method becomes less able to transfer the appropriate magnitude of the larger errors.

Conclusions
Runoff hydrograph prediction in ungauged catchments is notoriously difficult, and attempting to estimate the predictive uncertainty in that context is a further challenge.We have proposed a method allowing the transfer of global uncer- tainty estimates from gauged to ungauged catchments.The method extends the parameter transfer approach to the domain of global uncertainty estimation.We evaluated the approach over a large set of 907 catchments by assessing three expected qualities of the uncertainty estimate: reliability, sharpness and overall skill.We applied two different rainfall-runoff models (GR4J and TOPMO) to ensure that the results presented are not model-specific.Nonetheless, the following limitations to the study can be mentioned.
1.Although the approach seems promising on average on the large catchment set we used, it is not able to adequately quantify the predictive uncertainty for some catchments and it failed in some cases.
2. The method might not perform well in in regions with sparser gauging networks than the one used here, as revealed by the application of a demanding test called the hydrometrical desert.
3. We only tested the 90 % prediction intervals, whereas the method could be applied to obtain other prediction intervals.We made this choice to keep the article as simple as possible, but further work could be done in that direction.
4. We also noted that the number of flow groups used in the approach may be a sensitive setting of the method, and further research would be needed to provide more detailed guidance on this point depending on the structure of the model errors and the modelling objectives.
It is worth stressing that although we used a transfer based on spatial proximity, the approach presented in this article is not only independent of the rainfall-runoff model but also of the regionalisation scheme used to obtain deterministic prediction at ungauged locations.Any other similarity measure could be a basis for transferring residual errors, including physically based similarity measures.Accordingly, the regionalisation settings, including the output averaging option, could be adapted if deemed more appropriate.
Since we believe that uncertainty quantification has to be considered in any modelling study, further work should be devoted to the search for similarity measures that not only perform well in allowing the transfer of parameter sets from donor to target catchments, but also allow transferring modelling error characteristics.
Last, we would like to stress that the results presented in this study are expressed in terms of dimensionless measures.As such, they can provide a basis for future comparisons when prediction in ungauged catchments with uncertainty estimates is performed.

Figure 1 .
Figure 1.Illustration of the proposed approach, in the case of n = 4 donors.Red catchments are first-level donors while green catchments are second-level donors.For Step 2b, the simulated discharge variable (x axis) is split into 10 equal-size groups.In Step 3, white dots represent the values of the upper and lower multiplicative coefficients for each group.See the text for the description of the four steps.

Figure 4 .
Figure 4. Distributions of the three performance criteria.Boxplots (5th, 25th, 50th, 75th and 95th percentiles) summarise the variety of scores over the 907 catchments of the data set.

F
Figure5.Distributions of the three performance criteria, obtained in two cases, (i) when the donor catchments are treated as ungauged (solid lines) and (ii) when the donor catchments are treated as gauged (dashed lines).Boxplots (5th, 25th, 50th, 75th and 95th percentiles) summarise the variety of scores over the 907 catchments of the data set.

Figure 6 .
Figure6.Distributions of the three performance criteria, obtained in two cases: (i) when 10 groups of relative errors are used (solid lines) and (ii) when only one group is used (dashed lines).Boxplots (5th, 25th, 50th, 75th and 95th percentiles) summarise the variety of scores over the 907 catchments of the data set.

Figure 7 .
Figure7.Impact of deterministic performance, as quantified by the bounded C2M quadratic criterion, on the three performance criteria for the 907 catchments.Note that for easier visualisation, the lower limits of the AWI (b) and ISS (c) values are set to −100 % but lower AWI values are obtained in seven cases for both models, and lower ISS values are obtained in 18 and 22 cases for GR4J and TOPMO, respectively.

Figure 8 .
Figure8.Impact of the hydrometrical desert on the distributions of the three performance criteria.Potential donor catchments are not retained as donors when their distance to the target catchment is below 10, 20, 50, 100 and 200 km.Boxplots (5th, 25th, 50th, 75th and 95th percentiles) summarise the variety of scores over the 907 catchments of the data set.