Predicting streamflow hydrographs in ungauged catchments is challenging, and accompanying the estimates with realistic uncertainty bounds is an even more complex task. In this paper, we present a method to transfer global uncertainty estimates from gauged to ungauged catchments and we test it over a set of 907 catchments located in France, using two rainfall–runoff models. We evaluate the quality of the uncertainty estimates based on three expected qualities: reliability, sharpness, and overall skill. The robustness of the method to the availability of information on gauged catchments was also evaluated using a hydrometrical desert approach. Our results show that the method presents advantageous perspectives, providing reliable and sharp uncertainty bounds at ungauged locations in a majority of cases.

Predicting the entire runoff hydrograph in ungauged catchments is
a challenge that has attracted much attention during the last
decade. In this context, the use of suitable conceptual rainfall–runoff
models has proved to be useful, and because traditional calibration
approaches based on observed discharge data cannot be applied to ungauged
catchments, other approaches are required. Various methods have been
proposed for the estimation of rainfall–runoff model parameters in
ungauged catchments, as reported by the recent summary of the Prediction
in Ungauged Basins (PUB) decade

The estimation of predictive uncertainty is deemed good practice in any
environmental modelling activity

For gauged catchments, the methodologies include Bayesian approaches

While many methods have been proposed for gauged catchments, only a few
have been proposed for the estimation of predictive uncertainty on
ungauged catchments.

One difficulty of the above-mentioned approaches lies in the
interpretation of the uncertainty bounds obtained from the parameter
ensemble predictions. As noted by

A pragmatic alternative consists in addressing the parameter
estimation and the global uncertainty estimation issues separately.
It has been argued by several authors

As stated by

The historical model residuals (errors) between the model prediction

Similarly, one could argue that the model residuals between the model's
prediction and the observed data at

The only attempt to apply a global uncertainty estimation
approach at ungauged locations that we are aware of
is the one presented by

The aim of this paper is to provide an estimation of the global uncertainty affecting runoff prediction at ungauged locations when a rainfall–runoff model and a regionalisation scheme are used.

To our knowledge, a framework based on residual errors and global uncertainty quantification has not yet been extensively tested in the context of prediction in ungauged catchments. This paper contributes to the search for methods able to provide uncertainty estimates when runoff predictions in ungauged catchments are sought.

Our objective is not to develop a new parameter regionalisation approach.
Therefore, we purposely chose to use ready-to-use materials and methods
and only focus on the uncertainty quantification issue. This study can be
considered as a follow-up of the work by

A database of 907 French catchments was used. They represent various
hydrological conditions, given the variability in climate, topography and
geology in France. This set includes fast-responding Mediterranean
catchments with intense precipitation as well as larger,
groundwater-dominated catchments.
Some characteristics of the data set are given in Table

Characteristics of the 907 catchments.

Two daily, continuous lumped rainfall–runoff models were used.

The GR4J rainfall–runoff model, an efficient and parsimonious daily
lumped continuous rainfall–runoff model described by

The TOPMO rainfall–runoff model, inspired by TOPMODEL

Using these two models rather than a single one makes it possible to draw more general conclusions. The two models use a soil moisture accounting procedure as well as routing stores. However, they differ markedly in the formulation of their functions. While the GR4J model uses two non-linear stores and a unit-hydrograph, the TOPMO model uses a linear and an exponential store, and a pure time delay.

The GR4J and TOPMO models have four and six free parameters, respectively.
On gauged catchments, parameter estimation is performed using a local
gradient search procedure, applied in combination with pre-screening of
the parameter space

By definition, no discharge data are available for calibrating parameter sets at ungauged locations. Therefore, other strategies are needed to estimate the parameters of hydrological models for prediction in ungauged catchments.

Poorly modelled catchments were discarded as potential donors: only catchments with a performance criterion NSVQ in calibration above 0.7 were used as possible donors.

The spatial proximity approach was used. It consists in transferring parameter sets from neighbouring catchments to the target ungauged catchment. The proximity of the catchments to the gauged catchments was quantified by the distances between catchment centroids.

The output averaging option was chosen. It consists in computing the mean of the streamflow simulations obtained on the ungauged catchment with the set of parameters of the donor catchments.

The number of neighbours was set to four and seven catchments for GR4J and
TOPMO, respectively, following the work reported by

Transferring calibrated model parameters from gauged catchments to ungauged catchments is a well-established approach when parameters cannot be inferred from available data. The method presented here extends the parameter transfer approach to the domain of uncertainty estimation.

The main ideas underlying the proposed approach are to (i) treat each donor as if it was ungauged (simulating flow through the above described regionalisation approach), (ii) characterise the empirical distribution of relative errors (understood as the ratio between observed and simulated flows, i.e. considering a multiplicative model error) for each of these donors, and (iii) transfer global uncertainty estimates to the ungauged catchment.

Illustration of the proposed approach, in the case of

The methodology used to transfer global uncertainty estimates can be
described by the following steps, illustrated in Fig.

Selection of catchments Here we consider a target ungauged catchment (TUC).
This catchment has

Application of the parameter regionalisation scheme to the donor
catchments NGC

Apply the parameter regionalisation scheme to obtain a simulated
discharge time series for each NGC

Compute the relative errors of discharge reconstitution
(i.e. the ratio between observed and simulated discharges)
by comparing simulated and observed discharge series for catchment NGC

Computation of the multiplicative coefficients applicable to simulated discharge

Put together all the relative errors from the donors
NGC

Compute the empirical quantiles of the relative error distribution
within each master group

Computation of the uncertainty bounds for the TUC

Apply the parameter regionalisation scheme to obtain a simulated
discharge time series for the TUC using the parameter sets of the

Multiply the simulated discharge by the set of multiplicative
coefficients obtained at Step 3b to obtain the uncertainty bounds.
The coefficients calculated for the group

Note that we based our approach on multiplicative errors and not on additive errors because using multiplicative coefficients yields prediction bounds for discharge that are always positive, whereas this might not always be the case with additive errors.

Finally, we mention that the choice to use 10 groups reflects a trade-off between the number of points available to obtain reasonable estimates of empirical quantiles computed for each group and an adequate treatment of the variability of the characteristics of errors with the magnitude of simulated discharge. A larger (lower) number of groups could obviously be used if more (fewer) data are available (see discussion in Sect. 5.3) or based on the analysis of the statistical properties of errors.

The first step deserves a brief explanation. The choice to treat donors
as ungauged is related to the well-known fact that the performance of
rainfall–runoff models decreases when they are applied at ungauged
locations with a regionalisation scheme, compared to when local
data are available for parameter estimation. The quadratic efficiency
criterion used here is the C2M

Figure

We assessed the relevance of the 90 % uncertainty bounds by
focusing on three characteristics: reliability, sharpness and overall
skill. A general introduction to probabilistic evaluation can be found in

Reliability refers to the statistical consistency of the uncertainty estimation with the observation, i.e. a 90 % prediction interval is expected to contain approximately 90 % of the observations if prediction errors are adequately characterised by the uncertainty estimation. To estimate reliability, we used the coverage ratio (CR) index, computed as the percentage of observations contained in the prediction intervals.

Sharpness refers to the concentration of predictive uncertainty. The
average width (AW) of the uncertainty bounds is widely used to quantify sharpness:

To ease comparison between catchments, we used the width of the
90 % interval [

Comparing the two values AW and AW

Uncertainty bounds that are as sharp as possible and reasonably reliable are sought: sharp intervals that would consistently miss the target would be misleading, while overly large intervals that would successfully cover the observations at the expense of sharpness would be of limited value for decision making.

To complete the assessment of the prediction bounds, we used the
interval score

Illustration of the evaluation of the uncertainty bounds.

Impact of the regionalisation scheme on deterministic performance, as quantified by the bounded C2M efficiency criterion. Note that in a very few cases, the performance obtained with the regionalisation scheme is better than the performance obtained with calibration. This is possible because of the output averaging option used by the regionalisation scheme.

Distributions of the three performance criteria. Boxplots (5th, 25th, 50th, 75th and 95th percentiles) summarise the variety of scores over the 907 catchments of the data set.

IS is the average value of

To ease comparison between catchments and evaluate the skill of the
prediction bounds, we used the 90 % interval [

The interval skill score (ISS) is positive when the prediction bounds are more skilful than climatology, and negative otherwise. The best value that can be achieved is equal to 1.

Figure

For both models, half of the catchments (from the lower quartile to the upper quartile) have CR values between 80 and 95 %. The median values are equal to 89 and 90 % for GR4J and TOPMO, respectively. Since a value of 90 % is expected for 90 % prediction bounds, these results suggest that the prediction bounds are in a majority of cases able to reflect the magnitude of errors when predicting runoff hydrographs in ungauged catchments, even though it is clear that the perfect value of 90 % is not reached in most cases.

The CR values fall below 70 % for around 14 % of the catchments with GR4J and 13 % with TOPMO, which indicates cases where the proposed approach yields predictive bounds that are clearly too narrow or biased (i.e. not well centred on the observations). Note that we did not find any guidance on how to properly evaluate the CR values in the literature. The results presented here may be used as a benchmark to comparatively assess the ranges of values that would be obtained in future studies.

Regarding sharpness, it can be seen that for GR4J, half of the catchments (from the lower quartile to the upper quartile) have AWI values between 39 and 67 %, while for TOPMO corresponding values are equal to 38 and 65 %. The median values are equal to 57 and 55 % for GR4J and TOPMO, respectively. The higher the AWI values, the lower the predictive uncertainty is. Since it would be utopic to expect that no errors will be made when predicting runoff hydrographs for ungauged catchments, we could consider here uncertainty reduction values between 30 and 80 % as quite satisfactory, even though we recognise that this statement is arbitrary since there are no widely agreed values to base our analysis on.

Distributions of the three performance criteria, obtained in two cases, (i) when the donor catchments are treated as ungauged (solid lines) and (ii) when the donor catchments are treated as gauged (dashed lines). Boxplots (5th, 25th, 50th, 75th and 95th percentiles) summarise the variety of scores over the 907 catchments of the data set.

Note that negative values are seen for 7 % of the catchments with
both GR4J and TOPMO, which indicates cases where the approach yields
prediction intervals whose average width is larger than the width of the
[

Finally, Fig.

As mentioned earlier, a critical step of the proposed approach is to apply the regionalisation scheme to obtain a simulated discharge time series for each donor catchment (Step 2a). To assess the impact of this methodological choice, we applied the methodology described earlier to transfer uncertainty estimates, but this time the donor catchments are treated as gauged.

Similar to Fig.

Note that reliability is generally considered as a prevailing
characteristic over sharpness, since it reflects the ability of the
uncertainty method to adequately reflect the magnitude of errors we might
expect at locations for which prediction is done. Therefore, the benefit
of treating the donor catchments as ungauged clearly appears in
Fig.

Distributions of the three performance criteria, obtained in two cases: (i) when 10 groups of relative errors are used (solid lines) and (ii) when only one group is used (dashed lines). Boxplots (5th, 25th, 50th, 75th and 95th percentiles) summarise the variety of scores over the 907 catchments of the data set.

Impact of deterministic performance, as quantified by the bounded
C2M quadratic criterion, on the three performance criteria for the
907 catchments. Note that for easier visualisation, the lower limits of the
AWI

Another critical step of the proposed approach is to use 10 groups of relative errors. The groups are defined according to the magnitude of the simulated discharge (Step 2b). This was done to take into account the fact that the characteristics of errors usually change according to the magnitude of the simulated discharge. To assess the impact of this methodological choice, we again applied the methodology described earlier to transfer global uncertainty estimates, but this time using only one group instead of 10.

Figure

Obviously, although it appears that a single group is not enough to account for the variability of properties of relative errors, 10 groups may not provide significant performance gains and a compromise may be sought. The visual inspection of scatter plots between relative errors and simulated discharge reveals that the shapes can be very different between catchments, hence potentially requiring different numbers of groups. Moreover, the simulation objectives, e.g. simulating intermediate or extreme flows, may also be considered when choosing the number of flow groups. Hence it appears that the number of groups may need further trial-and-error tests in specific applications to obtain the best compromise.

Although our tests reveal that the number of groups is a sensitive setting of the method, further research would be needed to evaluate whether different numbers of groups can be advised for specific objectives or conditions.

To gain insights into the possible relationships between the
performance of the deterministic rainfall–runoff simulations and
the characteristics of the uncertainty bounds at ungauged locations, the
three criteria used to characterise the uncertainty bounds are plotted
in Fig.

Mean C2M values over the 907 catchments of the data set, with calibration (CAL), regionalisation (REGIO), and with the hydrometrical desert (HD) defined by increasing distance (10, 20, 50, 100 and 200 km).

A trend appears between deterministic performance and characteristics of the prediction bounds at ungauged locations, for the two rainfall–runoff models. The reliability index exhibits greater variability compared to the sharpness index, and the stronger link is seen for the skill score. Reliability is relatively less affected by the poor deterministic performance of the simulation at an ungauged location because there are cases where poor performance at neighbouring locations leads (through the transfer of relative errors) to wide prediction bounds that are able to cover the observed values. We can also observe that skill scores and C2M scores are strongly related, which indicates that when the transfer of model parameters performs well, the transfer of global uncertainty estimates also performs well.

The results presented so far were obtained with a dense network of gauging
stations. To investigate the impact of the network density on our results,
we applied a demanding test called the hydrometrical desert. It consists in
excluding potential donors that are closer to the target ungauged catchment
than a given threshold. For example, a threshold distance of 100 km means
that the closest donor catchment must be at least 100 km away from the
ungauged target catchment. This test results in a notable decrease of
deterministic performance, as shown in Table

Impact of the hydrometrical desert on the distributions of the three performance criteria. Potential donor catchments are not retained as donors when their distance to the target catchment is below 10, 20, 50, 100 and 200 km. Boxplots (5th, 25th, 50th, 75th and 95th percentiles) summarise the variety of scores over the 907 catchments of the data set.

Figure

Runoff hydrograph prediction in ungauged catchments is notoriously difficult, and attempting to estimate the predictive uncertainty in that context is a further challenge. We have proposed a method allowing the transfer of global uncertainty estimates from gauged to ungauged catchments. The method extends the parameter transfer approach to the domain of global uncertainty estimation.

We evaluated the approach over a large set of 907 catchments by assessing three expected qualities of the uncertainty estimate: reliability, sharpness and overall skill. We applied two different rainfall–runoff models (GR4J and TOPMO) to ensure that the results presented are not model-specific. These results demonstrate that the method is generally able to reflect model errors at ungauged locations and provide reasonable reliability.

Nonetheless, the following limitations to the study can be mentioned.

Although the approach seems promising on average on the large catchment set we used, it is not able to adequately quantify the predictive uncertainty for some catchments and it failed in some cases.

The method might not perform well in in regions with sparser gauging networks than the one used here, as revealed by the application of a demanding test called the hydrometrical desert.

We only tested the 90 % prediction intervals, whereas the method could be applied to obtain other prediction intervals. We made this choice to keep the article as simple as possible, but further work could be done in that direction.

We also noted that the number of flow groups used in the approach may be a sensitive setting of the method, and further research would be needed to provide more detailed guidance on this point depending on the structure of the model errors and the modelling objectives.

It is worth stressing that although we used a transfer based on spatial proximity, the approach presented in this article is not only independent of the rainfall–runoff model but also of the regionalisation scheme used to obtain deterministic prediction at ungauged locations. Any other similarity measure could be a basis for transferring residual errors, including physically based similarity measures. Accordingly, the regionalisation settings, including the output averaging option, could be adapted if deemed more appropriate.

Since we believe that uncertainty quantification has to be considered in any modelling study, further work should be devoted to the search for similarity measures that not only perform well in allowing the transfer of parameter sets from donor to target catchments, but also allow transferring modelling error characteristics.

Last, we would like to stress that the results presented in this study are expressed in terms of dimensionless measures. As such, they can provide a basis for future comparisons when prediction in ungauged catchments with uncertainty estimates is performed.

The authors thank Météo-France for providing the meteorological data and Banque HYDRO for the hydrological data. The financial support of SCHAPI to the first author is also gratefully acknowledged. The authors also thank the editor Ross Woods, who handled the manuscript, and the five reviewers, including Alberto Viglione and Denis Hughes, for their constructive comments, which helped improve the manuscript. Edited by: R. Woods