A recurrent problem in hydrology is the absence of streamflow data to calibrate rainfall–runoff models. A commonly used approach in such circumstances conditions model parameters on regionalized response signatures. While several different signatures are often available to be included in this process, an outstanding challenge is the selection of signatures that provide useful and complementary information. Different signatures do not necessarily provide independent information and this has led to signatures being omitted or included on a subjective basis. This paper presents a method that accounts for the inter-signature error correlation structure so that regional information is neither neglected nor double-counted when multiple signatures are included. Using 84 catchments from the MOPEX database, observed signatures are regressed against physical and climatic catchment attributes. The derived relationships are then utilized to assess the joint probability distribution of the signature regionalization errors that is subsequently used in a Bayesian procedure to condition a rainfall–runoff model. The results show that the consideration of the inter-signature error structure may improve predictions when the error correlations are strong. However, other uncertainties such as model structure and observational error may outweigh the importance of these correlations. Further, these other uncertainties cause some signatures to appear repeatedly to be misinformative.

In many areas of the world the absence of past observational streamflow time
series to calibrate rainfall–runoff models limits the ability to apply such
models reliably to predict streamflow and inform effective water resources
management. Whilst a large and increasing number of regions across the world
are insufficiently gauged

A commonly applied approach is to use response signatures (e.g., the runoff
ratio and the base flow index), which can provide insight into the
hydrological functional behavior of a catchment

Different ways of incorporating the regionalized information into a catchment
model have been suggested in the literature. This includes set-theoretic
approaches

Conditioning a rainfall–runoff model on multiple independent signatures would
reflect a spectrum of processes and, in principle, lead to an accurate
prediction of flow time series

Formally, in a Bayesian context, it is necessary to distinguish between
correlated signatures and correlated signature errors. It is the
correlation between the errors that should be accounted for in the likelihood
function to avoid double counting of information. It is possible to have two
highly correlated signatures that are derived from independent information
sources and therefore have uncorrelated errors. In that case, it would be
valid to include both signatures in the likelihood function without
accounting for correlation. This principle is well established when
considering Bayesian calibration to a time series of flow observations, where
flow values are typically strongly autocorrelated – but it is the
observation error autocorrelation that is relevant to the likelihood function
derivation

In this paper, we introduce and test a method that considers multiple
regionalized signatures, explicitly accounting for the signature error
correlations. By formally accounting for the error covariance, we hypothesize
that accuracy of flow predictions will generally improve and a greater number
of signatures can usefully be included without introducing avoidable bias
related to the duplication of information. This should allow the modeler to
use all signatures available without having to select, on a more or less
subjective basis, the most relevant (independent) signatures. The objective
is thus to explore how to get fuller value out of a set of regionalized
information than has been achieved in past applications. The method is
applied to a set of 84 United States catchments with a broad range of
hydrometeorological characteristics, obtained from the Model Parameter
Estimation Experiment (MOPEX) data set

Using a simple least-squares regression, observed signatures of catchments'
functional responses are related to physical and climatic attributes of the
catchments. Assuming that the same catchment attributes are available for an
ungauged location, it is possible to obtain an estimate of the set of
signatures for the location. Further, the parametric distribution of
regression errors can be directly translated to a response signature(s)
likelihood function. The likelihood function can then be used to update the
prior available knowledge about model parameters via Bayes' law, which is
expressed as

To apply Bayes' law (Eq.

The likelihood functions are defined using joint distributions of respective
signature errors obtained from the regionalization model. Errors introduced
by the regionalization procedure may come from at least five sources. First,
errors are introduced by the fact that the regression model is estimated
using a specific sample of catchments rather than the entire population;
second, differences may exist between the observed and the true value of the
response signature due, for example, to factors such as the discharge record
length and time period of record used in the computation

Considering all available gauged catchments, stepwise regression is applied to each signature independently to determine which predictors to include. The predictors are then fixed for the remaining steps.

Considering all available gauged catchments, one catchment is left out and the remaining are used in the fitting of the regression models for each signature.

The regression models obtained in step 2 are used to estimate the signature values for the omitted catchment.

The error for each signature is calculated for the omitted catchment by comparing the regionalized and observed signature values.

The process is repeated for all catchments.

A parametric joint probability distribution is fitted to all the computed errors. Furthermore, the errors are tested for independence that allows (approximately) factorizing a joint distribution into a product of marginal distributions.

To avoid masking the potential value of the regionalized signatures with
model structure and observational errors, a “perfect model” is first
employed. This involves using the preselected rainfall–runoff model and the
observed forcing data to generate the “observed” catchment signatures. The
Nash–Sutcliffe criteria (NSE)

A set of 84 medium-sized United States catchments (242 to 8657 km

The 84 catchments are hydrologically varied with a selection of properties
summarized in Table

Summary of general catchment properties and response signatures of the 84 MOPEX catchments.

Five response signatures are considered: runoff ratio (RR), base flow index
(BFI), streamflow elasticity (SE), slope of flow duration curve (SFDC), and
high pulse count (HPC) (Table

RR reflects the amount of precipitation that becomes streamflow over a
certain area and time. It is determined as the ratio of catchment's outlet
streamflow and catchment average precipitation over the 10 years used in this
study. BFI gives the proportion of streamflow that is considered to be base
flow. A simple one-parameter single-pass digital filter method is used to
derive BFI

The probability distributed moisture (PDM) model

Employing Bayes' law (Eq.

Two metrics are used to assess the effectiveness of the parameter
conditioning procedure: (1) the Bayes factor

The Bayes factor (BF) is defined as the ratio between two marginal
distributions of the data

When using synthetic streamflow data (“perfect model” approach), with the
streamflow time series generated by a preselected parameter set,

The probabilistic Nash–Sutcliffe efficiency NSEprob

For model validation, we use a jack-knife approach (or leave-one-out
strategy), commonly employed in regionalization studies

The regionalization error probability distributions (that define the
likelihoods) are generated following steps 2–6 in Sect. 2.2.2 and are
shown in Fig.

Distribution of individual signature residuals (res) are
approximated as histograms and normal distributions. The scatterplots and
correlation coefficients (

This section considers the role of inter-signature error correlation on model parameter estimation when pairs of signatures are used. First, different imposed error variances and correlations together with synthetic streamflow data are employed to test the impact of inter-signature error correlation without the impact of model structural error. Then, the results obtained using the observation-based error structure, for both synthetic and observed data streamflow, are analyzed.

Synthetic streamflow data are generated as described in
Sect.

Tested variance values for the data-based and imposed error structures.

Ten possible pairs of the five response signatures are used in parameter
conditioning, and the median Bayes factor, calculated over the 84 MOPEX
catchments, is calculated for each pair. The Bayes factor (Eq.

Reference table showing the 95 % confidence interval for the
median Bayes factor. The correlation coefficient

Figure

The Bayes factor for the 10 pairs of signatures over the 84
catchments when the observation-based error structure is used with

The signature pair [SFDC, HPC] shows the strongest correlation between errors
(

Nevertheless, it is clear from Fig.

Multiple signatures are used for parameter constraining and flow prediction. The information value of multiple signatures and its dependence on inter-signature error correlations is explored in this section.

Figure

Boxplots representing the distribution of the Bayes factor for each combination of signatures for synthetic streamflow data. The colored boxplots correspond to the results obtained when inter-signature error correlations are considered in the likelihood function, whereas the grey dashed boxplots correspond to the results obtained assuming that the inter-signature errors are independent.

To better evaluate whether the incorporation of additional sources of information improves parameter identification, one-sided Kolmogorov–Smirnov tests are applied between any combination of certain signatures (e.g., [SE, SFDC]) and any other combination that contains the same signatures and a new one (e.g., [SE, SFDC, HPC]). It is found that adding more signatures improves parameter identification in 82.5 % of the cases (66 out of 80 cases) at a 95 % confidence level.

Figure

Boxplots representing the distribution of NSEprob values for each combination of signatures for synthetic streamflow data. The colored boxplots correspond to the results obtained when inter-signature error correlations are considered in the likelihood function, whereas the grey dashed boxplots correspond to the results obtained assuming that the inter-signature errors are independent.

It is worth noting that very similar results (not shown here) are obtained
when instead of regionalized signatures, “observed” signatures are used but
with the same error derived from regionalization. This suggests that the
uncertainty around the regionalized signatures values, as well as signature
information content, are the key factors leading to the results shown in
Fig.

Figure

Boxplots representing the distribution of the Bayes factor for each combination of signatures for observed streamflow data. The colored boxplots correspond to the results obtained when inter-signature error correlations are considered in the likelihood function, whereas the grey dashed boxplots correspond to the results obtained assuming that the inter-signature errors are independent.

Further, by comparing Fig.

Figure

Boxplots representing the distribution of NSEprob values for each combination of signatures for observed streamflow data. The colored boxplots correspond to the results obtained when inter-signature error correlations are considered in the likelihood function, whereas the grey dashed boxplots correspond to the results obtained assuming that the inter-signature errors are independent.

Figure

In summary, unless there is no model structural error, an all-round performance improvement is not guaranteed by adding more signatures. Furthermore, model structure uncertainty seems to have a much bigger effect on the performance than the explicit inclusion of the inter-signature error correlations.

The main feature of the method suggested in this paper lies in the
possibility of allowing a large number of signatures to be added to the
conditioning process, without worrying about double counting of information
or degree of uncertainty in signature estimates and avoiding subjective
decisions about removal of possibly non-independent information. Although the
proposed framework can be applied to any number of signatures, the limited
sample size (i.e., number of gauged catchments available) can have an impact
on the definition of the likelihood distribution. For this specific study 83
samples were available to define that distribution. When a single response
signature is used to condition the hydrological model, this sample size is
likely to be sufficient to confidently judge whether the normal distribution
assumption is sufficient. However, when moving to multidimensional problems,
in which various signatures may be used simultaneously to condition the
hydrological model, it is increasingly difficult to judge the adequacy of any
multivariate parametric distribution and to judge which catchments are
outliers. This implies that as more signatures are used simultaneously in the
conditioning of the hydrological model, the more gauged catchments should be
used to define the likelihood function. As stressed by

While the work presented in this paper addresses a number of issues
associated with model regionalization, it is important to highlight some
additional areas for future research. An important source of uncertainty
comes from model structure error

Some of the results presented may be sensitive to the response signatures
used. The relationship between value of signatures and catchment type remains
ambiguous and an interesting aspect for posterior evaluation would be how the
value of signatures depends on catchment type. Other aspects that are worth
further research include whether a similar framework could be applied to
different types of information source, e.g., can some discharge measurements
be added into the model conditioning process? While

Uncertainty in streamflow estimation in ungauged catchments originates not only from the traditional sources of error generally identified in rainfall–runoff modeling (i.e., model structural, parameter, and data errors) but also by errors introduced by the transposition of information from data-rich areas and use of this information to condition model simulations. To identify which and how many types of signatures can usefully be included in model conditioning, it is critical to understand the effects of all these uncertainties. Moreover, when multiple signatures are used simultaneously to condition model simulations, inter-signature error dependencies may also introduce uncertainty and affect decisions about the value of information. While error and uncertainty analyses are quite common in regionalization studies, the question of how much information can be taken from a set of uncertain signatures and determining how many and which signatures should be used given their error dependencies has not been extensively studied.

The method suggested in this paper allows the specification of a signature error structure. A common reason for not including large numbers of signatures in regionalization studies is the potential for underestimation of uncertainty due to duplication of information. This study helps to justify the inclusion of larger sets of signatures in the regionalization procedure if their error correlations are formally accounted for and thus enables a more complete use of all available information. The results show that adding response signatures to constrain the hydrological model, while accounting for inter-signature error correlations, can contribute to a stronger identification of the optimum parameter set when the error correlations between different sources of information are strong. Furthermore, the results show that assuming independency of errors does not result in significant deterioration in model performance, unless the error correlation is very strong. The results also show that the effect of error correlations is likely to be overwhelmed by model structure and observation errors. The method suggested here can therefore become more relevant if observational and structural errors are reduced. In addition, it is illustrated that using more signatures, with and without considering their error correlations, may lead to deterioration in performance. In our case, there were particular problems when adding the slope of the flow duration curve and/or the high pulse count. As this is likely to be specific to the rainfall–runoff model used, the selected performance criteria and the set of catchments, it is recommended that the misinformative information sources are identified as part of any regionalization study, in a similar manner as has been done here.

A schematic representation of the model structure used in this study is shown
in Fig.

The parameter ranges (Table

Schematic representation of the rainfall–runoff conceptual model structure used.

Conceptual model prior parameter ranges.

When evaluating the impact of inter-signature error correlations on model
parameter identification, results are assessed in terms of Bayes factor

For a given hypothesis

The above integral can be numerically approximated as

In a “perfect model” study, data

While other choices can be made, two cases are considered in this paper.
First, the two distributions in Eq. (

The authors would like to acknowledge the support of Fundação para a Ciência e a Tecnologia (FCT), Portugal, sponsor of the PhD program of S. Almeida at Imperial College London, under the grant SFRH/BD/65522/2009. This work was also partially supported by the Natural Environment Research Council [Consortium on Risk in the Environment: Diagnostics, Integration, Benchmarking, Learning and Elicitation (CREDIBLE); grant number NE/J017450/1]. The authors would like to thank Keith Sawicz for advice and support relating to the data used in this study. The authors also thank the editor Vazken Andréassian, who handled the manuscript, and the three anonymous reviewers for their useful comments. Edited by: V. Andréassian