The widespread application of deterministic hydrological models in research and practice calls for suitable methods to describe their uncertainty. The errors of those models are often heteroscedastic, non-Gaussian and correlated due to the memory effect of errors in state variables. Still, residual error models are usually highly simplified, often neglecting some of the mentioned characteristics. This is partly because general approaches to account for all of those characteristics are lacking, and partly because the benefits of more complex error models in terms of achieving better predictions are unclear. For example, the joint inference of autocorrelation of errors and hydrological model parameters has been shown to lead to poor predictions. This study presents a framework for likelihood functions for deterministic hydrological models that considers correlated errors and allows for an arbitrary probability distribution of observed streamflow. The choice of this distribution reflects prior knowledge about non-normality of the errors. The framework was used to evaluate increasingly complex error models with data of varying temporal resolution (daily to hourly) in two catchments. We found that (1) the joint inference of hydrological and error model parameters leads to poor predictions when conventional error models with stationary correlation are used, which confirms previous studies; (2) the quality of these predictions worsens with higher temporal resolution of the data; (3) accounting for a non-stationary autocorrelation of the errors, i.e. allowing it to vary between wet and dry periods, largely alleviates the observed problems; and (4) accounting for autocorrelation leads to more realistic model output, as shown by signatures such as the flashiness index. Overall, this study contributes to a better description of residual errors of deterministic hydrological models.

Deterministic hydrological models are widely applied
in research and decision-making processes. The quantification of their
associated uncertainties is therefore an important task with high relevance
for the scientific learning process, as well as for operational decisions
with respect to water management. The total output uncertainty of those
models is a combination of (i) propagated input uncertainty

Various studies have investigated error models that consider correlation,
heteroscedasticity and non-normality of errors of deterministic hydrological
models. A typical approach, which is also applied in this study, is to
describe total output uncertainty in a lumped way

Heteroscedasticity is often considered in weighted least-squares error models by parameterising the
variance of the normal distribution as a function of the streamflow

Typically, residual errors are represented as a stationary
process. The issue of stationarity has been the subject of recent
debate

A probabilistic model to deal with unequally spaced data was
proposed by

Non-negativity of streamflow can be addressed by truncating the error probability density function so
that it does not extend to negative streamflow. This leads to zero
probability for zero streamflow, which may not always be
adequate. The truncation approach is seldom followed, and in most applications the truncation occurs “in
prediction only”

First, there is a lack of general approaches that can deal with all the above-mentioned characteristics of error models simultaneously. One general error
model that can accommodate various characteristics is the probabilistic model
proposed by

Second, there is limited guidance to the choice of a particular error
model for a given application. In the past, the choice has been
generally ad hoc, with limited justification. Only recently, there has
been more systematic comparison and testing which has resulted in some
general recommendations. For example,

Third, previous experience has shown that more realistic error models, which
are more complex, do not always result in better predictions. The additional
parameters of some of the more complex error models were found to have
undesirable interactions with the parameters of the hydrological model,
leading to unrealistic parameter values and poor predictions. For example,
particularly in dry catchments, accounting for autocorrelation produces worse
predictions than omitting it

Fourth, the potential advantages of more complex error models are
under-appreciated by the hydrological community. For relatively simple
uncertainty analysis, like the plotting of uncertainty bands around
hydrographs, the use of simplified error models may appear
justified. However, there are
several applications that go beyond this task, and for which a
simplified error model may lead to poor results. For example, assuming
uncorrelated errors may lead to unrealistic extrapolations

The goals of this study are the following:

Develop a flexible framework for likelihood functions for hydrological models that accounts for the following major characteristics of their errors: non-normality (heteroscedasticity, skewness and excess kurtosis), autocorrelation, non-stationarity regarding wet and dry periods, unequally spaced observation time points, and non-negativity of streamflow.

Use the flexible framework to do controlled experiments by
varying some of the assumptions and by performing joint inference
of a hydrological model with error models of increasing complexity. Investigate the effect of the various assumptions on
the quality of the predictive distributions. In particular, with case studies in two catchments, we investigate the
following questions:

Can we confirm previous findings about the problems related to joint inference of hydrological and error model parameters?

What are the causes of the problems encountered in joint inference of hydrological and error model parameters?

Can we improve the joint inference by introducing non-stationarity by allowing the autoregressive parameter to change between wet and dry periods?

Does the consideration of autocorrelation lead to more realistic predictions (e.g. in terms of better representation of hydrograph signatures such as the flashiness index)?

Can parameters controlling the shape of the distribution of the errors be inferred jointly with the hydrological model parameters to account for non-normality?

The paper is structured as follows. The theoretical framework for the
probabilistic model, corresponding to Goal 1, is presented in
Sect.

Suppose we choose the distribution

We assume that

Accounting
for temporal correlation requires some additional conceptualisations. Consider the transformation function

To describe autocorrelation in the deviations of

In summary, to transfer information between
time points, we transform the distribution

Note that, for a constant time step

The likelihood is then obtained by building
the product of the conditional probabilities in Eq. (

As a basis for subsequent applications, we set

Example of skewed Student's

The standard deviation of

Overview of the error models applied in this study, their
assumptions regarding correlation and the
distribution of streamflow and their corresponding
parameters (SKT: skewed Student's

If

Table

Consider that for any practical case of inference or
prediction, we will have a finite series of time points of interest

Given a suggested parameter vector

Using

As the likelihood (Eq.

For prediction, stochastic realisations of model output are obtained
by inverting Eq. (

Randomly draw a parameter vector

Using

Using

Use

How can the performance of empirical error
models, such as those presented in this study, be quantified? We argue
that the performance of an error model in joint inference with a hydrological
model should be judged according to the following criteria: (a) good reproduction
of observed dynamic fluctuations by individual model realisations, (b) good
overall predictive marginal distribution of streamflow, and (c) small absolute
deviance between model output and observations. The flashiness index
(Sect.

The function to calculate the flashiness index

Reliability is defined similarly to

The relative spread is an indicator for the width of the
predictive distributions over all time points, and was proposed by

The Nash–Sutcliffe efficiency

As a measure of systematic over- or under-prediction of streamflow, we
calculate the relative error in total cumulative streamflow:

The probabilistic framework developed in
Sect.

The Maimai experimental catchments are a set of small headwater catchments
with a long history of hydrological research. They are located on a deeply
incised hillslope on the South Island of New Zealand. The area is forested
and the climate is considerably more humid than in the Murg catchment
(Table

Properties of the two case study catchments.

While the resolution of the original data was hourly, we produced data sets
with 6-hourly and daily resolution by aggregation for both catchments. This
set-up allows us to systematically investigate the effect of the temporal
resolution of the data on the joint inference of hydrological and error model
parameters. This could contribute to the identification of the cause of
previously encountered problems in joint inference (Goal 2b specified in
Sect.

The hydrological model used throughout this study
is a simple, lumped bucket model with two reservoirs
(Fig.

Structure of the deterministic hydrological model used in
this study.

The prior distribution of the parameters was assumed to be composed of
independent normal or log-normal distributions with relatively large standard deviations (see Table

Prior distributions of the hydrological and error model parameters applied in all the cases where the respective parameter was used. N = Gaussian normal; LN = log-normal. Where lower and upper boundaries are listed, the distribution is truncated at those values.

After providing some general results, this section contains a more
detailed summary of the results for each of the tested error models. The complete analysis included additional
error models and performance metrics, which are included in
Appendix

Figure

Performance of the error models
with respect to the flashiness index, reliability and relative spread for both
catchments and all temporal resolutions.

Performance of the error models
in terms of the relative cumulative error in streamflow,

E1 tends to strongly overestimate the true flashiness in the case of
high temporal resolutions in both catchments
(Fig.

With the constant correlation assumption made in E2,

E3 generally overestimates the true flashiness; i.e.

When inferring

In the Murg catchment, on the other hand,
we see a degenerating performance of E3a with increasing measurement frequency, with values of

The stochastic model realisations with E4 tend to overestimate the true
flashiness index; the difference between

E4a results in

Streamflow predictions with hourly resolution in the Maimai catchment in a part of the validation period (1993) obtained with error models E1 (a), E2 (b) and E3a (c). Deterministic predictions with the parameter values at the maximum posterior density are shown together with the 90 %-confidence bands and one single stochastic streamflow realisation for each of the error models.

Error model E3, which accounts for reduced correlation of errors during the
precipitation events, leads to an overall improvement in the investigated
performance metrics (except

Transformed residuals,

Figure

In the Murg Catchment, inferring

Marginal posterior densities of

Relationship between the fixed correlation time during
precipitation events,

Relaxing the assumption of normality by inferring

Regarding the location of

Assumptions about the presence (E2) and absence (E1) of autocorrelation in

Accounting for the fact that

Figure

What is the physical explanation for non-stationary autocorrelation of the
errors

A very simple way of considering this reduced correlation (E3) provides strongly improved results compared to
the assumption of stationary correlation
(Sect.

Time series of

To challenge this hypothesis, one could argue that the improved performance
of E3 (compared to E2) might also be achieved when reducing

One could also argue that the improved performance of E3 compared to E2 is
primarily due to assuming reduced autocorrelation during periods with strong
outliers (i.e. storm events) and that those outliers (visible in
Fig.

It is still unclear what the optimal parameterisation of a time-dependent
correlation could be. Using the input to directly inform the correlation
structure of the output requires knowledge of how the catchment transforms
the signal. For example, there could be a significant time lag between precipitation
and streamflow, which would have to be taken into account in
Eq. (

The fact that

These findings call for additional investigations into the issue of
non-stationary correlation, potentially exploring other relationships between

Relaxing the assumption of marginal normality of

The ranking in performance of the two options to either place the mean or the
mode of

Regarding the choice of the type of the distribution

We presented and evaluated a flexible framework for probabilistic model formulations (i.e. likelihood functions) to describe the total uncertainty of the output of deterministic hydrological models. This framework allows us to consider heteroscedastic errors with non-stationary correlation, non-equidistant observations and zero probability for negative streamflow. It does so by allowing for arbitrary and explicit marginal distributions for the observed streamflow at each point in time. For experts, it is easier to parameterise these marginal streamflow distributions than the distribution characterising the autoregressive model or some non-intuitive transformations like the Box–Cox transformation. The consistent implementation of this framework was successfully checked with a synthetic case study.

Using a simple deterministic hydrological bucket model and two case
study catchments, the flexible framework was used to
systematically test different error models on real-world data. Those error models represented various assumptions about the statistical
properties of the errors in terms of autocorrelation, skewness and
kurtosis. The assumptions were found to have a profound effect on
the quality of the predictions. The key findings are as follows:

We confirmed that, as shown in previous work by various authors, accounting for autocorrelation with conventional approaches (represented by model E2) can lead to worse predictions than omitting autocorrelation (model E1). For example, model E2 had errors in cumulative streamflow of 76 % in the Murg catchment and 96 % in the Maimai catchment for hourly resolution in the calibration period. With model E1, in comparison, those errors were 1 % and 19 %, respectively. However, this result is unsatisfactory as there is clearly visible autocorrelation in the residuals that invalidates the model E1.

We showed that the predictions of conventional approaches to deal with autocorrelation worsen significantly as the temporal resolution increases. For example, the performance of model E2 in terms of the Nash–Sutcliffe efficiency decreases from 0.76 to 0.09 in the calibration period when moving from daily to hourly data resolution. In comparison, the performance of model E1 remains relatively stable (Nash–Sutcliffe efficiency decreases from 0.83 to 0.79).

Since rapid changes in a catchment's storage change its memory, errors in streamflow are expected to show different correlations during precipitation events and dry weather. Based on the hypothesis that this non-stationarity increases when going from daily to hourly resolution, neglecting non-stationarity of correlation is the likely cause for finding 2.

Accounting for non-stationarity in autocorrelation significantly alleviated the observed problems of finding 2. In particular, allowing for the autocorrelation to be lower during wet than during dry periods (models E3 and E4) led to more stable behaviour across time resolutions. For example, volume errors for model E3 in the Murg catchment were not larger that 3 % for all three investigated temporal resolutions. However, inferring the characteristic correlation time during precipitation events (model E3a) provided good results in only one of the two investigated catchments. Keeping that correlation fixed (model E3) could be seen as a pragmatic option with stable performance.

If the problems mentioned in finding 1 can be avoided, accounting for autocorrelation results in more realistic characteristics of model output than omitting autocorrelation, which is confirming previous work. In particular, signatures such as the flashiness index are much better represented when including autocorrelation. For example, for an observed value of the flashiness index of 0.13 in the Maimai catchment in the calibration period, model E3a provided a value of 0.13, whereas model E1 resulted in a much larger value of 0.56.

Inferring the skewness and kurtosis of a skewed
Student's

These results contribute to a better characterisation of the residual errors
of deterministic hydrological models. However, some questions remain. It
still has to be shown to what degree the findings of this study are
generalisable to a larger and more diverse set of catchments and to different
hydrological models. A comparison of the presented approach to existing
frameworks based on different assumptions, like the generalized likelihood
framework, would yield further insights. Furthermore, it is still unclear how
the non-stationary autocorrelation should ideally be parameterised. The chosen
approach, where we alternate between two values of the autoregressive
parameter based on whether there is precipitation or not, might lead to
problems in catchments with strong lags between precipitation and streamflow.
In those cases, defining the autoregressive parameter as a function of
modelled streamflow might be more suitable. Furthermore, future studies could
investigate different approaches to describe non-stationary correlation or
distributions other than the Gaussian and the skewed Student's

The data of the Maimai catchment can be obtained from
Jeffrey McDonnell (Associate Director at Global Institute for Water Security
and Professor at the School of Environment and Sustainability at the
University of Saskatchewan,

To derive the conditional distribution of

In simplified notation (which makes it easier to get the key idea without
getting in notational details), we get the following:

With explicit notation of functions and arguments, we get

Murg: summary of the predictions in the calibration and the
validation period made with error models E1–E4 for different temporal
resolutions of the hydrological data. Values are medians (and standard
deviations) of the quality indices of the deterministic model output for the
maximum posterior parameters, as well as those of 500 streamflow realisations
produced with the full posterior parameter distributions. Recall that smaller
values of

Maimai: summary of the predictions in the calibration and
the validation period made with error models E1–E4 for
different temporal resolutions of the hydrological
data. Values are medians (and standard deviation) of the quality indices of the deterministic model output for the maximum posterior parameters, as well as
those of 500 streamflow realisations produced with the full posterior
parameter distributions. Recall that smaller values of

And the mean and the variance of the skewed rescaled distribution are as follows:

The supplement related to this article is available online at:

PR conceptualized the general theory with contributions from LA and FF. LA developed the conceptual adaptations and improvements of the suggested approaches with contributions from PR and FF. All authors designed the experiments and FF and LA selected the test cases. LA did the implementation, data compilation, and testing. LA wrote the paper with contributions from FF and PR.

The authors declare that they have no conflict of interest.

This study was funded by the Swiss National Science Foundation (grant 200021_163322). The authors thank MeteoSwiss (Swiss Federal Office of Meteorology and Climatology) for the meteorological data concerning the Murg catchment, Massimiliano Zappa for the preprocessing of this data and Jeffrey McDonnell for the hydrological data of the Maimai catchment. Lorenz Ammann thanks Omar Wani for the inspiring discussions and exchange of ideas. Dmitri Kavetski provided valuable feedback on a draft of this paper. The authors also thank Alberto Montanari, Jasper Vrugt and two anonymous referees for their feedback and their help in improving this paper.

This paper was edited by Erwin Zehe and reviewed by Jasper Vrugt and two anonymous referees.