Decades of hydrograph separation studies have estimated the proportions of recent precipitation in streamflow using end-member mixing of chemical or isotopic tracers. Here I propose an ensemble approach to hydrograph separation that uses regressions between tracer fluctuations in precipitation and discharge to estimate the average fraction of new water (e.g., same-day or same-week precipitation) in streamflow across an ensemble of time steps. The points comprising this ensemble can be selected to isolate conditions of particular interest, making it possible to study how the new water fraction varies as a function of catchment and storm characteristics. Even when new water fractions are highly variable over time, one can show mathematically (and confirm with benchmark tests) that ensemble hydrograph separation will accurately estimate their average. Because ensemble hydrograph separation is based on correlations between tracer fluctuations rather than on tracer mass balances, it does not require that the end-member signatures are constant over time, or that all the end-members are sampled or even known, and it is relatively unaffected by evaporative isotopic fractionation.

Ensemble hydrograph separation can also be extended to a multiple regression that estimates the average (or “marginal”) transit time distribution (TTD) directly from observational data. This approach can estimate both “backward” transit time distributions (the fraction of streamflow that originated as rainfall at different lag times) and “forward” transit time distributions (the fraction of rainfall that will become future streamflow at different lag times), with and without volume-weighting, up to a user-determined maximum time lag. The approach makes no assumption about the shapes of the transit time distributions, nor does it assume that they are time-invariant, and it does not require continuous time series of tracer measurements. Benchmark tests with a nonlinear, nonstationary catchment model confirm that ensemble hydrograph separation reliably quantifies both new water fractions and transit time distributions across widely varying catchment behaviors, using either daily or weekly tracer concentrations as input. Numerical experiments with the benchmark model also illustrate how ensemble hydrograph separation can be used to quantify the effects of rainfall intensity, flow regime, and antecedent wetness on new water fractions and transit time distributions.

For nearly 50 years, chemical and isotopic tracers have been used to quantify the relative contributions of different water sources to streamflow following precipitation events (Pinder and Jones, 1969; Hubert et al., 1969); see also reviews by Buttle (1994) and Klaus and McDonnell (2013), and references therein. As reviewed by Klaus and McDonnell (2013), chemical and isotopic hydrograph separation studies have led to many important insights into runoff generation. Foremost among these has been the realization that even at stormflow peaks, stream discharge is often composed primarily of “old” catchment storage rather than “new” recent precipitation (Sklash et al., 1976; Sklash, 1990; Neal and Rosier, 1990; Buttle, 1994). The previous dominant paradigm, based on little more than intuition, had held that because streamflow responds promptly to rainfall, the storm hydrograph must consist primarily of precipitation that reaches the channel quickly. Isotope hydrograph separations showed that this intuition is often wrong, because the isotopic signatures of stormflow often resemble baseflow or groundwater rather than recent precipitation. These observations have not only overthrown the previous dominant paradigm, but also launched decades of research aimed at unraveling the paradox of how catchments store water for weeks or months, but release it within minutes following the onset of rainfall (Kirchner, 2003).

The foundations of conventional two-component hydrograph separation are
straightforward. If one assumes that streamflow is a mixture of two
end-members of fixed composition, which I will call for simplicity “new” and
“old” water, then at any time

In typical applications, the new water is recent precipitation and the tracer signature of the old water is obtained from pre-event baseflow, which is generally assumed to originate from long-term groundwater storage.

The assumptions underlying conventional hydrograph separation can be
summarized as follows:

Streamflow is a mixture formed entirely from the sampled end-members; contributions from other possible streamflow sources (such as vadose zone water or surface storage) are negligible.

The samples of the end-members are representative (e.g., the sampled precipitation accurately reflects all precipitation, and the sampled baseflow reflects all pre-event water).

The tracer signatures of the end-members are constant through time, or their variations can be taken into account.

The tracer signatures of the end-members are significantly different from one another.

As reviewed by Rodhe (1987), Sklash (1990), Buttle (1994), and Klaus and McDonnell (2013), each of these
assumptions can be problematic in practice:

Hydrograph separation studies often lead to implausible (including negative) inferred contributions of new water, and such anomalous results are sometimes attributed to contributions from un-sampled end-members (e.g., von Freyberg et al., 2017). In such cases, assumption no. 1 is clearly not met.

The isotopic composition of precipitation can vary considerably within an event, both spatially and temporally, even in small catchments (e.g., McDonnell et al., 1990; McGuire et al., 2005; Fischer et al., 2017; von Freyberg et al., 2017). Likewise, the isotopic signature of the baseflow or groundwater end-member has been shown to vary in space and time during snowmelt and rainfall events (e.g., Hooper and Shoemaker, 1986; Rodhe, 1987; Bishop, 1991; McDonnell et al., 1991). In these cases, assumptions no. 2 and 3 are not met. Various schemes have been proposed to address this spatial and temporal variability by weighting the isotopic compositions of individual samples, but the validity of these schemes typically rests on strong assumptions about the nature of the runoff generation process and the heterogeneity to be averaged over.

When the difference between

Here I propose a new method for using isotopes and other conservative tracers to quantify the origins of streamflow. This method is based on statistical correlations among tracer fluctuations in streamflow and one or more candidate water sources, rather than mass balances. As such, it exploits the temporal variability in candidate end-members, rather than requiring them to be constant. It also does not require strict mass balance and thus is relatively insensitive to the presence of unmeasured end-members. Because this method quantifies the average proportions of source waters in streamflow across an ensemble of events or time steps, it does not answer the same question that traditional hydrograph separation does (namely, how fractions of new and old water change over time during individual storm events). Instead, it can answer new and different questions, such as how the average fractions of new and old water vary with stream discharge or precipitation intensity, antecedent moisture, etc. The proposed method is designed to provide insights into stormflow generation from regularly sampled time series, even if those time series have gaps and even if they are sampled at frequencies much lower than the storm response timescale of the catchment.

The purpose of this paper is to describe the method, document its mathematical foundations, and test it against a benchmark model, in which the method's results can be verified by age tracking. Applications to real-world catchments will follow in future papers. Because the proposed method is new and thus must be fully documented, several parts of the presentation (most notably Sect. 4.2–4.4 and Appendix B) necessarily contain strong doses of math. The math can be skipped, or lightly skimmed, by those who only need a general sense of the analysis. A table of symbols is provided at the end of the text.

Here I propose a new type of hydrograph separation based on correlations between tracer fluctuations in streamflow and in one or more end-members. This new approach to hydrograph separation does not have the same goal as conventional hydrograph separation. It does not estimate the contributions of end-members to streamflow for each time step (as in Eq. 3). Instead, it estimates the average end-member contributions to streamflow over an ensemble of time steps – hence its name, ensemble hydrograph separation. The ensemble of time steps may be chosen to reflect different catchment conditions and thus used to map out how those catchment conditions influence end-member contributions to streamflow.

I will first illustrate this approach with a simple example of a
time-varying mixing model. Let us assume that we have measured tracer
concentrations in streamflow, and in at least one contributing end-member,
over an ensemble of time intervals

The lagged concentration

The ensemble hydrograph separation approach is based on the observation that
Eq. (8) is almost equivalent to the conventional linear regression equation,

However, astute readers will notice an important difference between Eqs. (8) and
(9): in Eq. (9), the regression slope

The slope of the relationship between

Points with large leverage in the scatterplot (i.e., with

As expected for typical sampling and measurement errors, the error term

The least-squares solution of Eq. (9) can be expressed in several equivalent
ways. For consistency with the analysis that will be developed in Sect. 4
below, I will use the following formulation, which is mathematically
equivalent to those more commonly seen:

The uncertainty in

The meaning of the new water fraction

In all of these cases,

Periods without precipitation will inherently lack same-day (or same-week)
precipitation in streamflow. Thus we can calculate the average fraction of
new water in streamflow during all time steps, including those without
precipitation, as

The regression derived through Eqs. (4)–(9) gives each time interval

If, instead, one wants to estimate the new water fraction in all discharge
(during periods with and without precipitation), following the approach in
Sect. 2.4 one simply rescales this regression slope by the sum of discharge
during time steps with precipitation, divided by total discharge:

Because the volume-weighting will typically be uneven, the effective sample
size will typically be smaller than

One can also express the flux of new water as a fraction of precipitation
rather than discharge. Recently, von Freyberg et al. (2018)
have noted, in the context of conventional hydrograph separation, that
expressing event water as a proportion of precipitation rather than
discharge may lead to different insights into catchment storm response.
Analogously, within the ensemble hydrograph separation framework we can
estimate the new water fraction of precipitation, denoted

This yields a linear regression similar to Eq. (9), but with

The approaches represented by Eqs. (21) and (22)–(23) are not equivalent.
Equation (21) is based on the ad hoc assumption – which is verified by the
benchmark tests in Sect. 3.3–3.5 – that the average of

But although Eqs. (22), (25), and (26) are algebraically equivalent, their
statistical behavior is different when they are used as regression equations
to estimate the average value of

The precise interpretation of

Readers should keep in mind that although

The new water fraction of precipitation as estimated by Eq. (21) is a
time-weighted average, in which each day with precipitation counts equally.
One may also want to estimate the volume-weighted new water fraction of
precipitation, which we can denote as

Schematic
diagram of the benchmark model

To test the methods outlined in Sect. 2 above, I use synthetic data
generated by a simple two-box lumped-parameter catchment model. This model
is documented in greater detail in Kirchner (2016a) and will be
described only briefly here. As shown in Fig. 1a, drainage

The model operates at a daily time step, with the storage evolution of the
lower box calculated by a weighted combination of the partly implicit
trapezoidal method (for greater accuracy) and the fully implicit backward
Euler method (for guaranteed stability). Unlike in Kirchner (2016a), here the storage evolution of the upper box is calculated
by forward Euler integration at 50 sub-daily time steps of 0.02 days
(roughly 30 min) each. At this time step, forward Euler integration is
stable across the entire parameter ranges used in this paper and is more
accurate than daily time steps of trapezoidal or backward Euler integration
(which are still adequate for the lower box, where storage volumes change
more slowly). Following Kirchner (2016a), the model is driven with
three different real-world daily rainfall time series, representing a range
of climatic regimes: a humid maritime climate with frequent rainfall and
moderate seasonality (Plynlimon, Wales; Köppen climate zone Cfb), a
Mediterranean climate marked by wet winters and very dry summers (Smith
River, California, USA; Köppen climate zone Csb), and a humid temperate
climate with very little seasonal variation in average rainfall (Broad
River, Georgia, USA; Köppen climate zone Cfa). Synthetic daily
precipitation tracer (deuterium) concentrations are generated randomly from
a normal distribution with a standard deviation of 20 ‰ and a
lag-1 serial correlation of 0.5, superimposed on a seasonal cycle with an
amplitude of 10 ‰. The model is initialized at the equilibrium storage
levels

For the simulations shown here, the drainage exponents

I illustrate the behavior of the model using two particular parameter sets,
one that gives damped response to precipitation
(

The model also simulates the sampling process and its associated errors. I
assume that tracer concentrations cannot be measured when precipitation
rates are below a threshold of

Panels b–i of Fig. 1 show 2 years of simulated daily behavior driven by the Smith River daily precipitation record applied to the damped and flashy catchment parameter sets. The simulated stream discharge responds promptly to rainfall inputs, and unsurprisingly the discharge response is larger in the flashy catchment (Fig. 1b, c). The streamflow isotopic response is strongly damped in both catchments, with isotope ratios between events returning to a relatively stable baseline value composed mostly of discharge from the lower box (Fig. 1d, e). Like the stream discharge and the isotope tracer time series, the instantaneous new water fractions (determined by age tracking within the model) also exhibit complex nonstationary dynamics (Fig. 1f–i). Despite the complexity of the modeled time-series behavior, ensemble hydrograph separation (Eqs. 14, 18, 21, and 28) accurately predicts the averages of these new water fractions, both unweighted and time-weighted, as can be seen by comparing the dashed and solid lines (which sometimes overlap) in Fig. 1f–i.

It should be emphasized that the ensemble hydrograph separation and the benchmark model are completely independent of one another. The ensemble hydrograph separation does not know (or assume) anything about the internal workings of the benchmark model; it knows only the input and output water fluxes and their isotope signatures. This is crucial for it to work in the real world, where any particular assumptions about the processes driving runoff could potentially be violated. Likewise, the benchmark model is not designed to conform to the assumptions underlying the ensemble hydrograph separation method. It would be relatively trivial to model a tracer time series assuming that new water constituted a fixed fraction of discharge, and then demonstrate that this fraction can be retrieved from the tracer behavior. What Fig. 1 demonstrates is much less obvious, and more important: that even when the new water fraction is highly dynamic and nonstationary, an appropriate analysis of tracer behavior can accurately estimate its mean.

This result holds not just for the two parameter sets shown in Fig. 1, but
throughout the parameter ranges that are tested in the benchmark model. The
scatterplots shown in Fig. 2 show new water fractions estimated by ensemble
hydrograph separation, compared to the true average new water fractions
determined by age tracking in the benchmark model, for 1000 random parameter
sets spanning the parameter ranges described in Sect. 3.1. Figure 2 shows
that ensemble hydrograph separation yields reasonably accurate estimates of
average event new water fractions (Fig. 2a, b), new water fractions of
discharge (Fig. 2c) and precipitation (Fig. 2d), and volume-weighted new
water fractions (Fig. 2e, f). Estimates derived from single years of data
(Fig. 2b) understandably exhibit greater scatter than those derived from
5 years of data (Fig. 2a), but in all of the plots shown in Fig. 2 there
is no evidence of significant bias (the data clouds cluster around the

New water fractions predicted from tracer dynamics using ensemble
hydrograph separation, compared to averages of time-varying new water
fractions determined from age tracking in the benchmark model. Diagonal
lines show perfect agreement. Each scatterplot shows 1000 points, each of
which represents an individual catchment, with its own individual random set
of model parameters (i.e., catchment characteristics), randomly generated
precipitation tracer time series, and random set of measurement errors and
missing values (see Sect. 3.1). The daily precipitation amounts are the same
(Smith River time series; Mediterranean climate) in each case. The event
new water fraction

Mean transit times have often been estimated in the catchment hydrology literature, often under the assumption that they should also be correlated with other timescales of catchment transport and mixing as well. This naturally leads to the question, in the context of the present study, of whether there is a systematic relationship between mean transit times and new water fractions, such that they could potentially be predicted from one another. The benchmark model allows a direct test of this conjecture, because it tracks mean water ages as well as new water fractions. Figure 3a shows that, across the 1000 random parameter sets from Fig. 2, the relationship between new water fractions and mean transit times is a nearly perfect shotgun blast: mean transit times vary from about 40 to 400 days and new water fractions vary from nearly zero to nearly 0.1, with almost no correlation between them. Both of these quantities are estimated from age tracking in the benchmark model, so their lack of any systematic relationship does not arise from difficulties in estimating either of them from tracer data. It instead arises because the upper tails of transit time distributions (reflecting the amounts of streamflow with very old ages) exert strong influence on mean transit times, but have no effect on new water fractions (reflecting same-day streamflow).

I have recently proposed the “young water fraction”, the fraction of
streamflow younger than about 2.3 months, as a more robust metric of water
age than the mean transit time (Kirchner, 2016b). Figure 3b shows
that, like the mean transit time, the young water fraction is also a poor
predictor of the new water fraction, beyond the obvious constraint that new
water (

Average new water fractions (same-day precipitation in streamflow)
for the 1000 simulated catchments (i.e., 1000 model parameter sets) shown in
Fig. 2, compared to the catchment mean transit time and the young water
fraction

Many long-term water isotope time series have been sampled at weekly intervals. Can new water fractions be estimated reliably from such sparsely sampled records? To find out, I aggregated the benchmark model's daily time series to weekly intervals, volume-weighting the isotopic composition of precipitation to simulate the effects of weekly bulk precipitation sampling, and subsampling streamflow isotopes every seventh day to simulate weekly grab sampling. I then performed ensemble hydrograph separation on the aggregated weekly data, using the methods presented in Sect. 2.

Illustrative simulations of weekly water fluxes, deuterium
concentrations, and new water fractions. The benchmark model, precipitation
forcing, and parameter values are identical to those in Fig. 1. Although the
isotope tracer concentrations and new water fractions exhibit complex
nonstationary dynamics, ensemble hydrograph separation yields reasonable
estimates of the average backward and forward weekly new water
fractions, as shown in

New water fractions estimated from weekly tracer dynamics using
ensemble hydrograph separation, compared to averages of time-varying new
water fractions determined from age tracking in the benchmark model. Plots
are similar to those in Fig. 2, except here they are derived from simulated
weekly sampling of tracer concentrations in precipitation and streamflow.
Diagonal lines show perfect agreement. Each scatterplot shows 1000 points,
each representing an individual random set of parameters, a randomly
generated precipitation tracer time series, and a random set of measurement
errors and missing values (see Sect. 3.1). The daily precipitation amounts
are the same (Smith River time series) in each case. The event new water
fraction

Figure 4 shows the behavior of the benchmark model at weekly resolution for both the damped and flashy catchments. At the weekly timescale, the benchmark model exhibits complex nonstationary dynamics in discharge (panels a, b), water isotopes (panels c, d), and new water fractions (panels e, h). Nonetheless – and even though the weekly sampling timescale is much longer than the timescales of hydrologic response in the system – ensemble hydrograph separation yields reasonable estimates for the mean new water fractions of both precipitation and discharge (both unweighted and flow-weighted), as one can see by comparing the dashed and solid lines in Fig. 4e–h.

A comparison of Figs. 1 and 4 shows that the isotopic signature of
precipitation is less variable among the weekly samples than among the daily
samples, reflecting the fact that the weekly bulk samples of precipitation
will inherently average over the sub-weekly variability in daily rainfall.
By contrast, the weekly grab samples of streamflow lose all information
about what is happening on shorter timescales. The new water fractions
calculated from the weekly data are distinctly higher than those calculated
from the daily data, owing to the fact that the definition of new water
depends on the sampling frequency: the proportion of water

Figure 5 shows scatterplots comparing new water fractions estimated by ensemble hydrograph separation and those determined by age tracking in the benchmark model, analogous to Fig. 2 but for weekly instead of daily sampling. The weekly new water fractions are larger than the daily ones, for the reasons described above, and exhibit more scatter because they are based on fewer data points than their daily counterparts are. A small overestimation bias is visually evident in Fig. 2d and an even smaller underestimation bias is evident in Fig. 2c. These reservations notwithstanding, Fig. 5 shows that ensemble hydrograph separation can reliably predict new water fractions of both discharge and precipitation, with and without volume-weighting, based on weekly tracer samples.

Ensemble hydrograph separation does not require continuous data as input, so
it can be used to estimate

Variations in new water fractions across ranges of discharge.

Variations in new water fractions across ranges of precipitation.

If, instead, we split the time series shown in Fig. 1 into subsets
reflecting ranges of precipitation rates rather than discharge, we obtain
Fig. 7. Figure 7 is a counterpart to Fig. 6, but with

Three conclusions can be drawn from Figs. 6 and 7. First, in these model
catchments, new water fractions vary dramatically between low flows and high
flows, and between low and high precipitation rates, with the event new
water fraction

Thus the patterns describing how new water fractions change with
precipitation and discharge may be useful as signatures of catchment
transport behavior and can be estimated directly from tracer time series
using ensemble hydrograph separation. These observations raise the question
of whether any of these signatures of behavior, as inferred from the
patterns in these plots (if not the individual numerical values), might imply
something useful about the characteristics of the catchments themselves,
ideally in a way that is not substantially confounded by precipitation
climatology. A comprehensive answer is not possible within the scope of this
paper, since it focuses mostly on just two parameter sets and three
precipitation records. But as a first approach, one can try superimposing
the results in Figs. 6 and 7 on consistent axes (note that the axes in these
figures' various panels differ from one another in order to show the full
range of behavior). Doing so yields Fig. 8, which overlays the age-tracking
results from Figs. 6c–h and 7c–h in its left- and right-hand panels,
respectively. In Fig. 8, catchments with the damped and flashy parameter
sets are denoted by green and blue curves, respectively, with different
levels of brightness corresponding to the three different precipitation
climatologies. The key question is: are there patterns in

Effects of precipitation climatology and catchment properties on
discharge dependence and precipitation dependence of new water fractions. The lines
plotted here superimpose the model age-tracking results (solid lines) from
Figs. 6 and 7. Panels

The behavior summarized in Figs. 6–8 shows that, in general, new water fractions are functions of both catchment characteristics and precipitation climatology. Moreover, new water fractions will depend on the sequence of precipitation events, not just on their frequency distribution, because they will depend on antecedent wetness. Thus although the ensemble hydrograph separation approach does not require continuous data, and thus can be applied to time series with data gaps, any inferred new water fractions will obviously represent only the particular time intervals that are included in the analysis.

Seasonality in new water fractions under Mediterranean climate
precipitation forcing.

One implication of the forgoing considerations is that seasonal differences
in storm size and frequency should also be reflected in seasonal variations
in new water fractions. Figure 9a shows a scatterplot of tracer fluctuations
in streamflow and precipitation, color-coded by season, for the flashy
catchment simulation shown in Fig. 1. The regression lines (whose slopes
define the event new water fractions

Any analysis based on water isotopes must deal with the potential effects of isotopic fractionation due to evaporation (e.g., Laudon et al., 2002; Taylor et al., 2002; Sprenger et al., 2017; Benettin et al., 2018). A detailed treatment of evaporative fractionation would necessarily be site-specific and thus beyond the scope of this paper. Nonetheless, it is possible to make a simple first estimate of how much evaporative fractionation could affect new water fractions estimated from ensemble hydrograph separation. The benchmark model does not explicitly simulate evapotranspiration and its effects on the catchment mass balance, but the issue to be addressed here is different: how much could evaporative fractionation alter the isotope values measured in streamflow, and how could this affect the resulting estimates of new water fractions?

To explore this question, I first adjusted the isotope values of infiltration entering the model in Fig. 1 to mimic the effects of seasonally varying evaporative fractionation. I assumed that evaporative fractionation was a sinusoidal function of the time of year, ranging from zero in midwinter to 20 ‰ in midsummer. Thus I assumed that evaporative fractionation effectively doubled the seasonal isotopic cycle in the water entering the model catchment (but not in the sampled rainfall itself, since any fractionation that occurs before the rainfall is sampled will not distort the ensemble hydrograph separation). I then calculated new water fractions based on the time series of sampled precipitation tracer concentrations and of streamflow tracer concentrations (altered by the lagged and mixed effects of evaporative fractionation), and compared these to the true new water fractions calculated by age tracking within the model.

The results are shown in Fig. 10, which compares 1000 Monte Carlo trials with
evaporative fractionation (the blue dots) and another 1000 Monte Carlo trials
without evaporative fractionation (the gray dots). One can see that, in these
simulations, evaporative fractionation leads to a slight tendency to
underestimate new water fractions. Nonetheless, the blue and gray dots
largely overlap, and both generally follow the

Effects of seasonally varying evaporative fractionation on new
water fractions estimated by ensemble hydrograph separation. Points show new
water fractions predicted from tracer fluctuations in precipitation and
streamflow (on the vertical axis), compared to averages of time-varying new water
fractions determined by age tracking in the benchmark model (on the horizontal axis).
Blue points show 1000 model runs in which precipitation undergoes seasonally
varying evaporative fractionation ranging from zero in winter to 20 ‰
in summer. Gray background points show 1000 model runs without evaporative
fractionation (analogous to Fig. 2). Each model run has a different random
set of model parameters, measurement errors, and missing values, but the
precipitation driver (Smith River daily precipitation) is the same in all
cases. The blue data clouds closely follow the

A natural extension of the approach outlined in Sect. 2 would be to quantify the contributions of precipitation to streamflow over a range of lag times: to quantify, in other words, the catchment transit time distribution. In principle this should be straightforward, although in practice several challenges must be overcome. Below, I describe these issues and outline techniques for addressing them. Readers who are not interested in the methodological details can proceed directly from Sect. 4.1 to 4.5, skipping over Sect. 4.2–4.4.

I assume that catchment inputs and outputs are sampled at the same fixed
time interval

In practice, precipitation fluxes are typically measured as averages over
discrete time intervals, and tracer concentrations in precipitation are
likewise volume-averaged over discrete intervals (such as a day or a week)
during which the sample accumulates in the precipitation collector. By
contrast, discharge fluxes are typically measured instantaneously, and
discharge tracer concentrations are typically measured in instantaneous grab
samples. In most of what follows, I will assume that

I now outline the fundamentals of the ensemble hydrograph separation
approach to estimating transit time distributions. Conservation of water
mass requires that the discharge at time step

Because tracing contributions to streamflow from all previous time steps
would be impractical, it will be necessary to truncate the summation in Eq. (31)
at some maximum lag, which I will denote as

Dividing Eq. (33) by

Analogous to the approach in Sect. 2, here I account for the concentration of
older inputs

When appropriately rescaled as described in Sect. 4.5–4.7 below, the
coefficients

Using

Astute readers will immediately notice a fundamental problem with applying
Eqs. (40) or (41) in practice, namely that they require precipitation tracer
concentrations

So-called “missing data problems” arise frequently in the statistical
literature, and several approaches have been proposed for handling them (Little, 1992). One approach, termed “listwise deletion” or
“complete-case analysis”, involves discarding all cases (meaning all rows

A second class of approaches to the missing data problem involves imputing values to the missing data (Little, 1992). In our case, however, many of the missing data are not simply unmeasured, but cannot exist at all (because rainless days have no rainfall concentrations), so it is not obvious how to impute the missing values.

A third approach, termed “pairwise deletion” or “available-case analysis”,
first proposed by Glasser (1964), entails evaluating each of the
covariances in Eq. (40) using any cases for which the necessary pairs of
observations exist. Thus the covariances in Eq. (40) are replaced by

Glasser's approach can potentially handle the problem of tracer measurements
that are missing at random due to sampling or analysis failures. However, it
will not correctly handle the problem of tracer concentrations that are
missing due to a lack of sufficient precipitation, because it assumes that
the missing values occur randomly and therefore that Eqs. (42)–(43) are
unbiased estimators of the covariances that one would obtain if no samples
were missing. But when little or no precipitation falls on the catchment, it
delivers little or no tracer to subsequent streamflow, and thus its
contribution to the covariance between precipitation and streamflow
concentrations will be nearly zero. Therefore different handling is required
for precipitation tracer concentrations that are missing because they were
not measured, versus those that are missing because they never existed at
all (because no rain fell). As shown in Appendix B, periods without
precipitation must be taken into account with weighting factors on the
off-diagonal elements of the covariance matrix (because the tracer
covariances will be less strongly coupled to one another, the less
frequently precipitation falls). When the approach outlined in Appendix B is
combined with Glasser's method for estimating each of the covariances, the
end result is

In the calculations presented here, I have assumed a precipitation threshold
of 1 mm day

Gaps in the underlying data imply that, unlike covariance matrices in
conventional multiple regressions, the covariance matrix in Eq. (40) is not
guaranteed to be positive definite (and thus may not be invertible). Even
when the covariance matrix is invertible, it may be ill-conditioned, making
its inversion unstable. This issue arises frequently in inversion problems
whenever different combinations of lagged inputs will have nearly equivalent
effects on the output, making it difficult for the inversion to decide among
them (this is the multidimensional analogue to nearly dividing by zero in
Eq. 10). In minimizing the sum of squared deviations from the observations,
inversions like Eq. (40) can potentially yield wildly oscillating solutions,
with huge negative values of

A standard therapy for this disease is Tikhonov–Phillips regularization (Phillips, 1962; Tikhonov, 1963). This technique (also known by
many other names, including Tikhonov regularization, Tikhonov–Miller
regularization, and the Phillips–Twomey method) is commonly used to solve
ill-conditioned geophysical inversion problems (Zhadanov, 2015) but is
less widely known in hydrology. Whereas conventional least-squares inversion
finds the set of parameters

The form of

This approach, also called “ridge regression” because it adds a “ridge” of
extra weight along the diagonal of the covariance matrix, was Tikhonov's
original regularization criterion and is widely used in geophysical
inversions (including unit hydrograph estimation). In our case, however, it
would have the undesirable effect of creating a systematic underestimation
bias in our estimates of recent contributions to streamflow, by always
making the

A second possible criterion is consistency: minimize the variance of the

Like Eq. (47), this minimum-variance criterion is also widely used and has the
advantage that, unlike Eq. (47), it does not lead to systematic biases in the
average

A third possible criterion is smoothness: minimize the mean square
of the second derivatives of the

This criterion, first used by Phillips (1962), has the advantage of
strongly suppressing rapid oscillations in the

The solution to Eq. (46) will depend on the value of the parameter

As one can see from Eq. (50), when

The question remains as to what the most appropriate value of

in practice several values

In conventional multiple regression analysis, calculating the uncertainties
in the

It may seem that calculating Eq. (51) is impossible in our case, because values
of

In conventional multiple regression, the covariance matrix of the
coefficients

Benchmark data sets verify that Eqs. (53) and (54) perform as they should:
the root-mean-square averages of the calculated

There is one important caveat to this generalization, however: it holds only
if the assumptions underlying the regularization criterion are actually true. For example, if
the true

The coefficients

To estimate the average contribution

Transit time distributions of discharge estimated by ensemble
hydrograph separation based on both daily and weekly tracer sampling, versus
true transit time distributions determined by benchmark model age tracking
(light blue curves). Panels

These transit time distributions can be tested by comparing them to time-averaged streamwater age distributions calculated by age tracking in the benchmark model (Sect. 3.1). Figure 11 shows the results of several such tests, using both daily and weekly tracer data as input (left and right columns, respectively). The light blue curves indicate the true time-averaged transit time distribution (determined from age tracking in the benchmark model), the dark blue symbols show transit time distributions estimated from one tracer time series, and the gray data clouds show 200 more transit time distributions from the same model with different realizations of the random inputs. The weekly TTDs are larger, in absolute terms, than the daily TTDs, because streamflow will always contain at least as much water that originated as precipitation during the previous week as during the previous day (for the simple reason that the previous day is part of the previous week). Figure 11 shows that ensemble hydrograph separation correctly estimates the general shapes of the TTDs and their quantitative values. Furthermore, the gray data clouds show that no TTD estimates deviate too wildly from the age-tracking curves.

Transit time distributions (TTDs) of discharge estimated by
ensemble hydrograph separation based on daily sampling, compared to true
TTDs determined by benchmark model age tracking (light blue curves), for
four model parameter sets yielding diverse patterns of transport behavior.
Dark blue symbols show transit time distributions estimated from one time
series. Data clouds show ensemble hydrograph separation results (slightly
jittered on the horizontal axis) from 200 different realizations of random
precipitation tracer values, random missing data, and random measurement
errors. Vertical axis scales differ greatly. Ensemble hydrograph separation
correctly reveals the shapes of the TTDs and also quantifies their values
at most lags. However, panels

Real-world transit time distributions could potentially have different
shapes from those shown in Fig. 11. To test whether ensemble hydrograph
separation can correctly estimate transit time distributions with more
widely varying shapes, I explored the benchmark model's parameter space, in
some cases venturing beyond the nominal parameter ranges outlined in Sect. 3.1. As Fig. 12 illustrates,
widely differing time-averaged (or marginal)
transit time distributions generated by the benchmark model (solid lines)
are well approximated by the ensemble hydrograph separation estimates (blue
dots) calculated from the tracer time series. The standard errors are
overestimated for humped TTDs, which generate strongly autocorrelated time
series. The reason appears to be that when the benchmark model's parameters
generate a strongly autocorrelated tracer time series, the residuals will
also be strongly autocorrelated; thus the effective sample size

The transit time distributions defined in Eq. (55) are ensemble averages in
which each day counts equally; that is, for a given lag

For many purposes, it would be useful to estimate the temporal origins of an
average liter of discharge instead, that is, the
volume-weighted TTD, which we can denote

To estimate the volume-weighted TTD, we must average over all discharge
(including discharge after time steps with no precipitation). Thus the
coefficients

Volume-weighted transit time distributions (TTDs) of discharge
estimated by ensemble hydrograph separation (Eq. 60) compared to benchmark
model age tracking. Panels

In addition to the backward transit time distributions

Forward transit time distributions are less straightforward to estimate from
tracers than backward distributions are, for the simple reason that although
streamflow is a mixture of contributions from previous precipitation events,
the converse does not hold: that is, precipitation cannot be expressed as a
mixture of subsequent streamflows. Although it is algebraically
straightforward to rewrite Eq. (35) as either

Instead, by analogy to Eq. (21), we can estimate the forward transit time
distribution from the regression coefficients

It should be emphasized that

Forward transit time distributions (the fraction of precipitation
that leaves the catchment within one time step, two time steps, and so on)
estimated by ensemble hydrograph separation (Eq. 63) compared to benchmark
model age tracking. Panels

The volume-weighted forward transit time distribution

Because the benchmark model in Fig. 1 has no evaporative losses and thus

Like the new water fraction

For example, we can choose to subdivide the data set according to the
discharge time

Transit time distributions

In a Mediterranean climate (as depicted by, for example, the Smith River precipitation record shown in Fig. 1), one would intuitively expect rainy-season streamflow to have larger contributions from recent precipitation. Conversely, one would expect that dry-season streamflow will have much smaller contributions from recent rainfall (because there is so little of it, among other reasons). But how big are the differences between rainy-season and dry-season transit time distributions? As an illustration of what may be possible with real-world data, I took the 5-year daily and weekly time series for the benchmark model driven by the Mediterranean climate (Smith River) precipitation record, separated them into summer (dry) and winter (wet) seasons, and analyzed the two seasons separately. Figure 16 shows that, as expected, the contributions of recent precipitation to streamflow are much larger during the wet season than the dry season. But more importantly, Fig. 16 also shows that these differences can be accurately quantified, directly from data.

Backward and forward transit time distributions (

The examples above are based on subdividing the data set according to the
discharge time

One question that can be explored by subdividing the time series according to precipitation is whether larger rainfall events propagate faster through catchments. Intuition suggests that intense rainfall should lead to larger contributions to streamflow from faster flow paths. But how much larger? Figure 17 illustrates how this kind of question could potentially be explored. In Fig. 17, the forward transit time distributions of the highest 20 % of precipitation are compared to the average transit time distributions of all precipitation events, for the damped and flashy parameter sets and all three precipitation climatologies. One can see that large rain events are associated with much larger amounts of water reaching the stream quickly, but this effect largely disappears after about 2–3 days. Moreover, the magnitude and timing of this effect are nearly the same in the estimates derived from ensemble hydrograph separation and benchmark model age tracking, suggesting that they could also be reliably estimated from real-world data.

Forward transit time distributions

Antecedent wetness has been recognized as a controlling factor in catchment
storm response (e.g., Detty and McGuire, 2010; Merz et al., 2006; Penna
et al., 2011), but its effects on solute transport at the catchment scale
have rarely been quantified, outside of the context of calibrated simulation
models (e.g., van der Velde et al., 2012; Heidbüchel et al., 2012;
Harman, 2015; Rodriguez et al., 2018). To assess whether the antecedent
moisture dependence of solute transport might be measurable directly from
field data, I binned the benchmark model time series into ranges of
antecedent moisture (as measured by the upper-box storage values

To visualize how high antecedent moisture affects transit time
distributions, I isolated the discharge times

One can even question why one would expect a backward TTD to help in understanding
the effects of antecedent moisture at all, given that the backward TTD will
mostly reflect precipitation inputs that came before and, in some cases, created
the antecedent moisture conditions themselves. A forward TTD, on the other hand,
might help in quantifying how antecedent moisture affects the transmission
of future precipitation to streamflow. I therefore isolated the precipitation
times

Effects of antecedent moisture on new water fractions and transit
time distributions

Over 20 years ago, Rodhe et al. (1996) wrote that transit times, despite their importance to modeling discharge, were “impractical to determine experimentally except in rare manipulative experiments where catchment inputs can be adequately controlled.” Despite over two decades of effort, including increasingly elaborate theoretical discussions of transit time distributions, the problem identified by Rodhe et al. remains: how can we measure transit times, and transit time distributions, of real-world catchments under real-world conditions? And how can we verify whether the estimates we get are realistic ones? The theory and benchmark tests presented in Sects. 2–4 aim to provide a partial answer.

Particularly because their names are similar, it is important to recognize
how ensemble hydrograph separation contrasts with conventional hydrograph
separation. Although one could view Eq. (9) as an algebraic rearrangement
of the conventional hydrograph separation equation (Eq. 3), with both sides
multiplied by (

Conventional hydrograph separation estimates the time-varying new water fraction

Conventional hydrograph separation assumes that the end-member tracer signatures are constant, but ensemble hydrograph separation assumes them to be time-varying; indeed, it exploits their variability through time as its main source of information.

Conventional hydrograph separation requires that all end-members that contribute to streamflow must be identified, sampled, and measured. Ensemble hydrograph separation, by contrast, requires tracer measurements only from streamflow and any end-members whose contributions to streamflow are to be estimated. There is no need to assume that all end-members have been identified and measured, just that tracer fluctuations in any unmeasured end-members are not strongly correlated with those in measured end-members and in streamflow.

Conventional hydrograph separation requires that the end-members' tracer
concentrations are distinct from one another; otherwise the solution to Eq. (3) becomes unstable because the denominator is nearly zero. By contrast,
ensemble hydrograph separation estimates the new water fraction by
regression, and points where the new water and old water concentrations
overlap will have almost no leverage on the regression slope (they
correspond to points near zero on the

Conventional hydrograph separation is vulnerable to biases in tracer
measurements, such as could arise from isotopic evaporative fractionation.
By contrast, these same biases should have relatively little effect on
ensemble hydrograph separation (e.g., Sect. 3.6), because it is based on
regressions between tracer fluctuations, and regression slopes are
unaffected by constant offsets on either the

In the convolution approach, the functional form of the transit time
distribution must be assumed (although shape parameters often allow the
shape of the TTD to be fitted, within a given family of distributions). By
contrast, the ensemble hydrograph separation approach makes no assumption
about the shape of the distribution; instead, the TTD values at each lag

Ensemble hydrograph separation quantifies the transit time distribution out
to a maximum lag

Convolution approaches are based on convolution integrals, and thus errors in the input terms accumulate over time. By contrast, the ensemble hydrograph separation approach is based on local derivatives of the stream tracer concentrations and their covariances with fluctuations in the input tracer concentrations at various lags; as a result, errors in the input terms do not accumulate.

Missing input data pose a fundamental problem for convolution integrals, whereas they can be readily accommodated in the ensemble hydrograph separation approach (Sect. 4.2).

Another approach that is coming into more frequent use is to calibrate a conceptual or physically based model to reproduce, as closely as possible, the observed hydrograph and streamflow tracer time series, and then infer the catchment transit time distribution or SAS function from particle tracking within the model (e.g., Benettin et al., 2013, 2015; Remondi et al., 2018). For these inferences to be valid, the model must not only be a good predictor of the calibration data, but its underlying processes must also be the correct ones. In other words, the model must get the right answers for the right reasons, and it will generally be difficult to verify whether this is the case. Thus it will be difficult to know how much the inferred transit times are determined by the tracer data or by the structural assumptions of the underlying model. Nor does a good fit to the observational data verify the correctness of the model and the inferences drawn from it, because a good fit can imply either that the model is doing everything correctly or that it is doing multiple things wrong, in offsetting ways.

One can argue that every data analysis approach also implies some underlying model, and one might argue that ensemble hydrograph separation is based on the (implausible) assumption that the transit time distribution is time-invariant. Such an argument would be mistaken. As I have shown, ensemble hydrograph separation neither assumes nor requires that the transit time distribution is stationary (see Appendices A and B). Instead, ensemble hydrograph separation quantifies the ensemble average of a catchment's time-varying transit time distribution, even when that distribution is highly dynamic.

Considerable effort has been devoted to benchmark tests of the methods proposed in Sects. 2 and 4. One may naturally ask: why bother? Why not just describe how ensemble hydrograph separation works, and apply it to several field data sets, and see whether it gives reasonable results? One answer is that whether the results seem reasonable only reflects whether they agree with our preconceptions, not whether they (or our preconceptions) are correct. A second answer is that only through properly designed benchmark tests can we assess how accurate the method is, and what factors might affect its accuracy. Yet another answer is that the benchmark model gives the analysis method a precise target to hit, thus better revealing its strengths and weaknesses.

Benchmark tests also have a role to play in the day-to-day application of data analysis methods like those proposed here. Users may wonder: will this approach work with data from my catchment? Given the data I have, how accurately can I estimate the ensemble average transit time distribution? What kinds of tracer data will be needed to distinguish between two different conceptualizations of catchment-scale storage and transport? Carefully designed benchmark tests with synthetic data can be helpful in addressing questions such as these.

It should be emphasized that, in the tests presented here, the benchmark model knows nothing about how the analysis method works; in fact, its nonlinearity and nonstationarity rather badly violate the assumptions underlying the analysis. Conversely, the analysis method knows nothing about the inner workings of the benchmark model. It knows the model inputs and outputs (the water fluxes and tracer concentrations in rainfall and streamflow), but it does not know – and, importantly, it does not need to know – how those outputs were generated. This is important because, for ensemble hydrograph separation to be useful in real-world catchments, its validity must not depend on the particular mechanisms that regulate flow and transport at the catchment scale.

Likewise, its validity must not depend on having unrealistically accurate or complete data. For this reason, the benchmark tests include substantial measurement errors and substantial numbers of missing values (Sect. 3.1).

Thus these benchmark tests are much stricter than many in the literature. For example, some benchmark tests generate the benchmark data set using the same assumptions that underlie the analysis method itself (e.g., Klaus et al., 2015). Such tests usually generate very nice-looking results, but they are guaranteed to succeed because they are performing the same calculations twice (first forwards, then backwards). At the same time, such tests are not realistic, because they would only be relevant to real-world cases where all of the assumptions underlying the analysis method were exactly true. Such cases are unlikely to exist.

One could argue that the benchmark model presented here would be more realistic if it were (for example) a spatially distributed three-dimensional model based on Richards' equation, calibrated to a particular research watershed. However, the benchmark model's purpose is to generate a wide variety of targets for the analysis method to hit, with each target precisely defined, rather than to realistically mimic any particular catchment. All that is essential is that it must generate realistically complex patterns of behavior and exactly compute the true new water fractions and transit time distributions by age tracking. The relatively simple two-box conceptual model that has been used here was chosen because it fulfills both criteria, not because it embodies a particular mechanistic view of flow and transport. Likewise, consistency with the assumptions underlying ensemble hydrograph separation was not one of the criteria, nor should it be.

For the same reason, it should be clear that real-world catchments may not necessarily exhibit similar patterns of behavior to those of the benchmark model, as shown in Figs. 6–9 and 15–18. Thus the analyses presented here do not necessarily mean, for example, that we should expect new water fractions in real-world catchments to be roughly linear functions of discharge (Fig. 6), precipitation (Fig. 7), or antecedent moisture (Fig. 18). These patterns of behavior reflect the properties of the benchmark model and its precipitation forcing. Whether real-world catchments behave similarly or differently is an open question. The benchmark tests demonstrate that these analyses are reliable (which cannot be demonstrated with real-world data because we cannot know independently what the right answer is), but they should not be taken as examples of what the real-world results would necessarily look like.

The analysis methods outlined in Sects. 2 and 4 include explicit procedures for estimating the uncertainties (as quantified by standard errors) in both new water fractions (Eqs. 11, 15, and 20) and transit time distributions (Eqs. 54, 55, 60, 69, and 66). These uncertainties are generally realistic predictors of how much the ensemble hydrograph separation estimates deviate from the true benchmark values determined from age tracking: the scatter in Figs. 2 and 5, for example, is consistent with the estimated standard errors, and the error bars in Figs. 6, 7, 9, and 11–18 (1 standard error in all cases) are usually reasonable estimates of the deviations from the benchmark values (exceptions include the humped transit time distributions in Fig. 12, where the uncertainties are overestimated).

Unsurprisingly, the standard errors scale with the scatter (error variance) in the data and inversely with the square root of the effective number of degrees of freedom. Thus the uncertainties will be larger when the data set is sparse and noisy, and when the new water fraction and/or transit time distribution explains only a small fraction of its variance. It should also be noted that the relative standard error can be large, for example when the TTD is small at long lags.

Because ensemble hydrograph separation does not require continuous input data, it can facilitate comparisons among various subsets of a catchment time series, as demonstrated in Sects. 3.5 and 4.7. However, it should be kept in mind that, as one cuts the data set into more (and thus smaller) pieces, the statistical sampling variability among the data points remaining in each piece will become more and more influential, and the inferences drawn on each piece will become correspondingly more uncertain. Thus there will be practical limits to the granularity of the subsampling that can be applied in real-world cases.

One should also keep these considerations in mind when choosing

In some TTDs, the last few lags exhibit unusually large deviations from the
true TTD curves derived from age tracking (e.g., Figs. 12b, 13a, c, 14c,
16b, d, and 17b, d; in several of these cases the last point is below
zero and thus does not appear on the plot). As noted in Sect. 4.5, I suspect
(but cannot prove) that this is an aliasing effect that arises when the
effects of fluxes beyond the longest measured lag are not adequately
accounted for by the reference concentration

Because ensemble hydrograph separation is based on correlations among tracer fluctuations, it is relatively insensitive to systematic biases that produce persistent offsets in the underlying data. For example, isotope ratios in precipitation often vary with altitude, leading to potential biases in precipitation tracer samples (depending on the sampling location). To the extent that these biases are constant, however, they should not alter regression slopes between tracer fluctuations in precipitation and streamflow (e.g., Figs. A1d, 6a, b, and 7a, b), or their multidimensional counterparts that determine the TTD. The same applies to randomly fluctuating precipitation tracer biases, unless they are large compared to the standard deviation of the tracer concentrations themselves – i.e., unless the fluctuating biases account for most of the variability in the precipitation tracer measurements. Likewise, confounding by any unmeasured end-members should be small, unless the unmeasured end-members are correlated with the measured ones, and have a strong influence on stream tracer concentrations.

The uncertainties calculated here, like all error propagation results,
depend on the assumptions underlying the analysis (in this case, ensemble
hydrograph separation). Under different assumptions, the errors in
estimating the average

The techniques proposed here quantify the timescales over which catchments store and transport water, and quantify how those timescales change with precipitation, discharge, and antecedent moisture. Such descriptive methods are often grouped under the heading of “catchment characterization”. One should keep in mind, however, that a catchment's storage and transport behavior also depends on its external forcing. If its precipitation climatology were wetter (or drier), for example, its timescales of storage and transport would decrease (or increase) accordingly. Thus transport and storage timescales are not characteristics of the catchment alone, but rather of the catchment and its particular precipitation climatology. By mapping out how a catchment's storage and transport behavior changes with hydrologic forcing (e.g., Figs. 6, 7, 15, 17, and 18), however, ensemble hydrograph separation can contribute to a more complete picture of catchment response. Alternatively, these patterns of response to hydrologic forcing can be considered as catchment characteristics in their own right.

Because new water fractions and transit time distributions from ensemble hydrograph separation closely match benchmark model age tracking, one might consider using them as a model for catchment transport processes. This will usually be a bad idea. One must remember that ensemble hydrograph separation quantifies ensemble averages, which will not be good models of catchment processes unless the real-world transit time distribution is approximately time-invariant. That is unlikely to be the case.

This observation raises an important point. Ensemble hydrograph separation
yields inferences that are phenomenological, not mechanistic. It quantifies how catchments behave,
but does not, by itself, explain how they work. It can nonetheless contribute to
mechanistic understanding by precisely quantifying catchment transport
behavior, and thus facilitating more incisive comparisons with models.
Examples of possible comparisons include

Do the model and the real-world catchment have similar new water fractions and forward new water fractions (Figs. 2 and 5)?

Do these new water fractions change similarly as functions of precipitation and discharge (Figs. 6 and 7)?

Do they exhibit similar seasonal patterns (Fig. 9)?

Do the model and the real-world catchment have similar transit time distributions, including forward transit time distributions (Figs. 11–14)?

Do these transit time distributions change similarly as functions of precipitation, discharge, antecedent moisture, and seasonality (Figs. 15–18)?

The analysis methods developed here can potentially be extended in several ways. For example, these methods could potentially be applied to infer transit times in other catchment fluxes, such as groundwater seepage or evapotranspiration. They could also be applied to other systems where transit times could be inferred from the propagation of fluctuating tracer inputs; potential examples include not only lakes, oceans, and aquifers, but also the atmosphere and perhaps even organisms.

The multiple regression analysis presented in Sect. 4 demonstrates that one
can quantify the contributions of multiple end-members using a single
conservative tracer. This is not possible in conventional end-member mixing
analysis, which assumes that the end-members are constant and consequently
requires that the number of end-members cannot exceed the number of tracers
plus one. But because ensemble hydrograph separation is based on
correlations of tracer fluctuations, one tracer can potentially identify
many end-members as long as their fluctuations are not too tightly
correlated. This is potentially useful, because hydrologists typically have
very few truly conservative tracers to work with (arguably only one, in the
case of stable isotopes, because

Last but not least, the approach presented here can also, with some modifications, be applied to rainfall and streamflow rates in order to quantify the time lags in catchments' hydraulic response to precipitation (reflecting the celerity of hydraulic potentials, as distinct from the velocity of water transport). A follow-up paper describing this “ensemble unit hydrograph” analysis is currently in preparation.

The analysis codes and benchmark model used here will be published separately in more
user-friendly form. The Plynlimon rain gauge data were provided by the Centre
for Ecology and Hydrology (UK), and the Smith River and Broad River
precipitation data are reanalysis products from the MOPEX (Model Parameter
Estimation Experiment) project (Duan et al., 2006;

A conventional linear regression equation has the form

In many practical situations, the unknown constant

Thus, in environmental work, regression equations are often used to estimate “constants” that are not known to be constant, or, even more pointedly, “constants” that we know are not constant. Regression equations are nonetheless used, under the assumption that the result will provide a useful estimate of some central tendency of the non-constant “constant”. The basis for this assumption and its range of validity are unclear.

The problem at hand can be stated like this: if the unknown coefficient

The single-underlined terms in Eq. (A5) cancel each other, and the
double-underlined terms are zero because primed quantities will always
average to zero (although products of two or more primed quantities usually
will not). Removing all underlined terms, multiplying by

The double-underlined term in the numerator of Eq. (A6) is zero, because the
inner average is a constant and therefore just rescales

Equation (A7) cannot be evaluated in practice, because the true coefficients

The second term says that the linear regression coefficient

The third term can be viewed as a weighted average of the deviations of the

The fourth term says that

It should be noted that the means, variances, and covariances in Eqs. (A2)–(A7)
are sample statistics calculated over the sample cases

To illustrate the analysis outlined above, I conducted a simple numerical
experiment based on ensemble hydrograph separation. I created a synthetic
data set based on the mixing equation

Values of

In the simulations shown in Fig. A1,

Benchmark test of regression estimates of mean new water
fractions, using data from a simple two-component mixing model. In that
mixing model (Eq. A8), a randomly varying new water fraction

Assume a
multiple linear regression equation with non-constant unknown coefficients,

For simplicity, and without loss of generality, assume that the

Multiplying the left and right sides of Eq. (B3) by the transpose of

Dividing through by the

One can see that Eq. (B7) is identical in form to Eq. (40), with the
addition of weighting factors on the off-diagonal elements of the covariance
matrix. One consequence of these leading terms is that the weighted
covariance matrix will usually not be completely symmetrical, because (for
example)

It bears emphasis that Eq. (B7) accounts for gaps in precipitation, but not for precipitation or streamflow samples that are missing due to sampling and measurement failures. A gap in precipitation means that the corresponding tracer values never existed at all and had no effect on streamflow, whereas tracer values that are missing due to sampling and measurement failures actually did affect streamflow, but are unknown. Equation (B7) accounts for the fact that the tracer covariances will necessarily be less strongly coupled to one another, the less frequently precipitation falls. Glasser's method, by contrast, estimates the covariances themselves from all available pairs of observations, but says nothing about how they are related to one another. Therefore we can account for both kinds of missing data using Eq. (B7), with the covariances between pairs of variables estimated using Glasser's method (Eqs. 42–43). That approach results in Eq. (44).

Astute readers may notice that Eq. (B3) is equivalent to the normal
equations of conventional multiple regression, with the cases of missing
precipitation replaced by

Normalize

Replace any values that are missing due to lack of precipitation with zeroes.

Solve for the

Multiply the standard errors of the

There remains one last important detail. In transitioning from Eq. (B2) to (B3), I made the simplifying assumption that all of the coefficients

If we have a variable

This work was motivated by discussions with Chris Soulsby and Doerthe Tetzlaff during long walks in the Scottish countryside. I also thank Jana von Freyberg, Andrea Rücker, Julia Knapp, Wouter Berghuijs, Paolo Benettin, and Greg Quenell for helpful discussions; Riccardo Rigon, Nicolas Rodriguez, and an anonymous reviewer for their comments; and Melissa Heyer for proofreading assistance. Edited by: Thom Bogaard Reviewed by: Riccardo Rigon and one anonymous referee