Ensemble hydrograph separation has recently been proposed as a technique for using passive tracers to estimate catchment transit time distributions and new water fractions, introducing a powerful new tool for quantifying catchment behavior. However, the technical details of the necessary calculations may not be straightforward for many users to implement. We have therefore developed scripts that perform these calculations on two widely used platforms (MATLAB and R), to make these methods more accessible to the community. These scripts implement robust estimation techniques by default, making their results highly resistant to outliers. Here we briefly describe how these scripts work and offer advice on their use. We illustrate their potential and limitations using synthetic benchmark data.

What fraction of streamflow is composed of recent precipitation? Conversely, what fraction of precipitation becomes streamflow promptly? What is the age distribution of streamwater? What is the “life expectancy” of precipitation as it enters a catchment? And how do all of these quantities vary with catchment wetness, precipitation intensity, and landscape characteristics? Questions like these are fundamental to understanding the hydrological functioning of landscapes and characterizing catchment behavior. Ensemble hydrograph separation (EHS) has recently been proposed as a new tool for quantifying catchment transit times, using time series of passive tracers like stable water isotopes or chloride. Benchmark tests using synthetic data have shown that this method should yield quantitatively accurate answers to the questions posed above (Kirchner, 2019), and initial applications to real-world data sets (e.g., Knapp et al., 2019) have demonstrated the potential of this technique.

However, it has become clear over the past year that the equations of Kirchner (2019, hereafter denoted K2019) may be difficult for many users to implement in practically workable calculation procedures or computer codes. It has also become clear that robust estimation methods would be a valuable addition to the ensemble hydrograph separation toolkit, given the likelihood of outliers in typical environmental data sets. The present contribution is intended to fill both of these needs, by presenting user-friendly scripts that perform EHS calculations in either MATLAB or R and that implement robust estimation by default.

Here we demonstrate these scripts using synthetic data generated by the benchmark model of K2019, which in turn was adapted from the benchmark model of Kirchner (2016). We use these benchmark data instead of real-world observations, because age-tracking in the model tells us what the correct answers are, so that we can verify how accurately these EHS scripts infer water ages from the synthetic tracer time series. The benchmark model consists of two nonlinear boxes coupled in series, with a fraction of the discharge from the upper box being routed directly to streamflow, and the rest being routed to the lower box, which in turn discharges to streamflow (for further details, see Kirchner, 2016, and K2019). It should be emphasized that the benchmark model and the ensemble hydrograph separation scripts are completely independent of one another. The benchmark model is not based on the assumptions that underlie the ensemble hydrograph separation method. Likewise, the EHS scripts do not know anything about the internal workings of the benchmark model; they only know the input and output water fluxes and their isotope signatures. Thus the analyses presented here are realistic analogues to the real-world problem of trying to infer the internal functioning of catchments from only their inputs and outputs of water and tracers.

Benchmark model daily water fluxes

Figure 1a and b show the simulated daily water fluxes and isotope ratios used in most of the analyses below. The precipitation fluxes are averages over the previous day (to mimic the effects of daily time-integrated precipitation sampling), and the streamflow values are instantaneous values at the end of each day (to mimic the effects of daily grab sampling). We also aggregated these daily values to simulate weekly sampling, using weekly volume-weighted average tracer concentrations in precipitation and weekly spot values in streamflow (representing grab samples taken at the end of each week). Five percent of the simulated tracer time series were randomly deleted to mimic sampling and measurement failures, and a small amount of random noise was added to mimic measurement errors.

To illustrate the need for robust estimation techniques, and to demonstrate the effectiveness of the robust estimation methods employed in our scripts, we also randomly corrupted the synthetic isotope data with outliers (Fig. 1c). These outliers are intentionally large; for comparison, the entire range of the outlier-free data shown in Fig. 1b lies between the two dashed lines in Fig. 1c. The outliers are also strongly biased (they all deviate downward from the true values), making them harder to detect and eliminate. We make no claim that the size of these outliers and their frequency in the data set reflect outlier prevalence and magnitude in the real world (which would be difficult to estimate in practice, without replicate sampling or other independent reference data). Instead, these outliers were simply chosen to be large enough, and frequent enough, that they will substantially distort the results of non-robust analyses. They thus provide a useful test for the robust estimation methods described below.

The simplest form of ensemble hydrograph separation seeks to estimate the
fraction of streamflow that is composed of recent precipitation.
Conventional hydrograph separation uses end-member mixing to estimate the
time-varying contributions of “event water” and “pre-event water” to
streamflow. By contrast, ensemble hydrograph separation seeks to estimate
the

Regression relationship (Eq. 1) used to estimate the
event new water fraction

Although ensemble hydrograph separation is rooted in assumptions that are
similar to end-member mixing, mathematically speaking it is based on
correlations between tracer fluctuations rather than on tracer mass
balances. As a result, it does not require that the end-member signatures
are constant over time or that all the end-members are sampled or even
known, and it is relatively unaffected by evaporative isotopic fractionation
or other biases in the underlying data (see Sect. 3.6 of K2019). Even when
new water fractions are highly variable over time, one can show
mathematically (and confirm with benchmark tests) that ensemble hydrograph
separation will accurately estimate their average (see Sect. 2 and Appendix A of K2019). As Fig. 2a shows, higher discharges (indicating wetter
catchment conditions) may be associated with larger new water fractions and
thus stronger coupling between tracer fluctuations in precipitation and
streamflow. Nonetheless, the regression slope in Fig. 2a averages over these
variations, yielding an event new water fraction (0.164

The lagged streamflow tracer concentration

As explained in Sect. 2 of K2019, there are three main types of new water
fractions. First, as noted above, the event new water fraction

In our scripts, new water fractions are calculated by the function
EHS_Fnew. Users supply EHS_Fnew with vectors of evenly spaced data for the water fluxes

The linear regression in Eq. (1), like any least-squares technique, is potentially vulnerable to outliers. Because potential outliers are often present in environmental data, practical applications of ensemble hydrograph separation would benefit from a robust method for estimating new water fractions. Such a method should not only be insensitive to outliers; ideally it should also be statistically efficient (i.e., it should yield reasonable estimates from small samples), and it should be asymptotically unbiased (i.e., it should converge to the conventional regression results when outliers are absent, with a bias near zero for large samples).

Figure 2 shows ensemble hydrograph separation plots of the outlier-free
benchmark data (Fig. 2a, estimated from the time series shown in Fig. 1b)
and the outlier-corrupted benchmark data (Fig. 2b, estimated from the time
series shown in Fig. 1c). On these axes – precipitation and streamflow
tracer fluctuations on the

By contrast, outliers substantially distort the ensemble hydrograph
separation plot in Fig. 2b; they extend well beyond the range of the
outlier-free data indicated by the gray rectangle and inflate the estimate
of

Many robust estimation methods will not be effective against outliers like
those shown in Fig. 2b, which create points that have great leverage on the
slope of the fitted line. This leverage can allow the outliers to pull the
line close enough to themselves that they will not be readily detected as
outliers. To address this problem, our robust estimation procedure has two
parts. The first step is to identify extreme values of both precipitation
and streamflow tracer concentrations at the outset and exclude them by
setting them to NA (thus treating them as missing values). This will
effectively prevent outliers from exerting strong leverage on the solution.
Because the exclusion criterion must itself be insensitive to outliers, we
define extreme values as those that lie farther from the median than 6
times

As a second step, we use iteratively reweighted least squares
(IRLS: Holland and Welsch, 1977) to estimate the regression slope
and thus the event new water fraction

Visual comparisons of the different discharge ranges shown by different
colors in Fig. 2a indicate that in these benchmark data, higher discharges
are associated with stronger coupling between tracer concentrations in
precipitation and streamflow, implying that streamflow contains a larger
fraction of recent precipitation. This observation implies that by
estimating

Although this can be achieved by applying a series of point filter vectors
to isolate each ensemble, here we provide a function,

Profiles illustrating how new water fractions of
discharge change with discharge regime, estimated using robust and
non-robust methods (dark and light blue symbols, respectively; error bars
indicate 1 standard error) applied to synthetic benchmark tracer data
without different percentages of outliers. In profiles generated from
outlier-free data

Profiles illustrating how “forward” new water fractions
(new water fractions of precipitation, i.e., fractions of precipitation
leaving as streamflow during the current sampling interval) change with
precipitation regime, estimated using robust and non-robust methods (dark
and light blue symbols, respectively; error bars indicate 1 standard
error) applied to synthetic benchmark tracer data without outliers

Figures 3 and 4 show example profiles created by EHS_profile
from the benchmark model time series, with and without outliers. The gray
lines in Fig. 3 show how new water fractions (the fractions of streamflow
that entered the catchment as precipitation during the same sampling
interval, as determined by age tracking in the benchmark model) vary as a
function of discharge rates. The gray lines in Fig. 4 show the similar age
tracking results for “forward” new water fractions (the fractions of
precipitation that leave as streamflow during the same sampling interval),
as a function of precipitation rates. These age tracking results are
compared to profiles of the new water fraction

Astute readers will note that the robust estimates of new water fractions almost exactly match the benchmark age tracking data in the profiles shown here, whereas they underestimated the same age tracking data by roughly 25 % in Sect. 2.1 above, where the data were not separated into distinct ranges of discharge or precipitation rates. The difference between these two cases is illuminating. Individual discharge ranges exhibit well-defined relationships between tracer fluctuations in precipitation and streamflow; that is, the individual colored discharge ranges in Fig. 2a show roughly linear scatterplots with well-constrained slopes. Thus, for these individual discharge ranges, the robust estimates agree with the benchmark “true” values (and the non-robust estimates do too, if the underlying data are free of outliers). However, when these different discharge ranges are superimposed, the robust estimation procedure down-weights the high-discharge points because they follow a different trend from the rest of the data, resulting in an underestimate of the new water fraction averaged over all discharges. Thus users should be aware that our robust estimation procedure (like any such procedure) can be confounded by data in which some points exhibit different behavior than the rest and are therefore excluded or down-weighted as potential anomalies.

One can estimate catchment transit time distributions (TTDs) from tracer time
series by extending Eq. (1) to a multiple regression over a series of lag
intervals

The function

The function

In

Transit time distributions of discharge (

This robust estimation procedure yields transit time distributions that are
highly resistant to outliers (Fig. 5). The gray lines in Fig. 5 show the
true transit time distributions of discharge (

The benchmark tests shown in Figs. 2–5 above, like most of those presented
in K2019, are based on a benchmark model simulation that yields “L-shaped”
TTDs, that is, those in which the peak occurs at the shortest lag. In this
section we explore several phenomena associated with the analysis of
distributions that are “humped”, that is, those that peak at an intermediate
lag. Where tracer data are sufficient to constrain the shapes of
catchment-scale TTDs, they suggest that humped distributions are rare
(Godsey et al., 2010). They are also not
expected on theoretical grounds, since precipitation falling close to the
channel should reach it quickly and with little dispersion, leading to TTDs
that peak at very short lags (Kirchner et al., 2001; Kirchner
and Neal, 2013). Nonetheless, humped distributions could potentially arise
in particular catchment geometries (Kirchner et al., 2001) or in
circumstances where tracers are introduced far from the channel but not
close to it. Thus we have re-run the benchmark model with parameters that
generate humped TTDs (

Humped transit time distributions (

Figure 6 shows both forward and backward humped transit time distributions,
as estimated by EHS_TTD from the benchmark model daily tracer
time series, with their standard errors. (Here, as in the other analyses
presented in this note,

Comparison of observed and fitted streamflow tracer time
series (gray dots and dark blue line, respectively, shown relative to their
lagged reference values as in the left hand side of Eq. 2) and fitting
residuals (dark blue dots), for the nonstationary benchmark model with a
humped time-averaged TTD

Thus it appears that the TTD error estimates are generally conservative
(i.e., they overestimate the true error), but with humped distributions the
uncertainties are greatly exaggerated. Numerical experiments (Fig. 7) reveal
that this problem arises from the nonstationarity of the transit times in
the benchmark model (and, one may presume, in real-world catchment data as
well). K2019 (Sect. 4 and Appendix B) showed that ensemble hydrograph
separation correctly estimates the average of the benchmark model's
nonstationary (i.e., time-varying) TTD, as one can also see in Figs. 6 and
8. When this (stationary) average TTD is used to predict streamflow tracer
concentrations (which is necessary to estimate the error variance and thus
the standard errors), however, it generates nearly the correct patterns of
values but not with exactly the right amplitudes or at exactly the right
times (see Fig. 7a). This is the natural consequence of estimating a
nonstationary process with a stationary (i.e., time-invariant) statistical
model. As a result, the residuals are larger, with much stronger serial
correlations, than they would be if the underlying process were stationary
(compare Fig. 7a and b), resulting in much larger calculated standard
errors of the TTD coefficients. These tendencies are even stronger for
humped TTDs, which introduce stronger serial correlations in the multiple
regression fits that are used to estimate the TTD itself. Serial
correlations in the residuals reduce the effective number of degrees of
freedom by a factor of approximately
(

Since the exaggerated standard errors in Fig. 6 arise primarily from the nonstationarity in the benchmark model's transit times, one might intuitively suspect that this problem could be at least partly resolved by dividing the time series into separate subsets (representing, for example, wet conditions with shorter transit times and dry conditions with longer transit times) and then estimating TTDs for each subset separately using the methods described in Sect. 4.4 below. Benchmark tests of this approach were unsuccessful, however. This approach might theoretically work, if the “wet” and “dry” states persisted for long enough that tracers would both enter and leave the catchment while it was either “wet” or “dry”. Under more realistic conditions, however, many different precipitation events and many changes in catchment conditions will be overprinted on each other between the time that tracers enter in precipitation and leave in streamflow, making this approach infeasible.

Transit time distributions (

A somewhat counterintuitive approach that shows more promise is to use
lower-frequency tracer data to estimate humped TTDs. Figure 8 shows that if
the same TTDs as those shown in Fig. 6 are estimated from weekly data rather
than daily data, the standard errors more accurately approximate the
mismatch between the TTD estimates and the true values (i.e., the difference
between the blue dots and the gray curves). Weekly sampling yields much more
reasonable standard errors in this case, because the multiple regression
residuals are much less serially correlated (see Fig. 7c;

Profiles of new water fractions (

Figure 9 shows profiles of new water fractions (

In principle the distortions arising from the correlations in the
precipitation tracer data could potentially be alleviated by calculating
TTDs for individual precipitation and discharge ranges using the methods
outlined in Sect. 4.4 below and then estimating

Instead, benchmark tests suggest that a practical cure for the biases shown
in Fig. 9 may be, counterintuitively, to estimate profiles of

Profiles of new water fractions (

Transit time distributions are typically constructed from the entire
available tracer time series for a catchment, as in Figs. 5, 6, and 8. Such
TTDs can be considered as averages of catchments' nonstationary transport
behavior, as shown in Sect. 4.2 above. However, ensemble hydrograph
separation can also be used to calculate TTDs for filtered subsets of the
full catchment time series, focusing on either discharge or precipitation
time steps that highlight particular conditions of interest. (In Appendix B
we describe the new procedure that

For example, we can map out the nonlinearities that give rise to catchments'
nonstationary behavior, by comparing TTDs from subsets of the original time
series that represent different catchment conditions (Fig. 11). Larger
precipitation events in our benchmark model result in forward transit time
distributions with peaks that are higher, earlier, and narrower (Fig. 11a).
A similar progression in peak height, timing, and width is observed in
forward TTDs (Fig. 11b) obtained from the benchmark tracer time series by
setting the point filter

Non-stationary transit time distributions of
precipitation

The ensemble hydrograph separation TTDs do not perfectly match the age tracking results shown by the dotted gray lines in Fig. 11b and d, particularly for the smallest fractions of the precipitation and discharge distributions, where fewer data points are available. Nonetheless, although the TTDs differ in detail from the age tracking results, they exhibit very similar progressions in peak height and shape, reflecting the nonlinearity in the benchmark model storages, which have shorter effective storage times at higher storage levels and discharges. Although the particular results shown in Fig. 11 are generated by a synthetic benchmark model, they illustrate how similar analyses could be used to infer nonlinear transport processes from real-world catchment data. Comparing TTDs representing different levels of antecedent catchment wetness, for example, could potentially be used to determine how much more precipitation bypasses catchment storage during wet conditions. Similarly, TTDs representing different levels of subsequent precipitation (over the following day or week, for example) could potentially be used to determine how effectively such precipitation mobilizes previously stored water. Thus Fig. 11 illustrates how TTDs from carefully selected subsets of catchment tracer time series can be used as fingerprints of catchment response and as a basis for inferring the mechanisms underlying catchment behavior.

Prospective users of ensemble hydrograph separation may naturally wonder
what sample sizes and sampling frequencies are needed to estimate new water
fractions and transit time distributions. The answers will depend on many
different factors, including the timescales of interest to the user, the
desired precision of the

L-shaped

Beyond these generalizations, it is difficult to offer concrete advice. We
can, however, report our recent experience applying ensemble hydrograph
separation to weekly and 7-hourly isotope time series at Plynlimon, Wales
(Knapp et al., 2019). We were generally able to estimate TTDs out to lags of
about 3 months based on 4 years of weekly sampling. The same 4
years of weekly samples yielded about 100 precipitation–discharge sample
pairs (after samples corresponding to below-threshold precipitation were
removed), which were sufficient to estimate weekly event new water fractions
with an uncertainty of about 1 % (e.g.,

Another obvious question for users is the number of lags over which the TTD
should be estimated. Here, too, there is no fixed rule; the answer will
depend on the timescales of interest, the length of the available tracer
time series, and the shape of the TTD itself (which of course will not be
known in advance). An empirical approach is to compare the results for
several different maximum lags

A further observation from Fig. 12 is that TTD estimates from weekly tracer data may be at least as accurate as, if not more so than, those calculated from daily tracer data. This may seem surprising, particularly because the time series underlying Fig. 12 are all 5 years long; thus the daily time series contain 7 times as many individual tracer measurements as the weekly time series. Nonetheless, for several reasons it is not surprising that in this case one could obtain more stable estimates from fewer data points. First of all, in these numerical experiments the precipitation tracer concentrations are serially correlated (as they also often are in the real world); thus there is more redundancy among the daily tracer inputs than among the weekly tracer inputs. Secondly, the precipitation volumes are less variable (in percentage terms) from week to week than they are from day to day, meaning that the weekly calculations use fewer input concentrations that are accompanied by very small water volumes (and that therefore could not have much influence on the real-world catchment). And thirdly, lower sampling frequencies entail TTDs with coefficients at more widely spaced lags, which are thus less redundant with one another and thus can be individually constrained better. Of course with lower-frequency sampling one loses the short-lag tail of the TTD, which may be of particular interest. But in cases where this information is not crucial – or where only lower-frequency data are available – it appears that TTDs can be reliably estimated from samples taken weekly and perhaps from samples taken even less frequently.

In this short contribution, we have presented scripts that implement the ensemble hydrograph separation approach. We have also illustrated some of its quirks and limitations using synthetic data. These issues have been revealed through benchmark tests that are substantially stricter than many in the literature. One should not assume that other methods have fewer quirks and limitations, unless those methods have been tested with equal rigor.

For example, many benchmark data sets are generated using the same assumptions that underlie the analysis methods that they are used to test. Although the results of such tests often look nice, they are unrealistic because those idealized assumptions are unlikely to hold in real-world cases. For example, the TTD methods presented here would work very well if they were tested against benchmark data generated from a stationary TTD (see Fig. 7b), but this is hardly surprising since the regression in Eq. (2) assumes stationarity. However, such a test is far removed from the real world, in which tracer data typically come from nonstationary catchment systems. Tests with nonstationary benchmarks yield results that are less (artificially) pleasing but more realistic (e.g., Fig. 7a). These tests also demonstrate an important point, by showing how well the TTD method estimates the average of the time-varying TTDs that are likely to arise in real-world cases (see also Sect. 4 and Appendix B of K2019).

Although these scripts have been tested against several widely differing benchmark data sets (both here and in K2019), we encourage users to test them with their own benchmark data to verify that they are behaving as expected. As the examples presented here show, ensemble hydrograph separation can potentially be applied not only to the high-frequency tracer data sets that are now becoming available, but also to longer-term, lower-frequency tracer data that have been collected through many environmental monitoring programs. We hope that these scripts facilitate new and interesting explorations of the transport behavior of many different catchment systems.

Ensemble hydrograph separation estimates transit time distributions by a
multiple regression of streamflow tracer fluctuations against current and
previous precipitation tracer fluctuations (Eq. 2, which is the counterpart
to Eq. (1) over multiple lag intervals

Equation (2) in the main text has the form of a multiple linear regression
equation,

The conventional least-squares solution to such a multiple regression is
usually expressed in matrix form as

Equation (A4) cannot be applied straightforwardly to real-world catchment
data, because it cannot be solved when values of

Practical experience since the publication of K2019 has revealed at least
three important limitations in the approach outlined above (and detailed in
Sect. 4.2 and Appendix B of K2019). First, although this approach can work
well if values of

Here, rather than removing the missing values and using Glasser's (1964) error
variance formula, instead we fill in the missing values and calculate the
residuals directly by inverting Eq. (A1), thus facilitating both robust
estimation using IRLS and direct calculation of the error variance for
purposes of uncertainty estimation. The key to this approach is that we
subtract the means from

Broadly speaking, the solution proceeds similarly to Sect. 4.2–4.4 of K2019,
with several important differences. One is that the covariance matrix now
requires different prefactors than the

A sketch of the solution procedure is as follows. First we identify and
remove outliers in the precipitation and streamflow tracer concentrations

If

We then calculate the (weighted) covariances as

To estimate the uncertainties in the regression coefficients

The system of equations that is used to estimate transit time distributions
in ensemble hydrograph separation (Eq. A1) can be represented as a matrix
equation of the form

Benchmark tests verify that the approach outlined in Eq. (B5) yields much
more accurate estimates of

The R and MATLAB scripts and benchmark data sets are available at

JWK wrote the R scripts and JLAK translated them into MATLAB. JWK conducted the benchmark tests, drew the figures, and wrote the first draft of the manuscript. Both authors discussed the results and revised the manuscript.

The authors declare that they have no conflict of interest.

This paper was edited by Josie Geris and reviewed by two anonymous referees.