Kirchner (2016a) demonstrated that aggregation errors due to spatial heterogeneity, represented by two homogeneous subcatchments, could cause severe underestimation of the mean transit times (MTTs) of water travelling through catchments when simple lumped parameter models were applied to interpret seasonal tracer cycle data. Here we examine the effects of such errors on the MTTs and young water fractions estimated using tritium concentrations in two-part hydrological systems. We find that MTTs derived from tritium concentrations in streamflow are just as susceptible to aggregation bias as those from seasonal tracer cycles. Likewise, groundwater wells or springs fed by two or more water sources with different MTTs will also have aggregation bias. However, the transit times over which the biases are manifested are different because the two methods are applicable over different time ranges, up to 5 years for seasonal tracer cycles and up to 200 years for tritium concentrations. Our virtual experiments with two water components show that the aggregation errors are larger when the MTT differences between the components are larger and the amounts of the components are each close to 50 % of the mixture. We also find that young water fractions derived from tritium (based on a young water threshold of 18 years) are almost immune to aggregation errors as were those derived from seasonal tracer cycles with a threshold of about 2 months.

Environmental tracers are commonly used to obtain transit time distributions (TTDs) in groundwater systems (Małoszewski and Zuber, 1982) or catchments (McDonnell et al., 2010). Transit time is the time it takes for rainfall to travel through a system from recharge to emergence in a well, spring, or stream. TTDs provide important information about transport, mixing, and storage of water in systems and therefore on the retention and release of pollutants. In addition, mean transit times (MTTs) determined from these distributions provide practical information for various aspects of water resources management. For example, MTTs have been used to estimate the volume of groundwater storage providing baseflow in catchments (Morgenstern et al., 2010; Gusyev et al., 2016) and to predict lag times and life expectancies of contaminants in the subsurface (Hrachowitz et al., 2016). The drinking water securities of wells in New Zealand are partly assessed by an absence of water with less than 1-year travel time by the New Zealand drinking water quality standard (Ministry of Health, 2008). As useful as they are, TTDs cannot be measured directly in the field and have to be inferred from age-dependent tracer concentrations with the use of lumped parameter models (LPMs).

Catchments are inherently heterogeneous on various scales. Point-scale properties vary greatly from place to place, while streams integrate the various catchment outputs. The top-down approach uses catchment outputs, such as streamflow and stream chemistry, to infer or predict catchment TTDs. The hope is that these average out local heterogeneities allowing one simple LPM to provide a good fit and its parameters to be representative of the catchment. But individual areas within catchments can vary greatly because of geology, geography, aspect, etc. Groundwater systems also show heterogeneity. Kirchner (2016a) showed by means of virtual experiments that aggregating subcatchments with different TTDs can lead to severe underestimation of the composite MTT when simple LPMs were applied to interpret seasonal tracer cycles. This is because the smoothing out of the seasonal cycles is a non-linear process which acts more rapidly on the younger water components thereby causing underestimation of the composite MTT. He also found that the young water fraction was a much more robust metric than the MTT against aggregation error. These results raise an important question: are tritium-derived MTTs also susceptible to aggregation error due to spatial heterogeneity? This work aims to answer this question.

Seasonal tracer cycle and tritium-based MTTs are determined by different methods and have given very different results in catchments. The seasonal tracer cycle method depends on damping of input cycles on passing through a system into the output, whereas the tritium method depends on radioactive decay of tritium between input and output (with half-life of 12.32 years). Effects of mixing within systems need to be accounted for in both cases (Małoszewski and Zuber, 1982). Results from seasonal tracer cycles have given MTTs up to about 5 years, at which point the input cycles in homogeneous systems are completely damped within tracer measurement errors, while results from tritium measurements have shown that large proportions of the flow in many streams have MTTs of 1–2 decades or more (Stewart et al., 2010; Seegar and Weiler, 2014; Michel et al., 2015). Aggregation errors due to the non-linearity of the damping of the seasonal tracer cycles in time (noted above) add to this loss of signal in seasonal tracer cycles, thereby increasing the underestimation of the real MTTs in streams. Similarly, radioactive decay of tritium is a non-linear process and therefore spatial aggregation errors are expected when water components with different MTTs are combined (Bethke and Johnson, 2008).

Calibration of LPMs using environmental radioisotope and stable isotope data has been the subject of study for many years (see Małoszewski and Zuber, 1982, and early work summarised therein). If a catchment outflow is a mixture of two or more components of different water ages, it can be difficult to calibrate a LPM uniquely when we only have data for tracers. For example, for springs in Czatkowice, Poland, only when the proportion in which the water components (water fluxes) was mixed was known could the unique answer based on tritium measurements be found (Grabczak et al., 1984; Małoszewski and Zuber, 1993). In heterogeneous catchments, it is always helpful (i) to measure a variable tracer periodically, and (ii) to combine those data with water fluxes in the inputs and outputs to separate “fast” and “slow” components; see for example studies at Lainbach Valley, Germany (Małoszewski et al., 1983), and Schneealpe, Austria (Małoszewski et al., 2002). The choice of LPM, or equivalently the TTD function, must be based more on the hydrogeological situation and not on artificial mathematical (fitting) considerations. Consideration of hydrological parameters known independently (e.g. mean thickness of the water-bearing layers in the catchment) is required for model validation in order to examine whether the model is likely to be applicable to the real situation. We can have a very well-calibrated model in terms of tracer data being fitted by an LPM, but the MTT can be far from the hydrological reality.

The aim of this paper is to examine the aggregation effects of spatially
heterogeneous catchments and groundwater systems on MTTs and young water
fractions determined using tritium concentrations. We conducted our
investigation by combining two dissimilar water components in virtual
experiments and comparing the true mixed MTTs with the tritium-inferred
apparent MTTs, as Kirchner (2016a) did with seasonal tracer cycles. Our
experiments did not include examination of non-stationary hydrological
systems, for which Kirchner (2016b) had found similar underestimation of
MTTs with seasonal tracer cycles. We also examined aggregation effects for
young water fractions estimated using tritium. Our calculations are based on
the gamma LPM with shape factors (

The varied flow paths of water through the subsurface of catchments imply that outflows contain mixtures of water with different transit times. That is, the water in the stream does not have a discrete transit time, but has a TTD. This distribution is often described by a conceptual flow or mixing model, which reflects the average (steady-state) conditions in the catchment or groundwater system.

Rainfall incident on a catchment is affected by immediate surface/near-surface runoff and longer-term evapotranspiration loss. The remainder
constitutes recharge to the subsurface water stores. Tracer inputs to the
subsurface water stores (i.e. seasonal tracer cycles and tritium
concentrations in the recharge water) are modified during passage through
the hydrological system by mixing of water with different transit times
(represented by the flow model) and radioactive decay in the case of tritium
before appearing in the output. The convolution integral and an appropriate
flow model are used to relate the tracer input and output. The convolution
integral is given by

Tritium concentrations in precipitation were different in each hemisphere,
and are proxies for tritium recharge concentrations (

The curves also show smaller variations due to annual peaks in tritium concentrations caused by increased stratospheric leakage during spring in each hemisphere, and possibly small longer-term variations related to sunspot cycles. Tritium concentrations are expected to remain at the present cosmogenic levels for the foreseeable future, and this means that multiple age solutions are becoming less of a problem (Stewart et al., 2012; Stewart and Morgenstern, 2016; Gusyev et al., 2016). However, the minimal variation will mean that tritium will not be effective for identifying flow models in the future.

Tritium concentrations (TU) in monthly precipitation samples at Kaitoke, New Zealand, in the Southern Hemisphere, and Trier, Germany, in the Northern Hemisphere.

Several simple flow models are commonly used in tracer studies. The piston
flow model (PFM) describes systems in which all of the water in the output
has the same transit time (MTT or

The exponential model (EM) is given by

The gamma model (GM) has TTDs based on the gamma distribution:

The exponential piston flow model (EPM) combines a volume with exponential
transit times followed by a piston flow volume to give a model with two
parameters (Małoszewski and Zuber, 1982). The TTD is given by

The dispersion model (DM) assumes a tracer transport which is controlled by
advection and dispersion processes (Małoszewski and Zuber, 1982), with a
TTD of

This paper makes a particular distinction between

Compound LPMs have generally only been explored for more complicated systems
or when simple LPMs have given poor fits to data (such as seasonal tracer
cycles or tritium concentrations) (e.g. Małoszewski et al., 1983;
Stewart and Thomas, 2008; Blavoux et al., 2013; Morgenstern et al., 2015).
The binary parallel LPM is given by

To estimate the effects of spatial aggregation on mean transit times (MTTs),
we perform virtual experiments by combining two homogeneous subsystems. Each
subsystem or water component is described by a simple LPM (a GM
with assumed parameters

To determine the “apparent” MTT, the tritium concentrations of the water
components from 1940 to the present are calculated from the GMs
applying to each component using the convolution process described above
(Eq. 1). The input function was first assumed to be constant at 2 TU for the
calculations given in Sect. 3.1.1; then the Kaitoke or Trier input
functions (Fig. 1) were used for the calculations in Sect. 3.1.2 and
3.1.3. In all cases, the tritium concentrations of the mixed system
(

The young water fraction (

The transit time distributions of the three cases of the GM
investigated in this work are illustrated in Fig. 2a, as normalised
PDFs (i.e.

The other simple flow models are compared with the GM in Table 1
and Fig. 2b–c. The standard deviation (SD) and Nash–Sutcliffe efficiency
(NSE) are used to quantify the goodness of fit between the GM
(GM

Aggregation errors when the tritium input concentration is assumed to be constant at 2 TU. Mean transit times (MTTs) are inferred from tritium concentrations in mixed runoff from two subcatchments with different tritium concentrations and MTTs (shown by red dots) using a range of GMs and the PFM. The relationships between MTTs and tritium concentrations given by the simple models (black curves) are strongly non-linear causing marked differences between the true and apparent MTTs.

Comparison of the shapes of the gamma (GM), exponential piston flow (EPM), and dispersion (DM) model transit time distributions. The shape parameters of the best-fitting versions of the other models and the goodness of fit (standard deviation, SD; Nash–Sutcliffe efficiency, NSE) between them and the GM are given.

We first demonstrate the relationships between mean transit time and tritium concentration for mixed systems (Fig. 3) by assuming constant annual input tritium concentration of 2 TU over time, i.e. without the bomb pulse during the nuclear age and only natural background concentrations are present. This simplifying assumption is necessary to allow for the analysis shown in Fig. 3; with the real peaked input the figures would be much more complicated. The assumption of a constant tritium input function is however becoming increasingly realistic in the Southern Hemisphere, with the bomb tritium from 50 years ago now fading away and assuming no more large-scale releases of tritium to the atmosphere. This assumption is not limited to tritium but would also be valid for all radioactive tracers with constant input such as carbon-14 and argon-39.

Figure 3a shows the relationship for the GM with shape factor

Figure 3b–d show the same calculations applied to the GMs with

Figure 4 shows the effect of changing the fraction of the young component (

The true versus the apparent MTTs calculated using the real tritium input
function from Kaitoke (expressed as annual values) are given in Fig. 5. The
calculations were structured so that the two water components were initially
assumed to have the same MTTs (i.e.

Effect of changing

Aggregation effects for tritium MTTs for GMs with different
values of

Aggregation effects for tritium MTTs using the Trier input function. Symbols as in Fig. 5.

The different values of

Using the Trier (Northern Hemisphere) tritium input function (Fig. 1) results in very similar aggregation biases for tritium MTTs (Fig. 6) compared to those obtained with the Kaitoke input (Fig. 5). Using Northern Hemisphere or Southern Hemisphere tritium input functions makes only slight differences to the curves. Note that the problem of multiple age solutions often experienced using tritium with the Northern Hemispheric input function (e.g. Stewart et al., 2012) does not arise here because we calculate around 75 tritium values (one for each year) and this constrains the final “apparent” fitting to a single unique solution. However, the fitting errors for the apparent MTTs with the Trier input function are much larger than those determined with the Kaitoke input function.

Some of the calculation results are replotted in Fig. 7 to compare results
for the Northern Hemisphere and Southern Hemisphere. This figure shows the possible
aggregation error (expressed as percentage deviation of the apparent from
the true MTTs) versus the MTT of component 2 (MTT2) for the GM with

The effect of combining two different water components on the true and
apparent young water fractions (

Comparison of maximum aggregation effects for Southern Hemisphere (Kaitoke) and
Northern Hemisphere (Trier) for the GM with

True versus apparent tritium young water fractions for GMs
with different values of

Tritium young water fractions using the Trier, Germany, tritium input function. Symbols as in Fig. 8.

Aggregation effects on MTTs determined using seasonal tracer cycles
for the GM with

For stable isotopes, Kirchner (2016a) reported a young water threshold range
from 0.1 to 0.25 years (or approximately 2 months) for the GM
shape factor

Young water fractions evaluated using tritium are of practical interest for various threshold ages – for example 1 year for assessing drinking water security of groundwater wells (water mixtures without any fraction of water of less than 1 year are regarded as secure in terms of potential for pathogen contamination; Close et al., 2000; Ministry of Health, 2008), or 60 years to assess the fraction of water that has already been impacted by high-intensity industrial agriculture starting after WWII (e.g. Morgenstern et al., 2015).

Aggregation effects for seasonal tracer cycles have been determined by the
methods of Kirchner (2016a) for comparison with the tritium effects. The
rainfall input variation has been approximated as a sine wave with a
1-year period to imitate the seasonal tracer cycle, and the sine wave has
been traced through the convolution using the gamma distribution. Figure 10
shows the aggregation effects for the GM with

The analysis of Sect. 3.1 and 3.2 has shown that tritium-derived MTTs are
just as susceptible to aggregation bias as seasonal tracer cycles when flows
from dissimilar parts of catchments are combined using simple LPMs.
Likewise, groundwater wells or springs fed by two or more water sources with
different MTTs will also show aggregation bias. However, the transit times
over which the biases are manifested are different because the two methods
are applicable to different time ranges, up to 5 years for seasonal tracer
cycles and up to 20 years for tritium concentrations (based on appropriate
mixing models). Note particularly that the bias applies

The calculations have been made for extreme cases to highlight the
aggregation bias. Firstly, the heterogeneity is assumed to be represented by
just two homogeneous but different areas of hydrological systems. This is
the worst type of heterogeneity for aggregation bias. Secondly, the water
components from these areas are assumed to combine in the proportions of 1 : 1
in the outlet. This causes close to the maximum aggregation bias for a given
pair of waters, since it ranges from zero at

Both simple and compound LPMs can be free of aggregation error or conversely
be affected by aggregation error depending on whether or not they capture
the nature of the heterogeneity in the catchment or groundwater system
relevant to the error. Simple LPMs have fewer parameters, but have no
ability to capture heterogeneity because of their underlying perceptual
model (i.e. the assumption of homogeneity), and therefore would be expected
to underestimate MTTs because of aggregation error if there

We therefore suggest that the answer to the question in the title of this section may be what has often been practised in the past, even though the term “aggregation error” was not used (e.g. Małoszewski et al., 1983; Uhlenbrook et al., 2002; Stewart and Thomas, 2008; Morgenstern and Taylor, 2009; Stewart et al., 2010; Blavoux et al., 2013; Morgenstern et al., 2015). This ideally involves evaluation of many types of information about a hydrological system (geological, hydrological, hydrochemical, tritium, and other isotopes) to establish a perceptual model, and experiments with simple and compound LPMs in harmony with the derived perceptual model to fit tritium data (and, if available, other types of chemical or isotopic data). Compound LPMs in harmony with the perceptual model would be expected to yield MTTs with less aggregation error than simple LPMs, because the former have the ability to separate young and old water components while the latter do not. Comparison of MTTs from simple and compound models should then show whether there is much aggregation error. Parameters yielded by best-fitting models have been used in the past, but they may not be the most appropriate ones if the parameters are to be used in other contexts. There is also risk of missing less apparent (alternative) parameter solutions if there are any elsewhere in the parameter space. Gallart et al. (2016) applied a GLUE-based uncertainty assessment method which used Monte Carlo searching of the parameter space of the EPM to estimate MTTs from tritium. This allowed the uncertainties of the parameters to be quantified.

MTT estimations based on tritium concentrations show very similar aggregation effects to those for seasonal tracer variations. Our virtual experiments with two water components show that the aggregation errors are largest when the MTT differences between the two components are largest and the amounts of the components subequal. We also find that young water fractions derived from tritium based on a young water threshold of 18 years are almost immune to aggregation errors as were those derived from seasonal tracer cycles with a threshold of about 2 months. We conclude with a discussion of the implications of aggregation bias on tritium MTTs and detection of aggregation errors in past studies.

No data sets were used in this paper. Readers can consult with the authors regarding the methods used in the virtual experiments.

The authors declare that they have no conflict of interest.

We thank the editor and reviewers. The New Zealand Ministry of Business and Innovation is acknowledged for funding via the SSIF research programme Groundwater Resources of New Zealand. Edited by: Markus Hrachowitz Reviewed by: five anonymous referees