Understanding uncertainties when inferring mean transit times of water trough tracer-based lumped-parameter models in Andean tropical montane cloud forest catchments

Weekly samples from surface waters, springs, soil water and rainfall were collected in a 76.9 km 2 ountain rain forest catchment and its tributaries in southern Ecuador. Time series of the stable water isotopes δ18O andδ2H were used to calculate mean transit times (MTTs) and the transit time distribution functions (TTDs) solving the convolution method for seven lumped-parameter models. For each model setup, the generalized likelihood uncertainty estimation (GLUE) methodology was applied to find the best predictions, behavioral solutions and parameter identifiability. For the study basin, TTDs based on model types such as the linear–piston flow for soil waters and the exponential– piston flow for surface waters and springs performed better than more versatile equations such as the gamma and the two parallel linear reservoirs. Notwithstanding both approaches yielded a better goodness of fit for most sites, but with considerable larger uncertainty shown by GLUE. Among the tested models, corresponding results were obtained for soil waters with short MTTs (ranging from 2 to 9 weeks). For waters with longer MTTs differences were found, suggesting that for those cases the MTT should be based at least on an intercomparison of several models. Under dominant baseflow conditions long MTTs for stream water ≥ 2 yr were detected, a phenomenon also observed for shallow springs. Short MTTs for water in the top soil layer indicate a rapid exchange of surface waters with deeper soil horizons. Differences in travel times between soils suggest that there is evidence of a land use effect on flow generation.


Introduction
The mean transit time (MTT) of waters provides a valuable primary description of the hydrological (Fenicia et al., 2010) and biochemical systems (Wolock et al., 1997) of a catchment and its sensitivity to anthropogenic factors (Landon et al., 2000;Turner et al., 2006;Tetzlaff et al., 2007;Darracq et al., 2010).Whereas the MTT describes the average time it takes for any given water parcel to leave the catchment, the transit time distribution function (TTD) describes the retention behavior of all those water parcels as a frequency function over time (McGuire and McDonnell, 2006).Together with the physical characteristics of the catchment, the MTT and TTD (for the particular case of soil water, MTT should be more properly understood as mean residence time, and TTD as residence time distribution function) allow inferring the recharge of aquifers (Rose et al., 1996), the bulk water velocities through its compartments (Rinaldo et al., 2011), and the interpretation of the water chemistry (Maher, 2011), all of which supports the design of prevention, control, remediation and restoration techniques.Additionally, MTT and TTD data are useful to reduce the uncertainty of results and improve input parameter identifiability for either hydrologic modeling studies (Weiler et al., 2003;Vache and McDonnell, 2006;McGuire et al., 2007;Capell et al., 2012) or solute movement analyses through soil and aquifers using mixing models (Iorgulescu et al., 2007;Barthold et al., 2010).

E. Timbe et al.: Understanding uncertainties when inferring mean transit times of water
The stable water isotopes δ 18 O and δ 2 H are commonly used as environmental tracers for a preliminary assessment of the transport of water in watersheds with transit times less than 5 yr (Soulsby et al., 2000(Soulsby et al., , 2009;;Rodgers et al., 2005;Viville et al., 2006).For longer MTTs of up to 200 yr (Stewart et al., 2010), tritium radioisotopes are used to analyze the storage and flow behavior in surface water and shallow groundwater systems (Kendall and McDonnell, 1998), while, for example, carbon isotopes are employed for analyzing the dynamics of deep groundwater with ages of hundreds to thousands of years (Leibundgut et al., 2009).
Since Barnes and Bonell (1996), researchers in tracer hydrology use quasi-distributed and conceptual models to encompass the non-linearity of the processes related to the transit states of the soil moisture dynamics (Botter et al., 2010;Fenicia et al., 2010).However, the use of such modeling approaches is only advisable after basic inferences about the underlying mixing processes and the way water is routed through the system have been drawn.In this sense, insights can be provided by applying lumped TTD functions as the models proposed by Maloszewski andZuber (1982, 1993), which are based on quasi-linearity and steady-state conditions.These models include the exponential (EM), piston (PM), or linear (LM) models, in which the MTT of the tracer is the only unknown variable, and also combinations of models such as the exponential-piston flow (EPM) and the linear-piston flow (LPM) models.Among the two-parameter lumped models, the dispersion model (DM), which considers simplifications of the general advection-dispersion equation, has been applied in environmental tracer studies (Maloszewski et al., 2006;Viville et al., 2006;Kabeya et al., 2006).For almost one-and-a-half decades, other lumped models have been exploited such as the two-parameter gamma model (GM) proposed by Kirchner et al. (2000), which is a more general and flexible version of the exponential model, and the two parallel linear reservoirs model (TPLR), a three-parameter function that combines two parallel reservoirs, each one represented by a single-exponential distribution (Weiler et al., 2003).The use of these models for estimating the MTT in the compartments of a catchment has become a standard practice for the preliminary assessment of the catchment functioning.The advantage of the latter functions relies on the fact that they allow the representation of different mixing processes in different system components, such as soil and groundwater.In contrast, simpler models assume instantaneous and complete mixing over the entire model domain (Hrachowitz et al., 2013).Regarding lumped-parameter models, McGuire and McDonnell (2006) presented in their study a compilation of the most frequently used models for deriving MTTs.Under the condition that a particular model ought to be concordant with the physical characteristics of the aquifer system, this condition hinders the applicability of lumped-parameter models to poorly gauged catchments with scarce or no information on the physical characteristics of the system.For these cases the authors believe that it is better to use an ensemble of models in order to be certain that the results or the inferences point in the same direction, or, if not, to have a better idea of the uncertainties.
Particularly for tropical zones the knowledge of hydrological functioning is still limited, and investigation of system descriptors such as TTD and MTT is key to improving our understanding of catchment responses (Murphy and Bowman, 2012;Brehm et al., 2008).This is especially the case for tropical mountain rainforest systems.In this study we focus on the San Francisco River basin, an Andean mesoscale headwater catchment in Ecuador.Notwithstanding the recent characterization of the climate (Bendix et al., 2006), soils (Wilcke et al., 2002), water chemistry (Buecker et al., 2011) and hydrology (Plesca et al., 2012) of the basin, we are still lacking a perceptual model that explains the observations of chemical, hydrometric and isotopic variables and related processes (Crespo et al., 2012).
To enhance the understanding of the hydrological functioning of the San Francisco basin, this study focuses on the (i) estimation of the MTT in the different compartments of the catchment; (ii) characterization of the dominant TTD functions; and (iii) evaluation of the performance and uncertainty of the models used to derive the MTTs and TTDs.Translated into hypotheses the study reported in this paper aimed to test if 1. the diversity of the sampling sites allows evaluating the spatial variability in catchment hydrology, identifying the dominant processes, and screening the performance of the TTD models; 2. the multi-model approach and the identifiability of their parameters enable identification of the respective TTDs and MTTs.
The hypotheses are based on the following assumptions: 1. the used tracers are conservative, there are no stagnant flows in the system, and the tracer's mean transit time τ represents the MTT of water (e.g., McGuire and McDonnell, 2006); 2. stationary conditions are dominant in the basin, and lumped equations based on linear or quasi-linear behaviors are applicable (Heidbüchel et al., 2012); 3. from insights derived from related studies (Soulsby et al., 2009;McGuire and McDonnell, 2006;Rodgers et al., 2005), considering the drainage areas, the steepness of the topography and the shallow depth of the soil layers, the transit times of the sampling sites are less than 5 yr, making it possible to use δ 2 H and δ 18 O as tracers.2. Framed image shows the zoomed area of the lower part of the catchment.

Study area
The San Francisco tropical mountain cloud forest catchment (Fig. 1, Table 1), 76.9 km 2 in size, is located in the foothills of the Andean cordillera in southern Ecuador, between Loja and Zamora, and drains into the Amazonian River system.Hourly meteorological data recorded at the Estación Científica San Francisco (ECSF, 1957 m a.s.l.),El Tiro (2825 m a.s.l.), Antenas (3150 m a.s.l.) and TS1 (2660 m a.s.l.) climate stations are available from the DFG funded Research Unit FOR816 (www.tropicalmountainforest.org).Monthly averages of the main meteorological parameters for the period 1998-2012 allow a description of their spatial and interannual variation.Mean annual temperature ranges from 15 • C in the lower part of the study area (1957 m a.s.l.) to 10 • C on the ridge (3150 m a.s.l.), with an altitude gradient of −0.57• C per 100 m, without marked monthly variability.The wind velocities of the prevailing southeasterlies reach average maximum daily values of 10 m s −1 between June and September, while wind velocities in the middle and lower catchment areas are fairly constant, equal to 1 m s −1 .The humid regime of the catchment is comparatively constant with the relative humidity varying from 84.5 % in the lower parts to 95.5 % at the ridges.Among all meteorological parameters, precipitation shows the largest spatial variability, with an average gradient of 220 mm per 100 m (Bendix et al., 2008b).However, this gradient is not constant throughout the catchment and shows substantial spatial variability (Breuer et al., 2013).Recent estimation of horizontal rainfall revealed its significance, contributing 5 to 35 % of measured tipping-bucket rainfall, respectively, to the lower and ridge areas of the catchment (Rollenbeck et al., 2011).Rainfall is marked by low rainfall intensities, generally less than 10 mm h −1 , and high spatial variability.Annual rainfall is uni-modally distributed with a peak in the period April-June.Using the Thiessen method and considering horizontal rainfall, the precipitation depth amounted 2321 mm in the period August 2010-July 2011, and 2505 mm in the period August 2011-July 2012.A more detailed description of the weather and climate of the study area is given in Bendix et al. (2008a).In line with findings of Crespo et al. (2012) in the same area, baseflow accounts for 85 % of the total runoff (Table 1), notwithstanding the rapid and marked response of flows to extreme rainfall events.In just a few hours peak discharges are several times higher than baseflows (Fig. 2a), carrying considerable amounts of sediment and accompanied by drastic changes in some of the cross sections.
Major soil types are Histosols associated with Stagnasols, Cambisols and Regosols, while Umbrisols and Leptosols are present to a lesser degree (Liess et al., 2009).The geology is reasonable similar throughout the study area, consisting of sedimentary and metamorphic Paleozoic rocks of the Chiguinda unit with contacts to the Zamora batholith (Beck et al., 2008).The topography is characterized by steep valleys with an average slope of 63 %, situated in the altitudinal range of 1725 to 3150 m a.s.l.(Table 1).Protected by the Podocarpus National Park, the southern part of the catchment is covered by pristine primary forest and sub-páramo.In the northern part, particular during the last two decades, land is being converted to grassland.Presently 68 % of the catchment is covered by forest, 20 % is sub-páramo, 6.5 % is used as pasture and 3 % is degraded grassland covered with shrubs (Goettlicher et al., 2009;Plesca et al., 2012).Landslides are present in the catchment, especially along the paved road between the cities Loja and Zamora.

Catchment composition and discharge measurements
The San Francisco catchment was subdivided into seven subcatchments with areas ranging between 0.7 and 34.9 km 2 , characterized by different land uses varying from pristine forest and sub-páramo to pasture areas (Fig. 1 and Table 1).
In order to define baseflow conditions, each sub-catchment was equipped with a water level sensor (mini-diver, Schlumberger Water Services, Delft, NL).Reference discharge measurement, using the salt dilution method, were made frequently during the time of sampling.However, due to the high variability of the river bed for the sites Pastos (QP), Zurita (QZ) and Ramon (QR), only continuous records for sub-catchments Francisco Head (FH), Navidades (QN), Milagro (QM) and Cruces (QC) and for the main outlet Planta (PL) were considered as reliable to calculate stage-discharge curves and the hydrographs, as shown in Fig. 2a for PL (Abreviations of names for all study sites are defined in Table 2).For the remaining sites, discharge measured at the moment of sampling was used.

Isotope sampling and analyses
Weekly water samples for isotope analysis were collected manually in the main river (Fig. 2b), its tributaries, creeks and springs in the period August 2010 to mid-August 2012 and later for soil water starting in September/November 2010 (Table 2), using 2 mL amber glass bottles.Soil water sampling was performed along two altitudinal transects covered by forest and pasture (Table 2), at 6 sites (Fig. 1) and 3 depths (0.10, 0.25 and 0.40 m) using wick samplers.Wick samplers were designed and installed as described by Mertens et al. (2007).Woven and braided 3/8-inch fiberglass wicks (Amatex Co., Norristown, PA, US) were unraveled over a length of 0.75 m and spread over a 0.30 m × 0.30 m × 0.01 m square plastic plate.The plate enveloped with fiberglass was covered with fine soil particles of the parent material and then set in contact with the undisturbed soil, respectively at the bottom of the organic horizon (0.10 m below surface), a transition horizon (0.25 m below surface) and a lower mineral horizon (0.40 m below surface).The low constant tension in the wick samplers guarantees sampling of the mobile phase of soil water, avoiding isotope fractionation (Landon et al., 1999).
Along with the weekly sampling, event-based rainfall samples for isotope analyses were collected manually in 1 L bottles using a Ø 25 cm funnel at 1900 m a.s.l.(Fig. 1).After every event, the sample bottles were covered with a lid and stored for analysis within a week in 2 mL amber glass bottles.Only sample volumes > 2 mL were suitable for permanent storage and measurements.Events with a sample volume below 2 mL were discarded.The end of a single rainfall event was marked by a time span of 30 min without rainfall, whereby a total of 946 samples were collected with an average duration of 3.2 h (varying from 0.25 to 19 h with up to 11 events per day).Since the solving of the convolution equation needs a continuous time step of input data (Maloszewski and Zuber, 1982), the time resolution of the input series was set to 7 days (Fig. 2c).In this sense, weekly mean isotopic signatures for smaller rainfall events during longer dry periods (only 5 among 104 weeks had no rainfall event > 2 mL sampling volume) were interpolated using antecedent and precedent measurements.
The final isotope signature used for the models represents -for rainfall water, the weighted mean of all events during each week (Sundays to Saturdays) using the rainfall data recorded at the nearby meteorological station (400 m to ECSF), -for soil water samples, the weekly average isotope signal for each soil depth, and -for stream, creek and spring water samples, an instantaneous isotopic concentration in time.These samples were not flux-weighted.For stream waters, only isotope samples from designated baseflow conditions were later considered (see Sect. 2.5).
The stable isotopes signatures of δ 18 O and δ 2 H are reported in per mil relative to the Vienna Standard Mean Ocean Water (VSMOW) (Craig, 1961).The water isotopic analyses a Sampling campaign was completed by mid-August 2012.b There are three wick samplers per site (i.e., A1 = 0.10 m, A2 = 0.25 m and A3 = 0.40 m below surface).c The letter Q for the site codes comes from "Quebrada" which stands for "Stream" in English.
were performed using a compact wavelength-scanned cavity ring down spectroscopy (WS-CRDS)-based isotope analyzer with a precision of 0.1 ‰ for δ 18 O and 0.5 for δ 2 H (Picarro L1102-i, CA, US).

Isotopic gradient of rainfall
Throughout the catchment, the recorded rainfall time series from meteorological stations are correlated (r 2 was at least 0.6, based on weekly precipitation data).As the models in question are only driven by the isotope signal and not by the actual amount of incoming precipitation on site, a flux weighting based on a single station within the catchment (ECSF) was sufficient.However, given the large altitudinal gradient in the San Francisco basin, it is to be expected that the input isotopic signal of rainfall for every sub-catchment varies according to its elevation (Dansgaard, 1964).In this regard, Windhorst et al. (2013) estimated this variation for the main transect of the catchment: −0.22 ‰ δ 18 O, −1.12 ‰ δ 2 H and 0.6 ‰ deuterium excess per 100 m elevation gain.Applying this altitude gradient to the flux-weighted isotope signal under the assumption that the incoming rainfall signal is the sole source of water, thereby excluding any unlikely source of water from outside the topographic catchment boundaries with a different isotope signal, it was possible to derive the recharge elevation and localized input signal in each sub-catchment.The derived recharge elevations were used to crosscheck that they are inside the topographic boundaries of every sub-catchment and comparable to their mean elevations.
The justification to adopt only the mentioned gradient to extrapolate the isotope signals was based on previous studies on spatial and temporal variation of stable isotopes of rainfall in the same catchment, which revealed that only the altitude effect is significant and that in this factor there is no influence of temperature, relative humidity and precipitation amount or intensity (Windhorst et al., 2013).
Since no marked fractionation was observed for all analyzed waters, it is highly probable that similar estimations of MTT are derived either using δ 18 O or δ 2 H (Fig. 3).Therefore, in this study δ 18 O was selected for further analysis.2.

Mean transit time estimation and transit time distribution
Mean transit times were calculated based on stationary conditions.In the case of stream water this condition was fulfilled by considering only baseflow conditions (Heidbüchel et al., 2012), which were dominant in the catchment during the 2 yr observation period (Fig. 2a and b depict this characteristic for the main outlet), accounting for 85 % of total runoff volume.Baseflow separations for streamflow were obtained through parameter fitting to the slope of the recessions in the observed hourly flows using the Water Engineering Time Series PROcessing tool (WETSPRO), developed by Willems (2009).To account for samples taken at baseflow conditions at sites where hydrometric records were not available, the specific discharges of the closer catchments with similar characteristics in terms of land use, size, and observed hydrologic behavior were used.In this sense, QZ, QR and QP were considered similar to QN, QM and QC, respectively (Table 1).In contrast, all spring and creek water samples were included in the analysis since their isotopic signatures were less influenced by particular rain events (as inferred from the smooth shape of the observed isotope signal) in the San Francisco catchment.In regard to soil water, we considered all samples, since each sample represents a volume-weighted weekly average signature (isotopic signatures of particular high-rainfall events are smoothed at a weekly time span).
For the calculation of MTTs, we used the lumpedparameter approach.In this, the aquifer system is treated as an integral unit and the flow pattern is assumed to be constant as outlined in Maloszewski and Zuber (1982) for the special case of constant tracer concentration in time-invariant systems.In this case the transport of a tracer through a catchment is expressed mathematically by the convolution integral.The tracer output C out (t) and input C in (t) are related as a function of time: (1) In the convolution integral, the stream outflow composition C out at a time t (time of exit) consists of a tracer C in that falls uniformly on the catchment in a previous time step t (time of entry), and C in becomes lagged according to its transit time distribution g(t − t ); the factor exp [−λ(t − t )] is used to correct for decay if a radioactive tracer is used (λ = tracer's radioactive decay constant).For stable tracers (λ = 0), and considering that the time span t − t is the tracer's transit time τ , Eq. ( 1) can be simplified and re-expressed as where the weighting function g(τ ) or tracer's TTD describes the normalized distribution function of the tracer injected instantaneously over an entire area (McGuire and McDonnell, 2006).As it is hard to obtain this function by experimental means, the most common way to apply this lumped approach is to adopt a theoretical distribution function that better fits the studied system.In general meaning, any type of a weighting function is understood as a model.In accordance, seven lumped-parameter models to infer the MTTs for diverse water storages (stream, springs, creeks and soil water) were applied in this study.Results were evaluated on the basis of the best matches to a predefined objective function, their magnitude of uncertainty and the number of observations in the range of behavioral solutions.The equations for each of the lumped-parameter models used are shown in Table 3. EM and LM reflect simpler transitions where the tracer's mean transit time τ is the only unknown variable.More flexible models consider a mixture of two different types of distribution.EPM includes piston and exponential flows, while the LPM accounts for piston and linear flows.In both cases the equations are integrated by the parameter η indicating the percentage contribution of each flow type distribution.
The DM, derived from the general equation of advectiondispersion, is also one of the common models used in hydrologic systems (Maloszewski et al., 2006).In this model the fitting parameter D p , known as the dispersion parameter, is related to the transport process of the tracer (Kabeya et al., 2006).In the GM, the product of the shape parameter α and The TPLR model (Weiler et al., 2003) is based on the parallel combination of two single exponential reservoirs (despite its name TPLR follows exponential and not linear assumption), representing fast and slow flows: τ f and τ s , respectively.The flow partition between the two reservoirs is denoted by the parameter φ.

Convolution equation resolution
Due to the similarities between the seasonal isotopic fluctuations of the sampled effluents and rainfall signal, a constant interannual recharge of the aquifers was assumed.For each sampling site, the 2 yr isotopic data series were used as input for the models.To get stable results between two consecutive periods, these input isotope time series were repeated 20 times in a loop: an approach similar to the methodology presented by Munoz-Villers and McDonnell (2012) resulting in an artificial time series of 40 yr.It is common practice to extend the time series artificially by duplicating it (Hrachowitz et al., 2010(Hrachowitz et al., , 2011)).This does not change the results; it rather gives the model more room to find stable results.Data of the last loop were considered for statistical treatment and analysis.The repetition of the input isotopic signal implies that the interannual variation is negligible; an acceptable assumption for the San Francisco catchment considering the high degree of similarity between the same months along the analyzed 2 yr period (Fig. 4).Comparable monthly isotopic seasonality of rainfall has been described by Goller et al. (2005) for the same study area and for nearby regions with similar climatic conditions, e.g., Amaluza GNIP station (http://www.iaea.org/water).Modeled output results are available for the weekly time span chosen for the input function (an average signal of rainfall was distributed for every week on Wednesdays at 12:00 LT).These results were interpolated in order to perform statistical comparisons with instantaneous observed data.For soil waters, direct comparisons were performed between predictions and observed data.

Evaluation of model performance
The search for acceptable model parameters for each site was conducted through statistical comparisons of 10 000 simulations based on the Monte Carlo method, considering a uniform random distribution of the variables involved in each model.For each site and model its performance was calculated using the Nash-Sutcliffe efficiency (NSE).Quantification of errors and deviations from the observed data were respectively calculated by the root mean square error (RMSE) and the bias.MatLab version 7 was used for data handling and solving the convolution equation.
When looking for the optimum parameter range, we first set a wide range (maybe even unrealistic) to be sure to cover all possible solutions (Table 3).By checking the plots of these preliminary results we were able to identify the convergence of model solutions (we used NSE as the objective function for all model parameters), thereby making it possible, for a second simulation, to narrow down the parameter range for each variable.Once the variation ranges were identified and bounded, according to the largest solution peak for every site and for every variable, all the solutions 5 % below the top NSE were selected.For these behavioral efficiencies, weighted quantiles between 0.05 and 0.95 (90 % prediction limits) were calculated in order to refine limits of behavioral solutions for every variable.Using these limits, a final simulation for each site and model was performed (at this stage the 10 000 simulations were allowed to vary only for the corresponding final solution ranges).Results are shown in Tables 4 and 5, as well as in Parts 1 and 2 of the Supplement.
The aforementioned approach is based on the generalized likelihood uncertainty estimation (GLUE; Beven and Freer, 2001).The GLUE approach considers that several likely solutions are valid as long as efficiency of a particular simulation is above a pre-set, but subjective, threshold.In this sense, considering the large number of sites and models used, a lower limit dependent of the top efficiency was set for each case.Only for the analysis and intercomparison of results we considered that a prediction was poor whenever NSE < 0.45.
The following three criteria were used to select the best solutions of MTTs and TTDs from the final model runs: (1) NSE; (2) magnitude of the uncertainty of the prediction, expressed as a percent of the predicted MTT value; and (3) percentage of observations covered by the range of behavioral solutions defined according to the second criterion.Linear model (LM) Linear-piston flow model (LPM) Two parallel linear reservoirs (TPLR) τ = tracer's mean transit time; η = parameter that indicates the percentage of contribution of each flow type; D p = dispersion parameter; α = shape parameter and β = scale parameter; τ f and τ s = transit time of fast and slow flows; φ = flow partition parameter between fast and slow flow reservoirs.Units for parameters and their respective ranges are dimensionless except for τ and β, which have units of time (in this table they are given in weeks).

Soil water
Of all predictions the best matches of the models, with respect to the NSE objective function, ranged between 0.64 and 0.91 (Fig. 5a).When only the best goodness of fit was considered, the GM and the EPM performed best in most of the sampled sites (13 from 18), followed by the DM, LM and LPM (Fig. 5b).Only these models were considered for further mutual comparison.Even when the derived MTT values were similar among the models that best fitted the objective function (Fig. 6a, Table 4 and Part 1 of the Supplement), the LPM performed best taking into consideration additional selection criteria, as shown in Fig. 6b and c. Figure 7 depicts, for the LPM applied to site C2, the uncertainty and the range of behavioral solutions for the two model parameters.
Considering results from the LPM (Table 4), differences between observed and predicted values described by the RMSE are up to 1.72 ‰, and the larger absolute bias accounts for 0.181 ‰.Bearing in mind the ranges of behavioral solution, MTT results were between 2.3 and 6.3 weeks for pasture soils and between 3.7 and 9.2 weeks for forested soils, while parameterizations for η (ratio of the total volume to the volume in which linear flow applies) ranged from 0.84 to 2.23 and from 0.76 to 1.61, respectively.
Regarding the shapes of the distribution functions, Fig. 8 shows the best-matching results for two representative and comparable sampling sites (C2 for pastures and E2 for forest) for each lumped model (results for LM are not included since best-matching results for LPM were achieved with η ≈ 1; see Table 4).These probability (PDFs) and cumulative density functions (CDFs) depict how water is routed through the system.In this sense, pasture sites generally show a faster and higher response of the tracer peak when compared to forest sites (Fig. 8a and c).The CDFs (Fig. 8b and d) of all models are quite similar for the major part of the flows, even including the linear function LPM that averages the shape of the peaks described by the other models.Models based on exponential functions (EPM, DM, or GM in Fig. 8b and d) predict a small portion of the flow with an exponentially delayed tail, which is larger for forested sites than for pastures.Best distribution function results (based on highest NSEs) for all sampled sites, according to the type of land cover, are shown in Fig. 9a and b for the LPM and GM applied to pasture sites, and in Fig. 9c and d for forest sites.Considering the range of possible or behavioral solutions (e.g., shaded area represents range of solutions for C2 site in Fig. 9a and b, and for E2 in   Fig. 9c and d), distributions functions for each type of model and land cover are very similar between each sampled site.

River and tributaries
Considering all sites and models the criterion NSE > 0.45 was exceeded in 41 of the 63 predictions (9 sites per 7 models, Fig. 5a).Among the analyzed sites the TPLR model yielded the best matches for PL, SF, FH, QZ, QN, QM and QC, while the EPM for the QR and QP sites (Fig. 5b).The GM reached closest efficiencies when compared to the best match for every site.Consequently only the TPLR, EPM and GM were further considered.Differences between MTT predictions for all sites are depicted in Fig. 10a, and results from retained models in Table 5 and Part 2 of the Supplement.Although MTT results according to the best NSEs were reached using the TPLR model, compared to the GM or the EPM, these predictions also showed the largest uncertainties (Fig. 10b) and at the same time depicted the lowest number of observations inside the predicted range of behavioral solutions (Fig. 10c).Considering these additional selection criteria, EPM performed better.For stream water at the main outlet, Figs.11-13 show the parameter uncertainties and behavioral solutions for the TPLR, GM and EPM, respectively.
Considering results from the EPM (Table 5, Fig. 10a), the fitting efficiencies reached a maximum NSE of 0.56 for the main stream, and NSEs between 0.48 and 0.58 for the main tributaries (Fig. 5a).The predicted MTT at catchment outlet was 2.0 yr with a η parameter of 1.84 (a similar value was estimated for the main river at the SF sampling site, MTT = 2.0 yr and η = 1.85) and varied from 2.0 (QM, η = 1.85) to 3.9 yr (QC, η = 1.97) for the main tributaries.Uncertainties of MTT predictions between sites were similar with a maximum range between 14.1 and 20.4 % of the predicted MTT, as derived for the FH and QM sites (Table 5).Similarly, η ranged from 1.61 (QZ) to 2.21 (QP); the average value of η = 1.85 implies a 54 % of volume portion of exponential flow and a 46 % volume of piston flow; the uncertainty for the η parameter was 25 % on average.
Figure 14a and b show the shape of the TTD for the main river outlet (PL), corresponding to the highest NSEs for the EPM, GM and TPLR models.The curve for EPM shows a delayed peak that is not accounted in the GM or TPLR models (Fig. 14a), which in turn are very similar between them (at least after a short initial time since GM tends to infinity for times closes to zero).Furthermore, the latter models show a more delayed flow tail when compared to EPM, which shows in general a faster transit time (Fig. 14b).Differences between stream water TTDs from the main sub-catchments considering EPM and GM are shown in Fig. 15a and b.For comparison of the degree of similarities between sites, these plots include the range of behavioral solutions for the main outlet (PL), thereby being clear that apart from QC or QP the remaining sites have similar (EPM or GM) transit time distribution functions.

Springs and creeks
Of 35 predictions (7 models for 5 sites) the criterion NSE > 0.45 was fulfilled in 20 cases.Sites with reduced isotope signal (small σ ) yielded lower efficiencies (Fig. 5a, Table 5 and Part 2 of the Supplement).Apart from Pasto's tributary (TP) and Ramon's spring (QRS), in the remaining sites the criterion NSE > 0.45 was reached at least by 5 models.TP and springs located near to Planta (PLS) and San Francisco (SFS) sampling sites were best described by using a TPLR model (Fig. 5b).In this regard, GM and EPM were the second-and third-best models.Figure 10a shows the MTT results predicted by the three models, while detailed information is given in Table 5 and Part 2 of the Supplement.As for stream waters, the EPM performed best when looking at the uncertainties and the number of observed data inside the range of behavioral solutions (Fig. 10b and c).
Considering EPM, MTTs of 4.5 yr (NSE = 0.49, η = 1.74) for TP and 2.1 yr (NSE = 0.65, η = 1.84) for Q3 were estimated; while for springs, 2.0 yr (NSE = 0.69, η = 1.85) for PLS and 3.3 yr (NSE = 0.47, η = 1.42) for SFS.Results for the QRS site showed poor reliability due to the reduced amplitude of δ 18 O in the observed data (Table 5), the lowest among the observed sites (σ = 0.17).Estimations of MTTs for this site were larger than 5 yr, and therefore beyond the level of applicability of the method for natural isotopic tracers.
Figure 14c and d show the TTD results of EPM, GM and TPLR models, for a representative site with long MTT (creek TP).This site show a distinctive more-delayed time to the peak (for EPM) and longer duration of flow tails compared to stream water (Fig. 14a and b).In Fig. 15c and d, the TTDs for all spring and creek sampled sites are shown for the EPM and GM.In these figures, it is noticeable that the sites Q3 and PLS show the same patterns described previously for most of the stream waters (Fig. 15a and b), while some differences related to more-delayed flow responses can be accounted for SFS, TP or QRS sites (Fig. 15c and d), which are more similar to QP and QC stream waters.

Discussion
For each soil water site, similar MTT results of a few weeks to months were obtained regardless of the lumped-parameter model used (Fig. 6a, Table 4 and Part 1 of the Supplement).Although the LPM did not yield predictions with the highest efficiencies (Fig. 5a), it provided smaller ranges of uncertainty (Fig. 6b) and a larger number of observations inside them (Fig. 6c), advantages that could not be inferred by using only the best matches to NSE, for which GM and EPM performed better than others (Fig. 5b).Using a LPM, suitable to describe a partially confined aquifer with increasing thickness (Maloszewski and Zuber, 1982), we found MTTs varying from 2.3 to 6.3 weeks for pasture sites and from 3.7 to 9.2 weeks for forested soils.If we consider that only the top soil horizon was sampled (maximum sampled depth was 0.4 m), these results are comparable to values between 7.5 and 31 weeks found in 2.0 m soil columns of typical Bavarian soil using the DM (Maloszewski et al., 2006).When analyzing the distribution function for soil waters, similarities between model results are evident (Figs. 8  and 9).Considering the range of possible solutions of each site (shaded areas in Fig. 9a-d), it is noticeable that the major part of the flow's transit can be described similarly by all models, even using the simpler function (LPM).For these sites, when considering exponential models (EPM, GM or DP), a small portion of the flow is depicted as having a delayed tail; however, compared to the magnitude of the total volume, an LPM distribution could still be considered as a reliable method to estimate MTTs.
Considering the LPM results for MTTs of soil water from pastures (4.3 weeks on average) and forest sites (5.9 weeks  on average) as independent data sets, a two-tailed p value of 0.0075 for a Student's t test was calculated, meaning that the difference between the two groups was statistically significant, although physical characteristics, like length, slope, altitude and meteorological conditions of the respective hill slopes were more or less similar.Land use effects, affecting soil hydraulic properties controlling the infiltration and flow of water, were detected in previous studies within the research area (Huwe et al., 2008).Confirming findings in other tropical catchments were published by Zimmermann et al. (2006) and by Roa-García and Weiler (2010), who stated that under grazing the hydraulic conductivity decreased, overland and near-surface flows increased, and the storage capacity of the soil matrix declined, with feedbacks on the MTT of soil water.Similar insights were found by Tetzlaff et al. (2007) comparing two small catchments in the central Scotland Highlands of different land use.
For larger MTTs (≥ 2 yr), as derived for sampled surface waters and shallow springs, there were differences when predicted results among models were compared (Fig. 10a, Table 5 and Part 2 of the Supplement), especially for sites with strong damped signals of measured δ 18 O (e.g., QRS and TP sites).When considering uncertainties, the EPM performed significantly better when compared to the GM or TPLR models (Fig. 10b and c), although the latter two performed best for most of the sampled surface waters according to the NSE objective function (Fig. 5a and b).
When analyzing results from different models, dotty plots of model parameter uncertainty are very useful to display not only the magnitude of uncertainty but also its tendency.Similarly, the uncertainty bands of behavioral solutions can help to account for the sensitivity of the parameter uncertainty on δ 18 O modeled results.For example, when predicted results for the PL site are compared, larger parameter uncertainty and skewness are notorious for TPLR than for EPM or GM (Fig. 11a-c for TPLR; Fig. 12a-c for GM; Fig. 13a and b for EPM).At the same time EPM shows the highest sensitivity in modeled results (Figs. 11d,12d,13c).In order to contrast the signature of the effluent with younger waters such as rainfall, Figs.11e, 12e, or 13d show the damped observed (and predicted) δ 18 O signatures at the main outlet: a characteristic present in all analyzed surface waters.Considering the efficiencies reached by the predictions, we should keep in mind that ranges of behavioral solutions derived from a fixed 5 % of the top NSE are generally smaller than a predefined lower limit for all waters; e.g., a predefined lower efficiency limit of 0.30 and 0.45 were used by Speed et al. (2010) and Capell et al. (2012), respectively.
For stream waters, as for springs and creeks, the main differences between EPM and GM (or TPLR) results consisted first in a delayed response of the tracer signal in the outlet, modeled by a parameter η > 1 (Table 5), while for GM or TPLR the response of the flow occurred instantaneously after the spread of the tracer along the catchment (Figs. 14 and 15, Part 2 of the Supplement); and secondly by a comparatively smaller exponential flow tails, which also means that in general the flow transport is faster considering EPM than GM or TPLR models.For these cases, regardless of the degree of efficiencies or uncertainties, the decision on which TTD is more reliable would depend on the conceptual knowledge of the functioning of the catchment.For the San Francisco catchment this can be gained through additional field experiments in selected sites or sub-catchments using either higher-resolution samples from the effluents in order to analyze non-steady conditions (Botter et al., 2011) or considering different mixing assumptions (Hrachowitz et al., 2013).Another approach could be to analyze longer time series of stable isotopes, or even to include radioactive isotopes as tritium, which would help to crosscheck results, as it has been claimed that, in some cases, the inferences of the processes using solely stables isotopes underestimate the delayed part of the flow (Stewart et al., 2010).
Regardless of the model used, efficiencies of MTT for stream waters were lower than for soil waters.This was somehow expected, since the dampening effect on a catchment to sub-catchment scale generates a smoother signal filtering/averaging the heterogeneity observed at a single point along a precise transect.Since for most of the cases MTTs for soil waters showed an increasing trend according to increasing soil depth, longer MTTs corresponding to deeper soil layers are to be expected.Soil water below 0.4 m was not monitored within this study, given the shallow soil depth and the increasing fraction of rock material with depth, preventing the use of wick samplers.
The similarities and differences between models for sites with MTTs ≥ 2 yr, as for stream and spring waters, gave insights about the importance of accounting for a proper TTD, defined according to the conceptual knowledge of the catchment's functioning, before calculating MTT.In this regard, the use of a multi-model approach and uncertainty analysis is believed essential as to be able to define which functions describe in a better way the parameter identifiability and bounds of behavioral solutions.By considering best matches to NSE for stream waters, best predictions were obtained with the TPLR, EPM and GM -being more flexible versions of a pure exponential distribution function (i.e., EM), which helps to account for non-linearities of the system.The same distribution functions were identified as good predictors of observed data in a related study by Weiler et al. (2003).When comparing the TPLR to EPM or GM, the latter two take the non-linearity of the flow without splitting it in two reservoirs with different exponential behaviors, therefore yielding more identifiable results.However, findings by Weiler et al. (2003) suggest that the TPLR distribution function could achieve better predictions for runoff events generated by mixed fast and slow flows.In related studies using multiple models, the EPM yielded the best predictions for surface and spring waters (Viville et al., 2006).Considering this model, in the San Francisco catchment the average η = 1.85 value for stream waters (similar values were found for creeks and springs: η = 1.79 and η = 1.64, respectively) implies that a significant portion of old water (46 %) is released prior to the new one (54 %).The η value in this study is larger than the η value found in studies for stream water in temperate small headwaters catchments (η = 1.09,Kabeya et al., 2006;η = 1.28, McGuire et al., 2002;η = 1.37, Asano et al., 2002), and close to results published by Katsuyama et al. (2009) for two riparian groundwater systems (η = 1.6 and 1.7).
Regarding the gamma model, it was also identified as an applicable distribution function in headwater montane catchments with dominant baseflow in a temperate climate (Hrachowitz et al., 2009(Hrachowitz et al., , 2010;;Dunn et al., 2010).For our study area, a characteristic shape parameter α < 1 (e.g., Fig. 12b and Part 2 of the Supplement) was found in all stream and spring sites meaning that an initial peak or a significant part of the flow was quickly transported to the river.Similar results were found recently for mountain catchments of comparable size in Scotland by Kirchner et al. (2010), who also stated the importance of accounting for the best distribution shape, which is usually assumed as purely exponential (α = 1).MTTs derived without the use of observed data, using a purely exponential model, frequently led to an overestimation of α and consequently an underestimation of MTTs.The higher flexibility of the GM permits accounting for the non-linearity in the behavior of a catchment system (Hrachowitz et al., 2010).

Conclusions
The research revealed that looking for the best TTD and its derived MTT is not only a matter of accounting for the best fit to a predefined objective function; instead, it is recommended to at least (1) include in the analysis several potential TTD models, (2) assess the uncertainty range of predictions and (3) account for the parameter identifiability.Although the uncertainty ranges increases for MTTs ≥ 2 yr (e.g., compared to short residence times of water for soils) using simpler models that still yield acceptable fits to an objective function can help to reduce the uncertainty associated with the predictions.In this sense, using the best predictions from models like LPM for soil waters and EPM for surface and spring waters yielded a more reliable range of MTT inferences through lowering the uncertainty associated in the predictions of certain models.Sites that showed substantial differences in predictions between models (e.g., QRS or TP) were related to a strong reduction of the isotopic signal yielding larger uncertainties and extended MTT predictions getting close to the limitations of the used method.Considering the high uncertainties for the cases where MTTs predictions were larger than the observed period (≈ 2 yr), it is recommended to interpret these results with care, even to not consider them until longer time series of isotopic data are available.
The diversity of sampling sites and uncertainty analysis, based on the best fits to the objective function NSE and the identifiability of the parameters of the convolution equations of seven conceptual models, allowed to define the ranges of variation of the mean transit times, their uncertainties, and the probable distribution functions for the main hydrological compartments of the San Francisco catchment.Pure exponential distributions (i.e., EM) provided the poorest predictions in all sites, suggesting non-linearities of the processes, as produced by preferential or bypass flow.On the other hand, models such as EPM or GM which have a better performance in terms of considering the non-linearity, in most cases yielded better fits to the observed data and at the same time better identifiability of its variables (τ , η or α).
For baseflow conditions, which are annually dominant in the catchment area, stream water at the main outlet (PL) and five tributaries (FH, QZ, QN, QR, QM) yielded similar MTT estimations, ranging from 1.8 to 2.5 yr, including uncertainty ranges, while the MTT estimation for two tributaries (QP and QC) were between 3.5 and 4.4 yr.Despite the similar contribution areas, two small creeks described contrasting transit times, TP between 4.2 and 5.1 yr, and Q3 between 1.9 and 2.2 yr.Springs showed a longer variation range, from 2.0 yr for PLS to larger than 5 yr for QRS.Considering the predominance of the stream water characteristics of the larger sub-catchments and the higher variability of smaller tributaries (creeks and springs), there is a clear indication that the heterogeneity of the small-scale aquifers is averaged in large areas.In this sense, an in-depth analysis on individual functioning or intercomparison between analyzed sites, which was beyond the scope of this paper, should be performed in selected areas using longer time series.
Two transects based on land cover characteristics showed differences in MTTs.Pastures have shorter ranges (2.3-6.3 weeks) than forested (3.7-9.2 weeks) areas.Considering the characteristics of the sampling sites (Table 1), results suggest a possible regulatory effect of land use on water movement.Although the representativeness of the sampled sites is low in comparison to the total catchment area, findings point out the potential of environmental tracer methods for estimating the effects of changes in vegetation, a task usually difficult to accomplish by conventional hydrometric methods.
(SENESCYT).Last but not least, the authors thank the useful remarks provided by Markus Hrachowitz and from another anonymous reviewer.

Fig. 1 .
Fig. 1.San Francisco catchment with sampling locations and delineation of drainage area.Acronyms in bold are defined in Table2.Framed image shows the zoomed area of the lower part of the catchment.

Fig. 2 .
Fig. 2. (a) Time series of rainfall for ECSF meteorological station, hourly discharge and baseflows at the catchment outlet (PL); (b) weekly δ 18 O and δ 2 H of stream water at PL for baseflow and high flow conditions; and (c) weekly δ 18 O and δ 2 H at the ECSF rainfall sampling collector; light blue bubbles indicate daily δ 18 O and relative volume of daily rainfall.

Fig. 3 .
Fig. 3. Shaded area depicts the expected variation range of the local meteorological water line of rainfall (LMWL) considering the altitudinal range of the catchment (1725-3150 m a.s.l.) and estimated d-excess gradient.Symbols in colors depict weekly values of some of the catchment's waters.Acronyms are defined in Table2.

Fig. 5 .
Fig. 5. (a) Best NSE for each of the seven lumped-parameter models; (b) MTT estimation according the best NSE per site: symbols represent MTT corresponding to the best-matching result among seven models considering the NSE criteria shown in (a), while the vertical line represents uncertainty bounds according the GLUE methodology for the selected model.

Fig. 6 .
Fig. 6.Intercomparison of models for soil sites according to their (a) estimated mean transit times; (b) uncertainty ranges expressed in percentage of its respective MTT estimation; and (c) number of observations inside the range of behavioral solutions.

Fig. 7 .
Fig. 7. Fitted results of the LPM compared to observed data for soil water of a pasture site (C2).Sub-plots (a) and (b) show the uncertainty analysis of 10 000 simulations and the feasible range of behavioral solutions of model parameters as a 5 % of the top best prediction.Black filled circles in (c) represent the observed data; the black line and the shaded area represent the best possible solution and its range of variation according to the 5-95 % confidence limits of the behavioral solutions shown in (a); and the gray dashed line with crosses represents the weekly rainfall variation as an input function for the model.

Fig. 8 .
Fig. 8. Comparative characteristic shapes of residence time distribution functions corresponding to the best NSE using four lumpedparameter models (DM, EPM, GM and LPM): (a) and (b) for the soil site C2 located in a pasture land cover; (c) and (d) for the soil site E2 located in a forest land cover.

Fig. 9 .
Fig. 9. Comparative results between LPM and GM of soil water residence time distribution functions corresponding to the best NSE for every sampling site: (a) pasture sites using LPM; (b) pasture sites using GM; (c) forest sites using LPM; (d) forest sites using GM.Gray shaded area in each plot corresponds to the range of possible shapes of the distribution function for one of the sampling sites: C2 in (a) and (b), and E2 in (c) and (d).

Fig. 10 .
Fig. 10.Intercomparison of models for surface waters and springs according to their (a) estimated mean transit times; (b) uncertainty ranges expressed in percentage of its respective MTT estimation; and (c) number of observations inside the range of behavioral solutions.

Fig. 11 .
Fig. 11.Uncertainty ranges for outlet stream water (PL site) using a TPLR distribution function.(a), (b) and (c) show the modeled parameter uncertainties of 10 000 random simulations and the feasible range of behavioral solutions taking a lower limit of 5 % from the best solution.Black filled circles in (d) and (e) represent the observed data; the black line and shaded area depict the best possible solution and its range of variation according to the 5-95 % confidence limits of the behavioral solutions shown in (b); and the gray dashed line with crosses in (e) represents the weekly rainfall variation as an input function for the model.

Fig. 12 .
Fig. 12. Uncertainty ranges for outlet stream water (PL site) using a GM distribution function.(a), (b) and (c) show the modeled parameters uncertainties of 10 000 simulations and the feasible range of behavioral solutions taking a lower limit of 5 % from the best solution.Black filled circles in (d) and (e) represent the observed data; the black line and the shaded area represent the best possible solution and its range of variation according to the 5-95 % confidence limits of the behavioral solutions shown in (a); and the gray dashed line with crosses in (e) represents the weekly rainfall variation as an input function for the model.

Fig. 13 .
Fig. 13.Uncertainty ranges for outlet stream water (PL site) using an EPM distribution function.(a) and (b) show the modeled parameters uncertainties of 10 000 simulations and the feasible range of behavioral solutions taking a lower limit of 5 % from the best solution.Black filled circles in (c) and (d) represent the observed data; the black line and the shaded area represent the best possible solution and its range of variation according the 5-95 % confidence limits of the behavioral solutions shown in (a); and the gray dashed line with crosses in (d) represents the weekly rainfall variation as an input function for the model.

Fig. 14 .
Fig. 14.Comparative characteristic shapes of the transit time distribution functions corresponding to the best NSE using three lumpedparameter models (EPM, GM and TPLR): (a) and (b) for the stream water sampled at the main outlet PL; (c) and (d) for the small creek TP.

Fig. 15 .
Fig. 15.Comparative results between the EPM and GM of soil water transit time distribution functions corresponding to the best NSE for every sampling site: (a) stream water of main outlet and subcatchments using EPM and (b) using GM; (c) spring waters and creeks using LPM and (d) using GM.Gray shaded area in each plot corresponds to the range of possible shapes of the distribution function for one of the sampling sites: the main outlet (PL) in (a) and (b) and TP creek in (c) and (d).

Table 1 .
Main characteristics of the San Francisco catchment and its tributaries.Acronyms of sampled sites are defined in Table2.

Table 2 .
Applied sampling strategy in the San Francisco catchment.

Table 3 .
Lumped-parameter models used for estimation of mean transit times and transit time distribution functions of water in the San Francisco catchment.

Table 4 .
Main statistical parameters of observed δ 18 O and predicted results for soil waters using a LPM distribution function.Statistical parameters of modeled results: RMSE, bias, mean and σ correspond to the best-matching value of the objective function NSE.Uncertainty bounds of modeled parameters (τ and η), calculated through generalized likelihood uncertainty estimation (GLUE), are shown in parenthesis.

Table 5 .
Main statistical parameters of observed δ 18 O and predicted results for surface and spring waters using an EPM distribution function.Statistical parameters of modeled results: RMSE, bias, mean and σ correspond to the best-matching value of the objective function NSE.Uncertainty bounds of modeled parameters (τ and η), calculated through generalized likelihood uncertainty estimation (GLUE), are shown in parenthesis.