Influence of input and parameter uncertainty on the prediction of catchment-scale groundwater travel time distributions

Groundwater travel time distributions (TTDs) provide a robust description of the subsurface mixing behavior and hydrological response of a subsurface system. Lagrangian particle tracking is often used to derive the groundwater TTDs. The reliability of this approach is subjected to the uncertainty of external forcings, internal hydraulic properties, and the interplay between them. Here, we evaluate the uncertainty of catchment groundwater TTDs in an agricultural catchment using a 3-D groundwater model with an overall focus on revealing the relationship between external forcing, internal hydraulic properties, and TTD predictions. Eight recharge realizations are sampled from a high-resolution dataset of land surface fluxes and states. Calibration-constrained hydraulic conductivity fields (Ks fields) are stochastically generated using the null-space Monte Carlo (NSMC) method for each recharge realization. The random walk particle tracking (RWPT) method is used to track the pathways of particles and compute travel times. Moreover, an analytical model under the random sampling (RS) assumption is fit against the numerical solutions, serving as a reference for the mixing behavior of the model domain. The StorAge Selection (SAS) function is used to interpret the results in terms of quantifying the systematic preference for discharging young/old water. The simulation results reveal the primary effect of recharge on the predicted mean travel time (MTT). The different realizations of calibrationconstrained Ks fields moderately magnify or attenuate the predicted MTTs. The analytical model does not properly replicate the numerical solution, and it underestimates the mean travel time. Simulated SAS functions indicate an overall preference for young water for all realizations. The spatial pattern of recharge controls the shape and breadth of simulated TTDs and SAS functions by changing the spatial distribution of particles’ pathways. In conclusion, overlooking the spatial nonuniformity and uncertainty of input (forcing) will result in biased travel time predictions. We also highlight the worth of reliable observations in reducing predictive uncertainty and the good interpretability of SAS functions in terms of understanding catchment transport processes.


Introduction
Travel/transit time distributions (TTDs) of groundwater provide a description of how aquifers store and release water and pollutants under external forcing conditions, which has significant implications for interdisciplinary environmental studies.
For example, remarkable time-lags of the reaction of streamflow with outer forcings and considerable amounts of "old water" (i.e., water with an age of decades or longer) in streamflow have been observed in many studies (Howden et al., 2010;Stewart 5 et al., 2012). Moreover, the legacy nitrogen in groundwater storage may dominate the annual nitrogen loads in agricultural basins (Wang et al., 2016;Van Meter et al., 2016;Van Meter et al., 2017). Groundwater TTDs offer important insights into the vulnerability of aquifers to pollution spreading, and they are critically important for the environmental assessment of nonpoint-source agricultural contamination (Böhlke and Denver, 1995;Böhlke, 2002;Molnat and Gascuel-Odoux, 2002;Eberts et al., 2012). TTDs shed light on the quantification of the long-term influence of agricultural contamination, which is crucial 10 for water quality and sustainability.
The accurate quantification of groundwater travel time at a regional scale is extremely challenging. A primary difficulty is that the complex geometric, topographic, meteorologic, and hydraulic properties of hydrologic systems control the flow and mixing processes and therefore define the unique shape of the travel time distribution (TTD) (Leray et al., 2016;Hale and McDonnell, 2016;Engdahl et al., 2016). The other difficulty is that the groundwater system is intricately and tightly 15 coupled to land surface hydrologic processes. The fundamental characteristics and the coupled nature determine the response of a catchment to outer forcings such as anthropogenic climate change, artificial abstraction, and agricultural and chemical contamination (Tetzlaff et al., 2014;van der Velde et al., 2015;Heße et al., 2017).
The techniques for determining groundwater TTDs can be categorized into two groups: geochemical approaches and numerical modeling approaches (McCallum et al., 2014). In geochemical approaches, the lumped parameter models are often 20 used to interpret the catchment-scale observation of an environmental tracer concentration. Environmental tracer datasets can be divided into those representing the concentration distribution of young water (e.g., 3 H, SF 6 , 85 K, and CFCs) and those representing the concentration distribution of old water (e.g., 36 Cl, 4 He, 39 Ar, and 14 C). Additionally, the analytical StorAge Selection (SAS) function is a cutting-edge tool for characterizing transport processes in lumped, time-varying hydrologic systems at the hillslope/catchment scale Rinaldo et al., 2011;Van Der Velde et al., 2012;Harman, 2015;25 Danesh-Yazdi et al., 2018). This framework provides a clear distinction between the travel time (the time spent by a water parcel or a solute from its entrance to the control volume until its exit) and the residence time (the age of the water parcel or the solute existing in the control volume at a particular time). The SAS function has been successfully applied to interpret environmental tracer data through some assumptions of the mixing mechanism (Benettin et al., 2015(Benettin et al., , 2017. However, analytical approaches fall short in representing the dispersion of transport processes caused by catchment heterogeneity. Strong 30 heterogeneity leads to significant aggregation error of mean travel times (MTTs) when using analytical models to interpret the tracer data (Kirchner, 2016;Stewart et al., 2016).
In contrast to such an analytical approach, physically based numerical models can explicitly describe the geometry, topography, and geological structures, and they can represent the flow paths of individual water particles. Physically based numerical models are structurally complex and computationally expensive and often have more parameters than lumped parameter models do. These models can be classified as Eulerian approaches or Lagrangian approaches (Leray et al., 2016). The Eulerian approach directly solves the partial differential equations (PDEs) derived from mass conservation with "age mass" as the primary variable (Goode, 1996;Ginn, 2000;Engdahl et al., 2016). The Lagrangian approach, including the smoothed particle hydrodynamics (SPH) approach and the random walk particle tracking (RWPT) approach, is numerically robust and less re-5 strictive on time-step size in solving advection-dominated problems (Tompson and Gelhar, 1990). Consequently, Lagrangian methods are more promising in simulating complex real-world transport processes, as they avoid spurious mixing error in gridfixed Eulerian methods (Benson et al., 2017). Therefore, the Lagrangian approach has been widely used to simulate large-scale reactive transport and biogeochemical problems (Park et al., 2008;de Rooij et al., 2013;Selle et al., 2013).
A reliable application of groundwater transport modeling is subject to many sources of uncertainty, including measurement, 10 model structural, and parameter uncertainty (Beven, 1993). Specifically, the reliability of model prediction suffers from the uncertainty of external forcings, the uncertainty of internal hydraulic characteristics, and the interplay between them (Ajami et al., 2007). The spatially sparse measurements of recharge lead to a biased characterization of spatiotemporal patterns of recharge (Healy and Scanlon, 2010;Cheng et al., 2017). On the other hand, the spatial scarcity of hydrogeological data always hampers the right characterization of aquifer properties such as porosity and permeability, thus allowing a range of various 15 realistic parameter values. The best-fit parameter may suffer from a fitting error caused by overparameterization and equifinality (Schoups et al., 2008). Such biased parameters cause uncertain predictions because parameter error may compensate for model structural defects (Doherty, 2015). Accordingly, predictive uncertainty can be hardly assessed in a precise way.
Biased characterization of the hydrodynamic system and oversimplified assumptions will lead to a problematic prediction of TTDs. Many past studies offer insights into the influence of recharge and hydrogeological configuration on the prediction of 20 TTDs. For example, some research studies have been devoted to the development of analytical solutions for the idealized catchment (or aquifer) under some essential assumptions and simplifications (Neuman;Haitjema, 1995;Engdahl et al., 2016;Leray et al., 2016). Among them, Haitjema (1995) derived an analytical solution in an idealized groundwatershed under steady-state conditions and the Dupuit-Forchheimer assumption and found that the groundwater mean travel time appears to be only dependent on recharge, saturated aquifer thickness, and porosity, provided that the hydraulic conductivity is locally homogeneous. 25 Basu et al. (2012) evaluated analytical, GIS, and numerical approaches for the prediction of groundwater TTDs and found that the simulated TTDs show a moderate difference. Many recent studies have reported the dependency of transient TTDs on the temporal variability of input forcings (Benettin et al., 2015;Yang et al., 2018;Remondi et al., 2018;Kaandorp et al., 2018), but the dependency of TTDs (as well as SAS functions) on the spatial variability of input forcings has rarely been studied.
Although studies on catchment-scale groundwater TTDs are numerous, comprehensive uncertainty analysis that aims to 30 unveil the different roles of external forcing and internal hydrostratigraphic structure using both a numerical model and SAS functions is scarce. In this regard, two important questions are the following: (1) How does the uncertainty of recharge (including its spatial nonuniformity) and hydraulic conductivities affect the TTD predictions in a mesoscale agricultural catchment, provided that the model is constrained to reality and groundwater head observations? (2) How does the uncertainty of inputs (forcings) and parameters influence the prediction of systematic preference for young/old water?

35
In this paper, we aim to answer these questions through a detailed (uncertainty) analysis of an example application in a mesoscale real-world catchment. In doing so, we establish a detailed groundwater model coupled to a random walk particle tracking system for predicting groundwater TTDs. The groundwater model OpenGeoSys (OGS) is used to simulate the groundwater flow, while the input forcing is fed by the mesoscale hydrologic model (mHM) via the coupling interface mHM-OGS (Jing et al., 2018). The numerical model follows the steady-state assumption of groundwater flow systems. This assumption is 5 made because at the regional scale, the groundwater flow process has a much larger time scale than that of the high-frequency oscillation of recharge, which essentially dampens the effect of recharge oscillation (Leray et al., 2016). An ensemble of simulations using multiple recharge realizations and multiple equifinal hydraulic conductivity (K s ) fields is established. An analytical model is used as a reference for unveiling the mixing mechanism of the system. The StorAge Selection function is also used to interpret the simulation results of the numerical model, with an overall aim to quantify the predictive uncertainty 10 of systematic preference for young/old water.

Site Description
The candidate site in this paper is the Nägelstedt catchment, located in central Germany (see Figure 1). With an area of approximately 850 km 2 , the Nägelstedt catchment is a headwater catchment of the Unstrut river. The terrain elevation of this area varies from 164 m to 516 m a.m.s.l. (above mean sea level). It is a subcatchment of the Unstrut basin, one of the 15 most intensively used agricultural regions in Germany. About 88% of the land in this site is marked as arable land, which is significantly higher than the average level of Thuringian (Wechsung et al., 2008). The agricultural nitrogen input has varied over the years and locations, from 5 -24 kg/ha in the soils of the lowlands to 2 -30 kg/ha in the feeding area (Wechsung et al., 2008). The mean annual precipitation is approximately 660 mm.
The dominating sediment in the study area is the Muschelkalk (Middle Triassic). The Muschelkalk has an overall thickness 20 of about 220 m, and it has been divided into three subgroups according to mineral composition: Upper Muschelkalk (mo), Middle Muschelkalk (mm), and Lower Muschelkalk (mu). The Upper Muschelkalk (mo) is mainly composed of limestone, marlstone and claystone, and it forms fractured aquifers (Seidel, 2003;Jochen et al., 2014;Kohlhepp et al., 2017). The Middle Muschelkalk (mm) deposits are composed of evaporites, including dolomit marlstone, gypsum, dolomit limestone and eroded salt layers. The Lower Muschelkalk (mu) is composed of massive limestone (Seidel, 2003;McCann, 2008;Jochen et al., 25 2014). The Muschelkalk formation consists of limestone sediments, which may form fractured and karst aquifers. A recent study demonstrated that karstification and the development of conduits are limited at the base of the Upper Muschelkalk at the Hainich critical zone in the Nägelstedt catchment (Kohlhepp et al., 2017). In the middle of the study area, Keuper deposits, including Middle Keuper (km) and Lower Keuper (ku), overlay the Muschelkalk formation. The Lower Keuper formation forms the low permeable aquitard on top of the Upper Muschelkalk aquifer (Seidel, 2003;Kohlhepp et al., 2017). Lithologically, the Muschelkalk aquifer system is a "layer-cake" aquifer system that contains interbedded marlstone aquitards (Aigner, 1982;Merz, 1987;Kohlhepp et al., 2017).  (Jing et al., 2018). a) An overview of the Nägelstedt catchment and the locations of the monitoring wells used in this study. b) 3-D view highlighting the arrangement of alluvium and soil and cross-sectional view of the study area. c) 3-D view highlighting the zonation of the sedimentary aquifer-aquitard system. Note that the Muschelkalk layers (mo, mm and mu) are divided into more permeable subunits (mo1, mm1 and mu1) and less permeable subunits (mo2, mm2, and mu2).
Eighteen monitoring wells distributed in this area are used to calibrate the model (Figure 1a, in which the well W0 is abandoned in this study due to the proximity to the outer edge). The geological layers to which the wells belong are listed as follows: 5 wells in Middle Keuper (km), 4 in Lower Keuper (ku), 6 in Upper Muschelkalk (mo), 2 in Middle Muschelkalk (mm), and 1 in alluvium.

Numerical model
We use the coupled model mHM-OGS, proposed by Jing et al. (2018), to simulate terrestrial hydrological processes. This coupled model was developed for extending the predictive capability of mHM from land surface processes to the subsurface flow and transport processes. Specifically, mHM (Samaniego et al., 2010;Kumar et al., 2013) is used to partition water budget 5 components, while the porous media simulator OGS (Kolditz et al., 2012) is used to compute groundwater flow and transport processes by using mHM-generated recharge as driving forces. For details on the coupled model mHM-OGS, please refer to Jing et al. (2018).
The catchment water storage is conceptually partitioned into soil zone storage and deep groundwater storage; the two corresponding components are computed by mHM and OGS, respectively. The soil zone dynamics of TTD has been well studied 10 using mHM in a previous work (Heße et al., 2017). Hence, in this paper, we perform explicit forward modeling of the saturatedzone TTD through a 3-D OGS groundwater model by using the mHM-generated recharge as the external forcing.
In this study, we focus on the travel times in the saturated zone. Saturated groundwater flow is characterized by the continuity equation and Darcy's law: where S is the specific storage coefficient in confined aquifers, or the specific yield in unconfined aquifers We use the RWPT method to track the particle movement. The RWPT method is embedded in the source codes of OGS 20 (Kolditz et al., 2012;Park et al., 2008). Derived from stochastic physics, RWPT is under the assumption that the advection process is deterministic and the diffusion-dispersion process is stochastic. The theoretical background of the RWPT method is described in detail in Appendix A.
3.2 Numerical model setup

Boundary conditions 25
The steady-state model configuration is achieved using a temporally averaged recharge of the simulated daily recharges over a long period . The gridded recharges estimated by mHM are interpolated and then assigned to each grid node on the upper surface of the OGS mesh using a bilinear interpolation approach. No-flow boundaries are assumed at the outer edges that are defined by catchment divides, except for the northwestern and northeastern edges, where fixed-head boundaries are applied (Wechsung et al., 2008). The streams are assigned with fixed-head boundaries, wherein the heads are equal to the 30 long-term averaged water levels. For the steady-state system and the one-way coupled model, baseflow component generated Upper limit 9.0 × 10 −4 2.0 × 10 −3 9.0 × 10 −4 8.5 × 10 −5 8.0 × 10 −4 9.1 × 10 −4 2.0 × 10 −5 Lower limit 5.0 × 10 −5 4.5 × 10 −6 1.0 × 10 −6 9.6 × 10 −7 9.0 × 10 −7 3.1 × 10 −7 2.0 × 10 −8 Figure 2. Recharge realizations used in this study (unit: mm). They were sampled from a high-resolution dataset of land surface fluxes for Germany .
by OGS proves to be consistent with the baseflow estimated by mHM, implying that the water budget in the subsurface system is essentially closed (Jing et al., 2018). Neumann boundaries are prescribed for 7 drinking water production wells. However, the amount of pumping makes up only around 3% of the total amount of outflow, and it therefore has a marginal influence on the water budget (Wechsung et al., 2008).

5
The numerical experiment to explore the uncertainty of TTDs is performed through the following workflow: 1. Eight spatially distributed recharge realizations are sampled from a high-resolution dataset of land surface fluxes for Germany, in which mHM is used to simulate land surface hydrological processes. The details of the dataset and the sampling method are described in the following section.
2. For each recharge realization, a series of equally probable realizations of K s fields are generated using the null-space The NSMC method takes advantage of the hybrid Tikhonov-TSVD method in the parameter estimation code PEST to produce Monte Carlo realizations of parameters (Tonkin and Doherty, 2009;Doherty and Hunt, 2010). This approach is able to efficiently generate an ensemble of parameter fields that are conditioned to expert knowledge and measurements.
Here, the observations of groundwater levels from 18 spatially distributed monitoring wells are used to calibrate the OGS model (the locations of monitoring wells are illustrated in Figure 1a). Before generating parameter sets, we calibrate 5 the model to obtain the best-fit hydraulic conductivities, as well as a covariance matrix of the parameter probability distributions. On the basis of this information, many fully distributed K s fields are randomly generated from a uniform distribution of hydraulic conductivity values (Doherty, 2015). As shown in Table 1, the range of hydraulic conductivities is predefined based on values obtained from a geological survey (Wechsung et al., 2008). As a result, a total of 400 parameter sets conditioned on both observations and reality are generated for the uncertainty analysis. 4. An ensemble of forward simulations using the RWPT method is performed over all realizations of K s fields.
In each realization of the ensemble parameter sets, forward simulations of particle tracking are performed. In this study, we focus on the predictive uncertainty within the convection process. Therefore, the molecular diffusion coefficients are 20 universally set to 0 for all ensemble simulations. The porosity of the study domain is set to 0.2 universally. Through the above procedures, the flow paths and the corresponding residence times can be fully traced in the model at random times and locations, facilitating the detailed characterization of TTDs.
In parallel to this analysis, a sensitivity analysis for the spatial variability of recharge is also performed. Two different recharge scenarios are compared for this purpose: (1) the spatially distributed recharge generated by mHM and (2) the uniform 25 recharge that is equal to the spatial average of the distributed recharge. Other parameters, including the porosity and the hydraulic conductivity, remain identical in these two recharge scenarios.

Recharge realizations
A high-resolution dataset of land surface fluxes and states across Germany is used for sampling recharge scenarios. This dataset was established on the basis of a daily simulation with 4 km spatial resolution using mHM for a time span of 60 years 30 (1951-2010) . This dataset consists of an ensemble (100 realizations) of land surface variables, including evapotranspiration, groundwater recharge, soil moisture and discharge, with a spatial resolution of 4 km. A total of 100 re-  The uppermost 10 m of the mesh has been separated as a soil layer, while an alluvium layer consisting of high-permeability sandy gravel is set at the nodes beneath and near streams (Figure 1). Each of the Muschelkalk layers is further divided into two categories: the more permeable parts (mo1, mm1, and mu1) and the less permeable parts (mo2, mm2, and mu2) (see Figure   20 1). For each of the Muschelkalk units, the permeability of the less permeable part is tied to the corresponding more permeable part with a factor of 0.1. The equivalent porous medium approach is applied to characterize the karst aquifer of the Upper Muschelkalk (mo). This approach translates the parameters describing highly heterogeneous hydraulic properties at the point are tied with the respective more permeable subunits (mo1, mm1, and mu1) with a factor of 0.1.
scale to the equivalent homogeneous medium at the regional scale to avoid adding redundant parameters and therefore avoid overfitting.

Parameter uncertainty
Multiple calibration-constrained K s fields were stochastically generated for each recharge realization. Figure 4 shows the box-plot of generated hydraulic conductivities in all realizations categorized by geological unit. The hydraulic conductivity 5 of the Lower Muschelkalk (mu) has the highest uncertainty (10 −8 -10 −5 m/s) because the conductivity of mu is insensitive to groundwater head observations, given that it is the deepest geological layer and that no monitoring well is located in this layer (Table B1). The other parameters fluctuate moderately and are constrained within one order of magnitude in most of the recharge realizations. Hydraulic conductivities of several permeable layers (mo, mm, alluvium, and soil) increase from R1 to R8, which is not surprising because the hydraulic conductivity increases with increasing recharge and constant groundwater head. Moreover, the hydraulic conductivities of the above layers are roughly linearly correlated to the corresponding recharge in each recharge realization. Figure 5 shows the simulated and observed groundwater heads for all 400 realizations. All of the 5 400 realizations are well constrained to observations, with the root mean square error (RMSE) of groundwater level residuals being lower than 4.6 m in all of the considered recharge realizations.

Theory of analytical StorAge Selection function
The travel time is defined as the time spent by a moving element (either a water particle or a solute) in a control volume of a hydrologic system. In principle, the control volume can be defined at arbitrary spatial scales (i.e., from the molecular scale 10 to the regional scale). Considering a hydrologic system in which the input flux (J) and the output fluxes (Q 1 , Q 2 , ..., Q n ) are known, each parcel of water within the system is tagged using its current age τ . The age-ranked storage S T = S T (T, t) is defined as the mass of water in the system with age τ < T . The backward form of the master equation (ME) for TTD in a control volume can be expressed as follows Van Der Velde et al., 2012;Harman, 2015): 15 with boundary condition S T (0, t) = 0, where T is the residence time of the oldest water parcel in storage S T ; t is the chronological time; ← − P Qj (T, t) is the cdf of the backward travel time distribution of output flux Q j ; J(t) is the input flux at time t; and Q j (t) is the output flux at time t. Specifically in this study, J is the groundwater recharge, and Q is composed of two components: the stream baseflow and the abstraction at production wells.
The StorAge Selection (SAS) function describes the fraction of water parcels leaving the control volume at time t, which is 5 selected from the age-ranked storage S T . Following the above definition, the SAS function can be linked with the backward travel time distribution ← − P Q (T, t) (Harman, 2015): for S T = S T (T, t). Ω Q is the cumulative form of the StorAge Selection (SAS) function.
Three instances of SAS functions using gamma distribution are shown in Figure 9a. In case the age distribution of each 10 outflow is uniformly selected from all water storages with various ages, the outflux TTDs turn into a random sample of the storage residence time distribution (RTD). The random sampling (RS) is equivalent to the uniform SAS function (Figure 9a).
Many past studies have also considered the random sampling as a proper description of the sampling behavior for heterogeneous catchments (Cartwright and Morgenstern, 2015;Benettin et al., 2015;Heße et al., 2017). Eq. (4) in this case has the analytical solution (Harman, 2015;Danesh-Yazdi et al., 2018): where p S (T, t) is the pdf of the residence time distribution and S(t) is the storage at time t. Specifically, in the case of a steady-state hydrodynamic system, Eq. (5) is further simplified into an exponential form: Eq. (6) is the analytical solution of backward TTD under the RS assumption. 20 In the idealized saturated groundwater aquifer, Eq. 6 is equivalent to the analytical solution derived by Haitjema (1995).
Based on Dupuit-Forcheimer's assumption, Haitjema (1995) derived a formula about the frequency distribution of residence time: 25 provided that nH/J is constant over the entire domain, the recharge is spatially uniform, and the aquifer is locally homogeneous, where n is the porosity, H is the saturated aquifer thickness, and T is the weighted mean travel time in the aquifer.

Linking the SAS functions to the physically based numerical model
Danesh-Yazdi et al. (2018) developed an approach to link the analytical SAS functions to the fully distributed numerical model.
Although differing in numerical model and particle tracking scheme, we apply the same approach to link the SAS functions 30 and the numerical model in this study. Eq. (3) under the steady-state condition can be further simplified as: By combining Eq. (4) and Eq. (9), the age-ranked storage S T can be calculated directly under the steady-state assumption: Combining Eq. 10 with Eq. 4, the SAS function can be directly computed from the simulated backward TTD using numerical 5 model.

Predictive uncertainty of TTDs
The theoretical framework of predictive uncertainty in this paper is based on Doherty (2015). As indicated in Bayes' theorem, the parameters of a model retain uncertainty, given that they have been adjusted to best-fit values achieved during calibration.
Nevertheless, the uncertainty of parameters is subject to constraints. One of the constraints resides in the fixed adjustable range 10 of parameters, in which expert knowledge must be respected. Another constraint is exerted by the parameterization process.
While the computationally expensive Bayesian approach offers a complete theoretical framework for predictive uncertainty evaluation, practical modeling efforts are often based on model calibration and a subsequent analysis of error or uncertainty in post-calibration predictions (Doherty, 2015). Ideally, the best-fit parameters achieved through calibration can reduce the predictive error to a minimum, with the minimum predictive error being the inherent uncertainty. However, the best-fit param- 15 eter is always biased from the true parameter because the essential imperfection of the model may facilitate or hamper the achievement of the minimum. Therefore, the motivation of uncertainty analysis in this study is to quantify and minimize the predictive uncertainty of travel time distributions, given that the parameters are plausible and that the model can reproduce well the groundwater heads.

20
For the sake of clarity, we number the recharge realizations from R1 (with the lowest recharge rate) to R8 (with the highest recharge rate). For each recharge realization, 50 K s fields are numbered from K1 to K50. Accordingly, R1K1 represents the K s field K1 in the recharge realization R1.

Uncertainty of TTD predictions
Flow paths of particle tracers in a random parameter realization (R5K1) are displayed in Figure 6, serving as a visual reference 25 for the regional groundwater flow pattern and the residence time distributions. In this realization, the deep low-permeability geological layers (e.g., mm2 and mu2) act as low-permeability aquitards. Therefore, the majority of streamlines do not enter these geological layers ( Figure 6).   Figure 7. Note that if the number of parameter realizations is sufficiently large, the ensemble average of MTTs (µ) will converge to the simulation result using the best-fit parameters achieved through model calibration (Doherty, 2015). Noticeable variability of 5 TTDs can be observed with respect to different recharge realizations. Generally, the µ values show a decreasing trend from 166.5 yrs in recharge realization R1 to 110.9 yrs in recharge realization R8, with only two exceptions (R3 and R6), which is not surprising based on the inversely linear dependency between recharge J and µ, indicated by Eq. 8. In each recharge realization, the different realizations of K s fields manipulate the mean travel time. c v varies from 7.81% (for R5) to 15.56% (for R3), indicating a modest degree of uncertainty propagated from K s estimation to TTD prediction. 10 The exponential model under the RS assumption is fitted to the ensemble averaged TTD of numerical solutions (see black lines in Figure 7) using Eq. 6. As shown in Figure 7, the shape of numerically simulated TTDs significantly deviate from the exponential distribution under the RS assumption, indicating a nonuniform sampling behavior of different water ages. The TTDs of numerical simulations are more right-skewed than the analytical TTDs under the RS assumption. This phenomenon reveals that the catchment TTD cannot be replicated by the single random sampling store. 15 Based on Eq. 8, we can approximate the "effective volume" of water involved in the transport process in the aquifer. The effective volume of groundwater storage related to the transport process is calculated and shown in Table 2. The effective volumes of storage (S eff ) estimated by the numerical solutions range from 9.8 m to 12.0 m, whereas the S eff estimated by the analytical solution range from 6.8 m to 7.5 m. The groundwater storage that contributes to the streamflow is significantly Seff (analytical) 6.9 6.9 7.5 7.1 6.8 6.9 6.9 7.2 7.0 smaller than the total groundwater storage (48.3 m). This difference is because most of the released particles only exist in the upper permeable layers rather than spread evenly over the whole aquifer/aquitard system. We are aware that this is only a first-order approximation because the analytical solution is only rigorously valid for the idealized homogeneous aquifer system (Haitjema, 1995).
Moreover, we assess the propagation to the MTT predictions from input and parameter uncertainty yielded by the 8 recharge

Uncertainty of young/old water preference
Figure 9a provides an intuitive illustration of the relationship between the cumulative rank SAS functions and the systematic preference for discharging water with different ages. Figure 9b shows the cumulative rank SAS functions of all ensemble simulations (obtained from 400 realizations of K s fields in 8 recharge scenarios). The figure is categorized into 8 groups by 15 different colors and line styles, each representing a recharge realization. Generally, the system has a weak preference to select younger water as discharge, despite different recharge realizations and K s realizations. Nevertheless, a moderate variation in SAS functions for the ensemble simulations can be observed. The variation of interest is introduced by the spatial distribution and velocity of flow pathlines that are controlled by different K s fields. For example, a more permeable shallow aquifer layer will activate more flow pathways in this layer and thus will introduce a stronger preference for young water. Particularly, the 20 significant variation in hydraulic conductivities in the deepest geological layer (e.g., Lower Muschelkalk) has a pronounced impact on the discharge of old water. With a thickness of up to 100 m, the hydraulic conductivity of this layer controls how many water parcels can enter into this layer and how deep the flow paths can develop. This effect can be evidenced by a large difference in the SAS functions related to old ages and a relatively smaller difference in those related to young ages in Figure   9b.   early period. Additionally, the simulated MTT using the uniform recharge appears to be smaller than that using the spatially distributed recharge. Figure 10b indicates that the simulation using uniform recharge has a consistently stronger preference for sampling young water than the simulation using spatially distributed recharge. Nevertheless, both scenarios show a general preference for young water.
The difference in TTDs and SAS functions is not induced by the variability in internal hydraulic properties, since the two simulations share the same K s field. Rather, it is mainly induced by the different flow paths of particle tracers in the two recharge scenarios. The spatially distributed recharge simulated by mHM reveals that the upstream mountainous area has higher 5 recharge rates than those in the lowland plain. By contrast, the uniform recharge scenario neglects this spatial nonuniformity.
This difference results in the following: (a) under a uniform recharge scenario, more particle tracers enter the system from locations near the streams at lowland plains (Figure 3b), indicating that more particle tracers are transported in the local flow system rather than in the regional flow system (Toth, 1963); and (b) higher recharge rates at lowland plains accelerate the particles' movement in this area and shorten their travel times. As such, local particle flow paths within the shallow aquifer 10 layer at lowland plains are activated, leading to a stronger preference for sampling local flow paths in shallow aquifer layers and therefore a stronger preference for young ages. Our findings are in line with the observations by Kaandorp et al. (2018), wherein the authors found a relatively higher preference for the selection of older water in the upstream area than that in the downstream area of the Springendalse Beek catchment.

Uncertainty of external forcing, internal property, and TTD predictions
In the idealized aquifers where groundwater flow is Dupuit-Forchheimer type, the recharge is uniform, and the aquifer is locally homogeneous. TTD is controlled by recharge, saturated aquifer thickness, and porosity, and it is independent of hydraulic conductivity (Haitjema, 1995). In a real-world catchment with complex geometry and topography, stratigraphic aquifers, and nonuniform recharge, our numerical exploration demonstrates that the groundwater TTD is dependent on both recharge and the hydrostratigraphic K s field.
The mechanisms behind the effects of the recharge rate and the K s field are different. Given that the spatial pattern of recharge remains the same, a higher recharge rate intensifies the fluxes for the complete range of flow pathways. This process 5 forces water downward in the recharge zones and upward to the discharge area. As such, flow rates through all flow pathways are increased equally, and the spatial distribution of flow pathways is not changed. The corresponding SAS function is also not changed (Botter et al., 2010;Cartwright and Morgenstern, 2015;Kaandorp et al., 2018). In contrast, a different K s field activates flow pathways in more permeable layers, deactivates flow pathways in less permeable layers, changes the spatial distribution of flow pathways, and therefore changes the shape of the SAS function (Harman et al.; Kim et al.). 10 We also underline the value of observational data in reducing predictive uncertainty in simulated TTDs. In this study, the majority of model parameters can be adequately conditioned by spatially distributed groundwater head observations ( Figure   4). Provided that most of the hydraulic conductivities are constrained to the model-to-measurement misfit and reality, the TTD predictions can also be effectively bounded. This is evidenced by Figure 7, from which moderate values of c v ranging from 7.81% to 15.56% in different recharge realizations can be observed. The ensemble averaged MTTs for different recharge real- 15 izations also have a high c v (15.70%), implying that the TTD prediction appears highly sensitive to recharge. Our findings are in line with Danesh-Yazdi et al. (2018), in which the interplay between recharge and subsurface heterogeneity was investigated and a strong dependency of TTDs on the recharge was observed.

Analytical model and SAS functions
The analytical solution of TTD, assuming a random sampling of water, cannot properly replicate the TTD of numerical simu-20 lation in the study domain. In the stratigraphic aquifer with complex topography and diffuse recharge, the analytical solution using Eq. 6 underestimates the MTT. Note that this conclusion holds when the simulated TTD has a relatively larger long-tail behavior than the exponential distribution. Such observations have also been reported for other real-world aquifers (Basu et al., 2012;Eberts et al., 2012;Kaandorp et al., 2018). This finding can be seen as an extension of Basu et al. (2012), in which the authors found a moderate discrepancy between the analytical solution using Eq. 6 and the numerical solution in a small 25 catchment.
It is obvious that the analytical solution under the RS assumption cannot explicitly include the impact of the distributed hydraulic properties of stratigraphic aquifers and the spatially nonuniform recharge. The above limitations of analytical models may introduce a significant predictive error for the TTD predictions, as shown in Figure 7. Moreover, the effective volume of storage related to the transport process S eff is calculated utilizing the numerical solutions of distributed models. As a first-order 30 approximation of effective storage volume, it suggests that only a small fraction of total groundwater volume is involved in the active hydrologic cycle.
The SAS function provides a good interpretability for simulation results using the fully distributed model in terms of characterizing the preference for releasing water of different ages. We find that the SAS functions are weakly dependent on the K s fields in the stratigraphic aquifer system, but the overall preference for young water does not change. This weak dependency can be explained by the fact that different realizations of K s fields modify the spatial distribution of particle flow paths. The overall tendency for young groundwater in the saturated aquifer has been reported by many past studies (Danesh-Yazdi et al., 2018;Kaandorp et al., 2018;Yang et al., 2018). Our study links the explicit simulations of travel times and the analytical SAS functions and offers original insights into the uncertainty propagated from recharge and the K s fields to the SAS functions. 5

Dependency of TTDs and SAS functions on the spatial pattern of forcings
The sensitivity of the TTDs and SAS functions to the spatial pattern of recharge forcings can be explained mainly by the different flow paths of particle tracers, resulting primarily from the spatially heterogeneous fields of recharge across the study catchment. For the regional groundwater system, the spatial variation of recharge determines the distribution of starting points of the flow pathlines of tracer particles. For example, more particles will be injected from recharge zones that are typically 10 located in high-elevation regions, resulting in a higher weight of flowlines starting from high-elevation regions. The pronounced spatial variability of recharge also controls the systematic (water age) preference for particles existing from the system (to river discharge) that originated from different regions and therefore exerts a strong control on the shape of the SAS function.
In the study catchment, an oversimplified spatially uniform recharge results in a smaller MTT and a stronger preference for discharging young water compared to those taking the spatial variability of recharge. Such observations are conditioned to 15 site-specific features of the study catchment, which is noticed only when the following apply: (a) a site is located in a headwater catchment under a humid climate condition; (b) the recharge rate in areas close to the drainage network is generally lower than that in areas far away from the drainage network; and (c) the system is under (near) natural conditions, meaning that artificial drainage and pumping do not dominate the groundwater budget.
The assumption of spatially uniform input forcing has been widely applied in regional-scale subsurface hydrologic models 20 (Zghibi et al., 2015;Yang et al., 2018). Our study indicates that a reasonable characterization of spatial pattern of recharge is crucial for reliable TTD prediction. Unfortunately, it is quite challenging to confidently quantify the groundwater recharge at the regional scale under today's technique due to the lack of reliable measurements (Healy and Scanlon, 2010;Cheng et al., 2017;Zink et al., 2017). Appropriate techniques should be chosen to estimate groundwater recharge according to the study goals and the spatial and temporal scales. 25

Implications for the applied groundwater modeling
Uncertainty limits the applicability of groundwater models. Most of the applied groundwater models are deterministic models that use direct values of inputs and parameters instead of probabilistic distributions of them. Specifically, both the model inputs and the inversion process are deterministic, leading to a deterministic best-fit parameter set achieved during model calibration. Our study reveals limitations of the above modeling procedure and suggests that the probabilistic distribution 30 of inputs and parameters should be considered for the applied modeling. The main limitation is that the single exclusive assignment of recharge is inadequate for the simulation of transport processes because the error can be propagated from inputs to the conditioned parameters (e.g., hydraulic conductivities) through the calibration process ( Figure 4). This accumulated error will further lead to a seriously biased prediction of travel times. Additionally, the modeling workflow used in this study is computationally more efficient than the Bayesian approach and is suitable for complex real-world applications.
The degree of predictive uncertainty is highly dependent on the parameterization scheme. Some highly parameterized models are potentially ill-posed due to the paucity of data and therefore cannot be constrained by the available calibration dataset. In this case, the predictive uncertainty of TTDs is potentially high (Weissmann et al., 2002;Danesh-Yazdi et al., 2018). Stratigraphic 5 aquifer models with zoned parameters are still widely used for applied groundwater modeling because the field representation of local-scale heterogeneity is difficult. Given that the aquifer model is stratigraphic and the number of parameters is less than the number of observations, most of the adjustable parameters can be effectively bounded. In this case, the uncertainty of input data (e.g., recharge) appears to have a primary influence on the TTD predictions. Note that here, we do not account for the error caused by model structural deficiency. 10

Conclusions
In this study, we explore the relationship between the uncertainty of recharge, calibration-constrained hydraulic conductivity realizations, and predictions of groundwater TTDs. Using both a physically based numerical model and a lumped analytical model, a comprehensive case study is performed in an agricultural catchment (the Nägelstedt catchment). The RWPT method is used to track the water samples through the modeling domain and compute their travel times. Moreover, the analytical model 15 is fitted against the numerical solutions to provide a reference for the effective storage and sampling behavior of the system.
Based on this study, the following conclusions are made: 1. In the Nägelstedt catchment model, the simulated MTTs are strongly dependent on the recharge rate and weakly dependent on the postcalibrated K s field. We highlight the importance of recharge quantification and the worth of reliable data in reducing the predictive uncertainty of TTDs. 20 2. The framework of the SAS function provides a good interpretability of simulated TTDs in terms of characterizing the systematic preference for sampling young/old water as outflow. On the basis of this framework, we find that the ensemble simulations have a consistent preference for young water, despite the different recharge and hydraulic conductivity realizations. Our study provides a novel modeling framework to explore the effect of input uncertainty and parameter equifinality on TTDs and SAS functions through a combination of calibration-constrained Monte Carlo parameter 25 generation, a numerical model, and a SAS function framework.
3. Both the shape and the breadth of catchment groundwater TTDs and SAS functions are sensitive to the spatial distribution of recharge. Therefore, a reasonable characterization of the spatial pattern of recharge is crucial for the reliable TTD prediction in the catchment-scale groundwater models.
Appendix A: Random walk particle tracking Random walk particle tracking solves a diffusion equation at local Lagrangian coordinates rather than the classical advectiondiffusion equation, which can be expressed as: x(t i ) = x(t i−1 ) + v(x(t i−1 ))∆t + Z 2D(x(t i−1 )∆t (A1) where x denotes the coordinates of the particle location, ∆t denotes the time step size, and Z denotes a Gaussian random 5 number, with the mean being zero and the variance being unity.
The velocity v in Eq. (A1) is replaced by v * i to maintain consistency with the classical advection-dispersion equation (Kinzelbach, 1986). The expressions of v * i and the hydrodynamic dispersion tensor D ij are: where δ ij denotes the Kronecker symbol, α L denotes the longitudinal dispersion length, α T denotes the transverse dispersion length, D d ij denotes the tensor of the molecular diffusion coefficient, and v i denotes the mean pore velocity component in the ith direction.
Appendix B: Composite parameter sensitivity 20 The PEST algorithm calculates the sensitivity with respect to each parameter of all observations (with the latter weighted as per user-assigned weights), namely the "composite sensitivity" (Doherty, 2015). The composite sensitivity of parameter i is defined as csp i = [J t QJ] 1/2 ii n , where J denotes the Jacobian matrix that includes the sensitivities of all predictions to all model parameters, Q is the weight matrix, and n is the number of observations with nonzero weights. In this study, all weights assigned to observations are equally set to 1.