Estimating groundwater recharge from groundwater levels using non-linear transfer function noise models and comparison to lysimeter data

The application of non-linear transfer function noise (TFN) models using impulse response functions is explored to estimate groundwater recharge and simulate groundwater levels. A non-linear root zone model that simulates recharge is developed and implemented in a TFN model, and is compared to a more commonly used linear recharge model. An additional novel aspect of this study is the use of an autoregressive-moving average noise model so that the remaining noise fulfills the statistical conditions to reliably estimate parameter uncertainties and compute the confidence intervals of the recharge 5 estimates. The models are calibrated on groundwater level data observed at the Wagna hydrological research station in the southeastern part of Austria. The non-linear model improves the simulation of groundwater levels compared to the linear model. The annual recharge rates estimated with the non-linear model are comparable to the average seepage rates observed with two lysimeters. The recharges estimates from the non-linear model are also in reasonably good agreement with the lysimeter data at the smaller time scale of recharge per 10 days. This is an improvement over the results from previous studies 10 that used comparable methods, but only reported annual recharge rates. The presented framework requires limited input data (precipitation, potential evaporation, and groundwater levels) and can easily be extended to support applications in different hydrogeological settings than those presented here.

level data as model input. An additional advantage of the WTF method is that no assumptions are made about the actual recharge processes, for example the existence of preferential flow paths. This can also be considered a disadvantage, as no relationship between precipitation and recharge is established. This makes the method unsuitable for future projections of groundwater recharge when precipitation patterns change, for example in climate change impact studies.
In a review paper on the topic, Healy and Cook (2002) suggested that "approaches based on transfer function noise (TFN) 25 models should be further explored" for the estimation of recharge. TFN models can be used to translate one or more input series (e.g., precipitation and potential evaporation) into an output series (e.g., groundwater levels) and have been adopted in many branches of hydrology (Hipel and McLeod, 1994). An early example of the use of these models for recharge estimation is given in Besbes and De Marsily (1984), through the deconvolution of groundwater levels with an aquifer unit hydrograph obtained from a groundwater model. The study showed how the recharge flux can be related to rainfall by using an additional 30 unit hydrograph for the unsaturated zone. Their proposed method required a calibrated groundwater model and a good estimate of the infiltration, making the method relatively laborious and less applicable in practice. O'Reilly (2004) developed a waterbalance/transfer-function model to simulate recharge, using the WTF method to obtain recharge estimates to calibrate the model parameters.
In recent decades, the use of a specific type of TFN models using predefined response functions (von Asmuth et al., 2002) 35 has gained popularity for the analysis of groundwater levels . In this method, impulse response functions are used to describe how groundwater levels react to different drivers such as precipitation, evaporation, and pumping.
An important goal for these models has traditionally been to accurately describe the observed groundwater level fluctuations.
For shallow water tables (up to a few meters depth) this can often be achieved using a simple linear relationship between precipitation and evaporation (e.g., Berendrecht et al., 2003;von Asmuth et al., 2008). In a large scale case study for the 40 Netherlands, Zaadnoordijk et al. (2019) obtained good results with this method for areas with shallow groundwater depths. For the simulation of deeper groundwater levels the linear relationship was shown to be less appropriate and non-linear models may be used to accurately simulate the groundwater levels (e.g., Berendrecht et al., 2006;Peterson and Western, 2014;Shapoori et al., 2015).
More recently, efforts have been made to explore the use of TFN models to estimate groundwater recharge, as suggested by study site. Given these conditions, the groundwater level fluctuations are assumed to be the exclusive result of changes in the groundwater recharge from infiltrating precipitation water.
The study site is equipped with two weighable scientific field lysimeters operated by JR-AquaConSol since 2005 (von Unold 90 and Fank, 2008). The first lysimeter is operated under conventional farming practices (Sciencelys 1), while the second lysimeter (Sciencelys 2) was organically farmed until 2014, when it was also converted to conventional farming. A crop-rotation scheme is used for the lysimeters, with crops changing every growing season. The soils in the area are rather heterogeneous, with the thickness of the sandy loamy top layer varying greatly over short distances. The underlying sand and gravel deposits start at a depth between 50-120 cm. Both lysimeters have an area of 1 m 2 and are 2 meters deep. Seepage to the groundwater is 95 measured near the bottom of the lysimeters at 1.8 meters depth, where suction cups are installed that apply a water potential that is similar to the potential measured with tensiometers just outside the lysimeters. Both lysimeters are identical in their technical setup, and a detailed description of the lysimeters is provided in von Unold and Fank (2008) and Klammler and Fank (2014).
As the recharge is not measured at the water table itself, a certain time lag between the recharge measured with the lysimeters 100 and the corresponding groundwater level rise exists. Only a limited time lag is expected as the ±2 m thick percolation zone consist mostly of highly conductive gravel layers. It is noted that the recharge measurements are local measurements for the area of the lysimeter, and are influenced by prevailing soil conditions, vegetation and the degree of soil sealing. The groundwater levels, measured at approximately 12 m distance from the lysimeters at Wagna test site, may also be influenced by different recharge rates from other land-use types in the surrounding area (e.g., grassland or residential areas). As such, the measured 105 4 https://doi.org/10.5194/hess-2020-392 Preprint. Discussion started: 27 August 2020 c Author(s) 2020. CC BY 4.0 License.
recharge rates from the lysimeters are -for the purpose of this paper -used as an indicative rather than an exact estimate of recharge. Considering the above, the average recharge from the two lysimeters (shown in Fig. 1c (Reszler and Fank, 2016). The study concluded that the seepage and water content dynamics in the lower gravel zone inside the lysimeters could not be matched using the Richards equation and a Van Genuchten-Mualem approach, suggesting the existence of preferential flow paths below the root zone.
Following Bakker et al. (2008), a four-parameter impulse response function is used to translate the recharge flux into groundwater level fluctuations: -], and n [-] are shape parameters. For n > 1 the four-parameter function simulates a delayed response of the groundwater levels to recharge, while for n ≤ 1 and b = 0 the groundwater levels respond instantaneously to a recharge pulse. If n = 1 and b = 0, Eq. (3) reduces to an exponential response function with only two parameters: The parameters A, a, n, b, and d are estimated by fitting Eq.
(1) to observed data. Depending on the hydrogeological setting and the model used to compute the recharge either a four-parameter or an exponential response function is used here to translate the recharge flux R into groundwater level fluctuations. The main question that remains is how to estimate the recharge R(t) from observed hydrometeorological data. The following two sections introduce the two models used in this study to compute the recharge flux R(t).

The Linear model
A common approach to approximate the recharge flux R in Eq.
(2) is a simple linear function of precipitation P [LT −1 ] and potential evaporation E p [LT −1 ] (e.g., Berendrecht et al., 2003;von Asmuth et al., 2008): where f [-] is a parameter that is calibrated. The grass-reference evaporation computed using the Penman-Monteith equation 150 (Allen et al., 1998) is used as potential evaporation E p here. A clear interpretation of the parameter f is not available. While Berendrecht et al. (2003) referred to f as a crop factor, von Asmuth et al. (2008) noted that the value of f "depends on the soil and land cover" instead of a single crop and also incorporates the "average reduction of the evaporation due to actual soil water shortages". Here, f is referred to as the evaporation factor, following the terminology suggested by Obergfell et al. (2019).
From Eq. (5) it is clear that the flux R can be negative for periods when evaporation (f E p ) exceeds precipitation. As Eq. (5) 155 does not include a storage term, the temporal distribution of recharge that may result from storage in the unsaturated zone has to be captured by the impulse response function. The four-parameter response function is therefore used to translate the computed recharge into groundwater level fluctuations for the linear model. As such, the response function simulates the behavior of the

The Non-linear model
While the linear model depends on the response function to simulate the effects of the root zone on the groundwater recharge, the non-linear model uses a soil-water storage concept to account for the temporal storage of water in the root zone. The nonlinear recharge model developed here is loosely based on the FLEX conceptual modeling framework used in rainfall-runoff modeling (Fenicia et al., 2006). The model is conceptualized as two connecting reservoirs: the first for interception and the The general functioning of the model is as follows. Precipitation water is intercepted in the first reservoir until the interception capacity S i,max [L] is exceeded. The intercepted water can evaporate from the first reservoir as interception evaporation (E i [LT −1 ]). This process forms the first barrier for precipitation to become groundwater recharge (Savenije, 2004), and creates To allow the model to adjust the input potential evaporation (E p ) to an evaporation flux that better represents the vegetation-175 dependent actual evaporation, a maximum potential evaporation flux E max [LT −1 ] is computed first: where k v [-] is a vegetation coefficient that needs to be calibrated. This approach is similar to, for example, the Ecohydrological Streamflow model developed by Viola et al. (2014). The parameter k v is interpreted as a vegetation coefficient, highlighting the idea that the groundwater recharge may be affected by different types of vegetation instead of a single type of crop.

180
The water balance for the interception reservoir is: where where S i [L] is the amount of water stored in the interception reservoir. The maximum storage capacity of the interception 185 reservoir is determined by the parameter S i,max [L]. Intercepted water is evaporated from the interception reservoir, limited by the amount of maximum potential evaporation E max (energy-limited) or the amount of water available for evaporation S i (water-limited). Any precipitation water exceeding the interception capacity S i,max will continue to the root zone reservoir as effective precipitation P e .
The water balance for the root zone reservoir is: where S r [L] is the amount of water in the root zone reservoir, E t,s [LT −1 ] is a combined evaporation flux constituting both soil evaporation and transpiration by vegetation, and R is the recharge to the groundwater. The maximum storage capacity of the root zone reservoir is determined by the parameter S r,max [L]. The saturation at t = 0 is set to S r (t = 0) = 0.5S r,max . The evaporation flux E t,s is limited by the amount of water available in the root zone as follows: where the parameter l p [-] determines at what fraction of S r,max the evaporation flux is limited by the availability of soil water. The relationship between the saturation of the root zone (S r /S r,max ) and the fraction of the potential evaporation that is evaporated through the root zone (E t,s /E max ) is shown in Fig. 3a. It is noted that the maximum potential evaporation is decreased by the amount of evaporation that already took place as interception evaporation. The actual evaporation as simulated 200 by the non-linear model is calculated as E a = E t,s + E i .
Relationships between the saturation of the root zone (Sr/Sr,max) and the evaporation (a) and the drainage from the root zone reservoir (b). The saturated hydraulic conductivity is set to ks = 1.
Recharge to the groundwater R is computed using Campbell's approximation for unsaturated hydraulic conductivity (Campbell, 1974).
where k s [LT −1 ] is the saturated hydraulic conductivity and γ [-] is a parameter that determines how non-linear this flux is 205 with respect to the saturation of the unsaturated zone. Equation (11) reduces to the equation used in the FLEX models (Eq. (4) in Fenicia et al., 2006) when γ = 1 and is similar to that used by Peterson and Western (2014). The relationship between the saturation of the root zone and the recharge flux for different values of γ is shown in Figure 3b.
In the preliminary phase of this study it was found that the use of an exponential response function yields similar results as the four-parameter response function for the non-linear model. For reasons of model parsimony, the exponential response 210 function (Eq. (4)) was adopted for the non-linear model to translate the recharge R into groundwater levels.
In total the non-linear recharge model has 6 parameters that need to be estimated: k v , S i,max , S r,max , k s , γ, and l p . Some of these parameters may be fixed to sensible values based on experience and literature values (Savenije, 2010), decreasing the number of parameters that need to be calibrated. Here, the interception capacity S i,max was set to 2 mm and l p was fixed to 0.25. The parameter S r,max was fixed to 250 mm (e.g., Gao et al., 2014), as it was found to have a strong correlation with k s 215 in the preliminary phase of this study and thus hard to calibrate. This leaves a total of 6 parameters to be calibrated: k v , k s , and γ of the non-linear recharge model, A and a of the response function (Eq. (4)), and the base level of the model d (Eq. (1)).

The Lysimeter model
For comparison, a third model is constructed where the recharge measured with the lysimeters is used as the flux R in Eq. (2).
Similar to the non-linear model, an exponential response function is used to translate this recharge into groundwater levels.

220
Assuming that the recharge measured with the lysimeters is a good estimate of the (unknown) real recharge, the groundwater levels simulated with this model provide an indication of the fit that may potentially be obtained with the other models.

Noise modeling
The residuals r(t) of TFN models applied to groundwater level data (see Eq. (1)) often show considerable autocorrelation. To allow statistical inferences with the model (e.g., the estimation of confidence intervals of the simulated recharge) it is necessary 225 to transform the residuals series into a noise series that is approximately white noise. For groundwater levels time series this generally means that the autocorrelation needs to be removed from the residuals. An autoregressive model of order one (AR (1)) is commonly used for this purpose (e.g., von Asmuth et al., 2002): where υ is called the noise series here, ∆t i is the time step between two residuals r(t i ) and r(t i−1 ), and α [T] is the AR 230 parameter.
In the preliminary phase of this study, the models were calibrated using daily groundwater level observations. It was found that the noise series from these models still exhibited significant autocorrelation, despite the use of the AR(1) noise model. This result may in fact not be that surprising, considering the slow processes governing groundwater flow systems and the model structure used to simulate these. The former can for example be quantified by calculating the autocorrelations of the observed 235 groundwater levels, which in this study are higher than 0.95 for measurements up to 13 days apart and only drop below 0.5 for measurements 100 days apart. The latter is more general, where autocorrelated errors are a result from the model structure.
Errors in the input data propagate through the TFN model and are likely to result in autocorrelated errors, due to the use of a reservoir model (Kavetski et al., 2003) and the convolution with an impulse response function.
As a practical solution, the time step between groundwater level observations was systematically increased through removal In a further attempt to remove the autocorrelation from the residuals, the AR(1) model was extended with a Moving-Average part of order one (MA(1)) to form an ARMA(1,1) noise model as follows: where β is the parameter of the moving average part of the noise model. The parameters α and β are estimated during model 250 calibration. The first value of the noise series at t = 0 is set to the first value of the residuals, υ(t 0 ) = r(t 0 )), as it is not possible  to compute υ(t = 0) from the previous residual. The MA(1) process can correct for individual shocks in the system, quickly reducing the error over one time step, whereas the AR(1) part deals with an error whose effect exponentially decreases over multiple time steps. Note that the time step ∆t in Eq. (13) may be irregular, but in this study only time series with a regular time step are used. Additional research is necessary to make this noise model fully applicable to irregular time steps, as was 255 done for the AR(1) model (von Asmuth and Bierkens, 2005).
Rerunning the previous analysis of the Durbin-Watson statistic for for an increasing time interval between observations using the ARMA(1,1) noise model, shows that this noise model is better capable of removing the autocorrelation at the first time lag (Fig. 4b). The autocorrelation decreases with increasing time interval and the DW value stabilizes for time intervals of around 6 days and larger. A lack of autocorrelation in the noise series at larger time lags was also confirmed using the Ljung-Box 260 test, although the autocorrelation at lags around one year become significant for time intervals below 10 days. Based on this analysis, groundwater level time series with a 10 day time interval were used for model calibration. The final autocorrelation plots are shown in Fig. A1 in the Appendices.

Parameter estimation and confidence intervals
The previous three sections described the TFN models used in this study, which include a recharge model, a response function, Minimization of the objective function is done using the Trust Region Reflective algorithm, as implemented in Scipy's least squares method (Virtanen et al., 2020, version 1.4.0) and Lmfit (Newville et al., 2016, version 1.0). Note that this is not the default option in Lmfit. The standard errors of the parameters are computed from the covariance matrix that is estimated during optimization. An important assumption underlying this approach is that the minimized noise series (υ in Eq. (13)) 275 behaves as normally distributed white noise with no significant autocorrelation, a constant variance (homoscedastic), and a mean of zero. These assumptions were checked through visual inspection of the results and the use of various statistical tests for autocorrelation as already shown in the previous section.
The 95% confidence intervals of the simulated recharge are computed through a Monte Carlo simulation (N =100,000).
Parameter sets are drawn from a multivariate normal distribution computed using the estimated covariance matrix. If one of the 280 parameters in a parameter set is outside the parameter boundaries, the set is discarded from the sample and a new parameter set is drawn. This procedure is repeated until N parameter sets are available for the Monte Carlo simulation. The model is run with the N different parameters sets and the 95% confidence intervals are computed from the ensemble of simulated recharge fluxes.   Fig. 6a, along with the estimated recharge fluxes (Fig. 6b and c) and the measured recharge ( Fig. 6d). As the models are calibrated on groundwater level observations with a 10 day time step, only recharge rates summed over 10 day intervals are presented here. The blue shadings denote the 95% confidence intervals of the recharge estimates. The step responses characterizing how the groundwater levels respond to a sudden unit recharge event for each of the models are 310 shown in the inset plot at the top of Fig. 6a. The values of the calibrated parameters can be found in Table A1 in the Appendix. All three TFN models are able to capture the major groundwater dynamics and simulate the observed groundwater levels reasonably well. For the calibration period, the non-linear model shows the best simulation of the groundwater levels as quantified by the four goodness-of-fit metrics used in this study (see Table 1). The linear model performs better than the lysimeter model in terms of NSE and KGE metrics, but the lysimeter model shows similar or better performance according to the MAE 315 and R 2 , respectively. For the validation period, the differences in the metrics are not as clear, and no model outperforms the other models. For example, the lysimeter model captures the single peak in groundwater levels during the validation period better than the other models, but shows the worst simulation of the low groundwater levels that follow this peak. The linear model performs better during that period with low groundwater levels, but overestimates the low groundwater levels observed   recharge occurring primarily as individual events, interluded by extended periods of reduced recharge. The behavior of eventbased recharge was also found in other studies (Groenendijk et al., 2014;Reszler and Fank, 2016), and suggests that recharge   limited, as any analysis that uses this estimate as input data would have large uncertainties in its outcomes due to the uncertainty in the input data. The non-linear model performs much better in this respect.

Parameter estimation and consistency of model output
The results presented so far are based on the calibration of the models using only 1 out of every 10 groundwater level measure-400 ments. The use of only a selection of the available groundwater level measurements during calibration allowed for a further investigation into the consistency of the modeling results, by calibrating the models to 10 different selections derived of the original time series as a type of split-sample test. This way, it is possible to assess the consistency of the estimated parameters and the impact on the simulated groundwater levels and the annual recharge estimates for this particular time series.
in the Appendix. The results show that both the simulated groundwater levels and the estimated recharge fluxes are consistent between the different calibrations for all models. This should in fact not be that surprising, considering that the time series used for calibration originate from the same groundwater level time series.
What may be more surprising, however, are the differences in the estimated parameters between the 10 different calibrations (see Fig. A3 in the Appendix). The parameter values for all models are of the same order of magnitude and model performance 410 measured in R 2 is relatively stable, but the optimal parameter values can differ significantly from each-other between calibrations (e.g., for the non-linear model k s ranges between 100 and 250 mm d −1 ) even though the estimated confidence intervals overlap for the most part. The results of this split sample test raise the question of parameter identifiability. Given the similarity in simulated groundwater levels and annual recharge estimates, it is clear that different combinations of parameters yield similar results. This split sample test shows that caution is needed when interpreting values of individual (optimal) parameters.

415
Further research is necessary on the identification of parameters, for example through testing the models on large samples of groundwater time series (similar to, e.g., Perrin et al., 2001).
5 Applicability of the methodology

Hydrogeological setting
The presented approach was tested on relatively shallow groundwater levels (±4 m depth to the water table), for which no 420 feedback between the groundwater and root zone was expected. In this setting an exponential response function was used in the non-linear model and the computed flux R could be directly interpreted as groundwater recharge. The use of an exponential function may not be appropriate for deeper groundwater bodies with thicker unsaturated zones. A response function that accounts for this could then be used (e.g., the four-parameter response function), but the estimated flux R should then be interpreted as drainage from the root zone to the groundwater and not as recharge occurring at the water table. Peterson and 425 Fulton (2019) suggested that the flux could be "averaged over a period greater than the time lag" to provide an estimate of gross recharge in this case. This approach was applied here for the presentation of the annual recharge rates, where also the recharge estimates from the linear model were considered.
To make the methods applicable in other hydrogeological settings than those presented here, additional hydrological processes (e.g., snow melt, surface runoff) and variables (e.g., pumping, river levels) may be included in the model. In the current this approach for groundwater levels that were also influenced by groundwater pumping. To make the recharge models applicable in different settings, additional processes may be implemented in the root zone model, for example precipitation entering 435 the system as snow or leaving as surface runoff before infiltrating into the soil. For ideas on how to include such processes one can draw from the vast number of concepts already available in conceptual rainfall-runoff modeling (e.g., Beven, 2011).
There may also be possibilities for knowledge transfer in the reverse direction, fading the boundaries between hydrologists and hydrogeologists (e.g., Staudinger et al., 2019). Groundwater levels are not often considered in conceptual rainfall-runoff models, although it has been shown that groundwater level time series can be used to further constrain parameters in these 440 models (e.g., Seibert, 2000). This study showed how groundwater levels may be used to calibrate the parameters of a root zone module, which is based on the conceptualization taken from a rainfall-runoff model (Fenicia et al., 2006). It would be interesting to see new attempts to constrain the parameter estimation in rainfall-runoff models using groundwater levels, in particularly using the concepts of impulse responses to improve the simulation of groundwater levels without adding many parameters. Conversely, the results of such analyses may also help to further constrain the non-linear recharge models used in 445 the TFN models presented here.

Noise modeling and quantifying uncertainties
In this study, model parameters are estimated by fitting simulated groundwater levels to observed groundwater levels. The recharge is an intermediate model result that is not calibrated for, and it is recommended to quantify the impact of parameter uncertainties on the recharge estimates by computing their confidence intervals. To obtain reliable estimates of the parameter 450 standard errors of in the presented framework, it is important to remove the autocorrelation from the minimized noise series by using an appropriate noise model. Here, an AR(1) did not suffice for this purpose and an ARMA(1,1) noise model was used instead. While this model was more successful in transforming the residual series into a noise time series that is approximately white noise, some autocorrelation still remained in the noise series. As a practical solution, the time interval between groundwater level measurements was increased to 10 days by removing measurements from the time series. It should be noted here 455 that the optimal time interval is likely to be site specific and should be investigated per individual time series. The approach shown in Section 3.5 can be helpful in determining the optimal time step size used for model calibration. The modeling of the residuals and the choice of an appropriate noise model and time interval should be considered an iterative process, as also suggested in Smith et al. (2015).

Time series requirements 460
The presented method requires time series of groundwater levels, precipitation, and potential evaporation as model input data.
The precipitation and evaporation time series should have a regular (daily) time step to compute the recharge to the groundwater.  (2017) recommended a 10 to 20 year period for simulating groundwater levels with reliable credible intervals. More research is needed to determine the effect of time series length on the recharge estimation.

Right answers for the right reasons
The non-linear root zone model presented in this study is only one of many similar alternatives. Comparable results were obtained using a similar root zone model developed by Berendrecht et al. (2006), which is also available in the Pastas software (Collenteur et al., 2019). It is expected that other comparable non-linear model setups (e.g. the models of Peterson and Western, 2014) perform in a similar manner. Such non-linear models are better capable of simulating true system dynamics which are commonly not measured, such as groundwater recharge and actual evaporation. This suggests that the improved groundwater level simulation is the result of a better representation of the hydrological processes, rather than merely the result of added 475 mathematical complexity. As such, the use of non-linear root zone models in TFN models is a promising step in the effort "to get the right answers for the right reasons", as advocated by, e.g. Kirchner (2006).

Conclusions
The application of linear and non-linear transfer function noise (TFN) models using predefined impulse response functions was explored to estimate recharge and simulate groundwater levels. The methods were tested on groundwater levels observed at the is not calibrated for in the first two models.
The results show that both the linear and non-linear TFN models are capable of simulating the groundwater levels reasonably well, with the non-linear model slightly outperforming the linear model. The use of a more complex response function was required to obtain satisfactory results with the linear model, as the response function also had to simulate the effects of the root zone. However, this response function was not able to compensate for all hydrological conditions, and in particular during 490 periods of low soil-moisture levels the lack of soil moisture dynamics in the linear model leads to larger errors in the simulation of the groundwater levels. These findings confirm those from other studies (Berendrecht et al., 2006;Peterson and Western, 2014) and advocate a more widespread adoption of non-linear recharge models in TFN modeling of groundwater levels. The use of such models does not necessarily imply a higher number of calibration parameters; i.e., the linear and non-linear models used in this study have the same number of calibration parameters.

495
The use of a non-linear root zone model to compute the recharge in the TFN model improved the estimation of groundwater recharge significantly. For annual recharge rates it was found that the non-linear model provides good estimates with relatively small deviations from the recharge measured with the lysimeters, while the linear models shows significantly larger deviations and a structural overestimation of annual recharge rates. The non-linear model also provided reasonable estimates for recharge summed over 10-day periods, suggesting that this model may be used to obtain recharge estimates at smaller time scales than 500 reported so far (e.g., Obergfell et al., 2019;Peterson and Fulton, 2019). Using detailed information from the lysimeters present at the study site, deviations in the recharge estimate could be linked to errors in the simulation of the actual evaporation, highlighting the value of field data from lysimeters and the importance of the evaporative flux in the estimation of recharge.
The methods developed in this paper were tested on a single groundwater time series and are presented as a proof-ofconcept. Additional research is needed using larger groundwater level data sets to investigate the general applicability of the 505 method under different hydrogeological settings. The methods can also be extended to estimate recharge in settings where other hydrological stresses cause groundwater fluctuations (e.g., river levels and pumping) or when other processes (e.g., snow melt, surface runoff) influence recharge generation. To support and encourage such applications and future research, the models are included and documented in the open-source software Pastas and all scripts used for this study are made available.
Code and data availability. All code used in this manuscript is available through the Python package Pastas (Collenteur et al., 2019). The  Linear Non-linear Lysimeter Figure A3. Parameter values and their 95% confidence interval for the three models calibrated on 10 time series. Note that when possible the y-axis is shared between the different columns for comparison purposes. The bottom right subplot shows the model performance for each calibration, measured as R 2 between the observed and simulated groundwater levels.