Determination of vadose and saturated-zone nitrate lag times using long-2 term groundwater monitoring data and statistical machine learning 3

In this study, we explored the use of statistical machine learning and long-term groundwater nitrate monitoring data to 12 estimate vadose-zone and saturated-zone lag times in an irrigated alluvial agricultural setting. Unlike most previous statistical 13 machine learning studies that sought to predict groundwater nitrate concentrations within aquifers, the focus of this study was to 14 leverage available groundwater nitrate concentrations and other environmental variable data to determine mean vertical velocities 15 (transport rates) of water and solutes in the vadose zone and saturated zone (3.50 m/year and 3.75 m/year, respectively). Although 16 a saturated-zone velocity that is greater than a vadose-zone velocity would be counterintuitive in most aquifer settings, the statistical 17 machine learning results are consistent with two contrasting primary recharge processes in this aquifer: (1) diffuse recharge from 18 irrigation and precipitation across the landscape, and (2) focused recharge from leaking irrigation conveyance canals. The vadose19 zone mean velocity yielded a mean recharge rate (0.46 m/year) consistent with previous estimates from groundwater age-dating in 20 shallow wells (0.38 m/year). The saturated zone mean velocity yielded a recharge rate (1.31 m/year) that was more consistent with 21 focused recharge from leaky irrigation canals, as indicated by previous results of groundwater age-dating in intermediate-depth 22 wells (1.22 m/year). Collectively, the statistical machine-learning model results are consistent with previous observations of 23 relatively high-water fluxes and short transit times for water and nitrate in the aquifer. Partial dependence plots from the model 24 indicate a sharp threshold where high groundwater nitrate concentrations are mostly associated with total travel times of seven 25 years or less, possibly reflecting some combination of recent management practices and a tendency for nitrate concentrations to be 26 higher in diffuse infiltration recharge than in canal leakage water. Limitations to the machine learning approach include potential 27 non-uniqueness when comparing model performance for different transport rate combinations and highlight the need to corroborate 28 statistical model results with a robust conceptual model and complementary information such as groundwater age. 29 30 31 32 33 34 35 https://doi.org/10.5194/hess-2020-169 Preprint. Discussion started: 5 May 2020 c © Author(s) 2020. CC BY 4.0 License.


Introduction 36
Lag times for movement of non-point source nitrate contamination through the subsurface are widely recognized (Böhlke,  Regulators and stakeholders in agricultural landscapes are increasingly in need of more precise and local lag time information to 42 better evaluate and apply regulations and best management practices for the reduction of groundwater nitrate concentrations (e.g., 43 Eberts et al., 2013). 44 Field-based studies of lag times commonly use expensive groundwater age-dating techniques and/or vadose-zone 45 sampling to estimate nitrate transport rates moving into and through aquifers (Böhlke et al., , 2007Böhlke and Denver, 1995 estimated potential application efficiency ("measure of the fraction of the total volume of water delivered to the farm or field to 114 that which is stored in the root zone to meet the crop evapotranspiration needs," per Irmak et al. (2011)) of 45% to 65%, compared 115 to center pivot sprinklers at 75% to 85% (Irmak et al., 2011). Based on improved irrigation efficiency (between 10-40%), average 116 precipitation throughout growing season (29.5 cm for 15 April to 13 October (Yonts, 2002)), and average water requirements for 117 corn (69.2 cm (Yonts, 2002)), converting furrow irrigated fields to center pivot over the aforementioned 14,253 hectares could 118 represent a difference of 1 x 10 7 m 3 to 6 x 10 7 m 3 in water applied. Those (roughly approximated) differences in water volumes are 119 equivalent to 6-28% of average annual precipitation applied over the Dutch Flats area, suggesting the change in irrigation practice 120 does have potential to alter the water balance in the area. 121 The hypothesis of lower recharge due to changes in irrigation technology was investigated by Wells  Other long-term changes to the landscape have included statistically significant reductions in mean fertilizer application 128 rates (1987-1999 vs. 2000-2012) and volume of water diverted into the Interstate Canal (1983-1999 vs. 2000-2016), while a 129 significant increase in area of planted corn was found (1983-1999 vs. 2000-2016). Precipitation was also evaluated, and though 130 the mean has decreased over a similar time period, it was not statistically significant. 131

Statistical Machine Learning Modelling Framework 132
Statistical machine learning uses algorithms to assess and identify complex relationships between variables. Learned 133 relations can be used to uncover nonlinear trends in data that might otherwise be overshadowed when using simple regression 134 techniques (Hastie et al., 2009). In this study we used Random Forest Regression, where Random Forests are created by combining 135 hundreds of unskilled regression trees into one model ensemble, or "forest", which collectively produce skilled and robust 136 predictions (Breiman, 2001). Predictors used in the model represent site-specific explanatory variables (e.g., precipitation, vadose-137 zone thickness, depth to bottom of screen, etc.) that may impact the response variable, groundwater [NO3 -]. 138

Random Forest Application 139
Random Forest regression models of groundwater [NO3 -] were developed using five-fold cross validation (Hastie et al.,140 2009), where four folds were used to build the model (training data), and one fold was held out (testing data). The maximum and 141 minimum of the [NO3 -] and each predictor were determined and placed into each fold for training models to eliminate the potential 142 for extrapolation during validation. Each fold was used as training data four times, and testing data once. This process was repeated 143 five times to create a total of 25 models, similar to the approach used by Nelson et al. (2018). The four folds designated to build 144 the model underwent a nested five-fold cross validation, as specified in the trainControl function within the caret (Classification 145 and Regression Training) R package (Kuhn, 2008;R Core Team, 2017). Functions in caret were used to train the Random Forest 146 models. 147 To evaluate model performance, Nash-Sutcliffe Efficiency (NSE), permutation importance, and partial dependence were 148 quantified. NSE indicates the degree to which observed and predicted values deviate from a 1:1 line, and ranges from negative 149 infinity to 1 (Nash and Sutcliffe, 1970). 150 https://doi.org/10.5194/hess-2020-169 Preprint. Discussion started: 5 May 2020 c Author(s) 2020. CC BY 4.0 License. For each tree, a random bootstrapped sample is extracted from the dataset (Efron, 1979), as well as a random subset of 157 predictors to consider fitting at each split. Thus, each tree is grown from a bootstrap sample and random subset of predictors, 158 making trees random and grown independent of the others. Observations not used as bootstrap samples are termed out-of-bag data 159 with a corresponding depth to groundwater record, of which the most recent record was used, were selected (2,651 observations 181 from 172 wells). Over this period, several wells were sampled much more frequently than others (e.g., monthly sampling, over a 182 short period of record), especially during a USGS National Water-Quality Assessment (NAWQA) study from 1995 to 1999. In Predictors were divided into two categories; static and dynamic (Table 1). Static predictors are those that either do not 190 change over the period of record, or annual records were limited. DO, for example, could potentially experience slight annual 191 variations, but data were not available to assign each nitrate sample a unique DO value. Instead, observations for each well were 192 assigned the average DO value observed from the well. This approximation was considered reasonable because nitrate isotopic 193 composition and DO data collected in the 1990s and by Wells et al. (2018) did not indicate any major changes to biogeochemical 194 processes over nearly two decades. Total travel time was strictly considered a static predictor in this study and was used to link the 195 nitrate-sampling year to a dynamic predictor value. 196 Dynamic predictors were defined in this study as data that changed temporally over the study period. Therefore, each 197 annual median [NO3 -] was assigned a lagged dynamic value to represent the difference between the time of a particular surface 198 activity (e.g., timing of a particular irrigation practice) and when groundwater sampling occurred. Dynamic predictors were 199 available from 1946 to 2013 and included annual precipitation, Interstate Canal discharge, area under center pivot sprinklers, and 200 area of planted corn (Fig. 2). Dynamic predictors were included to assess their ability to optimize Random Forest groundwater 201 modelling and determine an appropriate lag time. Lag times were based on the vertical travel distance through both the vadose and 202 saturated zones. Area of planted corn was included as a proxy for fertilizer data, which were unavailable prior to 1987. However, 203 analysis suggests there has been a reduction in fertilizer application rates per planted hectare, while area of planted corn has 204 increased in recent decades (Wells et al., 2018). There was a likely trade-off in using this proxy; we were able to extend the period 205 of record back to 1946, allowing for analysis of a wider range of lag times in the model, but might have sacrificed some accuracy 206 in recent decades when nitrogen management may have improved. Lastly, vadose and saturated-zone transport rates were assumed 207 to be constant over time (Wells et al., 2018). 208

Vadose and Saturated-zone Transport Rate Analysis 209
Ranges of vertical velocities (transport rates) through the vadose zone and saturated zone were estimated from 3 H/ 3 He where R is the upper and lower bound of recharge rates (m/yr) indicated by groundwater ages, and is mobile water content and 213 porosity in the vadose and saturated zone, respectively. The use of 3 H/ 3 He data was used in this study solely for constraining the 214 range of transport rates to evaluate in the vadose and saturated zones, and as a base comparison to model results. The age-data, 215 however, were not used by the model itself when seeking to identify an optimum transport rate combination. Throughout the text, 216 unsaturated (vadose)-zone vertical transport rates will be abbreviated as Vu, while saturated-zone vertical transport rates will be Vs. 217 In the vadose zone, was assigned a constant value of 0.13, which was calibrated previously using a vertical transport model for 218 the Dutch Flats area (Liao et al., 2012). In the saturated zone, was assigned a constant value of 0.35, equal to the value assumed 219 previously for recharge calculations (Böhlke et al., 2007). Vadose and saturated-zone travel times ( ) then were calculated using 220 This study addressed a relatively unexplored use of Random Forest, which was to identify optimal lag times based on 262 testing a range of transport rate combinations through the vadose and saturated zones, historical nitrate concentrations, and the use 263 of easily accessible environmental datasets. 264

Relative Importance of Transport Time and Dynamic Variables 265
In our initial modelling with dynamic predictors, we anticipated that we could use the Random Forest model with the 266 highest NSE to identify the optimal pair of vadose and saturated-zone transport rates. However, no clear pattern emerged among 267 the different models (Fig. 3). Given the small differences and lack of defined pattern in testing NSE values, we selected ten transport 268 rate combinations (the five top performing models, plus four transport rate combinations of high and low transport rates, and one 269 intermediate transport rate combination) for further evaluation of variable importance and sensitivity to a range of transport rate 270 combinations (Table 2). Median total travel time ranked third in variable importance, while the four dynamic variables consistently 271 had the four lowest rankings (Fig. 4) trend of the predictor, the greater the importance of the predictor (Fig. 2; Fig.4). For instance, center pivot irrigated area (highest 277 ranking dynamic variable) had the least noise and the most pronounced trend, while annual precipitation (lowest ranking variable) 278 was highly variable and lacked any trend over time (Fig. 2), and also may not be a substantial source of recharge (Böhlke et al., 279 2007). Further exploration could be done to test more refined variablesfor instance, annual median rainfall intensity for the 280 growing season might have a more direct connection to nitrate leaching than total annual precipitation. However, rainfall intensity 281 data are not readily available. Dynamic variables could be of more use in other study areas that undergo relatively rapid and 282 pronounced changes (e.g., land use). In future work, the model sensitivity to dynamic variables could be tested through formal 283 sensitivity analysis and/or automated variable selection algorithms (Eibe et al., 2016). 284 Ultimately, results from initial analyses suggest that (1) the dynamic data did little to improve model performance, and 285 (2) Random Forest was not able to relate the four considered dynamic predictors to [NO3 -] in a meaningful way that could be used 286 to estimate lag time. It has also been suggested by Katz

et al. (2001) that a monotonic trend in an independent variable is not 287
necessarily linearly related to the dependent variable. It is likely the influence of these dynamic predictors are dampened as nitrate 288 is transported from the surface to wells such that data-driven approaches are unable to sort through noise to identify relationships. 289

Use of Random Forest to determine transport rates 290
Due to their low relative importance as predictors, all four dynamic predictors were removed in the subsequent analysis. As 291 discussed above, a notable variation in total travel time %incMSE was observed in Fig. 4 The Random Forest models were useful in identifying the relative magnitudes of Vu and Vs that led to high %incMSE. 298 Based on the heat map of %incMSE, a band of transport rate combinations with consistently high %incMSE was visually apparent 299 (Fig. 5). The upper and lower bounds of the band translate to transport rate ratios (Vs/Vu) ranging from 0.9 to 1.5, and are values 300 that could be useful in constraining recharge and/or transport rate estimates in more complex mechanistic models, as part of a 301 hybrid modelling approach. This is especially important since recharge is one of the most sensitive parameters in a groundwater 302 model (Mittelstet et al., 2011), yet one with high uncertainty. 303 The %incMSE of total travel time in the second analysis ranged from 20.6 to 31.5%, with the largest %incMSE associated 304 with vadose and saturated-zone transport rates of 3.50 m/yr and 3.75 m/yr, respectively (Fig. 5), and the top four predictors for this 305 transport rate combination were total travel time, vadose-zone thickness, dissolved oxygen, and saturated thickness (Fig. 6). 306 Converting those vadose and saturated-zone transport rates to recharge rates yielded values of 0.46 m/yr and 1.31 m/yr, 307 respectively. Such a large difference between the two recharge values would be unexpected in most unconsolidated surficial (water-308  2 and 1.1, respectively) were consistent. That is, the Random Forest modelling framework produced transport rates 317 consistent with the major hydrological processes in Dutch Flats both in direct (i.e., transport rate estimates) and relative (i.e., 318 transport rate ratio) terms. 319 Assuming the Random Forest approach has accurately captured the two major recharge processes (diffuse recharge over 320 crop fields and focused recharge from canals), a comparison of recharge rates from all sampled groundwater wells representative 321 of recharge to the groundwater system as a whole (0.84 m/yr, n = 43) to the recharge rates from Random Forest modelling (0.46 322 and 1.31 m/yr) would provide an estimate of the relative importance of diffuse versus focused recharge on overall recharge in 323 Dutch Flats. Under these assumptions, diffuse recharge would account for approximately 55%, while focused recharge would 324 account for about 45% of total recharge in the Dutch Flats area. Similarly, Böhlke et al. (2007) concluded that these two recharge 325 sources contributed roughly equally to the aquifer on the basis of groundwater age profiles, as well as from dissolved atmospheric 326 gas data indicating mean recharge temperatures between those expected of diffuse infiltration and focused canal leakage. 327 Partial dependence plots, which illustrate the impact a single predictor has on [NO3 -] in the model with respect to other 328 predictors (Fig. 7), largely reflect the conceptual understanding of the system from previous studies including Böhlke et al. The partial dependence plot for total travel time exhibits a pronounced threshold, where [NO3 -] is markedly higher for 338 groundwater with travel time less than seven years. It is possible this reflects long-term stratification of distinct groundwater [NO3 -339 ], stemming from the suggested patterns stated above as it relates to aquifer depth and the influences of diffuse and focused recharge 340 in the region. This seven-year threshold is lower than a previous estimate of mean groundwater age alone (8.8 years; where 341 groundwater age neglects vadose-zone travel time) and suggests that rapid aquifer response to changes in nitrogen management in 342 Dutch Flats is possible. 343

Opportunities and limitations of Random Forest approach in estimating lag times 344
Overall, results suggest that in a complex system such as Dutch Flats, Random Forest was able to identify reasonable 345 transport rates for both the vadose and saturated zones, and with additional validation, this method may offer an inexpensive (i.e., 346 compared to groundwater age-dating across a large monitoring well network and/or complex modelling) and reasonable technique 347 for estimating lag time from historical monitoring data. Further, this approach allows for additional insight on groundwater 348 dynamics to be extracted from existing monitoring data. However, this study was conducted in the context of a larger project 349 (2) The Dutch Flats overlies a predominantly oxic aquifer, where nitrate transport is mostly conservative. In aquifers with 359 both oxic and anoxic conditions and distinct nitrate extinction depths (Liao et al., 2012;Welch et al., 2011), this approach 360 may be biased toward oxic portions of the aquifer where the nitrate signal is preserved. 361 (3) While estimates of vadose and saturated-zone transport rates determined from %incMSE are consistent with previous 362 studies, the predictive performance of the selected model (based on NSE and visual inspection of predicted versus 363 observed nitrate plots) was not substantially different than other models tested. In other words, the "optimal model" was 364 non-unique in terms of predicting [NO3 -]. Testing the approach of using %incMSE in other vadose and saturated zones, 365 with substantial comparison to previous transport rate estimates, is warranted. 366 (4) Despite potential non-uniqueness in prediction metrics, the heat map of %incMSE did reveal an orderly pattern suggesting 367 consistent transport rate ratios. For modelling efforts where recharge rates are a key calibration parameter, identification 368 of a range of reasonable recharge rates, and/or the ratio of recharge rates from diffuse and focused recharge sources for a 369 complex system will reduce model uncertainty and improve results. This statistical machine learning approach, which 370 essentially leverages nitrate as a tracer, may provide valuable insight to complement relatively expensive groundwater 371 age-dating or vadose-zone monitoring data, or as a standalone approach for first-order approximations.