Simulation of long-term spatiotemporal variations in regional-scale groundwater recharge: Contributions of a water budget approach in cold and humid climates

.

of their relative ease of use, water budget models are a useful approach for scientists, modellers, and stakeholders alike to understand regional-scale groundwater renewal rates in cold and humid climates, especially if they can be easily adapted to specific study needs and environments.

Introduction
Groundwater recharge (GWR) generally refers to the portion of precipitation that infiltrates into the ground and eventually reaches the water table (Doble and Crosbie, 2017;Scanlon et al., 2002). It links meteorological events to subsequent aquifer responses and usually defines renewal rates of the unconfined aquifers in cold and humid climates (Gleeson et al., 2011;Kløve et al., 2017;Meyzonnat et al., 2018;Rivera, 2014). GWR is influenced by climate, land cover, vegetation, topography, as well 40 as soil type (Douville et al., 2013;Fu et al., 2019;Jasechko et al., 2014). It is recognized as a strategic hydrologic variable for sustainable groundwater management (Foster and Ait-Kadi, 2012;Wada et al., 2010). Consequently, spatiotemporal GWR estimates at the regional scale and analysis of past changes and trends linked to climate variations or climate changes are particularly important for water resource managers (Ashaolu et al., 2020;Brunner et al., 2004;Larocque et al., 2018). This is especially true in cold and humid climates where warming temperature specifically impact key winter conditions (Aygün et 45 al., 2020;Grinevskiy et al., 2021;Nygren et al., 2020).
Models are among the most common ways to estimate GWR and are usually developed for specific climate and hydrogeological conditions (Healy and Scanlon, 2010;Scanlon et al., 2002). Their complexity varies widely with the use of few (<10) to numerous (>100) parameters and with the need to provide simple input data (e.g., precipitation and temperature time series) to detailed input data (e.g., spatially distributed time series of precipitation, temperature, solar radiation, wind, 50 relative humidity, normalized difference vegetation index, soil water content, groundwater levels, river flow rates) (Brunner et al., 2004;Crosbie et al., 2015). Spatiotemporal resolution is also highly variable between GWR models (Abdollahi et al., 2017;Döll and Fiedler, 2008;Hu et al., 2019;Portoghese et al., 2005). GWR assessment based on water budget approaches are widely used, providing useful results and insights on GWR dynamics for an array of local (field scale) to regional studies (several 1 000 km 2 ), and for a wide range of climatic conditions (Abdollahi et al., 2017;Dripps and Bradbury, 2007;Dyer, 55 2019; Portoghese et al., 2005;Zomlot et al., 2015).
Nevertheless, water budget models have strong limitations. In such models, GWR is dependent on other terms of the water budget, namely evapotranspiration and runoff (Crosbie et al., 2015;Jasechko et al., 2014;Scanlon et al., 2002), which can be highly uncertain. Thus error propagation can be significant, especially where GWR rates are particularly low, such as in arid and semi-arid areas. However, water budget approaches can be appropriate in cold and humid climates, where more water is 60 available for infiltration and groundwater and surface water are usually closely connected (Gleeson et al., 2011;Rivera, 2014). In these conditions, the substantial volumes of water stored in the snowpack through winter become rapidly available for infiltration during spring snowmelt and the relatively simple water budget models have been shown to provide results representative of those obtained with more complex numerical models. For example, similar monthly and annual GWR estimates were obtained in cold and humid climates with the Soil Water Balance model (SWB) and the GFLOW 65 analytical model (Hunt et al., 1998) by Dripps and Bradburry (2007) in Wisconsin (U.S.A) or with the HELP water budget model (Schroeder et al., 1994) and the fully coupled CATHY model (Comporese et al., 2010) by Guay et al. (2013) in Quebec (Canada).
The association of regional-scale changes in GWR to changing climatic conditions in cold and humid climates in the last decades helped developing a better picture of the potential impacts of climate change on regional hydrology. In the northern 70 part of western Russia, Grinevskiy et al. (2021) found that the warming temperature monitored between the 1965-1988 and the 1989-2018 periods led to an increase of simulated GWR up to +60 mm/yr (1D unsaturated zone Hydrus model, Simunek et al., 2009) due to more snowmelt and rainfall during winter. Inversely, Nygren et al. (2020) used time series of groundwater levels in Sweden and Finland to show that GWR decreased between the 1980-1989 and the 2001-2010 periods due to a switch from snowmelt to rain-dominated recharge events. Meanwhile, GWR has been estimated in various studies in the cold and 75 humid climate of southern Quebec (Canada), but they did not identify obvious trends (Benoit et al., 2014;Chemingui et al., 2015;Croteau et al., 2010;Gagné et al., 2018;Guay et al., 2013;Larocque and Pharand, 2010;Lavigne et al., 2010;Levison et al., 2014Levison et al., , 2016Nastev et al., 2008;Rivard et al., 2009;Rivera, 2014;Saby et al., 2016). This may be due to the relative short duration of the time series of the variables used as a proxy for GWR (baseflows, groundwater levels; Rivard et al., 2009) and to the fact that very different models were used in areas of contrasted sizes (Larocque et al., 2019). 80 The objective of this study was to demonstrate the relevance of using a water budget model to understand long-term transient and regional-scale GWR in cold and humid climates where groundwater observations are scarce. The HydroBudget (HB) model (Dubois et al., 2021) was used to simulate the first long-term (57 years) and regional-scale (36 000 km 2 ) GWR estimates for watersheds of the St. Lawrence River in southern Quebec (Canada) in a typical cold and humid climate case study.
HydroBudget is a parsimonious water budget model, capable of handling large amounts of data to simulate spatially-distributed 85 recharge. It was used here with a robust automatic calibration approach using river flow rates and baseflow estimates, along with a complete estimate of model sensitivity and uncertainty of the simulated variables.

Study area
The case study area is located in the province of Quebec (humid and cold climate), between the St. Lawrence River and the 90 Canada-USA border, and between the Quebec-Ontario border and Quebec City (35 800 km²) (Fig. 1). It is comprised of the watersheds of eight main tributaries of the St. Lawrence River (numbered 1 to 8 from west to east) (Table 1). Watersheds W1 (Châteaugay River), W2 (Richelieu River), and W4 (Saint-François River) are partially located in the USA (42%, 83%, and 15% of their total areas respectively). Topography is flat with low elevation areas close to the St. Lawrence River and higher elevations in the Appalachian Mountain range, associated with steeper slopes. Land cover includes agriculture, forest, 95 wetlands, urban uses, and surface water (Fig. 2a). Agriculture dominates in the watersheds located in the St. Lawrence Platform, while forest occupies most of the Appalachian watersheds. Bergeron (2016) used 251 climate stations located in the study area to produce spatially-interpolated temperature and precipitation data with a daily time step for the 1961-2017 period on a 10 km-resolution grid. The high density of measurements during this period generated minimal error on the interpolated data, with a root mean square error (RMSE) of 3 mm/d for 100 precipitation (mean bias of +0.1 mm/d), 2.5°C for minimal temperature (mean bias of -0.5°C), and 1.5°C for maximal temperature (mean bias of +0.1°C; Bergeron, 2016). Based on these data, average annual precipitation for the study area is 1,090 mm (Table 1), including 275 mm falling as snow during winter (November to March). Precipitation is distributed relatively evenly throughout the year, and although total precipitation is similar for all of the watersheds, it is higher in the mountainous zones than in the lowlands (Fig. 2c). The average annual temperature is 5°C (Table 1), with average monthly 105 temperature varying between -15°C (in January and February) and 20°C (in July and August). Regionally, temperatures decrease from the southwest to the northeast (Fig. 2d).
The study area includes two main geological units: the sedimentary basin of the St. Lawrence Platform and the Appalachian mountain range (metasedimentary) (Malo, 2018;Fig. 2c). The bedrock of the St. Lawrence Platform is composed of slightly deformed Combro-Ordovician fractured bedrock consisting of limestone, shale, and mixed shale and fine sandstone. The 110 bedrock of the Appalachians is composed of highly deformed schist, quartzite, and phyllades (Thériault and Malo, 2018).
The bedrock is unevenly covered with unconsolidated Quaternary sediments from the last glaciation-deglaciation cycle, mainly composed of glacial, marine, fluvial, and lacustrine deposits. In this area, sediment depositional modes determine the geological nature of the superficial materials and the pedology (Fig. 2b). Thin (<5 m) and relatively coarse superficial deposits (aeolian sands, till, and various glacial deposits) are found over the bedrock in the uphill areas. These deposits get thicker (5 m  115 to 20 m) in the valleys and in topographic depressions, with a wider mix of grain sizes resulting from the different depositional modes (fluvial, lacustrine, and marine). Closer to the St. Lawrence River, the thickness of the Quaternary deposits can reach several tens of meters, where silty clays from the Champlain Sea are covered with sandy materials (IRDA, 2008).
Regional fractured bedrock aquifers flow from the Appalachians to the St. Lawrence River (south/southeast to north/northwest). The aquifers are moderately productive and shallow, with the water table 3 m to 5 m from the surface 120 . They are in unconfined conditions upstream, where bedrock outcrops or under coarse superficial deposits, and reach semi-confined to confined conditions once the bedrock is covered by finer sediments in the valleys and close to the St. Lawrence lowlands (Rivera, 2014). Limited-extent shallow aquifers have been identified in the local superficial coarse deposits. Previous local or watershed-scale recharge studies have identified recharge areas upstream, mostly on the outcropping bedrock and in coarse glacial sediments, while the St. Lawrence tributaries drain groundwater over most of the 125 study area. Gagné et al. (2018) showed that in the main GWR zones, the hydrogeological system has a rapid response and most of the GWR rapidly reaches the nearby drainage system. The overall configuration corresponds to topographic-driven water tables (Gleeson et al., 2011), with hydrogeological watersheds matching superficial watersheds, and can reasonably be considered to have a closed watershed water budget. Fig. 1), with watershed areas ranging from 30 km 2 to 3 000 km 2 . Data missing for up to five consecutive days were linearly interpolated to optimize the length of the available time series. These 51 stations all have full-year measurements for more than three consecutive years, and the presence of ice most likely affects winter flow measurements.

HydroBudget
HydroBudget (HB) is a spatially-distributed GWR model that computes a superficial water budget on grid cells of regionalscale watersheds with outputs aggregated into monthly time steps. The model uses commonly available meteorological data (daily precipitation and temperature, spatialized if possible) and spatially-distributed data (pedology, land cover, and slopes).
It is based on simplified process representations and is driven by eight parameters that need to be calibrated. These parameters 140 are uniform over the grid and held constant for all the grid cells and through time ( Table 2). Coded in R, HB uses a conceptual lumped reservoir to compute the soil water budget on a daily time step (Appendix A. 1). For each grid cell and each time step, precipitation is divided between runoff (R), evapotranspiration (ET), and infiltration that could reach the saturated zone (potential GWR), with a monthly time step ( Fig. 3; Dubois et al., 2021).
Calculations begin with the vertical processes, by first determining the available liquid water (vertical inflow, VI), which is 145 the sum of rain and snowmelt, with a 0°C air temperature threshold for rain or snow and a two parameter (TM and CM) degreedays approach (Massmann, 2019). If the soil is frozen (two parameters, TTF and FT), the entire VI will produce R directly.
Otherwise, VI can flow as runoff, infiltration, evapotranspiration, and eventually percolation as potential GWR.
Runoff is calculated using the runoff curve number (RCN) method (USDA-NRCS, 2004; 2007) on a cell-by-cell basis (two parameters, tAPI and frunoff), similar to what is done in the SWAT model (Arnold et al., 2012;Neitsch et al., 2002). The RCN is 150 attributed per cell based on its pedology, land cover, and slope following the USDA-NRCS method adapted for the Quebec environment (Dubois et al., 2021;Monfet, 1979). The remaining water (VIrunoff) is considered to be infiltration (I) reaching a soil reservoir for which storage capacity is driven by one parameter (swm). If I exceeds available storage, saturation excess is added to R. Otherwise, actual evapotranspiration (AET) is calculated as the minimum of PET (calculated using the formula of Oudin et al. (2005)) and the available water in the soil reservoir. The residual soil water is mobilized as potential GWR (one 155 parameter, finf) or stored for the next time step.
HydroBudget thus calculates potential GWR, the recharge that could reach the aquifer) if 1) the geological material below the soil horizon allows deep percolation to occur, 2) no additional storage or losses occur in the unsaturated zone below the soil, and 3) no significant groundwater evapotranspiration occurs (Doble and Crosbie, 2017). Actual GWR corresponds to the part of potential GWR that would reach the water table, and potential GWR is therefore a maximum. 160

Calibration strategy
The HydroBudget model is calibrated on superficial watersheds, based on the hypotheses that: 1) surface watersheds match hydrogeological watersheds, 2) the rivers drain unconfined aquifers. Under these conditions, for any given watershed, potential GWR should be similar to river baseflow at the outlet, and the sum of runoff and potential GWR should be equal to the total 165 flow at the outlet. Dripps and Bradbury (2007), and Guay et al. (2013) have shown that results of GWR water budget models (SWB and HELP, respectively), compared better to groundwater flow models (GFLOW and CATHY, respectively) at the monthly time step than at a daily time step. Thus, runoff, AET, and potential GWR were computed with HB with a daily time step, and the results were aggregated on monthly time steps. Calibration and validation of the model and analysis of the results were all performed on the monthly aggregated results. As well, the calibration and validation of the model were performed 170 under the hypothesis that no major land use change occurred during the simulation period (i.e. land use was held constant).
The daily river flow rates used for calibration were those from 51 gauging stations (Fig. 1). Baseflows were estimated from the flow rate time series following the proposition of Ladson et al. (2013) with the Lyne and Hollick filter (Lyne and Hollick, 1979) and using a stochastic calibration and 30 filter passes. To compare the effects of the baseflow filters on GWR simulation, baseflows were also estimated with the Eckhardt (2005) and the Chapman (1991) filters for the gauging stations of W6. As 175 aquifers generally discharge into superficial water bodies in southern Quebec, baseflows estimated with recursive filters are considered an acceptable proxy of GWR dynamics. Total flows and baseflows were divided by the area of the given watershed to provide flow values in mm/yr, and thus facilitate the comparison of calibration results between watersheds of very different sizes.
The model was simultaneously calibrated for all the gauging stations using the automatic calibration procedure of the R 180 package caRamel (Monteil et al., 2020) to obtain a regionalized set of parameters. The eight HB parameters were optimized for each gauging station, grouped by river watershed to save computational time (from 51 individual optimizations to eight grouped optimizations), and averaged over the study area, regionalized, using the normalized density of stations per group (number of stations per km 2 ) as weights. Up to 5 000 model calls were used per station group, with several successive optimizations to confirm the reproducibility of the results, as recommended by Monteil et al. (2020)the algorithm's authors. 185 Model performance was assessed using the Kling-Gupta Efficiency (KGE) calculated for with total monthly measured river flows and simulated total flow (KGEqtot), as well as monthly baseflow and monthly potential GWR (KGEqbase). The KGE is preferred here over the Nash-Sutcliff Efficiency because it better represents the low flow periods than the Nash-Sutcliff Efficiency (Gupta et al., 2009). Each river flow time series was divided into a calibration period (first two thirds) and a validation period (last third), therefore allowing the objective functions to be computed per period. 190 The caRamel algorithm (Monteil et al., 2020), a combination of the multi-objective evolutionary annealing simplex algorithm (MEAS; Efstratiadis and Koutsoyiannis, 2008) and the non-dominated sorting genetic algorithm II (ε-NSGA-II; Reed and Devireddy, 2004), automatically calibrated the eight HB parameters to maximize KGEqtot and KGEqbase values. The algorithm produces an ensemble of parameter sets (called generation) to run the model, downscales the generation to the parameter sets that optimize the objective functions, and creates a new set of parameters that produces better results. To produce new 195 generations and ensure that the optimization tends toward a global maximum, the algorithm samples the parameter sets that individually maximizes the two objective functions, KGEqtot and KGEqbase, samples the parameter sets that maximizes the minimum values of the two objective functions, and increases the variance of each parameter. Finally, the best compromise was chosen by identifying the highest mean KGE value (KGEmean): The weights attributed to each objective function in KGEmean were arbitrarily chosen to select the calibrated parameter set that maximizes the reproduction quality of the baseflows, considered to be the proxy for GWR (KGEqbase), without losing the benefits of the multi-objective optimization (Supplementary Table A. 1).
The eight HB parameters were optimized for each gauging station, grouped by river watershed to save computational time (from 51 individual optimizations to eight grouped optimizations producing eight set of calibrated parameters) such as: 205 (1) With: (KGEqtot)ws the KGEqtot obtained over a river watershed (group of gauging stations)

(KGEqtot)station the KGEqtot obtained for a gauging station 210
Nws the number of gauging stations per watershed (KGEqbase)ws the KGEqbase obtained over a river watershed (group of gauging stations) (KGEqbase)station the KGEqbase obtained for a gauging station Finally, the set of parameters that allowed reaching the best compromise was chosen by identifying the highest mean KGE value (KGEmean): 215 The weights attributed to each objective function in KGEmean were arbitrarily chosen to select the calibrated parameter set that maximizes the reproduction quality of the baseflows, considered to be the proxy for GWR (KGEqbase), without losing the benefits of the multi-objective optimization (Supplementary Table A. 1).
The eight sets of eight calibration parameters were regionalized (uniform calibration parameters over the grid),, and averaged 220 over the study area, regionalized, using the normalized density of stations per group (number of stations per km 2 ) as weights.
The grid-wise uniform calibration parameters used for the simulation were obtained: With: xregionalized one of the eight calibration parameters regionalized 225 xws one of the eight calibration parameters optimized for the gauging stations of a river watershed areaws the area of the watershed (km 2 ) The Finally, the 100 best compromises of each group of gauging stations were used to produce the 100 best regionalized parameter sets and the HB model was run with these parameters, estimating uncertainty from the standard deviation.

Model sensitivity
Using the R package Sensitivity (https://CRAN.R-project.org/package=sensitivity), the Morris global sensitivity approach (Morris, 1991), enhanced by Campolongo et al. (2007), was performed for the eight groups of gauging stations and for the two

Model sensitivity and model calibration
The model sensitivity was obtained with 60 repetitions of the design for the eight groups of gauging stations, representing 540 model calls for each group. The relative sensitivity of the model varied markedly between the groups of gauging stations for 245 the simulated river flow rates ( Fig. 4a) but appears to be more constant for the simulated potential GWR (Fig. 4b). Flow rates were mostly sensitive to the snow-related parameters (TM and CM), except for the western watersheds where the frunoff was more important. The simulated potential GWR was most sensitive to frunoff and least sensitive to snowmelt parameters (TM and CM) for all the watersheds. The ranking from the second to the fifth-highest sensitivity of potential GWR only slightly varied between groups of gauging stations. River flow rates and potential GWR clearly showed limited sensitivity to the soil freezing 250 time (FT) with only a slightly higher sensitivity for the eastern watersheds W7 and W8. Overall, the potential GWR was more sensitive to parameter variations than river flow since all the μ* for the river flow were lower by a factor 2 to 10 than for the potential GWR (values not presented here).
In a preliminary step, the calibration process was performed with the W6 (Bécancour) group of stations. Up to 5 000 model runs were performed and the 25 best compromises were extracted to compare the equifinality in the solutions. The KGEmean 255 for this watershed varies within a range of ±0.005 in calibration and ±0.02 in validation (supplementary Table A. 1).
Furthermore, the variability of the optimized parameters of the 25 best compromises was very limited, therefore allowing to assume that the optimization per gauging station group converged. Calibration for this watershed was relatively short (e.g., 5 min for a 3 380 km 2 watershed, 500 m resolution grid (13 500 cells), for 57 years with 15 cores and 50 Go of RAM) and convergence was repeatedly attained before 1 000 model runs. The objective functions, the calibrated parameters, and the 260 interannual potential GWR values showed very limited sensitivity to the weights chosen for the computation of the KGEmean (supplementary Table A. 1). As the absolute value of baseflow changed with the filter method, the interannual GWR varied in consequence. The calibrated parameters with the different baseflow estimates slightly varied but remained in the parameter ranges described in Table 2. The objective functions were similar, between 0.7 and 0.8, except for KGEqbase in validation, which was slightly lower with the Eckhardt (2005) and Chapman (1991) filters than for Lyne and Hollick (1979) filter. 265 In light of these results, calibration of the rest of the gauging station groups was thus undertaken with 1 500 model runs in order to save computational time. As well, the stochastic calibration and 30 passes of the Lyne and Hollick (1979) filter developped by Ladson et al. (2013) was retained for the baseflow estimate and weights of 0.4 and 0.6 were associated to KGEqtot and KGEqbase respectively for the computation of KGEmean. Following regionalization, a satisfactory fit was observed between the calibrated model and measured flow rates and baseflows for all gauging station groups ( Table 3). The KGEmean 270 varied between 0.64 and 0.82 (average = 0.72) for the calibration period and between 0.62 and 0.77 (average = 0.69) for the validation period. The objective functions were very similar between the two periods, confirming the satisfactory calibration of the model and its capacity to reproduce the water budget over a long period, considering that calibration and validation span 1961 to 2017, and for the entire study area. The mean bias on simulated river flow rates and potential GWR varied between -9 mm/month and 5 mm/month (Table 3). The uncertainty computed with the 100 best regionalized parameter sets was 275 ≤ 10 mm/yr for the three simulated variables (Table 4).

Simulated potential groundwater recharge for 1961-2017
The calibrated HB model was used to simulate potential GWR for the entire study area on a 500 m x 500 m grid for the 1961-2017 period. Examples of the simulated monthly baseflows (Lyne and Hollick filter) and potential GWR are illustrated in 280 Fig. 5 for the downstream stations in W3, W7, and W8. The simulated potential GWR compared favorably with baseflows estimated using the Lyne and Hollick (1979) digital filter. Maximum values were reached simultaneously in April, during the spring month(s) of maximum VI. A second baseflow and GWR peak was observed and simulated in November-December of most years. Lowest values were reached in July to September (high AET rates) and February (minimum VI). Similar matching results in timing and amplitude were obtained for the river flow (not presented here). Simulated AET was null in the winter 285 until the spring thaw, after which it quickly reached its highest value (> 100 mm/month) in July and decreased at the end of August, reaching null values again in November. Comparable results were obtained for the other gauging stations (not shown).
The distribution of GWR rates showed a zone of low GWR rates (< 140 mm/yr) in the western part of the study area (flat and clayey St. Lawrence Platform), mainly in W1, W2 (except for its southeast part), the central and northwestern part of W3, and the northern part of W4, W5, W6, and W7 (Fig. 6a). A zone with higher GWR rates (140 mm/yr to 280 mm/yr, locally 290 > 280 mm/yr) covered the southern part of W3, W4, W5, W6, and W8 (Appalachians). Areas of high GWR rates in the central part of the study area (W2, W3, W4, W5, and W6) also corresponded to the zones of higher precipitation (Fig. 2c). The regional GWR/precipitation ratio (Fig. 6b) identified the areas of preferential recharge with ratios higher than 0.15, corresponding to areas of potential GWR rates higher than 140 mm/yr mainly located in the Appalachians (southern part of W2 and W3, and W4 to W8). In the St. Lawrence Lowlands (W7 and lower portions of W4, W5, W6 and W8) and on the Appalachians Piedmont 295 (central portions of W3), the ratios above 0.2 were associated with superficial coarse materials disconnected from the regional water table, meaning that the simulated potential GWR probably did not reach the fractured bedrock aquifer.

Partitioning and seasonality of the water budget
Across the study area, the average simulated runoff was 444 mm/yr (41% of total precipitation), the average simulated AET was 501 mm/yr (47% of total precipitation), and the average simulated potential GWR was 139 mm/yr (12% of total 300 precipitation) (Table 4). Mean annual GWR rates accounted for less than 140 mm/yr in watersheds mainly located in the St.
Lawrence Platform (W1, W2, and W3 to a certain extent) and for more than 140 mm/yr for the watersheds mainly located in the Appalachians (W4 to W8), according to the two zones of high and low GWR rates previously described.
The seasonality of the water budget varied with more important winter runoff and potential GWR (and higher spring AET rates) in the western watersheds than in the eastern watersheds (Table 4). Summer and fall AET and potential GWR were less 305 important in the western watersheds than in the eastern. Runoff and potential GWR clearly peaked during spring (51% and 44% of the annual value respectively) due to the snowmelt event in all watersheds. These variations in seasonal proportions of each variable were driven by the regional temperature gradient (decreasing temperature from west to east).
Partitioning of runoff, AET, and potential GWR according to soil type showed maximum runoff rates for organic deposits, often associated with wetlands, and minimum rates over clay (Fig. 7a). The highest AET rates occurred over coarse sediment 310 and the lowest rates over clay, while the highest potential GWR rates were found in coarse deposits and the lowest for clayey areas. Steepening slopes produced more runoff and a slight increase of potential GWR (not for slopes > 8%) (Fig. 7b). The increase in GWR rates associated with steeper slopes can be linked to the presence of coarser material usually corresponding to glacial deposits on hillsides (high infiltration rates). Low runoff rates were associated with the flat and clayey areas of the St. Lawrence Platform, where annual precipitation is the lowest of the study area (Fig. 2c). A clear contrast was visible in the 315 effect of land cover on the water budget (Fig. 7c), with the highest runoff rates over wetlands and lowest in forested areas. The highest AET rates were found for wetlands, while the lowest were found for urban areas. The highest potential GWR rates were associated with forested areas, and the lowest with wetlands.

Temporal evolution of the simulated water budget since the 1960s
HydroBudget simulated the temporal evolution of the water budget in the study area from 1961 to 2017, thus producing an 320 exceptionally long simulated time series of runoff, AET, and potential GWR for the area (Fig. 8). The effect of interannual variability in precipitation appeared clearly in the simulated runoff with low runoff rates (< 350 mm/yr) produced during the driest year and high runoff rates (> 550 mm/yr) produced during the wettest year (Fig. 8a, c). The simulated AET varied less than runoff, mainly between 450 mm/yr and 560 mm/yr (Fig. 8d). Interannual variations of potential GWR were relatively high, comprised between 90 mm/yr and 200 mm/yr, and seemed more influenced by precipitation than by temperature 325 variations (Fig. 8e).
The simulation period was split into three 19-year periods (1961-1979, 1980-1998 and 1999-2017) to look for significant changes in input variables (precipitation and temperature) and simulated variables (Tukey test, p < 0.05) (Fig. 8). Significant changes were found in the input variables between the 1961-1979 and the 1999-2017 periods, with an increase of temperature for all watersheds and increase of precipitation in five watersheds. These changes produced a significant increase of AET for 330 all watersheds except W1, and an increase in runoff for three watersheds (W3, W6 and W8). However, these changes in precipitation and temperature did not impact the annual potential GWR for which significant increases were simulated between the 1961-1979 and the 1980-1998 periods only for W4, W6 and W8.
When considering the entire period , VI significantly increased (Mann-Kendall test, p < 0.05) annually for all watersheds except W5 and W7 and during winter (December to February) and summer (June to August) for all watersheds 335 except W1 and W7 during summer (supplementary Table A. 2). Significant annual and seasonal temperature increases were also observed for all watersheds and for all seasons except W5 and W6 during spring. Similar to VI, runoff significantly increased annually for all watersheds except W5 and W7 and during winter and summer for all watersheds except W1 during summer. Significant increases in AET were also simulated for annual values and during winter and spring, and unlike temperature, summer and fall AET showed no significant trend except for W6 and W8 during summer. Significant increases 340 in potential GWR were simulated only for winter values for all watersheds except W1. Interestingly, although increasing runoff and AET were observed throughout the simulation period , no decreasing trends were simulated in potential GWR.

Regional groundwater recharge dynamic in cold and humid climates 345
Although river flows are well monitored by an extensive gauging station network over long periods, groundwater level monitoring has been initiated in the Province of Quebec only at the turn of the century (MELCC, 2020). Recharge estimates from these time series are yet to be produced. Moreover, a regional-scale lysimeter network still needs to be implemented.
Local ground truthing of HB simulation results is thus limited by the lack of spatially-distributed GWR estimates from alternative field-based measurements, restricting the verification of model results to comparisons with local field and modeling 350 studies.
The average simulated potential GWR for the study area (139 mm/yr; 12% of annual precipitation) is comparable to values found in previous studies in the region. For example, Chemingui et al. (2015) used the CATHY integrated hydrological model in W1 to estimate GWR to be 200 mm/yr. Although this value is much higher than the one obtained using HB in the same area (i.e., 70 mm/yr to 250 mm/yr in Chemingui et al. (2015) and 70 mm/yr to 280 mm/yr with HB). The average recharge rates simulated using the HELP model, also calibrated using flow rates and baseflow estimates, are similar to the HB-simulated potential GWR rates in two watersheds of the study area: 86 mm/yr for W1 (Croteau et al., 2010) and 185 mm/yr for W8 (Benoit et al., 2014) compared to 109 mm/yr and 145 mm/yr with HB respectively. Using the WaterGAP Global Hydrology Model (WGHM; Döll et al., 2003) and a 0.5° spatial resolution, Döll and Fiedler (2008) found GWR rates ranging from 360 100 mm/yr to 300 mm/yr in southern Quebec, thus generally matching the range of HB-simulated GWR. Wada et al. (2010) simulated much higher GWR rates over the study area (300 mm/yr to 1 000 mm/yr) with the global hydrological model PCR-GLOBWB (Bierkens and van Beek, 2009) at a 0.5° spatial resolution. These values do not match HB estimates, nor do they match any other results for the study area.
The spatially-distributed potential GWR values show a clear difference between areas in the St. Lawrence Platform 365 (GWR < 140 mm/yr) and areas in the Appalachian geological units (GWR > 140 mm/yr). Higher recharge occurs when soils are coarser or in outcropping bedrock areas, which are mostly located in the Appalachians and upstream in the watersheds, in accordance with what has been found in previous studies (Benoit et al., 2014;Croteau et al., 2010;Levison et al., 2014). Spatial variations in GWR also correlate well with land use, with preferential infiltration zones in forested areas and lower infiltration rates over wetlands and urban areas. Overall, local surface conditions, such as soil type and land use, mostly influence GWR 370 rates. Similar results of factors influencing GWR are reported by Batelaan and De Smedt (2007) and Zomlot et al. (2015) in Belgium (oceanic climate), and by Nielsen and Westenbroek (2019) in Maine (USAcold and humid climate).
Within the study area, potential GWR occurs mainly in the spring (44%) and winter (27-38%), while almost no GWR occurs in the summer (3-10%), and a moderate contribution occurs in the fall (14-21%). Differences between watersheds in this seasonal dynamic seem to be driven by the southwest-northeast decrease in temperatures, with higher winter GWR rates 375 obtained in the warmer western watersheds and higher summer and fall GWR obtained in the colder eastern watersheds. and winter temperature were identified as important variables for the variability of winter and summer low flows. The seasonality of GWR is well documented in the humid conditions of northern Europe as well, with an increase of the importance of winter events (rain-dominated GWR) at the expense of the spring snowmelt (snow-dominated GWR) with the increase of average temperature from north to south (Kløve et al., 2017;Nygren et al., 2020).
Simulated GWR from the current study is inversely correlated with AET, occurring preferentially when AET is low, 385 immediately following snowmelt and before the onset of soil frost. Similar results are reported from other cold and humid climates (Grinevskiy et al., 2021;Kløve et al., 2017;Okkonen andKløve, 2011, Nyren et al., 2020). The importance of snowmelt-related recharge evidenced by the simulations has also been observed using field data. Using isotopic analyses, of winter precipitation in central Canada, and worldwide with preferential infiltration periods during the cold months (Jasechko 390 et al., 2014). Wright and Novakowski (2020) have linked increases of groundwater levels from observation wells to GWR events in the neighboring Province of Ontario (Canada) during rain-on-snow and partial thaw events, despite the presence of a frozen soil. They underline the importance of winter recharge events in seasonally frozen environments and their potentially increased importance in a warming climate.

Trends in groundwater recharge in cold and humid climates in the past decades
Warming temperature were recorded in the cold and humid climates of the northern hemisphere since the middle of the 20 th century. Mean annual temperature has increased by +2°C to +3°C between 1954 and 2003, and this increase has been more marked between December and February (up to +4°C for the same period) (Arctic Climate Impact Assessment, 2004).
In eastern Canada, temperature warming up to +1°C to +3°C has been identified for the 1948-2016 period (Vincent et al., 400 2018), leading to important changes in the seasonal distribution, and particularly to an increase in winter precipitation and a decrease of accumulated snow during the cold season (Kong and Wang, 2017). This work has shown that although the simulated potential GWR was not impacted by the long-term increases in precipitation or temperature on an annual time-step for the 1961-2017 period, these changes produced significant increases in annual runoff and AET and may not have been sufficient yet to propagate to potential GWR. As observed with past data for winter trends in potential GWR, under changing 405 climate conditions, seasonal and monthly GWR could increase during periods with more available liquid water and low AET rates (late fall, winter, early spring). Inversely, GWR could decrease for periods of enhanced AET rates (warmer temperature) such as late spring, summer, and early fall.
In Canada, other authors have looked for trends in observed time series of groundwater levels for the snow-dominated region of British-Columbia (Allen et al., 2014) or across the country (Rivard et al., 2009), but each time, no coherent pattern could 410 be identified. This could be due to short time series or insufficient data coverage. In western Russia, Grinevskiy et al. (2021) simulated an increase of GWR between the 1965-1988 and 1989-2018 periods with a surface energy balance coupled to the Hydrus unsaturated zone model. The GWR increase was due to warmer winter leading to a decrease of soil frost depth and an increase of the winter soil-water storage. Interestingly, the warming temperatures did not produce AET increases which was interpreted to be due to a sharp decrease in wind speed. Inversely, Nygren et al. (2020) identified a significant decrease in 415 groundwater depth time series across Finland and Sweden between the 1980-1989 and 2001-2010 periods. These authors showed that the deeper groundwater levels were linked to decreasing GWR pattern associated with shorter snowmelt periods and longer periods of intensive evapotranspiration. In comparison, the temperature increase observed in southern Quebec over the 1961-2017 period was probably not large enough to produce such a change.
An outcome of this work was to show that the regional long-term potential GWR simulations with a water budget allowed 420 identifying the impacts of a changing climate on the past GWR conditions by positioning the trends (or absence of significant trends) in the globally changing hydrologic dynamic. As well, it reminded the high responsivity of the hydrologic dynamic of the cold months to the climate changes in the cold and humid climates.

Implications for water management 425
This study has shown that the use of a GWR water budget model relatively easily produced long-term spatially-distributed GWR values at the regional scale, with an acceptable spatial resolution and monthly time steps. The average GWR rate for the region, based on the new regional estimate, is 139 mm/yr, with lower aquifer renewal rates in the St. Lawrence Platform (< 140 mm/yr) than in the Appalachians (> 140 mm/yr). Preferential infiltration areas correspond to forested areas (170 mm/yr) and areas covered by coarse sediments (180 mm/yr) or outcropping bedrock (150 mm/yr). The decreasing 430 temperature from west to east impacts the intra-annual GWR dynamic, imposing higher winter GWR in the warmer watersheds, from 38% of annual recharge in W1 to 27% in W8. Inversely, summer and fall GWR rates are lower in the western watersheds (3% and 14% respectively in W1) than in the eastern watersheds (10% and 21% respectively in W8). Furthermore, the warming temperatures since 1961 have not negatively affected GWR until 2017, due to a simultaneous increase in precipitation and higher winter GWR, compensating for the increase in AET. However, with more intense and faster climate change, this 435 dynamic could change in the near future if temperature warming was to exceed precipitation increase.
Based on the study results, GWR maps should be included in land management planning to avoid impacting areas of preferential recharge, such as forests, areas covered by coarse material, and zones of outcropping bedrock. Although the GWR/precipitation ratio in eastern North America seems to be 0.25 and higher (Jasechko et al, 2014, based on the global simulation of Wada et al. (2010)), areas of preferential recharge can be identified in the study area based on a 440 GWR/precipitation ratio higher than 0.15 (the worldwide mean in Jasechko et al., 2014). These represent essential areas for the quantitative replenishment of groundwater resources, and therefore the most vulnerable areas in terms of water quality of southern Quebec. Forested areas should be prioritized for future preservation, and forest conservation policies implemented for groundwater resource protection. Although it has been shown that wetlands are often connected to the water table in southern Quebec (Bourgault et al., 2014) they are not considered to be preferential infiltration areas in HB. Separate studies 445 will be necessary to investigate their role in regional groundwater flow. Considering the substantial influence of winter on the regional hydrology, one of the novel outcome of this study, and that changes in the winter dynamic could highly influence the entire hydrologic dynamic of cold regions (Jasechko et al., 2014(Jasechko et al., , 2017, additional winter-related research should target gauging of winter river flows, soil frost modelling, enhancing the vertical inflow and snowpack modeling, and would probably require the development of the winter monitoring network. 450 It is important to underline that actual GWR is most likely lower than potential GWR, especially where superficial deposits are thick and have low hydraulic conductivity, or where impermeable sediments are present at depth and may induce confined conditions for underlying aquifers. To overcome this, Rivard et al. (2013) and Gagné et al. (2018) applied a 40% reduction coefficient in the semi-confined areas of W5 to estimate actual GWR from potential GWR and assumed that no recharge occurred over confined aquifers. 455

Contributions to and limitations of water budget models in groundwater recharge simulation
HydroBudget was specifically developed to simulate spatial and temporal variations in potential GWR at the regional scale, for long periods, and for regions where winter largely affects the intra-annual hydrologic dynamic. The model can reasonably be used where hydrogeological and superficial watersheds are similar, where regional groundwater flow converges to rivers, 460 and when a monthly resolution is sufficient. Although the model does not compute water routing, groundwater-surface water feedback, or evapotranspiration from groundwater, and produces potential GWR, the satisfying simulation results found in this work justify the water budget calculation scheme used in HB, resulting in an easy-to-use and computationally efficient model. Furthermore, the uncertainty analysis showed that the calibration method provides an interestingly small uncertainty for the simulated variables, and a mean bias similar to that of the input data. 465 The impact of long and cold winters was included in HB through the widely used degree-days method that represents snowpack evolution (Massmann, 2019), and through the representation of freezing soil conditions with a threshold temperature and a duration of the threshold temperature to freeze the soil (TTF and FT respectively). The results showed that the simulated potential GWR was sensitive to TTF, while both flow rates and potential GWR had limited sensitivity to FT (Fig. 4). The colder watersheds were more sensitive to these parameters while the simulations of river flows in the warmer watersheds were less 470 sensitive to the snow-related parameters. These results underline the importance of including soil freezing in GWR modeling for cold regions, as did other studies in cold and humid climates (Grinevskiy et al., 2021;Nemri and Kinnard, 2020;Okkonen and Kløve, 2011). This work pinpoints the need for more research on winter recharge to better understand the processes involved.
Several existing water budget models (HELP; HydroBudget; SWB; water balance GIS tool; Huet et al., 2016) use the RCN 475 curve number method (USDA-NRCS, 2004) as a simplification for runoff calculation. This method, developed specifically for the US context, has been criticized for its empirical basis (Ogden et al., 2017). However, it is continuously adapted and used for new environments and different size areas and is well-documented and easy to use (Bartlett et al., 2016;Lal et al., 2019;Miliani et al., 2011;Ross et al., 2018). Notably, it is used to estimate runoff in the SWAT hydrological model (Neitsch et al., 2002), and is calibrated specifically in the Quebec climate and geological context (Gagné et al., 2013;Monfet, 1979). In water 480 budget models, the RCN method drives the partitioning of the simulated water budget into runoff and water available for AET and GWR, therefore making the related parameters (tAPI and frunoff) particularly sensitive in their simulation (e.g., frunoff is the most sensitive parameter for potential GWR in HB; Fig. 4). Because it is based on land cover, topography, soil types, and seasonal humidity, the RCN method reflects the spatial variability of the large study area, which probably contributes to the ability of HB to simulate the surficial water budget and discriminate its partitioning depending on the superficial conditions 485 (superficial geology, land cover).
The AET corresponded to 47% of the annual precipitation (501 mm/yr), a result similar to those reported in previous work in the study area (Benoit et al., 2014;Croteau et al., 2010;Guay et al., 2013). Considering the vegetation cover of southern Quebec (45% of the area of forest and 6% of wetlands), its relatively large water availability, with an average of 1,090 mm/yr of precipitation evenly distributed throughout the year, the topography-driven water table, and the shallow depth of the 490 aquifers, groundwater-vegetation-atmosphere exchanges are probably intensive (Koirala et al., 2017;Xu and Liu, 2017). The configuration of the HB model, with a soil lumped reservoir for AET-potential GWR partitioning, is therefore conceptually suitable (Cuthbert et al., 2019). AET computation is driven by two parameters, i.e. the runoff factor (frunoff), influencing the partitioning between runoff and infiltration into the soil reservoir, and the soil reservoir parameter (swm), determining the reservoir capacity. Evapotranspiration calibration data would be useful to better constrain these two parameters, but measured 495 values are rarely available for this variable.
The simulated transient GWR was calibrated with baseflows computed using regressive filters from river flow rate time series.
Although Partington et al. (2012) showed that the association between baseflows and GWR is not always satisfactory and different baseflow separation methods can lead to differences in volumes and timing (Gonzales et al., 2009;Zhang et al., 2017), baseflows are generally considered an acceptable proxy for GWR in cold and humid climates (Chemingui et al., 2015;Dierauer 500 et al., 2018;Rivard et al., 2009). They are widely used for the calibration of GWR simulations (Batelaan and De Smedt, 2007;Croteau et al., 2010;Dripps and Bradbury, 2007;Gagné et al., 2018;Rivard et al., 2013). In this work, using baseflow estimated with the Lyne and Hollick (1979), Eckhardt (2005), and Chapman (1991) filters influenced the proportion of baseflow in total river flow, and consequently led to variations in simulated potential GWR (Supplementary Table A. 1). However, the quality of the simulations was comparable due to parameter variations that remained in the possible range of the parameters (Table 2). 505 Having a model that is fully coded in an open-source language leaves the option of reshaping its structure and conceptual model to specific study needs and environments. Changes to the model could include, for example, changing the PET calculation to a formula considered to be more suitable for a specific local context, defining a different runoff computation method that would better represent local conditions, modifying the representation of the soil reservoir by spatializing it, or using other calibration data, such as observation of AET or other baseflow filters. However, it should be kept in mind that 510 increasing the complexity of the process representation would increase the number of parameters to be calibrated and most likely increase computation time as well. An original contribution of this work was to show that the tested calibration method combined with a regionalized parameter set offers a very acceptable solution that is highly reproducible and could be applied in less monitored regions. Apart from being relatively simple to use, the HB model has high potential for non-hydrogeologist users, especially if the computational time is limited and the update of observation data and the automatic calibration procedure 515 are easy tasks. Hydrogeologist modellers might be interested in including HB as a complementary tool for comparative purposes when using integrated models or when in need of GWR estimates for groundwater flow models.
Groundwater recharge is a strategic hydrologic variable that needs to be estimated for sustainable groundwater management, 520 especially within the global warming context that highly impacts winter conditions the cold and humid climates regions. It is a challenge to simulate groundwater recharge at the regional scale and for long-term conditions because of long computational times and the large number of calibration data required to produce reliable results, especially that groundwater observations often remains too scarce for such spatio-temporal scales. The objective of this study was to demonstrate the relevance of using a water budget model to understand long-term transient and regional-scale GWR in cold and humid climates where 525 groundwater observations are scarce. As a typical case study, the HydroBudget model was automatically calibrated with river flow rates and baseflow estimates, to simulate long-term regional-scale GWR in the cold and humid climatic conditions of southern Quebec. With the model simultaneously calibrated on 51 gauging stations, GWR was simulated between 1961 and 2017 at the regional scale (36,000 km 2 ) with very little uncertainty (<10 mm/yr). GWR estimates at this scale, including eight tributary watersheds of the St. Lawrence River with a monthly time step and 500 x 500 m resolution was not available until 530 now. The study demonstrated that the ability of the HB model to simulate snowpack evolution and soil frost appears to be a key feature for GWR simulation. Another outcome of this work was to show that the model sensitivity to its parameters is correlated to the average air temperature of the watershed, making the simulated water budget in watersheds with lower temperature more sensitive to snow-related parameters than the warmer watersheds. Nevertheless, additional research focusing on snow melting and freezing soil processes would help to better constrain the model for the winter and spring periods, 535 consequently helping better anticipate future changes.
Water budget partitioning has proven to be highly useful to link changing climatic conditions to trends and evolution in regional GWR in the last six decades. The water budget is partitioned into runoff (41%, 444 mm/yr), AET (47%, 501 mm/yr), and potential GWR (12%, 139 mm/yr). This study shows that groundwater recharge peaks in the spring (44% of annual recharge) and is high during winter (32% of annual recharge). Interestingly, the long simulation period made it possible to identify a 540 significant increasing trend of GWR during winter and no decreasing trends during summer, despite warmer temperatures. This is most likely because increases in AET are compensated by increases in precipitation. In contrast to previous studies of past GWR trends in cold and humid climates, this work has shown that changes in past climatic conditions have not (yet) produced significant changes in annual potential GWR but have impacted runoff and AET. However, considering the intense and fast climate change expected in future decades, the water budget partitioning and the regional GWR could decrease if 545 temperature warming were to exceed the increase in precipitation.
Besides providing essential long-term and regional-scale data on regional GWR, a water budget model with a relative ease of use, such as HB, and that can be calibrated regionally with available data can be extremely useful for non-hydrogeologist users from water management and environmental agencies. It can be used to implement water resource conservation policies, anticipate the impacts of climate change, and identify additional research required. 550  Rm = simulated monthly total runoff (mm) n = number of days in the considered month AETm = simulated monthly AET (mm) GWRm = simulated monthly potential GWR (mm) gauging stations of W6 and ranges obtained with the 25 best compromises (before regionalization) using different calibration weights and different baseflow separation methods.   The code of the HydroBudget model is open source and can be obtained by direct request to the authors.

Author contribution
ED, ML, and SG contributed to developing the approach and all authors contributed to writing the paper. ED and SG developed the HB script in R, based on the model created by GM, which was implemented in java script. Data management, simulations, 605 and figure preparation were done by ED. ML obtained the research grant and supervised the research.
The authors declare that they have no conflict of interest.

Acknowledgements
This project was funded by the Quebec Ministry of Environment and Climate Change (Ministère de l'Environnement et de la