Stress-testing groundwater and baseflow drought responses to synthetic climate change-informed recharge scenarios

. Groundwater is the main source of freshwater and maintains streamflow during drought. Potential future groundwater and baseflow drought hazards depend on systems' sensitivity to altered recharge conditions. We performed groundwater model experiments using three different generic scenarios to estimate the groundwater- and baseflow drought sensitivity to changes in recharge. The scenarios stem from a stakeholder co-design process that specifically followed the idea of altering known drought events from the past, i.e. asking whether altered recharge could have made a particular event worse. 10 Across Germany groundwater responses to the scenarios are highly heterogeneous with groundwater heads in the North more sensitive to long-term recharge and in the Central German Uplands to short-term recharge variations. Baseflow droughts are generally more sensitive to intra-annual dynamics and baseflow responses to the scenarios are smaller compared to the groundwater heads. The groundwater drought recovery time is mainly driven by the hydrogeological conditions with slow (fast) recovery in the porous (fractured rock) aquifers. In general, a seasonal shift of recharge (i.e. less summer recharge and 15 more winter recharge) will therefore have low effects on groundwater and baseflow drought severity. A lengthening of dry spells might cause much stronger responses, especially in regions with slow groundwater response to precipitation. As climate models suggest such directional changes for Germany in the future, the results of the stress tests suggest that groundwater resources in Germany may not decrease in general, but water management may need to consider the potential for more severe groundwater droughts in the large porous aquifers following prolonged meteorological droughts.

meteorological drought that can propagate through all parts of the hydrological cycle (Van Loon, 2015). It can lead to social and economic impacts, especially during seasons with low water availability compared to water demand. As a natural hazard drought affects people worldwide and causes high economic loss (EC, 2007). Hence, the groundwater's potential to attenuate meteorological droughts influences society´s current and future vulnerability to drought events.
The groundwater response to meteorology can be highly diverse both on small and large scales (Stoelzle et al., 2014;35 Bloomfield et al. 2015;Kumar et al., 2016, Haas andBirk, 2018). Weider and Boutt (2010) showed that groundwater responses to precipitation anomalies are more heterogeneous compared to the responses of streamflow. Accordingly, Bloomfield et al. (2015), Kumar et al. (2016) and Stoelzle et al. (2014) consistently found that typical time scales of drought propagation into groundwater are site-specific, pointing to the importance of hydrogeological characteristics and subsurface storage processes.
The sensitivity to changes in the meteorology will hence be site-specific and is often not generalizable, in particular when 40 considering borehole data from specific locations within an aquifer and relative to rivers or recharge areas (Heudorfer and Stahl, 2017). Hellwig and Stahl (2018) found that the differences in the groundwater response to precipitation anomalies also correspond to varying sensitivities of baseflow to precipitation shifts.
To assess the groundwater and baseflow sensitivity to climate change on larger scales, extensive observational data capturing the large diversity of their responses to meteorology would be required. As borehole observations are often hardly scalable 45 (Kumar et al., 2016), these datasets are rarely available on a larger scale and groundwater models are often inevitable for detailed investigations. Recently, the use of large-scale groundwater models including gradient driven lateral flows has gained increasing attention (e.g. Maxwell et al., 2015;de Graaf et al., 2015;Reinecke et al., 2019), as large-scale datasets on aquifer parameters become more and more available. Hellwig et al. (2020) demonstrated that these models can depict the differences in propagation time from meteorological water deficits to groundwater on a large-scale reasonably well, concluding that they 50 are also suitable to assess the groundwater's and baseflow's sensitivity to climate change on larger scales.
The most common approach to estimate the impact of climatic changes on hydrological systems are model chains starting from emission pathways and global climate models and leading to regional impact models (Keller et al., 2019). In general, climate change scenarios allow the assessment of system changes but quantitative predictions of future changes are subject to large uncertainties (e.g., Lehner et al, 2020). Climate models (often) lack alterations in the sequencing of future wet and dry spells. 55 Both, time sequencing and small magnitudes of change, however, matter strongly to low flow and drought responses (Vormoor et al., 2017).
Alternative approaches such as scenario-neutral ensembles testing systems' sensitivities have therefore been proposed for example to inform planning processes for floods (Prudhomme et al., 2010). Designing similar approaches for drought, a slowly developing phenomenon with time lagged signal in streamflow and groundwater, requires the consideration of longer lead 60 times and resulting depletion of catchment storage (e.g. climate change-informed seasonal wetting or drying). For example, Staudinger et al. (2015) used scenarios of progressive drying to assess the streamflow sensitivity to drought for catchments across Switzerland. Stoelzle et al. (2014) developed a model-based scenario approach to study the sensitivity of streamflow to changes in climate based on modifications of the recharge. More applied synthetic stress testing approaches often use worst case scenarios to estimate the consequences of specific events. Stress testing sensitivity to drought will help to better understand 65 the degree of resilience of various hydrological systems (Hall and Leng, 2019).
As part of the Climate and Water Initiative of southern Germany's federal states (KLIWA) different types of "stress test scenarios" or "what-if scenarios" were explored as means to better understand and more easily communicate potential future changes to low flow (Stoelzle et al., 2018). Scenario designing included for example a progressive recharge reduction before the 2003 summer drought, as this event is often used as planning benchmark or to assess follow-up costs: the scenarios ask 70 whether the effect may even have been worse, e.g. with different antecedent conditions. The co-design process of KLIWA revealed different preferences, including rather arbitrary repetitions of sequences of past (known) dry years, very straightforward 'wetter-drier' modifications of past periods or specific drought events, and more systematic approaches with larger model ensembles of modified conditions. In this study we employed three of the approaches from this co-design process that also allow for a systematic analysis of stress responses (e.g. drought recovery). 75 Specifically, the scenarios focus on pre-drought recharge reduction effects on the hydrological drought sensitivity simulated in the groundwater-baseflow domain. Directly modifying groundwater recharge allows to focus the research question to the storage-outflow processes relevant to the hydrology in dry periods. It is justified by the aim of testing and attributing specific sensitivities rather than general system response to climatic change in this study. As groundwater has a recharge memory, antecedent recharge conditions are a key factor for groundwater drought severity and the effect of perturbed recharge on 80 drought severity can provide information on the site-specific groundwater and baseflow drought sensitivity. The approach by Stoelzle et al. (2014) illustrated an assessment of the sensitivity to altered recharge in reservoir or box-type hydrological model and was limited to the investigation of baseflow sensitivity.
In this study, we used similar recharge scenarios, as well as the stress-test ideas of KLIWA for entire Germany in a large-scale high-resolution MODFLOW-groundwater model to assess a range of potential changes to groundwater and baseflow drought 85 hazard. Specifically, this study aims to (1) assess potential changes in groundwater and baseflow availability during drought due to a climate-change informed seasonal wetting and drying shift, (2) identify large-scale sensitivity patterns of groundwater and baseflow drought events to extreme recharge drought conditions with particular return periods, and 90 (3) quantify characteristic groundwater drought recovery times.

Study area and groundwater model setup
The study area of this work is the state of Germany. Germany consists of four main natural regions with different groundwater characteristics ( Figure 1): the lowlands in the North with slow responding groundwater in porous aquifers, the uplands in Central Germany with faster responses and mixed aquifer types including fractured rocks and karst aquifers, the Alpine 95 foothills in southern Germany with porous aquifers and the high elevation Alps in the far South with mostly fractured rocks https://doi.org/10.5194/hess-2020-211 Preprint. Discussion started: 2 July 2020 c Author(s) 2020. CC BY 4.0 License.
aquifers. Germany's temperate humid climate is characterized by evenly distributed precipitation throughout the year and an annual temperature cycle that results in climatic water deficits due to higher evapotranspiration rates. As a result, groundwater recharge largely takes place during the winter months (Jacob et al., 2012, Kopp et al,, 2018. To assess the groundwater response to recharge scenarios we apply a large-scale groundwater model covering Germany. The 100 model consists of one MODFLOW layer (Harbaugh et al., 2000), simulating groundwater heads, baseflow (i.e. groundwater discharge to surface water) and lateral flows in weekly time steps. It covers all basins intersecting Germany (i.e. river Rhine in the West, river Danube in the South, river Elbe and river Oder in the East) with a spatial resolution of approximately 1 km (latitudinal: 1/22°, longitudinal: 1/14°). Hellwig et al. (2020) developed and evaluated the model, demonstrating its ability to depict the heterogeneous groundwater response to precipitation anomalies even though model performance markedly declined 105 in the mountainous regions due to the larger topographic variability. In the following the model structure and input data are briefly described, for detailed information refer to Hellwig et al. (2020).
Specific yield values were taken from the porosity values in the GLobal HYdrogeology MaPS (GLHYMPS: Gleeson et al., 2014). Initial hydraulic conductivity values k0 for Germany were derived from the "Hydrogeologische Übersichtskarte" (hydrogeological map HÜK200: BGR and SGD, 2016), for the rest of the model domain k0 was based on GLHYMPS' 110 permeability values. Consistent with other groundwater models based on a single layer (e.g. Fan et al., 2007;Miguez-Macho et al., 2008), hydraulic conductivity was assumed to decrease exponentially with depth. The characteristic decrease is described by the e-folding depth f. Then, transmissivity T depends on k0, f and the current groundwater table depth dgw: where z' is the depth below surface. T is updated every time step. 115 Interactions between surface water and groundwater were implemented using the RIV-package, simulating flow dependent on the difference of groundwater and surface water heads. Each cell contains either a large river (width > 10 m) with strong interactions with the aquifer or a small stream (width < 10 m) with less interactions. Channel depth, riverbed conductivities and river head over riverbed were derived from long-term average routed baseflow of previous model runs (Hellwig et al., 2020). Baseflow and infiltration was assumed to be proportional to the difference of groundwater heads and surface water 120 heads as well as riverbed conductivity. Hence, with decreasing water tables baseflow reduces and stops when groundwater heads fall below surface water heads.
Groundwater recharge was calculated using a conceptual recharge model consisting of a soil storage and a snow storage.
The groundwater model was evaluated by Hellwig et al. (2020) using 202 groundwater borehole time series and 338 streamflow observations. Their results suggested that the model can reproduce the standardized time series as well as precipitation accumulation times that have the maximum correlation (Tmax) with groundwater and baseflow, even though the model is still 130 too coarse for the small-scale variability in the mountainous regions of Germany. The Tmax measuring the time needed to propagate anomalies from precipitation to groundwater were found to be a good measure for the patterns of groundwater drought following different meteorological drought events. Moreover, Tmax were found to be related to the model parameters conductivity, specific yield and elevation.

Scenario design and modelling approach
Three types of generic recharge scenarios addressing different questions for drought management were applied to the groundwater model (Table 1). To do this the scenarios have different boundary conditions and different recharge modifications. 140 All scenarios apply relative changes over entire Germany, thus allowing the results to be analysed as composite maps of the same relative change but with respect to the specific local conditions. This sensitivity analysis approach should not be confused with the more common climate change model chain experiments that would apply locally varying changes stemming from the combination of climate model output and hydrology or soil water balance models with particular assumptions and parametrizations of vegetation and soils. The composite maps therefore represent response differences to the designed scenario 145 inputs due to hydrogeology.
The first scenario SSHIFT assumes a potential future change in drought hazard due to an increased seasonality of precipitation and temperature. Climate projections for Germany include large uncertainties emerging from different models and emission pathways. In general, projected climatic changes and resulting estimates such as groundwater recharge are small and with low 150 model agreement compared to other regions in the world (e.g. Reinecke et al., 2019). As a general pattern, precipitation -which so far has shown little seasonality in Germany -is projected to increase during winter and decrease during summer (e.g. Jacob et al., 2012, Paparrizos et al, 2018, Herrmann et al., 2016. Combined with increasing temperatures over the whole year, recharge also can be assumed to increase (decrease) in winter (summer) (Eckhardt and Ulbrich, 2003;Stoll et al., 2011;Dams et al., 2012;Hunkeler et al., 2014;Chen et al., 2018). The magnitude of change is highly uncertain (e.g. Moeck et al., 2016) 155 and depends on the choice of reference and future period. Due to this uncertainty in the magnitude, the more general question, whether and where the expected contrasting seasonal change has a general potential to influence low flow and groundwater drought is what water management at the moment is most interested in (Table 1). In SSHIFT, we therefore run the model from scenario lies in the range of potential precipitation changes for winter and projected evapotranspiration changes in summer 160 predicted by regional climate and water balance models for Germany until the end of the 21st century (Jacob et al., 2012, Herrmann et al., 2016, Paparrizos et al., 2018. Therefore, SSHIFT should be seen as a generic scenario to gain a composite insight into groundwater and baseflow responses under the assumed seasonal recharge shifts in Central Europe (Table 1).
For the assessment of the response to SSHIFT we compare inter-annual variability and percentile thresholds for water table/baseflow during drought from the scenario run with the reference run forced by original recharge. As a spatially and 165 temporally varying threshold τ we use 0.1, 0.25 and 0.5 representing an exceedance probability of 90, 75 and 50% within the specific season (Van Loon and Van Lanen, 2012;Heudorfer and Stahl, 2017). An increase (decrease) of the water table/baseflow under SSHIFT indicates a higher (lower) water availability for the selected drought severity.
The second scenario type SEVENT focuses directly on the scale of selected drought events and is designed to assess the 170 groundwater's drought sensitivity to systematic changes in the antecedent recharge conditions (Table 1). Changes related to single events might also become relevant in the future (Taylor et al., 2013), but in particular regarding dry spells are generally not well represented in downscaled and bias-corrected climate model derived input and difficult to analyse regarding changes in hydrological drought (Vormoor et al., 2017, Kohn et al., 2019. One potential future hazard is the occurrence of more severe and prolonged meteorological drought events. Practitioners often use past events for the design of drought management plans 175 and ask whether there might be scenarios that had the potential to make these events even worse (Table 1). For this study the events of 1973, 2003 and 2015 are selected for the analysis of a range of different but well-known severe benchmark drought years. These drought years have received attention in previous publications, and although they all had large precipitation deficits also differences were noted (e.g. Tallaksen and Stahl, 2014;Laaha et al., 2017;Hellwig, 2019 andHellwig et al., 2020). Due to differences in the recharge conditions before the droughts, the groundwater situation was very different in each 180 case (Hellwig, 2019). While the 1973 event can be characterized as a long-term water deficit leading to depleted water tables across Germany (Figure 3a), the events in 2003 and 2015 were rather severe short-term summer drought events. As the winter 2002/03 was exceptionally wet, in general water tables were not depleted (Figure 3b). The 2015 event followed on a winter of average recharge and led to a severe groundwater drought in the fast responding aquifers in the South whereas the slower responding aquifers in the North did not show anomalies corresponding to a groundwater drought (Figure 3c). For SEVENT 185 antecedent recharge conditions for every modelled grid cell were altered for three different durations (3, 9 and 24 months) to investigate groundwater responses on different time scales. The month of the groundwater drought's start is set in May. For the 3-month (9-month, 24-month) scenarios we modified recharge backwards from the drought's start for 3 (9, 24) months starting in February (August of the year before, May two years before) and compared the resulting groundwater situation from May to November in the drought year to the reference simulation ( Figure 2). 190 Antecedent recharge is modified to represent a "recharge deficit event" with a return period TRP of 50 and 100 years based on the modelled 57 years reference recharge series for each grid cell . The use of return periods allows a consistent spatial comparison of the same scenario intensity. First, for all three durations the corresponding 57 recharge sums are used to fit a generalized extreme value distribution with Weibull plotting positions. Then, fitted distributions are used to estimate the recharge sums of drought events with TRP = 50 and TRP = 100 years representing different drought severities. Finally, the 195 reference recharge time series is rescaled to match these recharge sums while conserving the original variability of the recharge time series (Stoelzle et al., 2014). The reduced recharge is then used as an input for the groundwater model. Altogether, this scenario type consists of 18 model runs: for 3 drought years (1973,2003,2015) antecedent recharge was modified on three time-scales (3, 9, 24 months) to match that of a drought event of two return periods (50y, 100y).
For the assessment of the response to SEVENT we analyse changes in water table/baseflow for all different benchmark droughts, 200 time scales and return periods. Effects of SEVENT are related to potential explanatory variables from the groundwater model: hydraulic conductivity, specific yield, elevation, slope, aquifer type and Tmax.
The third scenario type (SRECOV) focuses on the recovery of the worst drought events (Table 1). As groundwater dynamics are often more damped than climate anomalies, groundwater droughts usually last longer than meteorological droughts. To assess 205 the maximum duration the groundwater system needs to recover from severe drought conditions, the lowest groundwater heads simulated between 1970 and 2016 are taken as the initial condition for each grid cell in this scenario. Then, starting in October (in general, the beginning of the main recharge period in Germany), groundwater heads are simulated using the long-term average monthly recharge as input (Figure 2). Drought termination is set to when the simulation exceeds the recovery threshold for the first time. As a recovery threshold we use the monthly variable 25-percentile groundwater head (i.e. the groundwater 210 head that is exceeded 75% of the time in that calendar month considering all simulated years). The time between the scenario start and the drought termination is the groundwater recovery time Trec, i.e. the time needed to recover from initial drought conditions. Like for the interpretation of the results from SEVENT we relate Trec to potential explanatory variables.

Groundwater drought under a seasonal recharge shift 215
The assumed SSHIFT affects groundwater heads and baseflow throughout the year. As recharge increases (decreases) during winter (summer) recharge variability increases (decreases) correspondingly (Figure 4). Most recharge in Germany (outside the Alps) occurs during winter, therefore, the seasonal differences are amplified by SSHIFT and inter-annual variability for recharge as well as groundwater tables and baseflow is increased. While in general, the changes in seasonal baseflow variability correspond to the changes in recharge variability, alterations of groundwater head variability are much more heterogeneous. 220 Not only in winter but also during spring and autumn there is an increase in variability across large parts of Germany and even in summer variability increases in the Northeast.
Under SSHIFT groundwater heads increase due to the higher winter recharge except in the alpine South, where groundwater recharge mostly occurs during summer ( Figure 5). Changes of groundwater heads are smaller during drought than for median conditions, with negligible differences between the seasons. Absolute head changes are stronger in aquifers of large head variability (i.e. the fractured rock aquifers). On the contrary, relative head changes standardized by the mean and standard deviation of natural variability are most pronounced in the large porous aquifers in the North ( Figure S1) where changes of variability are strongest as well (Figure 4).
Baseflow also increases under SSHIFT in most parts of Germany ( Figure 6). However, there are relevant differences between the seasons: during winter there is a large increase of baseflow, particularly under average conditions. In spring and autumn 230 there are only small increases in the north of Germany (not shown). Baseflow changes during summer are bidirectional with increases in the North and decreases in the South, again more pronounced for average conditions than for drought. On an annual scale changes in baseflow are rather small following the same pattern of increases in the North and decreases in the South. Changes of baseflow relative to its variability are in general much smaller compared to changes of groundwater heads ( Figure S2). 235

The groundwater drought sensitivity to antecedent recharge
All SEVENT scenarios exacerbate the benchmark groundwater droughts chosen for this stress test (Figure 7). However, the magnitude of declines in groundwater head and baseflow vary for different drought events and durations. In comparison, the effect of the chosen return period of the recharge scenario is low. The differences between SEVENT with TRP = 50y and TRP = 100y are about one order of magnitude smaller than the differences between the different TRP = 50y scenarios and the reference 240 simulation (median ranging between 4 % and 21 % for the different generic scenarios). The scenario effects are related to different parameters (examples in Figures S5-S6), most significantly to the anomaly propagation time Tmax. In general, longer Tmax are related to stronger head decreases in the scenarios whereas baseflow 250 reductions are larger for shorter Tmax (Figure 8). However, the exact relationship between Tmax and scenario depends on the event year and scenario length.

Recovery times of groundwater drought
Consistent with the results from SSHIFT and SEVENT, there is a large heterogeneity of Trec across Germany (Figure 9a). Trec is shorter than 10 months in large parts of Germany, particularly in the Central German Uplands with its fractured rock aquifers. 255 In these regions, a single average recharge season can be enough to terminate a severe groundwater drought. In the northeastern part of Germany, which is characterized by large porous aquifers, groundwater heads will still not recover to the 25-percentile recovery threshold after up to 60 months of average recharge. In these regions, average recharge is not enough to terminate a severe groundwater drought. Trec increases with hydraulic conductivity and specific yield used in the model grid cell and is significantly higher in porous aquifers compared to aquifers in fractured rocks (Figure 9b). However, the strongest 260 relationship is found between Trec and propagation time Tmax.

Discussion
All scenarios revealed a spatially highly heterogeneous groundwater response due to changes in recharge. In the northeast of Germany where large porous aquifers are prevalent, groundwater heads respond to long-term recharge characteristics.
Accordingly, in this region changes on the 24-months duration (SEVENT) or changes in the annual average recharge sum (SSHIFT) 265 cause the strongest responses. Contrasting, in the fractured aquifers of the Central German Uplands intra-annual recharge dynamics are much more relevant, demonstrated by the stronger responses to 3-months scenarios (SEVENT). Also, the recovery time Trec from a severe drought varied accordingly (SRECOV). These results highlight the importance of the hydrogeological conditions for assessing the groundwaters' sensitivity to drought and for drought propagation, supporting the findings of Stoelzle et al. (2014). The hydrogeological conditions are also linked to the locally specific precipitation accumulation time 270 that has the maximum correlation with water table variation Tmax. Hellwig et al. (2020) analysed the Tmax ranging from few months to several years across Germany. Their results suggested that Tmax can be a good proxy for heterogeneous reactions of the groundwater to droughts. The patterns of Tmax were similar to those found here for the groundwater's response to the more specific scenarios, hence the propagation time from meteorological to groundwater anomalies also has the potential to be a predictor of the general groundwater drought sensitivity to recharge scenarios. 275 The drought-specific stress test scenarios, however, do provide a more nuanced insight into the hazard. The results for both SSHIFT and SEVENT revealed systematic differences for groundwater heads and baseflow. The main reason here is the non-linear relationship between the two variables: the baseflow dynamics are mainly driven by groundwater fluctuations in the wet range, when groundwater heads are closer to surface and more groundwater discharge is possible through the dynamic drainage network (Godsey and Kirchner, 2014). For low groundwater heads, the drainage system shrinks and less baseflow results in a 280 lower sensitivity to changes in groundwater heads. In the model this is represented by the variable number of grid cells in a catchment that contribute to baseflow with less cells in case of low groundwater heads. Changes in groundwater heads due to the event scenarios are most pronounced in regions with long propagation times Tmax (taken from Hellwig et al., 2020) where the antecedent recharge has more influence. However, aquifers with long propagation times are usually characterized by large dynamic storages leading to a smaller baseflow variability (i.e. more stable flow regimes). Correspondingly, large changes of 285 baseflow occur predominantly in regions with short Tmax opposite to the regions of large groundwater head change. The different responses of baseflow and groundwater are important to consider for an effective water management in a changing climate. For example, in a climate with higher annual recharge sums but more frequent summer droughts groundwater droughts might become less severe while the baseflow drought hazard becomes more severe with potential impacts on economy and ecology. 290 The model used in this study is limited in that it simulates groundwater head and baseflow dynamics under natural conditions only. The usual anthropogenic response to drought is an increased groundwater pumping, which causes a positive feedback which accelerated drying (Famiglietti, 2014). Therefore, anthropogenic influences also need to be considered as significant contributors to real changes in groundwater heads (Kløve et al., 2014). Moreover, there is uncertainty arising from the aquifer parametrization and exact model derived Tmax must be taken with care and not be interpreted exactly to the location. In 295 particular, Hellwig et al. (2020) found a decreasing model performance for higher elevation regions with small scale variability of the hydrogeology. Even though these uncertainties limit considerations for an effective local water management, they do not affect the general conclusions on groundwater sensitivity reported above.
There are also considerable uncertainties about future precipitation and predictions for recharge are even more uncertain as it might change even more strongly (Ng et al., 2010;Taylor et al., 2012;Jing et al., 2020). Previous studies on recharge changes 300 in Central Europe consistently predicted increases during winter and decreases during summer (Eckhardt and Ulbrich, 2003;Stoll et al., 2011;Dams et al., 2012;Hunkeler et al., 2014;Chen et al., 2018), however, recharge is variable with potentially large year-to-year variations (Kopp et al., 2018). The intra-annual shift used for SSHIFT is based on a relatively simple assumption that only represents a climate change-informed consensus estimate of recharge changes but is supported by recent findings of Jing et al. (2020) reporting increases in recharge and groundwater heads under different climate change scenarios 305 for a small catchment in Central Germany. Additionally, there is evidence that other meteorological characteristics that might change in future are relevant for groundwater and baseflow drought. Bloomfield et al. (2019) demonstrated an influence from changes in evapotranspiration due to increasing temperatures on changes in groundwater drought. Longobardi and Van Loon (2018) showed that changes in dry spell length can alter groundwater contributions to streamflow. Applying recharge frequency analysis to derive a 50-year or 100-year recharge drought event extrapolating beyond the range of the observational 310 time period is a pragmatic hydrological design concept. As always, it come with uncertainty and may be questioned due to climate-change induced non-stationarity. But as a sensitivity testing framework, it is found useful and suitable for communication to practitioners used to dealing for example with flood frequency terminology. The SEVENT for the first-time provides country-scale composite estimates of groundwater and baseflow sensitivity to such assumed more severe recharge droughts and should also be considered for future water management plans. 315 The different scenarios are complementary as they target the groundwater's sensitivity against different characteristics that are important to consider under climate change. SSHIFT focusses on systematic intra-annual changes in the recharge regime and its consequences for droughts. SEVENT assesses the specific response to prolonged dry spells whereas SRECOV investigates the groundwater's ability to recover after a severe drought. With the combination of these different scenarios the following main points regarding the groundwater drought sensitivity emerge: 320 1. Changes in the annual average recharge sum alter the groundwater heads in regions with slow groundwater response over the entire year, mitigating (or exacerbating if annual recharge is reducing) the groundwater drought hazard here for all seasons. In regions with fast groundwater responses, intra-annual recharge trends are more relevant than changes of the annual recharge sum.
2. An intra-annual shift of the recharge like it was assumed in SSHIFT has larger effects on groundwater heads under 325 average conditions than on groundwater heads during drought. The general increase in groundwater head variability following higher recharge variability is rather a result of wetter average conditions than drier drought conditions.
3. Groundwater heads respond to recharge on characteristic time scales. Hence, reduced antecedent recharge over a longer duration which could be a result of a changed climate with prolonged dry spells can lead to much more severe groundwater droughts in aquifers reacting on the corresponding time scales. 330 4. Groundwater recovery times for a severe drought are mainly related to the hydrogeology. This finding supports recent approaches for predictions on groundwater drought development several months ahead based on the site-specific characteristics of groundwater dynamics (e.g. Prudhomme et al., 2017;Parry et al., 2018).

Conclusions 335
Future changes of recharge are relevant for the groundwater drought hazard and groundwater's potential to mitigate drought impacts. In this study a stress-test approach was employed as an alternative to climate change model chains: three generic recharge scenarios were used in a country-scale German groundwater model simulating groundwater heads and baseflow.
Despite uncertainties of future recharge, the scenarios allow general conclusions on the diversity of groundwaters' sensitivity to projected directions of climate change. While the assumed intra-annual recharge shifts can be expected to weaken the 340 groundwater drought hazard, prolonged dry spells may aggravate droughts, particularly in regions with slow responding aquifers. Baseflow is not linearly related to changes of groundwater heads and is more prone to intensified drought event conditions on a shorter time scale, especially in regions with fast responding aquifers. The groundwaters' drought recovery time is strongly related to the aquifers' characteristic response time scale. Hence, it is not depending on the meteorological drought characteristics but rather an inherent property of the aquifer with large regional differences. 345 The scenario-approach applied in this study allows for a detailed composite assessment of a controlled environmental change.
Simple recharge scenarios (e.g. below average, average, above average recharge) in a country-scale groundwater model could also be used for probabilistic real-time groundwater drought forecasting as an informative tool for water supply and other stakeholders. The application of recently developed country-to-global scale transient and gradient-based groundwater models could allow for forecasts of groundwater heads with long lead-time on these scales. For local management decisions it will be 350 important to consider local hydrogeological conditions and include also anthropogenic feedbacks such as increased pumping during drought (e.g. due to higher irrigation demand). Such feedback could be also implemented as generic stress tests. Therefore, future work evaluating the groundwater response to scenarios of human water use during drought will be needed to complement the findings of this study.

Data availability 355
All model outputs from the reference run and scenario runs can be downloaded from FreiDok (https://freidok.uni-freiburg.de/ complete link in the final paper).

Author contribution
JH developed the main ideas, the design of the scenarios was jointly developed by all authors. JH performed the analyses and prepared the manuscript which was reviewed by the co-authors. 360