Spatially distributed sensitivity of simulated global groundwater heads and flows to hydraulic conductivity , groundwater recharge and surface water body parameterization

In global hydrological models, groundwater storages and flows are generally simulated by linear reservoir models. Recently, the first global gradient-based groundwater models were developed in order to improve the representation of groundwater-surface water interactions, capillary rise, lateral flows and human water use impacts. However, the reliability of model outputs is limited by a lack of data as well as model assumptions required due to the necessarily coarse spatial resolution. The impact of data quality is presented by showing the sensitivity of a groundwater model to changes in the only available 5 global hydraulic conductivity data-set. To better understand the sensitivity of model output to uncertain spatially distributed parameter inputs, we present the first application of a global sensitivity method for a global-scale groundwater model using nearly 2000 steady-state model runs of the global gradient-based groundwater model GM. By applying the Morris method in a novel domain decomposition approach that identifies global hydrological response units, spatially distributed parameter sensitivities are determined for a computationally expensive model. Results indicate that globally simulated hydraulic heads 10 are equally sensitive to hydraulic conductivity, groundwater recharge and surface water body elevation, though parameter sensitivities vary regionally. For large areas of the globe, rivers are simulated to be either losing or gaining, depending on the parameter combination, indicating a high uncertainty of simulating the direction of flow between the two compartments. Mountainous and dry regions show a high variance in simulated head due to numerical difficulties of the model, limiting the reliability of computed sensitivities in these regions. This instability is likely caused by the uncertainty in surface water body 15 elevation. We conclude that maps of spatially distributed sensitivities can help to understand complex behaviour of models that incorporate data with varying spatial uncertainties. The findings support the selection of possible calibration parameters and help to anticipate challenges for a transient coupling of the model. 1 Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2019-10 Manuscript under review for journal Hydrol. Earth Syst. Sci. Discussion started: 18 February 2019 c © Author(s) 2019. CC BY 4.0 License.


Introduction
Global groundwater dynamics have been significantly altered by human withdrawals and are projected to be further modified under climate change (Taylor et al., 2013).Groundwater withdrawals have led to lowered water tables, decreased base flows, and groundwater depletion around the globe (Konikow, 2011;Scanlon et al., 2012;Wada et al., 2012;Wada, 2016;Döll et al., 2014).To represent groundwatersurface-water body interactions, lateral and vertical flows, and human water use impacts on head dynamics, it is necessary to simulate the depth and temporal variation of the groundwater table.Global-scale hydrological models have Published by Copernicus Publications on behalf of the European Geosciences Union.
recently moved to include these processes by implementing a gradient-based groundwater model approach ( de Graaf et al., 2015;Reinecke et al., 2019).This study is based on G 3 M (Döll et al., 2009), one of the two global groundwater models capable of calculating hydraulic head and surface water body interaction on a global scale.However, the lack of available input data and the necessary conceptual assumptions due to the coarse spatial resolution limit the reliability of model output.These substantial uncertainties suggest an opportunity for diagnostic methods to prioritize efforts in data collection and parameter estimation.
Sensitivity analysis is a powerful tool to assess how uncertainty in model parameters affects model outcome, and it can provide insights into how the interactions between parameters influence the model results (Saltelli et al., 2008).Sensitivity methods can be separated into two classes: local and global methods.Local methods compute partial derivatives of the output with respect to an input factor at a fixed point in the input space.By contrast, global methods explore the full input space, though at higher computational costs (Pianosi et al., 2016).The large number of model evaluations required can render global methods unfeasible for computationally demanding models, though increased computational resources have facilitated their application, e.g., Herman et al. (2013a), Herman et al. (2013b), and Ghasemizade et al. (2017).Still, existing studies of global models either focus on exploring uncertainties by running their model with a limited set of different inputs for a quasi-local sensitivity analysis (Wada et al., 2014;Müller Schmied et al., 2014, 2016;Koirala et al., 2018) or applying computationally inexpensive methods based on a limited set of model evaluations (Schumacher et al., 2015).For example, de Graaf et al. (2015de Graaf et al. ( , 2017) ) determined the coefficient of variation for head results in a global groundwater model with 1000 model runs evaluating the impact of varying aquifer thickness, saturated conductivity, and groundwater recharge.To the knowledge of the authors, the only other study that applied a global sensitivity analysis to a comparably complex global model is Chaney et al. (2015).An overview of the application of different sensitivity analysis methods for hydrological models can be found in Song et al. (2015) and Pianosi et al. (2016).
G 3 M uses input from, and it is intended to be coupled with and integrated into, the global hydrological model Wa-terGAP Global Hydrology Model (WGHM) (Döll et al., 2014).This study investigates the sensitivity of steady-state hydraulic heads and exchange flows between groundwater and surface water to variations in main model parameters (e.g., groundwater recharge, hydraulic conductivity, and riverbed conductance).To this end the method of Morris (Morris, 1991) is applied.
Morris is a global sensitivity method as it provides an aggregated measure of local sensitivity coefficients for each parameter at multiple points across the input space and analyses the distribution properties (Razavi and Gupta, 2015).It requires significantly fewer model runs, compared to other global methods, to provide a meaningful ranking of sensitive parameters enabling the exploration of computationally demanding models (Herman et al., 2013a).The application of a global sensitivity method for a complex worldwide model of groundwater flows is unique, and Morris is currently the best available method to handle the computational constraints.
To reduce the number of necessary model runs when conducting global sensitivity analysis for computationally demanding models we introduce the concept of global hydrological response units (GHRUs) (Sect.2.2.3) (similar to Hartmann et al., 2015, for example).Using the GHRUs we present an application of the Morris method (Morris, 1991) to the Global Gradient-based Groundwater Model, G 3 M (Reinecke et al., 2019).
Sensitivities of the model are explored in three steps: (1) to understand the impact of improved input data, in particular hydraulic conductivity, we investigate the changes in simulated hydraulic head that result from changing the hydraulic conductivity data from the GLHYMPS 1.0 dataset (Gleeson et al., 2014) to 2.0 (Huscroft et al., 2018).(2) Based on prior experiments (de Graaf et al., 2015;Reinecke et al., 2019) eight parameters are selected for a Monte Carlo experiment to quantify uncertainty in simulated hydraulic head and groundwater-surface-water interactions.The parameters are sampled with a newly developed global region-based sampling strategy and build the framework for the (3) Morris analysis.Elementary effects (EEs), a metric of sensitivity, are calculated and their means and variances ranked to determine global spatial distributions of parameter sensitivities and interactions.The derived global maps show, for the first time, the sensitivity and parameter interactions of simulated hydraulic head and groundwater-surface-water flows in the simulated steady-state global groundwater system to variations in uncertain parameters.Foremost, these maps help future calibration efforts by identifying the most influential parameters and answer the question if the calibration should focus on different parameters for different regions helping to understand regional deviations from observations.Additionally, they guide the further development of the model, especially in respect to the coupling efforts highlighting which parameters will influence the coupled processes the most.Lastly, they show in which regions global groundwater models might benefit the most from efforts in improving global datasets like global hydraulic conductivity maps.

Methodology and data
2.1 The model G 3 M G 3 M (Reinecke et al., 2019) is a global groundwater model intended to be coupled with WaterGAP (Döll et al., 2003(Döll et al., , 2012(Döll et al., , 2014;;Müller Schmied et al., 2014) and is based on Hydrol.Earth Syst.Sci., 23, 4561-4582, 2019 www.hydrol-earth-syst-sci.net/23/4561/2019/ the Open Source groundwater modeling framework G 3 M-f1 (Reinecke, 2018).It computes lateral and vertical groundwater flows as well as surface water exchanges for all land areas of the globe except Antarctica and Greenland on a resolution of 5 arcmin with two vertical layers with a thickness of 100 m each, representing the aquifer.The groundwater flow between cells is computed as where K x,y,z [L T −1 ] is the hydraulic conductivity along the x, y, and z axis between the cells with size x y z; S s [L −1 ] is the specific storage; h [L] is the hydraulic head; and Q [L 3 T −1 ] denotes the in-and outflows of the cells to or from external sources of groundwater recharge from soil (R) and surface water body flows (Q swb ) (see also Reinecke et al., 2019[Eqs.(1, 2)]).The evaluation presented in this study is based on a steady-state variant of the model representing a quasi-natural equilibrium state, not taking into account human interference (a full description of the steadystate model and indented coupling can be found in Reinecke et al., 2019).The stand-alone steady-state simulations were performed as an initial step to identify the dominant parameters that are also likely to be important for controlling transient groundwater flow.

Groundwater recharge
Groundwater recharge (R) is based on mean annual R computed by WaterGAP 2.2c for the period 1901-2013.Human groundwater abstraction was not taken into account; not because it is not computed by WaterGAP but rather because there is no meaningful way to include it into a steady-state model which represents an equilibrium (abstractions do not equilibrate).

Hydraulic conductivity
Hydraulic conductivity (K) is derived from GLHYMPS 2.0 (Huscroft et al., 2018) (shown in Fig. 2a).The original data were gridded to 5 arcmin by using an area-weighted average and used as K of the upper model layer.For the second layer, K of the first layer is reduced by an e-folding factor f used by Fan et al. (2013) (a calibrated parameter based on terrain slope), assuming that hydraulic conductivity decreases exponentially with depth.Hydraulic conductivity of the lower layer is calculated by multiplying the upper layer value by exp(af −1 ) −1 where a = −50 (m) (Fan et al., 2013, Eq. 7).Currently only two datasets, GLHMYPS 1.0 and 2.0 (Gleeson et al., 2014;Huscroft et al., 2018), are available and are used by a number of continental and global models (de Graaf et al., 2015;Maxwell et al., 2015;Keune et al., 2016;Reinecke et al., 2019).GLHMYPS 1.0 (Gleeson et al., 2014) is compiled based on the global lithology map GLiM (Hartmann and Moosdorf, 2012) and data from 92 regional groundwater models and derives permeabilities (for the first 100 m vertically) based on Gleeson et al. (2011), differentiating the sediments into the categories fine-grained, coarse-grained, mixed, consolidated, and unconsolidated.Permafrost regions are assigned a K value of 10 −13 m s −1 based on Gruber (2012).Areas of deeply weathered laterite soil (mainly in tropical regions) are mapped as unconsolidated sediments as they dominate K (Gleeson et al., 2014).
The global permeability map was further improved with the development of GLHYMPS 2.0 by Huscroft et al. (2018).A two-layer setup was established in GLHYMPS 2.0 with the lower layer matching the original GLHYMPS 1.0.For the upper layer in GLHYMPS 2.0, a global database of unconsolidated sediments (Börker et al., 2018) was integrated into GLHYMPS 2.0, resulting in overall slightly increased K (Fig. 2a).The thickness of the upper layer was deduced from the depth-to-bedrock information available from Soil-Grid (Hengl et al., 2017).No thickness was assigned to the lower layer.

Surface water body conductance
The in-and outflows Q are described similar to MODFLOW as flows from the cell: a flow from the cell to a surface water body is negative, and the reverse flow is positive.Thus gains and losses from surface water bodies (lakes, wetlands and rivers) are described as where h is the simulated hydraulic head, E swb is the head of the surface water body, and B swb is the bottom elevation.The conductance C swb of the surface water body bed is calculated as where K is the hydraulic conductivity, L the length, and W the width of the surface water body.For lakes (including reservoirs) and wetlands, the conductances C lak and C wet are estimated based on K of the aquifer and surface water body area divided by a static thickness of 5 m (E swb − B swb = 5 m).For a steady-state simulation the surface water body data show the maximum spatial extent of wetlands, an extent that is seldom reached, in particular in the case of wetlands in dry areas.To account for that we assume for global wetlands (C gl.wet ) that only 80 % of their maximum extent is reached in the steady state (Reinecke et al., 2019).
www.hydrol-earth-syst-sci.net/23/4561/2019/ Hydrol.Earth Syst.Sci., 23, 4561-4582, 2019 Figure 1.Parameterization and outputs of the G 3 M model.Q swb is the flow between the aquifer and surface water bodies, h is the simulated hydraulic head, K is the hydraulic conductivity, K e-fold is K scaled by an e-folding factor (see Sect. 2.1.2),E swb is the surface water body head, B swb is the bottom elevation of the surface water body, C swb is the conductance of the surface water bodies, and R is the groundwater recharge.In red are the outputs and parameters that are of foremost importance for coupling.
Global wetlands are defined as wetlands that are recharged by streamflow coming from an upstream 5 arcmin grid cell in WaterGAP (Döll et al., 2009).For gaining rivers, the conductance is quantified individually for each grid cell following an approach proposed by Miguez-Macho et al. (2007).
According to Miguez-Macho et al. (2007), the river conductance C riv in a steady-state groundwater model needs to be set in a way that the river is the sink for all the inflow to the grid cell that is not transported laterally to neighboring cells.This inflow consists of R and inflow from neighboring cells.
where Q eq lateral is the lateral flow based on the equilibrium head h eq of Fan et al. (2013) and E riv is the head of the river (E swb = E swb,riv in Table 1).These conductance equations are inherently empirical as they use a one-dimensional flow equation to represent the three-dimensional flow process that occurs between groundwater and surface water.Future efforts will investigate using approaches appropriate for large-scale models, such as those described by Morel-Seytoux et al. (2017).An extensive description on the chosen equations and implications can be found in Reinecke et al. (2019).

Surface water body elevation
The vertical location of surface water bodies has a great impact on model outcome (Reinecke et al., 2019) 1).For rivers, B swb is equal to h riv −0.349×Q 0.341 bankfull (Allen et al., 1994), where Q bankfull is the bankfull river discharge in the 5 arcmin grid cell (Verzano et al., 2012).

Ocean boundary
The outer boundary condition in the model is described by the ocean and uses an equation similar to MODFLOW's general head boundary condition as flow where h ocean is the elevation of the ocean water table set to 0 m worldwide and C oc is the conductance of the boundary condition set to 10 m 2 d −1 based on average K and aquifer thickness.

Sensitivity of simulated head to choice of hydraulic conductivity dataset
Parameterization of aquifer properties based on hydrogeological data is an important decision in groundwater modeling.We first investigate the effect of switching to a newly available global permeability dataset to explore the sensitivity of h to the variability in geologic data.The results are then compared to the effects of parameter variability, as quantified by the Monte Carlo experiments.GLHYMPS 2.0 (Huscroft et al., 2018) provides an update of the only available global permeability map (Gleeson et al., 2014).To quantify how the new hydraulic conductivity estimates change the simulation outcome of the groundwater model we calculate a basic sensitivity index:  6) (white indicates that no index could be calculated).

S = h
where the sensitivity S of h to a change in K is calculated based on the change in h (h 1 is the hydraulic head calculated with GLHYMPS 1.0 and h 2 with GLHYMPS 2.0), and the change in K 1 and K 2 (the hydraulic conductivity) is based on GLHYMPS 1.0 and 2.0, respectively.

Sensitivity of head and surface water body flow to choice in parameters
Along with K, additional parameters influence the model outcome.In this study we apply the method of Morris (Morris, 1991) as a screening method to identify which parameters are most important for the two main model outcomes, namely h and groundwater-surface-water interactions (Q swb ).The Morris method provides a compromise between accuracy and computational cost in comparison to other Monte Carlo-like methods (Campolongo et al., 2007).Compared to other global methods, like the more robust variance-based methods, e.g., Sobol (1993), Morris has drawbacks as it may provide false conclusions (Razavi and Gupta, 2015).The attribution of what is a direct effect (model response only due to one parameter change) and what is an effect of interaction (response to nonlinear interaction of parameters on model output) is not trivial.Morris is prone to scale issues; that is, that the step size of the analysis can have a significant impact on the conclusions, especially for significantly nonlinear responses (Razavi and Gupta, 2015).In this study we address this by limiting the parameter ranges of the multipliers where we suspect nonlinearity in the model response.In general the choice of the chosen global sensitivity method may yield different results (Dell'Oca et al., 2017).On the other hand, Janetti et al. ( 2019) showed for a regional-scale groundwater study that different global methods showed similar results for hydraulic conductivity parameterization.Nevertheless, Morris is a well established and recognized method (Razavi and Gupta, 2015) that has the advantage of computational efficiency compared to variancebased methods to screen the most sensitive parameters (Herman et al., 2013a).Each model execution represents an individually randomized "one-factor-at-a-time" (OAT) experiment (Pianosi et al., 2016), where one parameter is changed per simulation.Parameter samples are based on trajectories.Each trajectory starts at a point in the parameter space and perturbs one parameter at a time.After all parameters are changed, a new trajectory begins from a different point in the parameter space.
Based on the model executions using these parameter perturbations, the Morris method calculates an elementary effect (EE) d for every trajectory of an ith parameter (in this study parameter multipliers).
where is the trajectory step size for the parameter multiplier X i , X is the vector of model parameters multipliers of size k and y(X) the model output (e.g., in the presented model h or Q swb ).Each EE is a local sensitivity measure that is finally aggregated to a global measure.This total effect of the ith parameter is computed as the absolute mean of the EEs for all trajectories and is denoted as µ * (Campolongo et al., 2007).If µ * is large, it means the parameter is sensitive, on average, throughout the parameter space.
The standard deviation of EEs (σ i ) is an aggregated measure of the intensity of the interactions of the ith parameter with the other parameters, representing the degree of nonlinearity in model response to changes in the ith parameter (Morris, 1991).If σ i is large, it means that the sensitivity of the parameter varies a lot between different points in the parameter space.For a completely linear model, EEs are the same everywhere (because the local gradients are the same everywhere), and σ i is zero.Therefore, a higher σ i entails a more nonlinear model with more interactive components.
The derived metrics µ * and σ i are both measures of intensity (higher values are more sensitive/interactive) and do not represent absolute values of sensitivity or interaction.Both can only be interpreted meaningfully in comparison with values derived for other parameters.To achieve that, µ * and σ i are used to rank the most sensitive parameters.Values for all parameters are sorted from highest to lowest, and the parameter with the highest value is selected as the most influential parameter with the highest rank (hereafter called rank 1).The parameter with the second highest value (rank 2) is the second most influential parameter and so on.The robustness of the parameter ranking is assessed by calculating confidence intervals as described in detail in Appendix A.
Previous experiments (de Graaf et al., 2015;Reinecke et al., 2019) showed the importance of hydraulic conductivity, groundwater recharge, and surface water body elevation to the simulated hydraulic head.Together with the highly uncertain surface water body and ocean conductance we thus selected eight model parameters for the sensitivity analysis.The analysis was conducted by using randomly sampled multipliers in the ranges presented in Table 1.
Throughout the analysis the following parameters, including the convergence criterion and spatial resolution, stay fixed: global mean sea-level, bottom elevation of surface water bodies, and their width and length.The baseline parameters are assumed to be equal to Reinecke et al. (2019).Hydraulic conductivity is based on a global dataset (Sect.2.1.2),the conductance is calculated as previously shown (Sect.2.1.3),and the groundwater recharge baseline is equal to the mean annual values calculated by WaterGAP (Sect.2.1.1).Parameter ranges were chosen to ensure that a high percentage of model realizations converge numerically.For example, the uncertainty of E swb in the model is higher than the ranges used in this study, but the sampling range was restricted because a larger range led to nonconvergence.Furthermore, the chosen river conductance approach uses R as parameter and includes a nonlinear threshold between losing and gaining surface water bodies, which strongly affects numeric stability.As in any sensitivity analysis, the choice of parameter ranges involves some subjectivity that may influence the ranking of sensitive parameters in the results.

Global hydrological response units
Even though the number of model evaluations are less for OAT experiments than for "all-at-a-time" experiments (Pianosi et al., 2016), varying every parameter independently in every spatial grid cell leads to an unfeasible amount of model runs.On the other hand, the use of global multipliers that vary a parameter uniformly for all computational cells may lead to inconclusive results, as the sensitivity for every cell to this change is spread to the whole computational domain.A possible solution would be to separate the globe into zones with similar geological characteristics based on the GLHYMPS dataset, but this may still result in an infeasible number of required simulations.Each simulation takes about 30 min to 1 h on a commodity computer (more if the parameters hinder a fast convergence).
To overcome these limitations, we introduce the use of a global hydrological response unit (GHRU).Every GHRU represents a region of similar characteristics, regarding three characteristics: E swb (Sect.2.1.3and 2.1.4),K (Sect.2.1.2),and R (Sect.2.1.1).This does not constitute a zoning approach often used for calibration in traditional regional groundwater modeling, only a separation into parameter multipliers.A uniform random distribution within the ranges given in Table 1 is used to sample the parameter multipliers for all GHRUs.Characteristics for each model cell are normalized to [0, 1] and used to create a 3-D point space (based on the three characteristics for each model cell).We apply a k-means (Lloyd, 1982) clustering algorithm to identify these regions.
K-means clustering partitions n points into k clusters, where each point belongs to the cluster with a minimized pairwise squared distance to the mean in that cluster.Figure 3a shows a map of k-means clustering (six clusters) categories based on a normalized three-dimensional space of E swb , K, and R per grid cell.
The number of clusters was determined based on the feasible number of model evaluations.K-means constitute an unsupervised machine learning approach that builds the required number of clusters automatically; thus it is necessary afterwards to examine what main characteristics these clusters represent (shown in   as relative values -high (↑), medium (∼), low (↓) -of the three parameter values based on their mean value per cluster.These characteristics are used to connect calculated parameter sensitivities to GHRUs when analyzing the results of the experiment.

Experiment configuration
The total number of necessary simulations N is determined with N = r(k + 1) (Campolongo et al., 2007), where r is the number of elementary effects and k is the number of parameters.For 7 parameters (without the ocean boundary) and 6 GHRUs we get a total number of parameters k = 42 + 1, where +1 stands for the ocean boundary, which is not varied by GHRU, resulting in 1848 simulations.Elementary effects are based on an initial random sampling of 10 000 trajectories using Campolongo et al. (2007) and then reduced by assuming 42 (number of parameters times GHRUs without ocean boundary) so-called optimized trajectories following Ruano et al. (2012).Only random sampling might result in nonoptimal coverage of the input space; thus the initial random trajectories are used to select only those that maximize the dispersion in the input space.This optimal set of trajectories is approximated with a reasonable computational demand using the methodology developed by Ruano et al. (2012).
The experiment resulted in 1848 simulations with an overall runtime of 2 months on a machine with 20 computational cores (enabled hyper-threading) and 188 GB RAM.Each simulation required about 8 GB of RAM and was assigned four computational threads while running the simulations in cohorts of 10 simulations at once.Changes in parameters were stacked over all experiments.Thus, an experiment may have changed R (also affecting C riv for gaining conditions) while containing a C riv multiplier from a previous experiment.Sampling and analysis was implemented with the Python library SALib (Herman and Usher, 2017).For each experiment, the model was run until it reached an equilibrium state (steady-state model).All other parameters and convergence criteria can be found in Reinecke et al. (2019).If a simulation failed (6 of 1848 did not converge) the missing results were substituted randomly from another simulation within the cohort to preserve the required ordering of parameter samples for the used Python implementation of the Morris method.This number is low enough that it does not bias the results in any significant way (Branger et al., 2015).
A converged simulation does not necessarily constitute a valid result for all computed cells.Numeric difficulties based on the model configuration (due to the selected parameter multipliers) may lead to cells with calculated h that are unreasonable -more specifically, a hydraulic head that is far above or below the land surface and/or leads to a large mass budget error.In the presented study these simulations are retained, as a removal would require either a rerun of simulations with a different convergence criterion and inclusion of this in the analysis or a modification of the Morris method to allow the removal of simulations.Confidence intervals (95 %) are derived via bootstrapping using 1000 bootstrap resamples (see Appendix A).
Table 2. Mean values of GHRU characteristics and their summarized description, where ↑ is read as a relatively high value, ∼ as medium, and ↓ as low; e.g., ↑↑ E indicates a cluster with very high and relatively high (↑) average E swb .Additionally, the last two columns show the percentage of cells per GHRU where µ * of h and Q swb could be reliably determined (described in Sect.3.2.6).

Sensitivity to updated GLHMYPS dataset
Global-scale hydrogeological data are limited.Figure 2b shows the change in K between GLHYMPS 1.0 (Gleeson et al., 2014) and the upper layer of GLHYMPS 2.0 (Huscroft et al., 2018), where an overall increase can be observed due to the change in unconsolidated sediments.Although unconsolidated sediments cover roughly 50 % of the world's terrestrial surface, their extent was underestimated in previous lithologic maps by half (Börker et al., 2018).The largest increase in K can be found between 50 and 70 • N because of glacial sediments that were assigned high K values.Different lithologies, e.g., alluvial terrace sediments and glacial tills, have all been grouped into the hydrolithological category of sand.Areas of decreased hydraulic conductivity are, for example, the Great Lakes, south of Hudson Bay, and parts of Somalia.The area around Hudson Bay was assumed to consist of unconsolidated sediments in GLHYMPS 1.0 (Gleeson et al., 2014) and was changed to consolidated.In Somalia, evaporites, which are known for low K, were incorporated from the Global Unconsolidated Sediments Map Database (GUM) (Börker et al., 2018).Furthermore, GUM provides a detailed mapping of loess and loess-like depositions, which were assigned lower K values.These regions can be observed to be the only regions with reduced K (Fig. 2b).Overall, the increase in unconsolidated sediments is probably the main cause for the increased K.
Due to the change in K, the simulated h changes accordingly (Fig. 2c).In areas where the K decreased h increased, e.g., eastern North America.Overall heads decreased, especially in central Russia by up to 10 to 100 m.A slight increase in head can be observed in areas with no change in K.This can be either due to changes in groundwater flow patterns due to the overall increase in K or due to numerical noise.
Based on these results, a local sensitivity index was calculated using Eq. ( 6), shown in Fig. 2d.White constitutes areas where either the relative change of K was zero or the head of the GLHYMPS 1.0 simulation was zero.Overall, h and K change in the opposite directions (positive values indicate a change in the same direction).An overall increase in K has led to a overall decrease in h as the higher K values are able to transport more water for a given hydraulic gradient, especially along coastlines and mountainous areas.Increased sensitivity indexes can be observed at boundaries of areas of large spatial extent where the initial K was equal, whereas the h changes inside that area are relatively small (e.g., the Arabian Peninsula).In regions where an increase in K leads to a decrease in head, an increase in h at the boundary to other hydrolithological structures can be observed.Areas with changing indexes next to each other, e.g., in the Sahara, possibly point to a numerically unstable model region with a general sensitivity to parameter changes.GLHYMPS 2.0 represents the best available global data for hydraulic conductivity, and the results of this initial experiment indicate a significant sensitivity to updating the model with this new dataset.

Monte Carlo experiments
To assess the variability of model outputs we used the Monte Carlo-like OAT experiments to quantify the output uncertainty as given in the 1848 model realizations.

Variability of hydraulic head
The spatial distribution of variability in the main model output h provides insights into model stability and highlights regions which are most sensitive to parameter changes.Observable differences between simulations can be caused by (1) the parameter change of the OAT experiment, (2) the interactive effects due to combinations of parameter changes, (3) numerical noise (slight variations in outcome due to the nature of the numerical algorithm or floating point errors that cannot be attributed to a specific parameter change), and (4) a nonoptimal solution of the groundwater equation (Eq. 1) even if the convergence criterion is met.The latter error (4) can be observed in the model where a strong nonlinear rela-Hydrol.Earth Syst.Sci., 23,2019 www.hydrol-earth-syst-sci.net/23/4561/2019/ tion may produce solutions that fit the convergence criterion but should be considered nonvalid, e.g., because of a mass balance that is unacceptably imprecise.
Figure 4 shows the absolute coefficient of variation (ACV) of h per cell over all Monte Carlo experiments.The ACV is used to make a sound comparison of variance taking into account the mean of the h value per cell (because the mean might be negative the absolute value is used).Yellow indicates that h changed little (mostly for regions with shallow groundwater), white to gray values indicate a growing difference in model results, and red values indicate a high variation of h over all model realizations.The latter areas represent either very low R (Sahara, Australia, South Africa) or a high variance in elevations, e.g., Himalaya, Andes, and the Rocky Mountains.These are expected to have a high sensitivity to parameter changes as the multiplier of E swb produces the highest shifts in regions with high elevation.Any changes in E swb might cause a switch from gaining to losing conditions and vice versa (discussed in Sect.3.2.2).Additionally, a change in R directly influences the conductance term C riv that might also be changed by a multiplier.These combinations may yield conditions that are exceptionally challenging for the numerical solver.Switches between the two conditions constitute a nonlinearity in the equation which might require a smaller temporal step-size to be solved.In a nutshell, if an iteration leads to a gaining condition and the next to a losing condition, the switch renders the approximated heads of the preceding iterations invalid as the equation changed.In the worst case this can lead to an infinite switch between the two conditions without finding the correct solution.Areas with a high variance in hydraulic heads will also produce wide confidence intervals for parameters which are highlighted in Fig. A2.
Figure 5 relates the uncertainty in h, due to a change from GLHYMPS 1.0 to 2.0 to the interquartile range of h of all Monte Carlo realizations, and thus uncertainty in h due to parameter variation.Parameter variation is the dominant cause for h variability in mountainous regions, whereas the change in geologic data has a dominant impact in northern latitudes and the upper Amazon.In Australia, central Africa, and northern India the impact of increasing K is almost as high as the variability caused by the variation of parameters in the Monte Carlo experiments.This suggests that a reduced uncertainty in K in these regions will improve the model results.

Variability of losing/gaining surface water bodies
Surface water bodies that provide focused, indirect groundwater recharge to the aquifer system are an important recharge mechanism to support ecosystems alongside streams (Stonestrom, 2007).They are important for agriculture and industrial development, especially in arid regions.
Losing or gaining surface water bodies are determined by h in relation to E swb .When h drops below E swb water is lost to the aquifer (Eq.4). Figure 6 shows for each grid cell the percentage of the model runs in which the surface water bodies in the cell lose water to the groundwater.Regions with a higher percentage are in losing conditions for most of the applied parameter values.Areas with the highest deviation in h (Fig. 4), and thus the lowest agreement over all model realizations, are similar to the regions where some parameter combinations lead to losing surface water bodies, while others lead to gaining surface water bodies (Fig. 6).Overall arid and mountainous regions show high percentages of Monte Carlo realizations with losing conditions, with dominantly 20 %-50 % of the realizations resulting in losing surface water bodies.h in these regions falls below E swb either due to low recharge or high gradients.Surface-water-groundwater interaction in these regions should be more closely investigated to improve model performance.The Sahara region stands out with large areas that contain losing surface water bodies in almost all model realizations.Values close to 100 % are furthermore reached in the Great Lakes, the Colorado Delta, the Andes, the Namib Desert, along the coast of Somalia, the Aral lake, lakes and wetlands in northern Siberia, and partially in Australian wetlands.Wetlands in Australia and the Sahara are likely to be overestimated in size in the context of a steady-state model.

Parameter sensitivities as determined by the method of Morris
The global-scale sensitivity of h and Q swb is summarized in Table 3, which lists the percentage fractions of all cells for which a certain parameter has a certain rank regarding sensitivity and parameter interaction.
Overall, E swb and R are the most important parameters for both model outputs over all ranks, followed by K. Q swb is more sensitive to R than h, whereas h is more sensitive to E swb .C riv appears to be dominant in only the second and third rank for both model outputs.This means that for the majority of cells a change in E swb and R, rather than C riv , dominates changes in Q swb and h.K and R directly influence the calculation of C riv and thus show a higher sensitivity.
The standard deviation of EEs (σ i ) is an aggregated measure of the intensity of the interactions of the ith parameter with the other parameters, representing the degree of nonlinearity in the model response to changes in the ith parameter (Morris, 1991).A high parameter interaction indicates that the total output variance rises due to the interaction of the parameter with other parameters.
E swb shows higher interactions for h than for Q swb .C riv shows a high interaction on the first rank even if it is not the dominant effect.This interaction is likely due to changes and K and R that directly influence the computation of C riv .Both model outputs are sensitive to changes in R but show a relatively low degree of interaction for the first rank.A higher percentage of cells with an increased interaction of R is only visible in the second and third rank.Lakes and wetlands show low sensitivity and interaction in relation to the total number of cells in Table 3 because they only exist in a certain percentage of cells.Table 4 shows the percentage fractions relative for cells with more than 25 % coverage of a lakes, global wetlands, and/or wetlands.The dominant parameter (by percentage) for all cells with respective surface water body is always E swb for h (in 79.2 % of the lakes and in (79.9 %) 66.3 % of the (global) wetlands) and R (∼ 54 %-77 % of all cells) for Q swb .For the second rank the conductance of the surface water body C lak,wet,gl.wet dominated h, C riv for Q swb .Thus for lakes and wetlands E swb and R are more relevant to h and Q swb than the conductance of these surface water bodies.

Maps of global sensitivity
To show the spatial distribution of the parameters that affect h and Q swb the most, ranked parameters were plotted for every cell in Fig. 7.The top of Fig. 7 represents the most sensitive parameters in terms of h (left) and Q swb (right).Areas that should be judged with caution due to overlapping CIs are shown in Fig. A2.
For h E swb stands out in mountainous regions with spots of C riv and in regions with low recharge.These regions align with highly variable outputs shown in Fig. 4. K is most im-Hydrol.Earth Syst.Sci., 23,2019 www.hydrol-earth-syst-sci.net/23/4561/2019/ portant for h in Australia, the northern Sahara, the Emirates, and across Europe.The second rank (second row in Fig. 7) shows values that are not as important as the top row but dominant over all other parameters.In the regions with large output variations (compare Fig. 4) K and for parts of the Himalaya R are dominant in the second rank (for h).C lak is clearly visible in parts of Nepal and along the Brahmaputra.
For Q swb E swb is dominant in the first rank in, for example, the Rocky Mountains, Andes, Hijaz Mountains in Saudi Arabia, and the Himalaya.R stands out in regions in the Tropical Convergence Zone as well as in northern latitudes.C wet appears as a dominant parameter in areas with large wetlands with a bigger impact on Q swb results than on h.K seems to be equally spatially distributed for h as well as for Q swb .There seems to be no correlation between the initial K spatial distribution and a highly ranked K sensitivity for both model outputs.Areas with a dominant K are possibly influenced by a high interaction with other model components (K shows a high interaction Table 3 that is also reflected spatially in Sect.3.2.5).For the second rank in the Tropical Convergence Zone C riv and K dominate for Q swb .In general Q swb seems to be more robust to show the effects in the highly variable regions.That is, Q swb is not responding as extremely as h to parameter changes.This further indicates the assumption that E swb is also mainly responsible for the h variations observed in Sect.3.2.1.
Zooming in on Europe (Fig. 8) for h, as an example, shows, similar to the global picture, that R and K have the highest impact on h along with E swb .E swb is dominant in mountainous regions like the Alps and the Apennines as well as in regions with lots of surface water bodies, e.g., the southern part of Sweden in the area of lakes Vättern and Vänern and in the Finnish Lakeland.R appears dominant in east Italy in the Po Valley, the Netherlands, and the wetlands in southwestern France.Almost invisible in the global picture is C oc , a dominant parameter for most cells that have the ocean as boundary condition (only observable for h).Predominantly C riv follows E swb as the second most important parameter.Only visible in the second rank are the wetlands, e.g., in west Scotland.

Maps of global parameter interaction
Similar to the spatial parameter sensitivities, Fig. 9 shows the parameter interactions for h and Q swb .Parallel to Fig. 7, the first row of Fig. 9 represents the most interactive parameters in terms of h change (left) and Q swb (right).The highest interaction with other parameters can be observed for E swb for regions with high h variability, similar to Fig. 7.This means that for E swb the model is not only sensitive, but also that the sensitivity of the parameter varies a lot between different points throughout the parameter space, suggesting a nonlinear model response.C riv showed no sensitivity on rank 1 in Fig. 7, although it shows a high interaction in regions sensitive to R (compare Figs. 7 and 9) and is more visible for Q swb .This means changes in C riv lead to nonlinear model responses.K regions in the second rank are similar to where K already showed a high sensitivity for h (compare Fig. 7).In the Himalaya R and C riv show a large spatial pattern.For Q swb , C gl.wet is clearly visible where C riv was most interactive before.

Sensitivity per GHRU
Average sensitivities and parameter interactions for each of the six GHRUs are shown in Fig. 10a.A dominant average per GHRU does not imply a rank 1 in each cell but rather provides an indication of its average importance per GHRU.Each GHRU is described by the notation in Table 2.The average sensitivities and interactions shown are normalized to [0,1] because the calculated µ * and σ present no absolute measure of sensitivity.Mean values of µ * and σ that are very close to zero are not shown in Fig. 10.
The values shown in Fig. 10a should be judged with caution as they also include the regions that show possibly unreliable results, i.e., those where any overlap in CIs indicates that the ranking of the parameters cannot be clearly determined (see additional explanation in Fig. A1).
To judge the reliability of the outcomes per GHRU Table 2 shows the percentage of reliable results for h and Q swb for each GHRU, where reliable results exclude over 80 % of all sensitivity values.Figure 10b shows only cells with reliable results, based on their confidence intervals, resulting in 11.8 % of all grid cells for h and 13.3 % for Q swb .GHRUs in high and very high elevations show low reliability concerning h results as expected (compare Fig. 4).Q swb appears as more robust in these regions.
Figure 10a shows a similar picture to the two global maps (Figs. 7 and 9).All GHRUs show a linear correlation of sensitivity and degree of interaction.The GHRU with average elevation, average recharge, and high K (GHRU 1) shows a higher average response in Q swb than h.h is most sensitive to C riv , and less sensitive to the other parameters.Q swb is clearly most sensitive to K and C gl.wet and shows a high interaction in this GHRU.Lower-lying regions with average K and R (GHRU 2) show high sensitivity of h only to E swb with a high interaction, while Q swb is affected in decreasing order by C gl.wet , and K. Results for h sensitivity in GHRU 3, with very high elevations, average K, and low R, should be judged with caution because only a very low fraction is based on results with nonoverlapping CIs (Table 2).Compared to other GHRUs, GHRU 3 shows rather clustered sensitivities and parameter interactions.h is most sensitive to E swb and R and Q swb to C lak , K, and C wet .GHRU 4, which differs from GHRU 3 by its high but not very high land surface elevation, clearly shows E swb , K, and R as the most dominant and interactive parameter for Q swb , followed by C wet .Similarly Q swb is most sensitive to E swb and K.In low-lying and rather flat regions with high groundwater recharge (GHRU 5), sensitivities of h are close to zero, except for K, possibly because changes in h are too small in flat regions (compare Fig. 4) due to small h gradients.Q swb is most sensitive to E swb and C gl.wet .GHRU 6 is relatively small and like GHRU 5 only occurs in the tropical zone (Fig. 3a).In this GHRU, which differs from GHRU 5 only by K being high instead of average, the dominant parameters of Q swb are similar to other GHRUs where E swb is clearly the most dominant followed by R and K. h shows a response to wetlands but again like in GHRU 5 a very low response to E swb .
Taking into account only the reliable regions changes the perception in Fig. 10b.GHRU 1 shows rather similar sensitivities and parameter interactions as compared to other GHRUs.h is most sensitive to E swb , and only somewhat less sensitive to C riv and C wet .Q swb is clearly most sensitive to C riv and shows a high interaction in this GHRU.GHRU 2 shows high sensitivity of h only to E swb with a high interaction while Q swb is equally affected by K, E swb , and R. Results for h sensitivity in GHRU 3 are not very representative for the whole GHRU as only a very small fraction of cells shows reliable results (Table 2).Like in GHRU 2, Q swb is equally affected by K, E swb and R. GHRU 4 shows E swb as clearly most dominant and interactive parameter for h, followed by K and C wet .For GHRU 5, sensitivities of h could not be determined reliably, possibly because changes in h are too small in flat regions (compare Fig. 4) due to small h gradients.Q swb is most sensitive to R (as rivers are gaining rivers that need to drain groundwater recharge) followed by K.In GHRU 6 the dominant parameters of Q swb are the same as for GHRU 5 (except for E swb ) while h is most sensitive to C lak .

Discussion
This study presents a novel spatially distributed sensitivity analysis for a high-resolution global gradient-based groundwater model encompassing 4.3 million grid cells.While these maps are challenging to interpret, they yield new ways of understanding model behavior based on spatial differences and help to prepare calibration efforts by identifying parameters that are most influential in specific regions.Furthermore, they guide the future development of the model and the intended coupling efforts of the groundwater model to the hydrological model.In particular, the sensitivity of Q swb and the importance of E swb , which are the two major coupling components, are of interest.
However, the large number of grid cells with either statistically zero sensitivity values (overlapping CI with zero) or unreliable results limit the relevance and applicability of the study results.For most of the statistically zero sensitivity values the CI is very large, and it is therefore very unlikely that the parameter is not influential.The study suggests that the highly nonlinear and conceptual approach to the surface water body conductance (in particular the sudden change of conductance between gaining and losing rivers) needs to be revised as it may affect the stability of transient model results.Additionally the results suggest that elevation of the water table of surface water bodies is a promising calibration parameter alongside with hydraulic conductivity.
The presented results need to be considered against the backdrop of the high h variability of the Monte Carlo experiments (Sect.3.2.1).Some of these simulations cannot be considered as a valid result for a h distribution, an issue not faced with other simpler traditional bucket-like hydrological models.This is due to multiple model challenges: (1) the evaluated model approximates a differential equation and can show nonlinear behavior for different parameterizations, (2) the equations used for rivers present a nonlinear model component (switch between equations for gain- ing and losing conditions as well as relation to K and R), (3) the convergence criterion for the steady-state solution is solely based on a vector norm of residuals (metric of changes of the solution inside the conjugate gradient approach) and maximum h change between iterations and does not contain an automated check for a reasonable mass balance.On the other hand, it is challenging to include a validation mechanism in the presented analysis to alleviate these problems while maintaining a reasonable model runtime (as a stricter convergence criterion will most likely increase the number of necessary iterations) and/or number of necessary model runs.It is questionable whether results based on different convergence criteria can be compared.This would necessitate including the numeric stability in the sensitivity analysis as well.
However, the results help to answer the research questions at hand.While overlapping CIs blur the ranking of the parameters in some regions, they still provide evidence of what pa-rameters the calibration should focus on and how the importance of parameters varies per region.The sensitivity of Q swb to parameters, especially E swb , will help to guide the future model development and coupling to the hydrological model.In general, the analysis helped to identify the elevation of surface water bodies as a focus for future research.
Around 30 % of all µ * values had a confidence interval that was larger than 10 % of the µ * value.This suggests that even more model runs are required and that large extents of the model experienced numerically unstable results as the spatial distribution of head variance and large confidence intervals overlap.
The selection of parameter ranges can influence the results of a sensitivity analysis significantly (Pianosi et al., 2016).Even parameters that are suspected of not being sensitive can show highly nonlinear behavior in certain parts of the parameter space that are only activated when one expands the ranges of the parameters.The presented ranges in this study do not explore the full assumed uncertainty range.Specifically, the small range of E swb is likely influencing the outcome of the parameter rankings.The range was chosen to allow a reasonable number of simulations to converge as the range of E swb directly influences the convergence of the model.The presented results, however, do show that the model output is highly sensitive to changes in E swb in most areas of the globe.The response in mountainous regions can be attributed to applying E swb as a multiplier, which has a higher impact in regions where the initial water body elevation is high.On the other hand, this is accounting for the fact that the uncertainty of E swb is largest in regions with highly variable topography per 5 arcmin grid cell.
The only previous sensitivity analysis of a global gradientbased groundwater model to our knowledge was done by de Graaf et al. (2015).Based on varying K, aquifer thickness, and R, the coefficient of variation of the steady-state hydraulic head was computed (de Graaf et al., 2015, Fig. 5).From that analysis it was determined that K has the highest impact and aquifer thickness the lowest.It is not clear how the coefficient of variation determined these outcomes.The relatively low impact of aquifer thickness was also observed by Reinecke et al. (2019).Therefore, this parameter was not included in this study.Both de Graaf et al. (2015) and this study show a high h variance in parts of Australia and the Sahara ( de Graaf et al., 2015, Fig. 5), possibly due to the low initial R. Variations in the mountainous regions, on the other hand, are not reflected in de Graaf et al. (2015) as their analysis did not vary E swb .
Besides the large h variance, which is likely the main cause for the low percentage of reliable cells, the confidence intervals of the sensitivity indices in this experiment suggest that additional simulations are necessary to determine more reliable results.Additionally, the small parameter ranges, required for stable model runs, influenced the overall outcome and might be a reason for cells with inconclusive results.
For cells with lakes and wetlands, E swb dominates over the variations in conductance for h (Table 4), confirming the importance in determining the surface water body elevation.For Q swb , on the other hand, R is most influential in these cells even though it does not affect the conductance equation for these surface water bodies.Apparently, available recharge is driving the interaction more than it influences changes in head.In regions with high recharge (GHRU 5) Q swb was more robust to parameter changes than h.This is possibly due to the generally lower response in Q swb to changes in E swb , which can be explained by the constant flow for losing surface water bodies (including rivers) as soon as h drops below E swb .Thus changes is E swb do not affect Q swb afterwards (as long as the surface water body remains in losing conditions).Both model outcomes show a high sensitivity to R while the interaction of R is only visible at the third rank, suggesting that if R changes other parameter changes do not influence the model response further.
Separating the complex global domain into a selected number of GHRUs enables a sensitivity analysis in accordance with computational constraints (e.g., maximum number of core hours).It alleviates the drawbacks of global-scale multipliers while keeping a reasonable number of total simulations.The presented decomposition based on three parameters E swb , K, and R was guided by the high sensitivity of model output to these parameters.Other factors like lithology and surface water body characteristics should be investigated as additional characteristics for GHRUs.

Conclusions
For the first time, spatially distributed sensitivities of the global steady-state distribution of hydraulic head and flows between the groundwater and the surface water bodies were calculated and presented.We found the Morris sensitivity analysis method can yield insights for computationally challenging (concerning computation time and numerical difficulties) models with reasonable computational demand.This study applied a novel approach for domain decomposition into GHRUs.Applying parameter multipliers simultaneously to all grid cells within each of the six GHRUs allowed a more meaningful sensitivity calculation than would be possible if the parameters would have varied simultaneously in all grid cells, while maintaining a feasible number of simulations.
Based on only a small fraction of grid cells for which parameters could be ranked reliably according to their importance for simulated model output, steady-state hydraulic heads (h) were found to be comparably affected by hydraulic conductivity (K), groundwater recharge (R), and the elevation of the water table of surface water bodies (E swb ).Rankings for individual grid cells vary, but globally none of the three dominates with respect to h.The simulated flows between groundwater and surface water bodies (Q swb ) are clearly most sensitive to R. This is due to the model parameterization of river conductance that is computed as a function of R, assuming that under steady-state conditions, groundwater discharge to rivers should tend to increase with increasing R (Eq. 4).The results indicate that changes in R between time steps for a fully coupled transient model could pose a challenge to the model convergence and that the equations might need to be reconsidered for a fully coupled model.In general the uncertainty due to the parameterization of groundwater-surface-water exchange flows (E swb and C riv,gl.wet,wet,lak ) needs to be further investigated as they have a high impact on h distribution and Q swb .
In high mountainous regions (Rocky Mountains, Andes, Ethiopian Highlands, Arabian Peninsula, Himalaya) and regions with low recharge (Sahara, southern Africa) the computed h showed an unreasonably high variance due to the numerical instability of the simulations in these areas.In the case of high elevations and thus large variations in E swb or in the case of low groundwater recharge, it is not possible to Hydrol.Earth Syst.Sci., 23,2019 www.hydrol-earth-syst-sci.net/23/4561/2019/ solve steady-state groundwater flow equations with arbitrary parameter combinations and a constant convergence parameter.Q swb was found to somewhat be more robust than h in these regions.These results suggest that the parameterization of E swb needs to be reconsidered and is a likely parameter for future calibration.In general more robust global sensitivity methods are required that allow the exclusion of certain simulations from the analysis.The lack of reliable data at the global scale, in particular hydraulic conductivity data with high horizontal and vertical resolution, hinders the development of global groundwater models.A simple sensitivity analysis on the impact of small changes to an existing global hydraulic conductivity dataset (GLHYMPS 1.0 Gleeson et al., 2014to 2.0 Huscroft et al., 2018) showed that knowledge about the distribution of K is pivotal for the simulation of h, as even slight changes in K may change model results by up to 100 m.
The presented study results refer to the uncoupled steadystate groundwater model G 3 M.As G 3 M is currently being integrated into the global hydrological model WaterGAP, future work will extend this sensitivity analysis to fully coupled transient simulations.

Appendix A
Confidence intervals are determined based on 1000 bootstrap resamples following Archer et al. (1997) for all simulation outputs.Bootstrapping is an established statistical method that relies on random sampling with replacement using the original data.This sampling from a set of independent, identically distributed data is equivalent to sampling from the empirical distribution function of the data, allowing confidence intervals to be determined (Archer et al., 1997).
The derived metrics µ * and σ i are both measures of intensity (higher values are more sensitive/interactive) and do not represent absolute values of sensitivity.Both can only be interpreted meaningfully in comparison with values derived for other parameters.To achieve that, µ * and σ i should be presented in so-called ranks.Values for all parameters are sorted from highest to lowest, and the parameter with the highest value is selected as the most influential parameter with the highest rank.The parameter with the second highest value is the second most influential parameter and so on.
Figure A1 shows the conceptual issues that are entailed with this ranking approach.The absolute mean (µ * ) of all EEs of parameter 1 (P1) might be bigger than µ * of P2, but as their CIs are overlapping a clear ranking is not possible.On the other hand it is evident that P1 and P2 are clearly more sensitive than P3.An overlapping suggests that even if the µ * values are different a ranking should be considered with care.Two parameters could be equally important or in some regions inside one GRHU their importance could be inverse to what the µ * values suggest.But even if they overlap, the µ * provides a valuable measure of the overall importance of the parameters, also in comparison with much less important parameters.
Additionally, not only the overlapping should be considered but also the size of the CI in comparison to the µ * .It is a useful indicator of whether the sampling of the parameter space was too small and more simulations are required to gain a clearer picture.15 % is an arbitrary value that we considered an appropriate boundary.Other studies used 10 % (Herman et al., 2013a) or 3.5 % (Vanrolleghem et al., 2015).
Figure A2 shows regions where CIs were smaller than 15 % of the calculated µ * of the first rank and regions where more simulations, or a more sophisticated approach to ensure numerical stability, are likely required.Hydrol.Earth Syst.Sci., 23,2019 www.hydrol-earth-syst-sci.net/23/4561/2019/

Figure 3 .
Figure 3. Map of k-means clustering categories, each representing a GHRU.Each color identifies a region where the combination of all three parameters is similar.

Figure 4 .
Figure 4. Absolute coefficient of variation (σ (h)µ(|h|) −1 ) (%) of simulated h per cell over all Monte Carlo realizations.Yellow indicates that h results changed very little, white to gray values indicate a growing difference in model results, and red values indicate a very high variation of h over all model realizations.

Figure 5 .
Figure 5. Uncertainty in h caused by variability in hydraulic conductivity data between GLHYMPS 1.0 and 2.0 (dominant in brown to green) in relation to uncertainty in h caused by variability in parameters based on Monte Carlo simulations (dominant in blue to light blue) calculated as |h1−h2|IQR(h mc ) , where h1/2 is the simulated head based on GLHMYPS 1.0 and 2.0 and h mc the simulated head of all Monte Carlo experiments.

Figure 6 .
Figure 6.Percentage of all Monte Carlo realizations that resulted in a losing surface water body in a specific cell.

Figure 7 .
Figure 7. Ranking of µ * of h (a, c, e) and Q swb (b, d, f).Panels (a, b) show the first rank, (c, d) the second, and (e, f) the third rank.

Figure 9 .
Figure 9. Ranking of σ * of h (a, c, e) and Q swb (b, d, f).Panels (a, b) show the first rank, (c, d) the second, and (e, f) the third rank.

Figure 10 .
Figure 10.Normalized average sensitivity and parameter interaction per GHRU for h and Q swb (a).If a parameter is not present the mean sensitivity for that GHRU was close to zero (overlapping CI with zero).Does not include ocean parameter sensitivity.Mean characteristics, their symbols for each GHRU, and the reliability of the sensitivity measure (only µ * not σ ) are shown in Table 2. (b) Only reliable results (after removing overlapping CI).

Figure A1 .
Figure A1.Illustration of derivation of presented metrics.Blue circles show the two criteria used to judge the quality of the results.µ * is calculated based on the EEs (circles); however, the CI is calculated based on bootstrap resamples of the simulation outputs.

Table 3 .
Percentage of cells for which parameters are ranked 1 to 3 based on µ * and σ .Percentages are shown for each model output, h and Q swb .For example, h is the most sensitive to parameter E swb (rank 1) in 57.2 % of all grid cells, while R is the most important parameter for Q swb in 59.8 % of those cells.Significant values are highlighted in bold.

Table 4 .
Percentage fractions of the most frequent parameter for rank 1 (R1) and 2 (R2) of all cells with more than 25 % coverage of a lakes, global wetland, or wetland.C riv = 31.7 %, C gl.wet = 40.6 %.Percentage of second most frequent parameter not shown.Percentage in relation to cells with lakes, global wetland, or wetland > 25 %.Percentage-wise R1(µ * (h)) was always followed by R, except for global wetlands, where the second most frequent R1 was C gl.wet R1(µ * (Q swb )) was followed percentage-wise by E swb except for local wetlands with K, R2(µ * (Q swb )) by C lak,wet,gl.wet except for global wetlands with C riv . *