Sensitivity of simulated global-scale freshwater fluxes and storages to input data , hydrological model structure , human water use and calibration

Introduction Conclusions References Tables Figures

Abstract.Global-scale assessments of freshwater fluxes and storages by hydrological models under historic climate conditions are subject to a variety of uncertainties.Using the global hydrological model WaterGAP (Water -Global Assessment and Prognosis) 2.2, we investigated the sensitivity of simulated freshwater fluxes and water storage variations to five major sources of uncertainty: climate forcing, land cover input, model structure/refinements, consideration of human water use and calibration (or no calibration) against observed mean river discharge.In a modeling experiment, five variants of the standard version of WaterGAP 2.2 were generated that differed from the standard version only regarding the investigated source of uncertainty.The basin-specific calibration approach for WaterGAP was found to have the largest effect on grid cell fluxes as well as on global AET (actual evapotranspiration) and discharge into oceans for the period 1971-2000.Regarding grid cell fluxes, climate forcing ranks second before land cover input.Global water storage trends are most sensitive to model refinements (mainly modeling of groundwater depletion) and consideration of human water use.The best fit to observed time series of monthly river discharge or discharge seasonality is obtained with the standard WaterGAP 2.2 model version which is calibrated and driven by daily reanalysis-based WFD/WFDEI (combination of Watch Forcing Data based on ERA40 and Watch Forcing Data based on ERA-Interim) climate data.
Discharge computed by a calibrated model version using monthly CRU TS (Climate Research Unit time-series) 3.2 and GPCC (Global Precipitation Climatology Center) v6 climate input reduced the fit to observed discharge for most stations.Taking into account uncertainties of climate and land cover data, global 1971-2000 discharge into oceans and inland sinks ranges between 40 000 and 42 000 km 3 yr −1 .Global actual evapotranspiration, with 70 000 km 3 yr −1 , is rather unaffected by climate and land cover uncertainties.Human water use reduced river discharge by 1000 km 3 yr −1 , such that global renewable water resources are estimated to range between 41 000 and 43 000 km 3 yr −1 .The climate data sets WFD (available until 2001) and WFDEI (starting in 1979) were found to be inconsistent with respect to shortwave radiation data, resulting in strongly different actual evapotranspiration.Global assessments of freshwater fluxes and storages would therefore benefit from the development of a global data set of consistent daily climate forcing from 1900 to present.

Introduction
The quantification of global-scale freshwater fluxes, in particular river discharge, is essential to assess availability and scarcity of water resources for humans and the environment for both present (Hoekstra et al., 2012;Oki and Kanae, 2006;Prudhomme et al., 2014) and scenario conditions (Döll and Müller Schmied, 2012;Masaki et al., 2014;Schewe et al., 2014).Further examples are the estimation of amounts and spatial distribution of precipitation (Harris et al., 2014;Schneider et al., 2014) and evapotranspiration (Jasechko et al., 2013;Jung et al., 2010;Sterling et al., 2012).As groundwater plays an important role for humans, e.g., for irrigation purposes, related fluxes such as groundwater recharge (Döll and Fiedler, 2008;Koirala et al., 2014;Portmann et al., 2013) or, as consequence of an overexploitation of groundwater resources, groundwater depletion (Döll et al., 2014b;Wada et al., 2010) are the focus of modeling activities.
These examples show attempts to quantify global-scale freshwater fluxes as well as water storages; however, the methodologies used differ largely.Interpolation of in situ measurements works well with a dense monitoring network.Hence, precipitation products (e.g., GPCC -Global Precipitation Climatology Center; Schneider et al., 2014; CRU -Climate Research Unit; Harris et al., 2014) are often based solely on interpolation of station data.In combination with other data sources like remote sensing, even less dense point measurements are used, e.g., to quantify global evapotranspiration (Jung et al., 2010).In particular, remote sensing is used to derive spatiotemporal input data for evapotranspiration schemes (Miralles et al., 2011;Vinukollu et al., 2011;Wang and Liang, 2008) or to assess total continental water storage variations (Schmidt et al., 2006).Spatiotemporal patterns of consistent multiple fluxes and storages can be obtained using land surface models (LSMs) and global hydrological models (GHMs).LSMs, which have evolved as "land components" of global circulation models (GCMs), usually have a high temporal resolution and solve the energy balance (Haddeland et al., 2011).However, they do have limitations, especially in runoff routing and with regard to human alterations of the water cycle (even though there are exceptions, e.g., Pokhrel et al., 2012).GHMs are explicitly designed to assess the state of freshwater resources and to address water-related problems like floods and droughts (Corzo Perez et al., 2011;Prudhomme et al., 2011) and human impacts on freshwater resources.In the last 20 years, a number of GHMs have been developed using different conceptual approaches; e.g., VIC (Variable Infiltration Capacity; Nijssen et al., 2001), WBM (Water Balance Model;Vörösmarty et al., 1998), Mac-PDM (Macro Probability Distribution Model; Gosling and Arnell, 2011), WASMOD-M (Water And Snow balance Modeling system -Macro scale; Widén-Nilsson et al., 2007), H08 (Hanasaki et al., 2008), WaterGAP (Water -Global Assessment and Prognosis; Alcamo et al., 2003;Döll et al., 2003) and PCR-GLOBWB (PCRaster GLOBal Water Balance;Sperna Weiland et al., 2010).
Freshwater fluxes and storages simulated by GHMs and LSMs are subject to several sources of uncertainty: spatially distributed input data (e.g., climate forcing, water use, and land cover), model structure (or modeling approach) and model parameters.In addition, epistemic uncertainty due to a lack of knowledge and understanding of processes is of particular importance at the global scale (see discussion in Beven and Cloke, 2012and Wood et al., 2011, 2012).
Uncertainties due to the choice of climate forcing were the focus of few studies.For example, Guo et al. (2006) showed the large sensitivity of soil moisture simulated by 11 LSMs to different climate forcing data sets (especially to precipitation and radiation).They concluded that this uncertainty associated with land surface hydrology is as large as the variation among the LSMs.Biemans et al. (2009) evaluated seven global precipitation products for 294 river basins worldwide and quantified an average uncertainty of 30 % per basin.They studied the dynamic global vegetation and hydrology model LPJmL (Lund-Potsdam-Jena managed Land; Bondeau et al., 2007) with these precipitation forcings.As consequence of the large spread of precipitation input, discharge uncertainty was quantified with about 90 %.Even though climate forcing is of such importance, only few studies are available which study the uncertainty in a global hydrological model setup.
Uncertainties in terms of model structure are related to the design of the model, i.e., the processes considered and their representation by conceptual approaches.To consider this kind of uncertainty, Butts et al. (2004), Clark et al. (2008), Refsgaard et al. (2006) and Song et al. (2011) developed approaches to diagnose different structures of hydrological models and related uncertainties.The model intercomparison efforts WATCH WaterMIP (Water and Global Change Water Model Intercomparison Project;Haddeland et al., 2011) and ISI-MIP (Inter-Sectoral Impact-MIP; Warszawski et al., 2014) have shown the effects of different model structures (Gudmundsson et al., 2012a, b;Hagemann et al., 2013;Van Loon et al., 2012;Prudhomme et al., 2014;Schewe et al., 2014) even though this was not explored systematically.For example, values for global annual evapotranspiration between 60 000 and 85 000 km 3 yr −1 were reported in the WATCH WaterMIP study (Haddeland et al., 2011).In such multimodel studies, many completely different models are participating, which makes it very difficult to identify the reasons for different model behavior.A sensitivity study using basically the same model but with a refined model structure can therefore be of benefit (e.g., Thompson et al., 2013).
Model parameters are used to represent system dynamics in solvable equations, in particular when the hydrological process cannot be described physically.These parameters are generally not measurable and, hence, are a source of uncertainty that can influence model results to varying degrees.Within the GCM community, the perturbed physics ensemble Hydrol.Earth Syst.Sci., 18, 3511-3538, 2014 www.hydrol-earth-syst-sci.net/18/3511/2014/ approach has been used to assess this kind of uncertainty in a structured way (Collins et al., 2006;Rowlands et al., 2012).
Global-scale hydrology applications also assess parameter uncertainty.For example, Gosling and Arnell (2011) used seven sets of parameter perturbations for two model parameters of the GHM Mac-PDM.09(Macro-scale -Probability-Distributed Mositure model.09).For the GHM WaterGAP, Kaspar (2003) investigated the impact of uncertainty of 38 model parameters on simulated river discharge by conducting various model runs with a sampling of parameter values within specific ranges.He found that major uncertainties are related to evapotranspiration parameters and land cover specific attributes.Schumacher et al. (2014) confirmed the sensitivity of model output (here: monthly total water storage) to radiation calculation and related parameters in Water-GAP which, together with a river roughness coefficient and precipitation, dominate uncertainty in many of the 33 investigated river basins.Groundwater-related parameters and soil parameters were found to be important for the timing and variation of total water storages in WaterGAP (Werth and Güntner, 2010).
Model parameters can be adjusted by calibration, such that model output matches an observed set of data.Whereas basin-scale hydrological models are routinely calibrated against observed river discharge (e.g., Beven, 2001), this is only seldom the case for GHMs.Widén-Nilsson et al. (2007) used different model parameter sets within the GHM WASMOD-M to define optimal parameter values on river-basin scale.WaterGAP is calibrated against observed river discharge in a basin-specific manner by varying one soil parameter (and up to two correction factors) (Döll et al., 2003;Hunger and Döll, 2008).
In this paper, we analyze and quantify the uncertainty in simulated global-scale freshwater water fluxes and storages due to (1) spatially distributed input data, and (2) model structure and modeling approach, using the most recent version of the GHM WaterGAP 2.2.Previous studies (Kaspar, 2003;Schumacher et al., 2014;Werth and Güntner, 2010) have already investigated both parameter sensitivity and uncertainty for WaterGAP.To assess recently available climate forcing and land cover input data as well as significant modifications of WaterGAP model structure during the last decade were major motivations of this experiment.
In particular, we will answer the following research questions: 1. How sensitive are freshwater fluxes and water storages to spatially distributed input data (climate forcing, land cover)?
2. What are the benefits of WaterGAP model structure refinements implemented during the last decade?
3. How does the modeling approach (calibration procedure, consideration of human water use) affect freshwater fluxes and water storages?4. Which type of uncertainty is dominant for specific fluxes and variations of total water storage?
After an initial description of WaterGAP 2.2 (for details see the Appendix), the experimental setup is explained (Sect.2).In Sect.3, the results are described; focusing on the effect of the different model variants on global freshwater fluxes and water storages as well as spatial patterns.In Sect.4, we discuss the results with regard to the research questions.The paper ends with a summary and conclusions (Sect.5).

Description of WaterGAP 2.2
The global hydrology and water use model WaterGAP (Fig. 1) consists of two major parts: the water use models for five different sectors (Appendix C) and the WaterGAP Global Hydrology Model (WGHM, Fig. A1).The submodel GroundWater Surface Water USE (GWSWUSE) (Appendix D) is used to distinguish water use from groundwater and surface water sources and computes net abstractions from both sources which are an input to WGHM (Fig. 1).Using a number of water storage equations (change of storage over time equals to inflow minus outflows; Appendix A), WGHM calculates daily water flows and storages at a spatial resolution of 0.5 • by 0.5 • (55 km by 55 km at the Equator) for the whole land area of Earth except Antarctica (66 896 cells).Water-GAP 2.2 is calibrated against mean annual river discharge at 1319 gauging stations, and the adjusted calibration factor is regionalized to grid cells outside the calibration basins (Appendix B).
Since the initial publication of WaterGAP 2.1d (Döll et al., 2003), major changes were made to keep the model up to date.For example, algorithms of reservoir operation were included (Döll et al., 2009), groundwater recharge was optimized by distinguishing semiarid/arid regions from humid regions (Döll and Fiedler, 2008), a variable flow velocity algorithm was included (Verzano et al., 2012) and the source of abstracted water was considered (Döll et al., 2012).

Experiment setup
Six WaterGAP model variants (Table 1) were designed as follows.The standard version of WaterGAP 2.2 (STAN-DARD) was modified regarding only one aspect, including either alternative climate forcing (CLIMATE), land cover input (LANDCOVER) or model structure (STRUCTURE).Each model variant was independently calibrated.Variant NoCal is an uncalibrated simulation with the standard version of WaterGAP 2.2 to study the impact of the calibration approach.Variant NoUse reflects naturalized water flows and storages without the impact of human water use, and thus also renewable water resources.
In addition, for assessing the effect of uncertainties on renewable water resources, variants CLIMATE, LAND-COVER, STRUCTURE and NoCal are also run without considering any water abstractions.The simulation period is 1901-2009 but model results are evaluated only for 1971-2000.

Climate input
Climate forcing data for global-scale hydrological models are a major source of uncertainty for two main reasons: (1) they are subject to measurement errors which were not corrected in the original input data and (2) they are subject to interpolation errors due to low spatial and temporal monitoring network density and/or because (temporal) data gaps have to be filled.To analyze the sensitivity of calibration and simulated freshwater fluxes to different climate forcing data sets, two climate forcings were used to force both WGHM and the Global Irrigation Model GIM (Döll and Siebert, 2002) In variant STANDARD, the daily WATCH Forcing Data methodology applied to ERA-40 (40-year European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis) data (WFD) (Weedon et al., 2011) for the years 1901-1978 (the years 1901-1957 are based on reordered reanalysis data) and the WATCH Forcing Data methodology applied to ERA-Interim reanalysis data (WFDEI) (Weedon et al., 2014) for the years 1979-2009 was chosen.Switching the climate input data set in 1979 leads to inconsistencies in terms of AET (actual evapotranspiration; higher in WFDEI) and therefore affects the storages until a new equilibrium is reached (see Sect. 3.1).WFD and WFDEI monthly sums/means are bias-corrected with other data sources (temperature bias correction, shortwave radiation adjustment using cloud cover and adjustment of number of wet days to CRU TS 2.1 for WFD and to CRU TS 3.1 for WFDEI as well as adjustment of monthly precipitation sum to GPCC v4 for WFD and GPCC v5 for WFDEI and snowfall undercatch corrected after Adam and Lettenmaier, 2003).To calculate net shortwave radiation, the incoming shortwave radiation is reflected by literature-based land cover specific albedo values (see Table A2).Literature-based-emissivity values for all land cover classes (Wilber et al., 1999) and the Stefan-Boltzmann equation are used to calculate outgoing longwave radiation.The difference to incoming longwave radiation represents net longwave radiation.Net radiation is the sum of both components.
In variant CLIMATE, the monthly data set CRU TS (timeseries) 3.2 (Harris et al., 2014) was used but monthly precipitation totals were replaced by the latest GPCC v6 precipitation monitoring product (Schneider et al., 2014) because it includes more observation stations.Monthly means are disaggregated to daily values within WaterGAP (Döll et al., 2003).Neither CRU nor GPCC precipitation is corrected for observational errors, e.g., wind-induced-precipitation undercatch.Thus, Döll and Fiedler (2008) included the catch ratios of Adam and Lettenmaier (2003) and used the empirical function of Legates (1987) to correct especially snow undercatch by dividing snow and liquid precipitation using a temperature-based approach.The correction of precipitation measurement bias leads to an average increase of 8.7 % compared to the original product.On 37.5 % of the land area (except Greenland and Antarctica), the increase of precipitation is larger than 10 %.Differences of mean values from both data sets (CRU/GPCC and WFD/WFDEI) occur due to the slightly different precipitation correction approach and the GPCC version used for scaling monthly sums.Monthly precipitation is equally distributed to the number of wet days provided by the CRU 3.2 data set; the distribution of wet days within a month is modeled as a two-state, first-order Markov chain (Döll et al., 2003).Cloudiness fraction was used to calculate incoming shortwave radiation as well as outgoing longwave radiation after Shuttleworth (1993), see also Döll et al. (2003).

Land cover input data
The distribution of land cover classes and associated attributes are affecting simulated fluxes in terms of radiation energy balance (albedo and emissivity), snow dynamics (degree-day factor D F ), available soil water capacity (rooting depth) and interception capacity (L) (for details see Appendix A).To estimate the effect of different, homogeneoussource land cover data, two input maps were used (Fig. 2).Attributes and model parameters associated to land cover classes were derived from literature or previous model versions (Table A2) and left equal in both variants.In variant STANDARD we used the gridded MODIS (Moderate Resolution Imaging Spectroradiometer) land cover product (MOD12Q1) for the year 2004.The product MOD12Q1 (1 km resolution, global coverage up to 80 • N) was used with land cover type 1 according to the International Geosphere-Biosphere Programme (IGBP) classification.After resampling to 0.5°spatial resolution, the data set was reclassified to fit to the WaterGAP land cover classification system (Table A2).As water bodies (from the global lakes and wetlands database, GLWD; Lehner and Döll, 2004) and percentage of urban area (from previous model versions) are obtained by additional input files, the second land cover class was appointed in case of "water" or "urban and builtup" as primary land cover.For coastal grid cells, which are not fully covered by MODIS and north of 80 • N, a combination of Global Land Cover Characteristics database GLCC (USGS, 2008) + CORINE (Coordination of Information on the Environment) land cover information was used.

H. Müller Schmied et al.: Sensitivity of simulated global-scale freshwater fluxes and storages
In variant LANDCOVER, a combination of the GLCC based on the years 1992 and 1993 and, for Europe, CORINE land cover based on the year 2000 (European Environment Agency, 2004) was used as land cover information, as also in a previous WaterGAP version (Haddeland et al., 2011).The idea was to use an IGBP-based classification scheme and a remote-sensing-based land cover distribution instead of IM-AGE (Integrated Model to Assess the Global Environment; Alcamo et al., 1998) model outputs (as in previous model versions).Both input data sets have a resolution of 1 by 1 km and were aggregated to the 0.5 • model resolution by assigning the majority land cover type.

Structural model changes
During the last 10 years, the WaterGAP model was subject to several revisions and improvements in terms of hydrologic process representation, resulting in an overall more complex model structure.To assess the sensitivity of simulated freshwater fluxes to model complexity, one model variant with a simplified structure comparable to Döll et al. (2003) (variant STRUCTURE) was set up which was run with the same input data as all other model variants.Differences of variant STRUCTURE as compared to the other variants are as follows.
-Flow velocity is globally set to 1 m s −1 and the meandering ratio is set to 1.0, instead of the variable flow velocity algorithm of Verzano et al. (2012) in the other variants.
-Reservoirs are treated as global lakes, i.e., the reservoir operation algorithm of Döll et al. (2009) is not used, which should result in a more dynamic discharge downstream of reservoirs.
-Water for human water use is abstracted only from surface water bodies; i.e., there are no groundwater abstractions as introduced by Döll et al. (2012).
-Evaporation from lakes/wetlands is not adjusted by reduction factors (Hunger and Döll, 2008) resulting in evaporation at potential rate even at low storage.
-Snow accumulation and melt are modeled on 0.5 • (instead of the 3 arcmin subgrid (Schulze and Döll, 2004)) which should lead to less snow dynamics.
-Finally, there is no distinction in groundwater recharge for semiarid/arid and humid regions (in contrast to Döll and Fiedler, 2008, all regions are treated like humid regions) resulting in higher groundwater recharge in semiarid/arid regions.

Human water use
In many areas of the globe, human water use significantly affects water flows and storage.In this experiment, all model variants except NoUse and STRUCTURE are taking into account water use from surface water and groundwater resources.In variant NoUse it is assumed that there are no water abstractions at all, while in STRUCTURE, water is only abstracted from surface water (as formerly no information on the source of water abstractions was available).

Calibration
WGHM is calibrated against mean annual discharge in a basin-specific manner by adjusting a runoff coefficient that affects the outflow from the soil compartment in each grid cell of the 1319 calibration basins.If necessary to simulate mean annual discharge within 1 % of the observed value, two additional correction factors are adjusted (for details see Appendix B).All other parameters are globally uniform (or land cover class dependent), based on literature or experiences from past studies; i.e., there is no basin or region specific modification.All model variants except NoCal are independently calibrated to the same observational data.In variant NoCal, the runoff coefficient and both correction factors are set to 1.0 in all grid cells.The comparison of NoCal to, for example, STANDARD allows for a direct quantification of the effect of calibration on simulated water fluxes and storages.water abstractions and groundwater abstractions (which differs in CLIMATE due to the forcing of GIM (Appendix C) and in STRUCTURE where water demand is entirely extracted from surface water resources) and due to the different water availability for abstractions.In all cases, a large share of the total water demand could be satisfied (between 90 % in STRUCTURE and 96 % in CLIMATE).

Global water balance
When human water use is not taken into account (NoUse), AET increases by 131 km 3 yr −1 because evaporation from open water bodies increases as they are not depleted by water uses and additional evapotranspiration of irrigated crops is not included in AET (but quantified within WC a ; row 4 in Table 2).As expected, river discharge is higher (by 758 km 3 yr −1 ) in NoUse.Changes in total water storages (142 km 3 yr −1 less storage decrease) are also visible, especially due to no groundwater withdrawals in this variant (Table 3).The sum of these differences between STANDARD and NoUse is 1031 km 3 yr −1 which equals to WC a (row 4 for STANDARD in Table 2).
The calibration has a strong effect on freshwater fluxes.Global discharge to oceans and inland sinks (Q) in No-Cal is about 6400 km 3 yr −1 (or 15.7 %) higher than in STANDARD, meaning that the main effect of calibration is lowering discharge.In many river basins, the calibration parameter γ is higher than the value 1.0 globally used in No-Cal which reduces the share of effective precipitation actually contributing to runoff.Consequently, AET is lower by nearly the same amount.
When comparing CLIMATE to STANDARD, both P and Q increased by approximately 1900 km 3 yr −1 whereas global AET sums are nearly equal.Most additional Q (1546 of overall 1906 km 3 yr −1 ) is generated in noncalibrated grid cells mostly because of an increased P (which explains 1200 of the additional 1546 km 3 yr −1 ) and a reduced AET (which explains 282 of the additional 1546 km 3 yr −1 ) in these grid cells.
RWR equal the long-term-averaged discharge to oceans and inland sinks (Q in Table 2) but without considering human water withdrawals.For the STANDARD model variant, RWR are 1.9 % higher than with WC a (row 3 in Table 2, column NoUse and STANDARD).Q of the other model variants and hence RWR increase by about a similar value (No-Cal 2.0 %, LANDCOVER and STRUCTURE 1.9 %, CLI-MATE 1.6 %; values not shown in Table 2).
The decreasing trends of total water storage are mainly caused by groundwater depletion, except in variants NoUse and STRUCTURE where no groundwater abstraction is modeled.Interestingly, NoCal shows a smaller decrease in groundwater storage than STRUCTURE.This is also due to the calibration parameter γ which is on average lower in case of NoCal.The lower γ , the more water leaves the soil and can subsequently contribute to groundwater recharge.Note that water abstractions from groundwater are taken directly from the groundwater storage and also return flows are added directly to groundwater storage (without passing the soil compartment).Hence, there is no difference in soil water storage between STANDARD and NoUse (Table 3).
Except for groundwater and snow, CLIMATE shows less storage depletion than all other variants that are forced by WFD/WFDEI (Table 3).The strong decrease in case of WFD/WFDEI is an artifact caused by combining WFD before 1979 with WFDEI after 1979.With WFDEI based on the ERA-Interim, AET is approximately 70 000 km 3 yr −1 , compared to 65 000 km 3 yr −1 in the case of WFD.This is caused by differences in the shortwave downward radiation (much higher in WFDEI) which impacts the net radiation as the main input for calculating potential evapotranspiration after Priestley and Taylor (1972).As all model runs are started in 1901, the storages are more or less in equilibrium until 1978.AET is increased in the following 22 years by ca. 10 %, which leads to a higher water loss and therefore to a reduction of all storage compartments.For all storages except snow, reservoirs and groundwater, a new equilibrium is achieved a few (approximately 5) years after 1979 on a lower level (STANDARD variant).Whereas snow storage is not influenced at all, groundwater storage is affected by groundwater depletion and reservoirs by water use and obvious limitations of the reservoir algorithm.Thus, an equilibrium is not reached in the global average of the latter two storages but decreasing after 1901.

Actual evapotranspiration
Mean AET shows the highest values around the Equator, consistent with available energy, except for the Pacific Rim of South America (Fig. 3a).Among the variants, the largest differences to STANDARD occur in the case of the uncalibrated version NoCal (Fig. 3f).As the calibration approach also affects grid cells outside of the 1319 calibration basins due to the regionalization (Appendix B3), all grid cells are affected.In most regions, calibration leads to higher AET but in the upstream Amazon, Congo, Arctic river basins and some other basins, the opposite is true.The global sum of AET of NoCal is 9.2 % lower than estimated with STAN-DARD (Table 2).Notable differences in AET also occur when using an alternative climate input (Fig. 3b).AET increases in CLIMATE on 42.6 % of the land surface by more than 10 mm yr −1 and decreases by more than 10 mm yr −1 on 30.5 % of the land surface.It increases (decreases) by more than 100 mm yr −1 on 5.4 % (5.6 %) of the land surface.When summed globally, only minor changes in AET occur in the CLIMATE variant (increase of 0.06 % or 39 km 3 yr −1 ; Table 2).In contrast, AET differences of the STRUCTURE variant are higher for the global sum (increase of 0.6 % or 414 km 3 yr −1 ) but occur on an overall smaller area (increase by more than 10 mm yr −1 on 11.9 % of the land surface, decrease on 14.2 %).The effect of STRUCTURE is visible in areas with surface water bodies and in snow-dominated areas.On the one hand, an increase in net radiation in snowy regions leads to a slight increase of AET but in small absolute numbers as total AET is comparatively low.On the other hand, effects due to the evaporation reduction factor for surface water bodies are visible.In all variants, except STRUC-TURE, evaporation is limited when the surface water body storage is reduced to mimic the shrinking of surface area.Hence, in regions with a high percentage/volume of surface water bodies, AET is increased.In addition, more complex effects occur.The Great Lakes, for example, evaporate with potential evapotranspiration E P (Appendix A1) in STRUC-TURE, even when the lake storage is relatively low.This results in a relatively low modeled discharge which fits well to the observed ones.Hence, no correction factor (neither CFA nor CFS) is required in the Great Lakes Basin.However, in STANDARD, the reduction factor reduces evaporation by up to three-fourths of E P .The resulting higher modeled discharge has to be reduced by an increased AET in STANDARD (and in the other model variants) on the land around the lakes as compared to STRUCTURE (red areas around the Great Lakes in Fig. 3).
Differences between NoCal and STANDARD are caused by the calibration parameter γ which differs from 1.0 (No-Cal) in most river basins of STANDARD (and of the other model variants).For example, there are blue patterns in China and South America.In both regions, γ is lower than 1.0 in STANDARD which results in higher runoff and less modeled AET.In many other regions (red areas), γ is greater than 1.0 in STANDARD.
AET differences between LANDCOVER and STAN-DARD (Fig. 3c) are caused by changes in net radiation in energy-limited areas (not shown) as well as changes in rooting depth.In general, minor differences occur (except in some basins; see explanation below).In some regions, an increasing net radiation results in an increasing AET, e.g., in parts of Angola.In water-limited areas (e.g., northeastern Brazil), insignificant changes of AET occur even if net radiation strongly increases.In northern Australia, AET increases even when net radiation is reduced.Here, large parts are defined in STANDARD as open shrubland (rooting depth of 0.5 m) and in LANDCOVER as savanna (rooting depth of 1.5 m).As soil storage capacity is a function of rooting depth, even with more energy available for evapotranspiration, only half of the soil water can be evapotranspirated due to the limited rooting depth.Neglecting human water abstraction in variant NoUse would lead to an overestimation of AET in regions where water abstraction for irrigation leads to reduction of wetland areas (Fig. 3e), and a global AET overestimation by less than 0.2 % (Table 2).
In WaterGAP 2.2, AET can become negative in some (mostly snow dominated) regions, where precipitation input is too low to reproduce observed discharge (grey colors in Fig. 3a).The total water balance of each large water body is calculated in its outflow cell; hence, AET can become very large as the value in millimeters is calculated by dividing AET over the whole lake by grid cell area.

Renewable water resources
RWR (mean annual runoff of the grid cell to the river without consideration of human water use) are dominantly influenced by the calibration (NoCal) and subsequently by input data and model structure (Fig. 4).
As RWR are approximately the difference between precipitation and AET, the difference maps (Fig. 4b-e) represent more or less the inverted difference maps in Fig. 3. Compared to STANDARD, the largest differences occur in model variant NoCal.In contrast to AET, calibration leads in many cases to lower RWR.The global sum of RWR of NoCal is 15.8 % higher than with STANDARD.The global sum of RWR from CLIMATE is 4.7 % higher but with a large spatial spread.RWR decrease in CLIMATE on 21.4 % of land surface by more than 10 mm yr −1 and increase by more than 10 mm yr −1 on 29.9 % of the land surface.RWR decrease (increase) by more than 100 mm yr −1 on 4.7 % (9.0 %) of the land surface.The differences in LANDCOVER mainly follow differences in net radiation (not shown).In snowdominated regions, RWR are lower in STRUCTURE because snow-cover dynamics are less intense than in STANDARD.In grid cells with (large) surface water bodies, RWR are lower in STRUCTURE (as AET is unlimited here even if storages are nearly empty).

River discharge seasonality
River discharge is the integral result of runoff generation, water losses by evaporation from surface water bodies, positive or negative net abstractions from surface water bodies and groundwater, and routing processes.It is one of the most important diagnostic variables in water resources.In many regions, river discharges have been observed for decades, providing an important data source for model evaluation.A good representation of modeled seasonality in comparison to the observed one is therefore a criterion for model evaluation.We compared observed and modeled discharge seasonality at the outflow of 12 large river basins, covering different climatic zones and levels of anthropogenic influence (Fig. 5).Climate input and model structure influence modeled discharge seasonality more than land cover changes for the selected river basins.Where seasonality of climate is high, like in the monsoon-dominated Mekong Basin, only marginal differences occur due to land cover and model structure.Structural model refinements have also important effects on discharge seasonality.For example, the constant flow velocity of STRUCTURE (in contrast to variable flow velocity in the other variants) leads to a higher peak in the Lena.Here, the variable flow velocity algorithm underestimates flow velocity in the lower reaches where bed slopes are very small.This leads to a strong underestimation of peak flow (which explains the improved seasonality of STRUC-TURE compared to observed discharge in the Lena).The reservoir algorithm which is not enabled in STRUCTURE has impacts at the Yangtze, Rio Parana, Mississippi and the Volga rivers in terms of smoothing the discharge.For the Rio Parana, this is the main influence in the STRUCTURE variant.The representation of snow in STANDARD leads to a more heterogeneous snow coverage as compared to the STRUCTURE variant.The strongest impact occurs for the Rhine, where the snow algorithm is the dominant reason for the differences to STRUCTURE.In STRUCTURE, the snow water storage of the Rhine headwater (Alps) is generally lower.In particular between May and October (the Alps are modeled as snow-free between June and September), this leads to a decrease of discharge as snowmelt cannot contribute any longer as it does, for example, in STAN-DARD.The importance of the climate forcing can be seen in the Mississippi and the Rhine where CLIMATE results in overestimated peak seasonal discharge.In the Danube, the WFD/WFDEI climate input (in STANDARD) is particularly beneficial, as the fit to observed seasonality is much better than with CRU TS 3.2/GPCC v6 climate (in CLIMATE).
For the Mackenzie, all model variants are close to each other but far away from observations.Here, freezing and thawing of the river are not reproduced as none of the model variants represents these processes.Interestingly, the Lena basin is also frozen during wintertime but here low flows are simulated quite well.In the Amazon, the model variants underestimate the delay of peak discharge which might be explained by the lack of modeling dynamic floodplain inundation.
The impact of alternative land cover is only slightly influencing discharge seasonality.Most effects occur at the Rhine, where CORINE-based land cover (variant LANDCOVER) consists dominantly of cropland.Many grid cells in the other model variants consist of mixed forest or cropland/natural vegetation mosaics which have a lower albedo, resulting in more evaporation and less discharge especially in the summer months.Additional effects occur due to deeper roots in the mixed forest class.Only for the Mackenzie, Lena and Yangtze, are mean monthly river discharges of NoCal within the range of all other variants in some months.The NoCal values for the Orange are higher than any observed value (and the values of the other variants) throughout the year (Fig. 5).This highlights the need for a calibrated model for discharge analyses.

Monthly time series
Nash-Sutcliffe efficiencies E NS (Eq.1; Nash and Sutcliffe, 1970) were calculated for time series of monthly river discharges at the gauging stations used for calibration.
where O i is observed discharge, S i is simulated discharge and O is mean observed discharge (all units in km 3 month −1 ).By adjusting the mean annual river discharge as done in our calibration approach, the E NS of monthly discharge increases in all calibrated model variants as compared to the NoCal variant because E NS is sensitive to both the mean and variances (Fig. 6).Among all calibrated variants, STAN-DARD and NoUse achieve the highest mean E NS values, while variant STRUCTURE shows a distinctly lower model performance (Fig. 6).This is further confirmed by the E NS distribution per Köppen-Geiger region (Table 4, column "sum"), where for the STANDARD and NoUse variants E NS is larger than 0.5 in 53.5 % of the basins.Comparing STANDARD and STRUCTURE, model development clearly improved simulation results in A, C and D climates.The CLIMATE variant performs better in cold areas but overall performs worse than STANDARD, particularly in temperate climate.No significant differences occur when using an alternative land cover input (LANDCOVER).Performance of all variants is very poor in arid (B) climate.

Variations of total water storages
Simulated temporal variations of TWS, i.e., the total amount of water in all continental water storage compartments (Fig. A1), are used widely in the context of analyzing information derived from the Gravity Recovery and Climate Experiment (GRACE).The dominant seasonal changes of TWS can be characterized by the difference between the minimum and the maximum value of mean monthly TWS .The spatial distribution of seasonal TWS variations (Fig. 7a) is similar to that derived with an earlier version of WaterGAP (see Fig. 4 in Güntner et al., 2007).Seasonal TWS variations are affected most strongly by the climate forcing (Fig. 7b).For example, in Europe and eastern US, they are more than 25 mm higher in case of CRU/GPCC climate forcing.This finding is consistent with the impact of climate forcing on river discharge, e.g., of the Danube (Fig. 5).The calibration approach leads to a decrease of TWS variation in areas where runoff is overestimated (Fig. 7f).Where land cover attributes vary significantly due to different land cover classes in LANDCOVER, the effects on TWS variations are strong (e.g., in the southern Congo or southern Amazon).Neglecting groundwater abstractions (as done in NoUse, which neglects any human water use, and in STRUC-TURE, where water is only abstracted from surface waters) leads to lower seasonal TWS variations in areas of groundwater abstractions (if in case of STRUCTURE surface water is not able to satisfy water uses) and groundwater depletion  (e.g., High Plains Aquifer in central US, Iran and northwestern India) (Fig. 7d, e).In these two variants, seasonal groundwater storage variations are solely driven by seasonal variations of groundwater recharge.Without simulating water use, some areas with extensive surface water irrigation have higher seasonal variations than with water use because large return flows during the dry (irrigation) season smooth natural groundwater storage variations.
In addition, seasonal TWS variations in STRUCTURE differ from STANDARD particularly along large rivers (Fig. 7d), mostly with a smaller range in STRUCTURE.There, the flow velocity (variable in STANDARD) is lower than the constant 1 m s −1 in STRUCTURE, resulting in increased river storage.In many cold areas, the simpler snow algorithm in STRUCTURE leads to increased TWS seasonality.

Comparison of simulated freshwater fluxes to other estimates
The modeled AET and discharge to the oceans and inland sinks for all model variants are within the range of published values except for the NoCal variant, which has very low AET and high discharge values (Tables 2, 5).Discharge estimates differ due to the applied estimation method and precipitation data set.Mueller et al. (2013) do not consider a precipitation undercatch correction and assume a global precipitation of ∼ 99 000 km 3 yr −1 which is low compared to recent estimates of Schneider et al. (2014) (117 000 km 3 yr −1 ) or the values used in this study (Table 2).Compared to previous WaterGAP results, model refinements have led to an increase of discharge.The value of STANDARD is approximately 450 km 3 yr −1 higher than for STRUCTURE (Table 2), and previous estimates (Döll et al., 2003) are even lower as precipitation undercatch was not taken into account.

Advantages and limitations of the calibration approach
The applied calibration approach is clearly beneficial as it leads to a better fit of simulated to observed monthly river discharge time series (Fig. 6; Table 4).Consequently, the basin-specific adjustment of 1-3 parameters (γ , CFA and CFS; see Appendix B1) based on observed mean annual discharge has been a part of the WaterGAP modeling approach since the beginning.Calibration allows to a certain degree compensating errors in input data and effective model parameters.Also, structural problems of the model, e.g., due to the simplified representation of hydrological processes at a half-degree grid cell, may be balanced out.(Fig. 4e) dominates all other modifications within this experiment setup.However, the correction of total cell runoff using CFA and CFS that is required to force simulated mean annual river discharge values to be equal to observed values is not ideal and has undesirable effects on estimated AET and RWR.In the Yenisey Basin, upstream of Igarka (western Siberian Plain), AET is largely reduced in one-half of the basin (and vice versa) when using alternative climate forcing.Transferring the correction factor CFS (which is, if necessary, calculated at the outflow grid cell of the basin) to the upstream grid cells can lead to unrealistic high positive and negative values for AET if precipitation is too low in these parts of the basin to simulate observed discharge or if the AET of surface water bodies has to be reduced by CFA.This is the reason for some artificial patterns in Fig. 3 and consequently in Fig. 4.These kinds of consistency errors can be found in some more basins where cumulative AET is low and parts of the basins are covered with surface water bodies.Nevertheless, the approach ensures a closed water balance for the whole basin.
Obviously, one parameter is not sufficient to calibrate the model.In many basins the γ parameter is not sensitive to input data and model structure in the current calibration approach as the range of γ through all four variants (NoCal is not considered, NoUse has the same value as STANDARD) is rather small.Of the basins in Fig. 8, 59 % are colored dark blue which means that the calibration parameter γ has the same value in all model variants.Here, γ is at its artificial minimum (0.1) or maximum (5.0) value, and the influence of input data and model structure, which were modified in this experiment, is insignificant.However, in 21 % of the basins, γ is differing by > 1 (green, yellow and red colors).In these basins, the calibration parameter is sensitive to input data and model structure.Therefore, within future model development, one task is to restructure the calibration approach with the aim to avoid correction factors or rather to introduce and test alternative calibration objectives.This could be achieved by either including more parameters (multiparameter calibration) and/or by integrating additional reference data, e.g., GRACE-based data as was shown by Werth and Güntner (2010) (multiobjective calibration).In addition, remote-sensing-based input data with global coverage have been available for a decade.Especially for land cover characteristics (e.g., land cover type, L, and albedo; see Appendix A), a more realistic representation of dynamics (integration of time series as input data instead of static input maps) can reduce the input data and model parameter uncertainty.

How sensitive are freshwater fluxes and water
storages to spatially distributed input data (climate forcing, land cover)?
In general, more differences occur due to the alternative climate input than due to the alternative land cover data.The major freshwater fluxes (AET, Fig. 3, and RWR, Fig. 4) as well as river discharge (Fig. 5) show in many cases that land cover input has much less impact (except for some areas where the attributes of a changed land cover type differ significantly).The effect of different land cover input would probably increase if the associated attributes were also modified.Forced with CRU 3.2 and GPCC v6 instead of WFD/WFDEI input, AET increased by at least 10 mm yr −1 in large parts in the world (light blue colors in Fig. 3b).
In those regions with similar precipitation amounts but different radiation, RWR decreased by the same amount that AET increased (e.g., Southeast Asia, Australia, Saudi Arabia).In other regions, no clear effect on RWR is detectable (e.g., North America).In some parts of Europe, RWR increased by at least 10 mm yr −1 even if AET increased.Here, besides radiation (affecting AET), the amount of precipitation is of great importance (affecting RWR).
In regions where the climate forcing data sets differ significantly (e.g., Danube River basin), the impact on discharge is large (Fig. 5, bottom center panel).Here, differences in temperature and precipitation amounts lead to a poor fit compared to observed discharge when using the CLIMATE variant which is also reflected in the E NS criterion (Fig. 9b).Also, the two land cover input data sets used here result in the same E NS classes, with only a few exceptions (Fig. 9c).

What are the benefits of WaterGAP model structure refinements implemented during the last decade?
In general, WaterGAP 2.2 STANDARD leads to improved results compared to the reduced model version STRUC-TURE that is comparable to the Döll et al. (2003) model version.In many basins in the Alpine region in central Europe, the E NS of STRUCTURE ranks behind that of STANDARD (Fig. 9d, red colors) reflecting a refined simulation of snowcover dynamics on the 3 arcmin subgrid.In some basins, the reservoir algorithm improves E NS (and discharge seasonality).For example, the Volga at station Volgograd Power Plant (see also Fig. 5) and basins in Brazil show a much better E NS (Fig. 9d) in STANDARD compared to STRUC-TURE.However, the E NS of some basins with E NS < 0.5 in STANDARD is improved in STRUCTURE.In summary, integrating more complex and refined process descriptions (see Sect. 2.2.3) in the past decade has led to the improved simulation of monthly time series of river discharge with WaterGAP.However, discharge before calibration tends to be higher with the implemented structural changes, e.g., due to the storage-dependent reduction of surface water evaporation.This together with the use of more calibration stations (Hunger and Döll, 2008) and the introduction of a bias correction for observed precipitation (Döll and Fiedler, 2008) has had the problematic consequence that correction factors to lower simulated river discharge have increasingly been required to ensure that simulated mean annual river discharges are equal to observed values.

How does the modeling approach (calibration procedure, consideration of human water use) affect freshwater fluxes and water storages?
The calibration procedure reduces simulated river discharge and water resources on most of the land area and increases the AET (Figs. 3, 4; Table 2).Without calibration, global AET and discharge would rank at the lower and higher end of the published values, respectively (Table 5).In addition, the fit to observed monthly river discharge time series as quantified using the  everywhere (Fig. 9f).The impact of calibration on freshwater fluxes and water storages is higher than those of alternative climate forcings and land cover data, and of a more sophisticated model structure.This confirms the strong benefit of calibration.However, as E NS is affected by mean discharge as well as discharge variations, the calibration approach improves this criterion.
Compared to the other variants, the consideration of human water use does not have large effects on freshwater fluxes and storages at the global scale.In regions with intense water use, in particular from surface water bodies (e.g., in Pakistan), AET without considering additional evaporation from WC a (Table 2) is reduced due to human water use (Fig. 3e).This effect occurs because human water uses decrease surface water storages and thus the reduction factor decreases evaporation from surface water bodies (Appendix Sect.A5).If the impact of human water use on river discharge were not considered, van Beek et al. (2011) showed there would be a lower performance in general.Within our experiment, higher correction factors would be necessary in basins with large abstractions from surface water bodies or significant decreases of baseflow due to groundwater abstractions.Still, the E NS of basins with high amounts of human water use is generally lower than those without human water use (not shown).In some basins, mainly in northeastern Europe, E NS improves when neglecting human water use (Fig. 9e).This obviously reflects uncertainties in water use models.

Which type of uncertainty is dominant for specific fluxes and variations of total water storage?
The answer to this question depends on the type of fluxes and the spatial aggregation.Regarding selected global sums of freshwater fluxes (Q and AET) and mean annual total water storage trends dTWS, dominant uncertainties can be determined by computing differences between the values computed with a certain model variant and STANDARD.As already shown above, global values of AET and Q as well as the fit of simulated to observed river discharge time series (E NS ) are most sensitive to whether the model is calibrated or not (Table 6).STRUCTURE and NoUse have the strongest impact on the global TWS trend (Table 6) as these model variants cannot reflect groundwater depletion.More refined Regarding grid-cell-specific differences that are more relevant than global values for most applications, the ranking of dominant uncertainties is quite different.Patterns of seasonal TWS variations are affected most strongly by the climate forcing (Fig. 7b), while climate forcings show the second largest impact on the spatial distribution of AET and RWR,after calibration (Figs. 3,4).The fraction of the global land area that is affected by significant differences of AET and river discharge between a certain model variant and the STANDARD variant is largest in the case of NoCal, followed by CLIMATE, LANDCOVER, STRUCTURE and NoUse.Thus, both global and grid cell values are most sensitive to calibration.The larger sensitivities to climate forcings and land cover input at the grid cell level (Table 7) cancel when globally averaged.The larger sensitivities of globally aggregated values (Table 6) to structural changes and the consideration of water use is due to unidirectional changes for all affected grid cells, but different to alternative climate and land cover data, structural changes and water use only affect a limited number of grid cells.This discussion on the dominant type of uncertainty does not take into account parameter uncertainty which is a major additional source of uncertainty (Kaspar, 2003).

Conclusions
We studied the sensitivity of freshwater fluxes and storages as computed by the GHM WaterGAP 2.2 to spatially distributed input data (climate forcing and land cover input) as well as model structure (model refinements during the last decade), consideration of human water use and calibration (or no calibration).We designed five model variants in addition to the standard variant.In each model variant, one component or feature was modified with respect to the standard variant.Sensitivity of different freshwater fluxes and water storage variations to the five types of uncertainty were analyzed and ranked considering both global sums and grid cell values, taking into account also the capability of the model variants to simulate time series of observed river discharge.Basin-specific calibration to mean annual river discharge was found to have the strongest impact on fluxes and storage variation and is the dominant reason for an improved simulation of observed monthly river discharge time series (as characterized by the Nash-Sutcliffe criterion).Uncertainty due to alternative climate forcing, and to a lesser extent, land cover input, leads to significant variations of grid cell fluxes (actual evapotranspiration, renewable water resources and river discharge) and storages (seasonal range of total water storage) even if the model variants are individually calibrated.However, these uncertainties largely cancel each other out at the global scale while the more refined model structure, and to a lesser extent water use, are more important for global sums of river discharge and actual evapotranspiration but also for an improved fit to observed monthly time series of river discharge.
The STANDARD variant of WaterGAP 2.2 leads to the best fit to observed river discharge (monthly time series, Fig. 6 and Table 4, and seasonality, Fig. 5).We conclude that the daily WFD/WFDEI data set as climate forcing is preferable to using a combination of the monthly CRU 3.2 and GPCC v6 data sets as done for model variant CLIMATE.However, we found that it is problematic to combine the WFD climate data set (covering 1901-2001) with the only seemingly consistent WFDEI data set (covering 1979-2009) due to a radiation bias (shortwave downward radiation component) between the two data sets.This results in a steep increase of actual evapotranspiration in 1979, and a water storage decrease between 1971 and 2000 that is an artifact of the combination of the two climate data sets (comp.Sect.3.1).It would be very beneficial for an improved estimation of global freshwater fluxes and storages to have a consistent daily climate forcing that covers the whole 20th and the early 21st century.
The calibration approach of WaterGAP is necessary to compensate uncertainties of spatially distributed input data, parameters and model structure.However, a calibration of only one parameter related to soil water balance is not sufficient and correction factors have to be applied in a number of basins.Therefore, a redesign of the calibration approach, with additional observations (e.g., including TWS variations as derived from GRACE gravity fields), other calibration objectives and adjustment of more model parameters (without correction factors) is planned.
The improved representation of hydrological processes of WaterGAP within the last decade led to a more complex model structure.In most cases, those modifications resulted in a better fit to observed river discharge.However, in some parts of the world, model performance is still not satisfactory due to an inappropriate modeling of certain processes such that further changes of the model structure are required.For example, the modeled discharge seasonality in the Amazon basin is shifted compared to that observed, which is suspected to be caused by inappropriate modeling of the temporal variations of inundations and the neglect of backwater effects.The reservoir operation algorithm does not yet take into account the construction year of the dam.Moreover, model results in semiarid and arid regions are poor, and improved modeling of evaporation from ephemeral ponds is planned.

A2 Snow
The change of snow water storage S sn (mm) over time t (d) is calculated as where P sn is precipitation, falling as snow at temperatures below 0 ) and E sn is sublimation (mm d −1 ).Snow accumulation and melt are modeled on a 3 arcmin subgrid (100 subgrid cells per 0.5 • ) using a degree-day algorithm (Schulze and Döll, 2004).Mean subgrid elevation was derived from GTOPO30 (US Geological Survey, 2003).The daily temperature for each subgrid cell is calculated from the temperature of the 0.5 • cell, applying an adiabatic lapse rate of 0.6 • C per 100 m.To avoid excessive snow accumulation, temperature does not decrease if a snow water equivalent of 1000 mm is reached in one subgrid.
At temperatures below 0 • C, all precipitation is assumed to fall and accumulate as snow.At subgrid temperatures T ( • C) above melting temperature T m (0 • C) and if snow storage is present, snow melts with land cover specific degree-day factor D F (mm d −1 • C −1 ) (Table A2) as Instead of using one specific albedo for snow as in previous versions (α = 0.4), land cover specific snow albedo values are used to account for differences in reflective properties between the land use classes under snow-covered conditions (Table A2).The albedo value switches to snow albedo if the snow water equivalent of the grid cell exceeds 3 mm, i.e., a closed snow cover is assumed.Sublimation E sn is modeled like potential evaporation rate but applying a latent heat of 2.835 (MJ kg −1 ) for temperatures below 0 ) is a function of potential evapotranspiration from the soil E p (mm d −1 ) minus the already evaporated water from the canopy E c (mm d −1 ), actual soil water content in the effective root zone S s (mm) and total available soil water capacity S s,max (mm) as where E p,max is 20 mm d −1 in semiarid and arid regions whereas 10 mm d −1 in grid cells classified as humid, S s,max is the product of total available water capacity in the upper meter of the soil (Batjes, 1996) and the land cover specific rooting depth (Table A2).
Runoff from land R l (mm ḋ−1 ) is calculated after Bergström (1995) as (A11) Dependent on the soil water storage S s , a part of effective precipitation P eff becomes runoff.If the soil water storage is empty, R l = 0.If the soil is completely saturated (at S s,max ), runoff equals effective precipitation.Between these points, the runoff coefficient γ determines the amount of precipitation that converts to runoff.This parameter is used for calibration (see Sect.B1).In urban areas (defined as a separate input map from IMAGE 2.2), 50 % of P eff is directly passed to the river.

A4 Groundwater
Inflow to groundwater storage S g (mm) is groundwater recharge R g (mm d −1 ), whereas outflows are baseflow Q g (mm d −1 ) and net abstractions from groundwater NA g (mm d −1 ) (Appendix C), which can also act as inflow (e.g., as additional groundwater recharge due to irrigation with surface water).Groundwater recharge R g (mm d −1 ) is calculated as a fraction of runoff from land: where R g,max is soil texture specific maximum groundwater recharge (mm d −1 ) (with values of 7/4.5/2.5 for sandy/loamy/clayey soils) and f g is the groundwater recharge factor (ranging between 0 and 1) related to relief, soil texture, aquifer type and the existence of permafrost or glaciers.For a detailed description see Döll and Fiedler (2008).If a grid cell is defined as arid and has coarse (sandy) soil, groundwater recharge will only occur if precipitation exceeds a critical value of 12.5 mm d −1 .Both values, R g,max and the precipitation threshold, are adapted to the climate forcing used (WFD) aiming to reach comparable groundwater recharge patterns to Döll and Fiedler (2008) as that groundwater recharge estimation is confirmed by experts within the WHYMAP (World-wide Hydrogeological Mapping and Assessment Programme; http://www.whymap.org)efforts.Within CLIMATE, the original values 5/3/1.5 for R g,max and 10 mm d −1 as precipitation threshold were used.
The outflow is modeled with k g = 0.01 d −1 as The runoff from land R l , which is not groundwater recharge R g , represents the fast surface runoff R s and is routed, together with Q g , through a series of different storages representing wetlands, lakes and reservoirs until reaching the river segment (Fig. A1).Outflow is in principle modeled like groundwater outflow (Appendix A4) for "local" lakes and wetlands, whereas "global" lakes and wetlands are linear storages whose equations are solved analytically.WaterGAP 2.2 does not consider variable land/water fractions as would be expected when a lake is shrinking due to evaporation and land surface increases; thus Hunger and Döll (2008) introduced a reduction parameter which reduces the evaporation when lake/wetland storage is low.In Water-GAP 2.2, for all surface water bodies the reduction factor r (−) is calculated as

Hydrol
where S is actual water body storage (m 3 ), S max is maximum water body storage (m 3 ) and p is the reduction exponent (−).As no truly global data set on lake volumes is available, the maximum storage capacity is determined by multiplying the surface area with an "active" depth (set to 5 and 2 m for lakes and wetlands, respectively).Values for p are 3.32 for lakes and wetlands which means a reduction of evaporation by 10 % if storage is halved and 2.81 for reservoirs, which means a reduction 15 % if storage is half of the maximum storage capacity (and a reduction of 50 % if storage is reduced to 20 % of storage capacity).
The distribution of wetlands is derived from GLWD (Lehner and Döll, 2004) as percentage of cell coverage.Locations and attributes of lakes and reservoirs are based on a combination of GLWD and a preliminary version of the GRanD (Global Reservoir and Dam) database (Döll et al., 2009;Lehner et al., 2011).In total, 6553 reservoirs, 52 regulated lakes (lakes whose outflow is regulated by a dam) (from GRanD) and 242 798 unregulated lakes (from GLWD) were considered.Out of these, 1386 large lakes (area ≥ 100 km 2 ), 1110 large reservoirs (storage capacity ≥ 0.5 km 2 ) and 52 regulated lakes (area ≥ 100 km 2 or storage capacity ≥ 0.5 km 2 ) were classified as "global"; i.e., they receive inflow not only from the grid cell itself but also from upstream ("global" wetlands are defined in the same way; see Fig. A1).All other surface water bodies were classified as "local".If "global" lakes or reservoirs cover more than one grid cell, the water balance of the whole surface water body is calculated at the outflow cell.

A6 Lateral routing
The global drainage direction map DDM30 (Döll and Lehner, 2002) is used to route the discharge through the stream network until it reaches the ocean or an inland sink.Fast runoff R s = R l − R g is routed to the surface storages without any delay, whereas baseflow Q g is a function of groundwater storage (Fig. A1, Appendix A4).Due to limited information on groundwater flow between grid cells, the groundwater recharge can only contribute to groundwater runoff of the same grid cell.Verzano et al. (2012) improved the routing by introducing a variable flow velocity approach based on the Manning-Strickler equation.The roughness coefficient is calculated after Cowan (1956) by using different physiographic parameters and information about rural and urban areas.The hydraulic radius is calculated using actual discharge of the cell and empirical relationships of river width and depth at bankfull flow conditions.Bankfull conditions are assumed to correspond to the 1.5-year maximum series annual flow (Schneider et al., 2011) and were accordingly calculated from daily discharge time series for the global land surface.Riverbed slopes were calculated based on the Hy-droSHEDS (Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales) drainage direction map (Lehner et al., 2008) and a meandering ratio (method is described in Verzano et al., 2012).
The reservoir algorithm of Hanasaki et al. (2006), distinguishing irrigation and nonirrigation reservoirs and considering 1109 reservoirs was implemented and improved by Döll et al. (2009) and slightly adapted in WaterGAP 2.2: if reservoir storage falls below 10 % of storage capacity, the release coefficient is set to 0.1 instead of 0.0 as in Döll et al. (2009), assuring that at least some water is released, e.g., for downstream ecosystem demands.(Scurlock et al., 2001) as in WaterGAP this class is mainly in the tropics; c mean value from TeDBL and TrDBL (Scurlock et al., 2001); d mean value of all forest classes.Fraction of deciduous plants and L reduction factor for evergreen plants based on IMAGE (Alcamo et al., 1998), initial days to start/end with growing season are estimated.Maniak (1997), WMO (1994).

Figure 2 .
Figure 2. Land cover maps with a spatial resolution of 0.5 • used as WaterGAP input based on MODIS observations for the year 2004 (variant STANDARD) (a), land cover derived from USGS GLCC, but CORINE for Europe, reflecting land cover distribution around the year 2000 (variant LANDCOVER) (b), and identification of grid cells where land cover class has changed due to different input data (c).

Figure 4 .
Figure 4. Renewable water resources (mean annual runoff from each cell if water use is neglected) calculated by the WaterGAP 2.2 NoUse variant (a) and differences to other variants (variants here run without considering water use) (b-e).

Figure 5 .
Figure 5. Discharge seasonality for selected basins and the calibrated model variants.Values for NoCal are only visible if they are in the range of calibrated model variants.

Figure 6 .
Figure 6.Nash-Sutcliffe efficiency (E NS ) (excluding outliers) of monthly observed and simulated discharge at 1319 stations used for calibration.

Figure 7 .
Figure 7. Seasonal variation of total water storages (TWS) for STANDARD (a) and as difference maps (mm) to all other model variants (b-f).

Figure 8 .
Figure 8. Range of calibration parameter γ through all four calibrated model variants (calculated as γ max − γ min ) showing the general sensitivity to input data and model structure.White colors indicate uncalibrated regions.

Figure 9 .
Figure 9. Spatial distribution of Nash-Sutcliffe Efficiency (E NS ) classes (from Table 4, 1: E NS > 0.7, 2: 0.5 < E NS < 0.7, 3: E NS < 0.5) for STANDARD (a), and differences of model variants (calculated as STANDARD E NS class minus that of the model variant) (b-f).Red colors indicate a decrease, green an increasing E NS when using the model variant compared to STANDARD.
Surface water bodies (inland freshwater such as wetlands, lakes and reservoirs) play an important role in the hydrologic cycle, e.g., for evaporation and the lateral transport.In general, surface water body storages S (m 3 ) increase with inflow I (m 3 d −1 ) from other storages or from upstream (see Fig.A1), and are reduced by the outflow Q (m 3 d −1 ).Additionally, the water balance of the water body itself B (m 3 d −1 ) is calculated as B = P − E w , where P is precipitation (m 3 d −1 ) and E w is potential evaporation of open water surfaces (m 3 d −1 ) applying an albedo of 0.08.Finally, net abstractions of surface water NA s (m 3 d −1 ) are considered, resulting in the storage equation:dS dt = I − Q + B − NA s .(A14)

Table 2
lists global values for various components of the global water balance and changes in total water storages (TWS) (calculated excluding Antarctica, Greenland and inland sinks) as estimated by the different model variants.Global values vary mainly due to calibration and selected climate forcing.For interpreting Table2and Fig.3it is important to mention that AET does not include additional evapotranspiration caused by irrigation and other human water use.This part of evapotranspiration is called actual water consumption (WC a ).For computing global values of AET and renewable water resources (RWR), the values were adjusted in calibration basins using the station correction factor (CFS) such that a closed global water balance is achieved (for calibration details see Appendix B).Grid cell values ofAET and  RWR (Figs. 3, 4), however, do not reflect CFS to avoid physically implausible values that likely result from inconsistencies between precipitation data and observed river discharge.Global precipitation P is about 1900 km 3 yr −1 (or 1.7 %) higher when using the CLIMATE model variant which results in an equal increase of discharge compared to STAN-DARD.Except for NoCal, global AET (calculated as sum of E c , E sn , E s and E w , see Appendix A) does not vary considerably among the variants.In general, discharge to oceans and inland sinks is lower by the amount of change in AET.WC a (row 4 in Table 2) varies due to the demand of surface Hydrol.Earth Syst.Sci., 18, 3511-3538, 2014 www.hydrol-earth-syst-sci.net/18/3511/2014/

Table 2 .
Long-term-average (1971Long-term-average ( -2000) )freshwater fluxes from global land area (except Antarctica and Greenland) of WaterGAP 2.2 (in km 3 yr −1 ).Cells representing inland sinks were excluded but discharge into inland sinks was included.is110 309 km 3 yr −1 in WFD and 110 812 km 3 yr −1 in WFDEI; b AET does not include evapotranspiration caused by human water use, i.e., actual water consumption WC a ; c including anthropogenic water use (except NoUse); d if not enough water is available, demand is not completely satisfied; e demand that needs to be satisfied (water use model output); f negative values indicate that return flows from irrigation with surface water exceed groundwater abstractions; a Mean annual P g TWS of 31 December 2000 minus TWS of 31 December 1970 divided by 30 years; h STANDARD but no subtraction of water use; i.e., discharge into oceans and inland sinks equals renewable water resources.

Table 3 .
Mean change in water storage in different compartments between 31 December 1970 and 31 December 2000 (in km 3 yr −1 ; global sum except Antarctica and Greenland).Cells representing inland sinks were excluded.In WaterGAP, increase of soil water storage by irrigation is not taken into account such that storage values for STANDARD and NoUse variants are the same.* * Not applicable as reservoirs are treated as global lakes.

Table 4 .
Number of calibration basins per E NS category and Köppen-Geiger climate zone * .temperate climate, D: snow climate and E: polar climates.Note that the number of basins per climate zone differs for CLIMATE as here the bases for Köppen-Geiger climate calculation are CRU TS 3.2 and GPCC v6 instead of WFD/WFDEI climate input for all other variants.

Table 5 .
Comparison of diverse estimates of global actual evapotranspiration and discharge (in km 3 yr −1 ).1.35 mm d −1 based on a land area of 130 922 × 10 6 km 2 .b 1.59 mm d −1 based on a land area of 130 922 × 10 6 km 2 (value taken from Mueller et al., 2013 as no area is given in Mueller et al., 2011).c Sum of AET and WC a . a

Table 6 .
The three model variants with the largest differences to the STANDARD variant (dSTA) regarding global freshwater fluxes (Q and AET) and total water storage trends (dTWS/dt) (from Table2; values in km 3 yr −1 ) as well as median E NS for monthly time series of river discharge at the 1319 calibration basins.

Table 7 .
Rank of model variants where global land area (except Greenland and Antarctica) is affected most based on a threshold which represents the 10th percentile of averagedglobal grid cell values for AET and discharge.

Müller Schmied et al.: Sensitivity of simulated global-scale freshwater fluxes and storages A3 Soil
Like snow and canopy, the change of soil water storage S s (mm) over time t (d) is calculated as one layer asdS s dt = P eff − R l − E s , • C and 2.501-0.002361×T (MJ kg −1 ) above 0 • C.H.
Schematic structure of the water fluxes and storages as computed by WGHM within each 0.5 • grid cell.Boxes represent water storage compartments, arrows water fluxes (inflows, outflows).Numbers at net abstraction from surface waters (NA s ) are the order in which storage water is abstracted until demand is satisfied.

Table A1 .
Parameters of the leaf area index model.No. Land cover type L max Fraction of L reduction factor Initial days to L max is assumed to be the mean value of land cover classes of Scurlock et al. (2001), TeENL and BoENL; b only value for TrEBL and not TeEBL a

Table A2 .
Attributes for IGBP land cover classes used in WaterGAP 2.2 for all model variants, compiled from various literature sources.Water has an albedo of 0.08, snow 0.6.Adapted from the IMAGE model(Alcamo et al., 1998); b Wilber et al. (1999); c a