Assessment of precipitation error propagation in multi-model global water resource reanalysis

. This study focuses on the Iberian Peninsula and investigates the propagation of precipitation uncertainty, and its interaction with hydrologic modeling, in global water resource reanalysis. Analysis is based on ensemble hydrologic simulations for a period spanning 11 years (2000–2010). To simulate the hydrological variables of surface runoff, subsurface runoff, and evapotranspiration, we used four land surface models (LSMs) – JULES (Joint UK Land Environment Simulator), ORCHIDEE (Organising Carbon and Hydrology In Dynamic Ecosystems), SURFEX (Surface Exter-nalisée), and HTESSEL (Hydrology – Tiled European Centre for Medium-Range Weather Forecasts – ECMWF – Scheme for Surface Exchanges over Land) – and one global hydrological model, WaterGAP3 (Water – a Global Assessment and Prognosis). Simulations were carried out for ﬁve precipitation products – CMORPH (the Climate Prediction Center Morphing technique of the National Oceanic and Atmospheric Administration, or NOAA), PERSIANN (Precipita-tion Estimation from Remotely Sensed Information using Artiﬁcial Neural Networks), 3B42V(7), ECMWF reanalysis, and a machine-learning-based blended product. As a reference, we used a ground-based observation-driven precipitation dataset, named SAFRAN, available at 5 km, 1 h resolution. We present relative performances of hydrologic variables for the different multi-model and multi-forcing scenarios. Overall, results reveal the complexity of the interaction between precipitation characteristics and different modeling schemes and show that uncertainties in the model simulations are attributed to both uncertainty in precipitation forcing and the model structure. Surface runoff is strongly sensitive to precipitation uncertainty, and the degree of sensitivity depends signiﬁcantly on the runoff generation scheme of each model examined. Evapotranspiration ﬂuxes are comparatively less sensitive for this study region. Finally, our results suggest that there is no single model–forcing combination that can outperform all others consistently for all variables examined and thus reinforce the fact that there are signiﬁcant beneﬁts to exploring different model structures as part of the overall modeling approaches used for water resource applications.


Introduction
Improved estimation of global precipitation is important to the analysis of continental water resources and dynamics. Over the past few decades, several studies have described the use of different precipitation algorithms to develop precipitation products (http://ipwg.isac.cnr.it/algorithms.html, last access: 31 March 2019 and http://reanalyses.org, last access: 31 March 2019) at high spatial and temporal resolution on a quasi-global scale and for different hydrological applications, such as flood early warning and control and drought monitoring (Hong et al., 2010;Wu et al., 2012;Vernimmen Published by Copernicus Publications on behalf of the European Geosciences Union. et al., 2012, amongst others). Precipitation estimates suffer, however, from various sources of error that consequently impact hydrologic investigations (Mei et al., 2015(Mei et al., , 2016Seyyedi et al., 2014Seyyedi et al., , 2015Bhuiyan et al., 2017;Nikolopoulos et al., 2013).
Over the last decade, an increasing number of studies have contributed to the development of global precipitation estimation (Pan et al., 2010;Beck et al., 2017a;Kirstetter et al., 2014;Carr et al., 2015;Dee et al., 2011) aiming at the overall improvement of the hydrological applications and global water resource reanalysis. Numerous models of varying complexity can be used to generate an array of hydrological products from precipitation forcing datasets (Vivoni et al., 2007;Ogden and Julien, 1994;Carpenter et al., 2001;Borga, 2002;Schellekens et al., 2017). Different hydrological models have different applications depending on the spatial and temporal scales of interest as well as the simulated variables of interest, such as subsurface runoff, surface runoff, and evapotranspiration. Past studies (Fekete et al., 2004;Biemans et al., 2009) have revealed that the uncertainty in simulated hydrological variables mainly depends on the uncertainty in precipitation and model parametrization and suggested subsequent exploration of different model structures as part of the overall modeling approach.
So far there are several studies that have analyzed uncertainty in precipitation forcing and its impact on hydrologic simulations by usually evaluating hydrologic simulations based on multiple forcing applied to a single model (Falck et al., 2015;Bitew et al., 2012;Behrangi et al., 2011;Mei et al., 2016;Bhuiyan et al., 2018;Gelati et al., 2018 among others). On the other hand, there are also past studies that have evaluated the model structural uncertainty and its impact on hydrologic simulations, usually by analyzing the simulation outputs from multiple models and a single forcing dataset (Breuer et al., 2009;Haddeland et al., 2011;Gudmundsson et al., 2012;Smith et al., 2013;Huang et al., 2017;Beck et al., 2017b). However, fewer studies have been dedicated to the analysis of the integrated impact of both forcing and model uncertainty on hydrologic simulations, and from the existing ones, most of them were focused on a single hydrologic variable such as streamflow (see, for example, Qi et al. 2016), evapotranspiration (Vinukollu et al., 2011), or a given hydrologic index such as the drought index (Prudhomme et al., 2014;Samaniego et al., 2017). Findings from these past investigations have demonstrated that both forcing and model structure uncertainty have a great impact on hydrologic predictions and therefore highlight that using a multi-model and multi-forcing ensemble is a more appropriate path forward for advancing the use of hydrologic model outputs. This conclusion raises at the same time the need for better understanding, characterizing and quantifying the uncertainty associated with multi-model and multi-forcing hydrologic ensembles. Thus, a better understanding of the ensemble spread of precipitation and simulated hydrological variables is necessary to improve water resource manage-ment and planning. This additionally means that there is also a need to assess hydrologic uncertainty in more than a single variable to be able to have a better and more integrative view on the interaction between forcing uncertainty, model uncertainty, and the hydrologic variable of interest. It will allow us to make hydrologic predictions more effective for water resource applications at a large scale.
This study builds upon a unique numerical experiment that was carried out, as part of the activities of the Earth2Observe project (Schellekens et al., 2017), to investigate the impact of precipitation uncertainty propagation and its dependence on model structure and hydrologic variables. In this investigation, we used different precipitation forcing datasets based on (i) reanalysis, (ii) satellite estimates, and (iii) a "combined" stochastic precipitation dataset (Bhuiyan et al., 2018). To consider model structure and parameters, we examined simulations from five state-of-the-art global-scale hydrological and land surface models (LSMs). With regard to water cycle variables, we evaluated precipitation uncertainty propagation to surface runoff, subsurface runoff, and evapotranspiration fluxes. The study area for this investigation is the Iberian Peninsula, which has precipitation and climate variability due to complex orography influenced by both Atlantic and Mediterranean climates (Rodríguez-Puebla et al., 2001;de Luis et al., 2010;Herrera et al., 2012). The analysis comprised two main parts: (1) performance and sensitivity evaluation of the different model-forcing scenarios and (2) precipitation uncertainty propagation to the hydrological variables. We analyzed hydrological simulation with a comparative assessment of the hydrological products and provided a detailed analysis of uncertainty in hydrological simulations for the different global hydrological and land surface models used in the multi-model global water resource reanalysis. Finally, we examined the performance of precipitation products in hydrological applications and potential uncertainty attributed to precipitation error propagations.
The paper is structured as follows. Section 2 presents the different types of forcing datasets used for the study, and Sect. 3 details the methodology we used for our model development and hydrological model analysis. Section 4 summarizes the hydrological results, Sect. 5 discusses the results, and Sect. 6 draws conclusions from the research conducted.

Study area and forcing data
This study is focused on the Iberian Peninsula (Fig. 1). The climate of the peninsula is primarily Mediterranean, being mostly oceanic at northern and semi-arid at southern parts. The topography varies from almost zero elevation to altitudes of 3500 m in the Pyrenees. Table 1 summarizes information and references of meteorological forcing datasets, and a short description is provided below.

Reference precipitation (SAFRAN)
The reference precipitation dataset, hereafter referred to as SAFRAN (Système d'analyse fournissant des renseignements atmosphériques à la neige), was recently created by Quintana-Seguí et al. (2016 using the SAFRAN meteorological analysis system (Durand et al., 1993). Spatially, SAFRAN precipitation data are presented at an hourly timescale on a regular grid with 5 km resolution, spanning 35 years and covering mainland Spain, Portugal, and the Balearic Islands (Quintana-Seguí et al., 2016). SAFRAN used an optimal interpolation algorithm (Gandin, 1966) to produce a quality-controlled gridded dataset of precipitation which combines ground observations and outputs of a meteorological model (Quintana-Seguí et al., 2017). Quintana-Seguí et al. (2017) also compared the different precipitation analyses with rain gauge data and successfully evaluated their temporal and spatial similarities to the observations by obtaining higher correlation ( > 0.75) than other precipitation products. The validation of SAFRAN with independent ground observations proved that SAFRAN is a robust product. On the other hand, several factors -including rainfall intermittency, discrete temporal sampling, and censoring of reference values for required quality -reduce the number of comparison samples for reference and satellite estimates. Therefore, the quality-controlled SAFRAN dataset which is designed to force the land surface model is chosen as a reference dataset for the study area (Quintana-Seguí et al., 2017).

Satellite-based precipitation
Satellite-based simulations were based on three quasi-global satellite precipitation products. Among them, CMORPH (the Climate Prediction Center Morphing technique of the National Oceanic and Atmospheric Administration, or NOAA) is developed from passive microwave (PMW) satellite precipitation fields which are generated from motion vectors derived from infrared (IR) data (Joyce et al., 2004). A neural network technique is used in PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks), where IR observations are connected to PMW rainfall estimates (Sorooshian et al., 2000). Merged IR and PMW precipitation products from NASA are gaugeadjusted for TMPA (Tropical Rainfall Measuring Mission Multi-Satellite Precipitation Analysis), or 3B42V(7), which is available in near-real time and post-real time . The satellite precipitation products have a spatial resolution that is 0.25 • × 0.25 • and a time resolution of 3 h.

Atmospheric reanalysis
The reanalysis product (EI_GPCC) is based on original ERA-Interim 3-hourly data, after rescaling based on GPCC (Global Precipitation Climatology Center) data. Note that total precipitation has been rescaled at the monthly scale with a multiplicative factor to match GPCC version 7 for the period 1979-2013 and GPCC monitoring for 2013-2015.
Data are further downscaled to 0.25 • × 0.25 • grid resolution by distributing the coarse grid precipitation according to CHPclim (Climate Hazards Group Precipitation Climatology) high-resolution information for each calendar month. A similar approach was performed in the generation of ERA-Interim/Land (Balsamo et al., 2015), but using GPCP (Global Precipitation Climatology Project) data. In this study we used GPCC data due to its higher spatial resolution when compared with GPCP.

Combined product
The combined product is based on the application of a nonparametric statistical technique for blending multiple satellite-reanalysis precipitation datasets. Specifically, a machine-learning technique, quantile regression forests (QRF; Meinshausen, 2006), was used to generate stochastically an improved precipitation ensemble at the spatiotemporal resolution of 0.25 • , 3 h. The technique optimally merged global precipitation datasets and characterized the uncertainty of the combined product. Details on the methodology and data used to develop the combined product are presented in Bhuiyan et al. (2018).

Other atmospheric variables
Apart from precipitation forcing, the rest of atmospheric forcing variables required for the hydrologic simulations were derived from the original ERA-Interim 3-hourly data as used in ERA-Interim/Land (Balsamo et al., 2015), bilinearly interpolated to 0.25 • . It includes a topographic adjustment to temperature, humidity, and pressure using a spatially temporally varying environmental lapse rate (ELR) computed similarly to Gao et al. (2012). The correction is the following: (i) relative humidity is computed from the uncorrected forcing, (ii) air temperature is corrected using the ELR and altitude differences (ERA-Interim topography versus 0.25 topography), (iii) surface pressure is corrected assuming the altitude difference and updated temperature, and (iv) specific humidity is computed using the new surface pressure and temperature assuming no changes in relative humidity.

Hydrological simulations
The hydrological simulations for this study were carried out by different collaborators within the framework of Earth2Observe, a project funded by the European Union (EU) using a number of global-scale land surface and hydrological models. In this study, simulations from four land surface models -JULES (Joint UK Land Environment Simulator), ORCHIDEE (Organising Carbon and Hydrology In Dynamic Ecosystems), SURFEX (Surface Externalisée), and HTESSEL (Hydrology -Tiled European Centre for Medium-Range Weather Forecasts -ECMWF -Scheme for Surface Exchanges over Land) -and one global hydrological model, the distributed global hydrological model of the WaterGAP3 (Water -a Global Assessment and Prognosis) modeling framework were considered. The models were already evaluated at all timescales, from daily to multi-annual. The timescale of the evaluation was mostly driven by the data availability. All the land surface models in the study were global models, built originally to work in coupled mode with atmospheric models. The "regionalization", or calibration of hydrological parameters at particular catchments or regions of these models, is an exercise that the different modeling groups or communities are certainly performing but was out of the scope of this study. All models were forced with the various precipitation datasets described in the previous section for an 11-year period (March 2000-December 2010). A summary of some basic characteristics of the models structure is presented in Table 1 and a short description is provided below. For more details on the modeling systems, the interested reader is referred to Schellekens et al. (2017) and references therein. Clark et al., 2011) is a physically based land surface model. JULES uses an exponential rainfall intensity distribution to calculate throughfall through the canopy first (altered by interception), then the water reaching the surface is divided into infiltration into the soil and surface runoff. Surface runoff is generated either through infiltration excess or saturation excess. Infiltration excess runoff is generated by JULES if the water flux reaching the surface exceeds the maximum infiltration rate of the soil (based on the saturated hydraulic conductivity). If the water flux reaching the surface over a time step (either rainfall, throughfall, or snowmelt) reaches a maximal infiltration rate, then infiltration excess runoff will be generated. This maximal infiltration rate in JULES is the saturated hydraulic conductivity multiplied by a vegetation-dependent parameter (four for trees and two for grasses). Saturation excess runoff is based on sub-grid soil moisture variability, as a fraction of the grid is saturated and water flux over this fraction is converted to surface runoff (probability distribution model; Blyth, 2002). Once infiltrated into the soil, water flows through the column, resolved using Darcy's law and Richards' equation. Subsurface runoff is calculated using the free drainage approach, with water flowing at the bottom of the resolved soil column at a rate determined by the soil hydraulic conductivity. There was no groundwater table in this version of JULES. The condition at the bottom of the resolved soil layers (3 m) was assumed to be free drainage. The soil hydraulic characteristics were calculated applying pedotransfer functions to the soil texture data from the

ORCHIDEE
ORCHIDEE (Krinner et al., 2005) is a complex land surface scheme that consists of a hydrological module, a routing module (Ngo-Duc et al., 2007), and a floodplain module (d'Orgeval et al., 2008). It also describes the vegetation dynamics and biological cycles, but these features were not activated for the present study. The most relevant parametrization of ORCHIDEE for the sensitivity of the model to rainfall is the one for partitioning between infiltration and surface runoff. In order to represent correctly the fast progression of the moisture front during a rainfall event when the time step is above 15 min, a time-splitting procedure is used (d'Orgeval, 2006). The parametrization also takes into account reinfiltration in the case of slopes below 0.5 % or dense vegetation. We have chosen to spread the entire 3-hourly rainfall over 1.5 h in these simulations. In terms of ancillary data, a vegetation map (IGBP; Olson classification) and the soil types (FAO, 2003) were used for these simulations. Furthermore, as ORCHIDEE represents sub-grid soil moisture by simulating separately the soil moisture column below bare soil, low and high vegetation, the infiltration process will display different sensitivities in each column.

SURFEX
The SURFEX modeling system of Météo-France (SURFace Externalisée; Masson et al., 2013) includes the ISBA (interactions between soil-biosphere-atmosphere; Noilhan and Mahfouf, 1996) LSM that can be fully coupled to the CNRM (Centre National de Recherches Météorologiques) version of the Total Runoff Integrating Pathways (TRIPs; Oki and Sud, 1998) continental hydrological system (Decharme et al., 2010). This study uses a ISBA multi-layer soil diffusion scheme (ISBA-Dif) as well as its 12-layer explicit snow scheme (Boone and Etchevers, 2001;Decharme et al., 2016). ISBA total runoff is contributed by both the surface runoff and free drainage as a bottom boundary condition soil layer. The soil evaporation is proportional to its relative humidity. Parameters of the ISBA LSM were defined for 12 generic land surface patches: nine plant functional types (namely needleleaf trees, evergreen broadleaf trees, deciduous broadleaf trees, C3 crops, C4 crops, C4 irrigated crops, herbaceous plants, tropical herbaceous plants, and wetlands) as well as bare soil, rocks, and permanent snow and ice surfaces. They were derived from ECOCLIMAP-II, the land cover map used in SURFEX (Faroux et al., 2013). Furthermore, the Dunne runoff (i.e., when no further soil moisture storage is available) and lateral subsurface flow were computed using a topographic sub-grid distribution.

WATERGAP3
The modeling framework WaterGAP3 is a tool for assessing the global freshwater resources on 30 min spatial resolution. It combines a spatially distributed rainfall-runoff model with a large-scale water quality model as well as models for five sectorial water uses (Flörke et al., 2013;Döll et al., 2009). Effective precipitation -calculated as superposed effects of snow accumulation, snowmelt, and interception -is split into (i) a fraction that fills up a single-layer soil moisture storage and (ii) a fraction that comprises surface runoff and groundwater recharge. Groundwater recharge is the input of a single linear groundwater reservoir that is drained by base flow. Water for evapotranspiration, estimated with the Priestley-Taylor approach, is abstracted from the soil storage. The Wa-terGAP3 setting used in this study was calibrated and validated against measured river discharge from 2446 stations of the Global Runoff Data Centre data repository (Weedon et al., 2014). Therefore, calibration only concerns the separation of effective precipitation into runoff and soil moisture filling. See Eisner (2015) for a detailed model description with additional data required for the model, such as soil types and the groundwater table.

HTESSEL
The LSM HTESSEL is part of the ECMWF numerical weather prediction model. The model represents the temporal evolution of the snowpack, soil moisture and temperature, and vegetation water content as well as the turbulent exchanges of water and energy with the atmosphere. HTES-SEL considered soil texture, vegetation type and cover, and mean annual climatology of the leaf area index and albedo (12 maps for each calendar year) for the simulations (FAO, 2003). The soil column is discretized in four layers (7, 21, 72, and 189 cm thickness), and the unsaturated vertical movement of water follows Richards' equation and Darcy's law. The van Genuchten formulation is used to derive the diffusivity and hydraulic conductivity using six predefined soil textures. In the case of partially or fully frozen soil, the water movement in the soil column is limited by reducing the diffusivity and hydraulic conductivity. The model assumes free drainage as a bottom boundary condition (subsurface runoff), while the top boundary condition considers precipitation minus surface runoff and bare ground evaporation. Evapotranspiration is removed from the different soil layers following a prescribed root distribution (dependent on the vegetation type). Surface runoff generation is estimated as a function of the local orography variability, soil moisture state, and rainfall intensity. Soil saturation state and rainfall intensity define the maximum infiltration rate which is modulated by a variable infiltration rate related to orography variability (Balsamo et al., 2009).

Evaluation metrics
To examine the magnitude and variability of differences among hydrological variables, we used the relative difference (RD), defined as where y i denotes reference variables (SAFRAN-driven simulations) andŷ i denotes simulated variables (based on the other forcing data considered) for each time step i. RD indicates the magnitude and direction of error with positive (negative) value indicating overestimation (underestimation). The RD of annual average estimates of the precipitation forcing and different hydrological variables was calculated using daily datasets at the spatial resolution of 0.25 • . Moreover, cumulative probability of estimated annual average relative differences among precipitation forcings and the simulated hydrological variables were calculated using same spatial resolutions of 0.25 • .
To collectively assess the performance of various precipitation forcing datasets, models, and simulated hydrological variables, we used a normalized version of the Taylor diagram (Taylor, 2001). Specifically, we normalized the values of the centered root-mean-square error (CRMSE) and the standard deviation with the standard deviation of the reference. Therefore, the reference (that is, the target point to which the model outputs should be closest) corresponds to the point on the graph with the normalized CRMSE equal to zero, while both the correlation coefficient and normalized standard deviation equal 1. The normalized Taylor diagrams summarized model results for two different temporal scales (3-hourly and daily) at the spatial resolution of 0.25 • .
To evaluate the degree of variation of various precipitation datasets and simulated hydrological variables, we used the coefficient of variation (CV) and coefficient of variation ratio (CV r ). The CV and CV r are determined using all precipitation forcing and variables examined at the 0.25 • daily resolution. The CV is a measure of variability defined as the ratio of the standard deviation to the mean. To compare the degree of variation from one data series to another, we used the CV where we considered distributions with CV < 1 to be low variance, while we considered those with CV > 1 to be high variance. We defined CV r as the ratio of the CV of model to the CV of reference. The defined parameters are expressed as follows: The CV m and CV o indicate the coefficient of variation of the model and the coefficient of variation of reference, with the means m and o and standard deviations σ m and σ o , respectively. The CV r includes two components: the ratio of the means and ratio of the standard deviation. Details on the statistical metrics, including name conventions and mathematical formulas, are provided in the Appendix.

Metrics of uncertainty propagation
The random error component was based on the normalized centered root-mean-square error (NCRMSE). To demonstrate how error in precipitation forcing translates to error in the simulated hydrological variables -surface runoff (Qs), subsurface runoff (Q sb ), and evapotranspiration (ET) -we used the NCRMSE error metric ratio as follows: where NCRMSE is the normalized centered root-meansquare error and α NCRMSE is NCRMSE error metric ratio at multiple temporal (3-hourly and daily) and spatial (0.25 • ) resolutions. The α NCRMSE metric quantifies the changes in the random error from precipitation to simulated hydrological variables (Q s , Q sb , and ET) and can thus be used to assess magnification (α NCRMSE > 1) or damping (α NCRMSE < 1).

Analysis of ensemble spread
To assess how variability in the precipitation ensemble translates to variability of the various hydrological simulations (Q s , Q sb , and ET) for the different modeling systems, we performed an analysis of ensemble spread ( ), formulated as in which X max and X min represent, respectively, the maximum and minimum of ensemble values at each time step, while Y is the corresponding value of the reference. Here, the members of ensemble constitute a sequence for each time step (X 1 , X 2 . . ..X 20 ). The ensemble spread ( ) is calculated at the monthly scale for the combined product and simulated hydrologic variables. Note that the combined product is an ensemble-based precipitation product; for the evaluations presented in this study, we use the ensemble mean as forcing. For the analysis and propagation of the precipitation ensemble spread to hydrologic simulations, we used 20 ensemble members, which are generated stochastically by the QRF tree-based regression model (Meinshausen, 2006). provides a measurement of the expected prediction intervals relative to the reference value. The value of 1 indicates the maximum possible uncertainty of the prediction interval. To achieve accurate and successful prediction, comparatively small prediction intervals are expected.

Variability of multiple hydrological model simulations
To examine the magnitude and variability of the differences among both models and forcing datasets, we analyzed the multi-model simulation results for three hydrological variables, including surface runoff (Q s ), subsurface runoff (Q sb ), and evapotranspiration (ET). Throughout this analysis, we used the SAFRAN-based simulation as the reference for comparison. Figures 2 to 5 present spatial maps of annual average values for each model, along with the relative differences of annual average estimates of precipitation forcing and the different hydrological variables for all the precipitation forcing datasets and models. The relative differences in precipitation forcing (Fig. 2) exhibited considerable spatial variability for satellite precipitation forcing (relative difference > 20 %) and relatively lower variability for EI_GPCC and the combined product. Examination of SAFRAN-based annual average values of surface runoff shows that Water-GAP3 estimates considerably higher surface runoff than the rest of the models, particularly in the northern and northwestern part of the study area (Fig. 3). Consequently, subsurface runoff (Fig. 4) and evapotranspiration (Fig. 5) from Water-GAP3 were lower in that part of the study area. All these results display substantial differences in the spatial pattern of relative differences, which suggests that simulations are sensitive to both precipitation forcing and model uncertainty.
Certain models seem to be more sensitive for given variables. For example, HTESSEL and ORCHIDEE are the models with the largest relative difference of Q s , and both models exhibited different behaviour relative to the other models when forced by the satellite precipitation. This suggests a distinct structural difference in the way precipitation is partitioned into surface-subsurface runoff between the two groups. Looking at the variability of results for combined and reanalysis (EI_GPCC) forcing datasets, no substantial differences occurred between reference and simulated surface runoff (Q s ). However, for the satellite-based simulations, there were significant deviations. Specifically, the CMORPH-based simulation showed significant overestimation for ORCHIDEE and HTESSEL, but this pattern was reversed for JULES, SURFEX, and WaterGAP3, an outcome that highlights the impact of model structure on precipitation error propagation.
For subsurface runoff, similar spatial patterns (with respect to Q s ) were exhibited for the reference and the rest of simulations (Fig. 4), which were also affected substantially by precipitation uncertainty. For example, looking at the dif- Figure 2. Map of the annual average relative difference (with respect to SAFRAN) for the different precipitation forcing datasets. ferent model simulations, we can see that WaterGAP3 results reveal the lowest relative differences in Q s for almost all the precipitation forcings. In addition, CMORPH-based simulation underestimated substantially for all the models. Figure 5 presents the spatial pattern of the results for evapotranspiration. For the combined product and EI_GPCC, results were consistent with low relative difference (< 25 %). On the other hand, CMORPH-based simulation showed an overall underestimation and deviated considerably from the results of the other precipitation products. By examining the spatial pattern of relative differences , one can recognize that there is no consistent spatial pattern among the different model-forcing combinations. There are cases where the pattern of the differences is dominated by the pattern of precipitation differences, as, for example, in the case of PER-SIANN, where the maximum number of differences are concentrated in the central and eastern part of the peninsula. While there are other cases where the pattern is dominated by the sensitivity of the model (see, for example, results for ORCHIDEE and 3B42 for surface runoff).
We also present a comparison of cumulative probability of the relative differences among precipitation forcings (Fig. 6) and the simulated hydrological variables (Fig. 7). The distribution of relative differences, both in terms of type (denoted by the shape of the cumulative density function -CDF) and magnitude and differed as a function of precipitation forcing, the model, and the variable considered. The CDF of precipitation relative differences shows that CMORPH deviated significantly from the other precipitation products (Fig. 6). The surface runoff based on ORCHIDEE and HT-ESSEL displayed a clear separation of the CDF for the combined product and EI_GPCC and satellite-based precipitation forcing (Fig. 7). Specifically, it is interesting to note that 3B42V(7) responds very differently to other precipitation forcing datasets for ORCHIDEE, highlighting again the sensitivity of runoff response to precipitation structure (space-time variability) and its dependence on the rainfallrunoff generation mechanism.
Box plots of the relative difference of different hydrological variables for the various forcing datasets and models at the daily scale are shown in Fig. 8. Note the inclusion of the relative difference of precipitation forcing to allow the comparison between relative differences in precipitation with those in the other hydrological variables. For each model, the box plot shows a lower interquartile range (IQR), marking lower variability for Q sb and ET compared to Q s . Results for simulations based on the combined product and EI_GPCC showed less variability than the satellite-based simulations. SURFEX and WaterGAP3 exhibited the lowest variability compared to the other models. Overall, with the exception of few cases (e.g., 3B42V(7) for ORCHIDEE and HTES-SEL and CMORPH for ORCHIDEE), uncertainty reduces progressively from precipitation to surface runoff, subsurface runoff, and finally ET.

Performance of multi-model simulations
The normalized Taylor diagrams summarize the results for two different temporal scales. Figure 9 shows the results for the 3-hourly scale only for the two models with output available at that resolution (JULES and SURFEX), while Fig. 10 presents results at the daily scale for all five models. We aggregated the 3-hourly results from JULES and SUR-FEX to daily to compare them with the nominal daily output of ORCHIDEE, WaterGAP3, and HTESSEL. Results improved with the temporal aggregation in reducing random error for JULES and SURFEX. As shown in Fig. 10, the points for the 3B42V(7) were always the furthest from the refer- ence (NCRMSE > 0.75) with the low correlation coefficient (0.4-0.55), except SURFEX, which means that 3B42V (7) was always associated with the worst performance for all other models. Simulations based on the combined product and EI_GPCC were always consistent, with significantly reduced NCRMSE values in the range of 0.25-0.8 for all the hydrological models. Results for simulated ET are more consistent among the various precipitation forcing datasets, exhibiting normalized standard deviations in the range of 0.8-1.2. NCRMSE reduced significantly (< 0.35) for each forcing dataset; accordingly, the correlation coefficient (CC) also raised considerably (> 0.9), showing a very high degree of agreement with reference-based simulations. For surfacesubsurface runoff, the SURFEX and WaterGAP3 models performed comparatively better than other models by reducing NCRMSE values, especially for the combined product and EI_GPCC.
To illustrate the relative variability between precipitation and individual hydrological variables, we calculated the co-efficient of variation (CV) and the coefficient of variation ratio (CV r ) for all the hydrological models. To provide an understanding of the impact of precipitation uncertainty in hydrological simulations, we produced box plots of the CV and CV r for precipitation forcing datasets and individual hydrological variables for all the models, as shown in Fig. 11. A precipitation-forcing-wise comparison indicates that the combined product and reanalysis underestimated precipitation variability more than other precipitation forcings, which affected the corresponding variability in Q s for all the models except ORCHIDEE. Although there were no significant differences in terms of variability for combined product and reanalysis-based simulations for the four models (JULES, SURFEX, WaterGAP3, and HTESSEL), substantial differences in variability between precipitation and Q s were observed for ORCHIDEE model. Satellite products overestimated precipitation variability, leading to overestimation of the variability of surface and subsurface runoff. The variability of ET was much lower than that of the other vari-  ables examined and well captured in all the simulation scenarios. From the box plots of CV from reference-based simulations, the distributions of ET showed low variability (CV < 1), while the variability for all the other hydrological variables was high (CV > 1). In terms of CV r , the SURFEX model performed very well by producing medians close to 1 (CV r = 1 means ideal consistency) for all the precipitation forcing datasets but CMORPH.

Assessment of precipitation error propagation
To investigate the possible amplification, or dampening, of the precipitation error to the hydrologic variables examined, we quantified the NCRMSE error metric ratio (α NCRMSE ), and results are illustrated in Figs. 12 and 13. For all the scenarios (at 3-hourly and daily scales) and almost all models, α NCRMSE values were less than 1, which highlighted the damping effect on the random error of precipitation in simulated variables. In general, the damping effect increases (i.e., α NCRMSE reduces), moving from surface to subsurface runoff and ET and highlighting once again the interaction between the different runoff-generating mechanisms as well as coupled water-energy balance processes and precipitation uncertainty. Interestingly, the relationship between error propagation among the different hydrologic variables varied greatly between models and precipitation forcing. Values of α NCRMSE for surface and subsurface runoff are generally close for the SURFEX model but distinctly different for satellite-based results of ORCHIDEE and WaterGAP3.

Stochastic precipitation ensemble and corresponding simulated hydrological variable analysis results
The following summarizes the results of our analysis of ensemble precipitation (20 members), generated stochastically according to the algorithm used for the combined product, and their corresponding hydrological simulations. To show the relationship between the precipitation ensemble and simulated hydrological variables (generated ensemble), we presented an analysis of ensemble spread. Figure 14 depicts density plots between ensemble spread of precipitation and the simulated hydrological variables (Q s , Q sb , and ET) at the monthly scale. A strong correlation between ensemble spread of Q s and precipitation is found for almost all models. For the other variables (ET and Q sb ), ensemble spread was significantly narrower and rather independent of the ensemble spread of precipitation, manifested as the horizontal structure of contours in Fig. 14. The ensemble spread of Q s was higher (ORCHIDEE and HTESSEL) or lower (SURFEX and WaterGAP3) depending on the model, elucidating again the impact of the modeling structure on the propagation of precipitation uncertainty. Figure 8. Relative difference presented for the various products and models at daily scale. In each box, the central mark is the median, and the edges are the first and third quartiles.

Discussion
Precipitation from different satellite-reanalysis datasets exhibits considerable differences in pattern and magnitude, which results in significant differences in hydrologic simulations. Results presented in this paper demonstrated clearly that magnitude and dynamics of uncertainty in hydrologic simulations depend not only on the uncertainty of the forcing variable but also on the model and examined hydrologic variable.
For example, surface runoff (Q s ) appears to be highly sensitive to precipitation differences, while ET was not for this semi-arid study region (Figs. 3 to 5). Particularly, ET exhibited reduced sensitivity to precipitation forcing, which potentially suggests that the water volume available for conversion to ET did not deviate significantly among the precipitation scenarios. This is expected for ET because it is primarily controlled by atmospheric demand, plant and soil hydraulic constraints, and solar radiation (Wallace and McJannet, 2010). When sufficient energy is available for rainfall to evaporate directly without contributing to surface-subsurface runoff, simulation of ET is not only affected by precipitation uncertainty but also by other atmospheric constrains.
Consequently, results (Figs. 6 to 7) for ET were more consistent among the various model and precipitation forcing scenarios, indicating a smaller degree of uncertainty in ET (relative to Q s and Q sb ). These results suggest that precipitation has a stronger influence on surface runoff, in particular precipitation intensity, i.e., the same amount of precipitation distributed over 3 h or over 1 d will impact mostly surface runoff, and this is associated with the model representation of this fast process. Similarly, if we look at the distribution of precipitation relative difference, CMORPH tends to decrease in magnitude compared to other precipitation products. Therefore, for subsurface runoff, CMORPH-based simulations displayed a gross underestimation compared to other precipitation forcing.
Precipitation-to-surface-runoff sensitivity is strongly controlled by the corresponding runoff generation scheme in each model. For example, in the case of HTESSEL and OR-CHIDEE, precipitation intensity has a great effect on the generation of surface runoff. The satellite precipitation datasets have higher precipitation intensities (Fig. 6) when compared to the remaining datasets, which explains the different behaviour of these two models. However, in the case of JULES, the infiltration excess mechanism is rarely invoked when the drivers are provided at a 3-hourly time step, as the maximum infiltration rate is not reached. Therefore, the significance of differences that HTESSEL and ORCHIDEE show with more intense rainfall are not shown by JULES due to distinct differences of their corresponding surface runoff generation modules.
Evaluation of the performance of the various simulations, relative to SAFRAN, emphasized the issues due to low correlation and increased random error from satellite products. On the other hand, the reanalysis (EI_GPCC) and combined product resulted in reduction of random error, suggesting that relying on gauge-adjusted reanalysis or blended (satellitereanalysis) products offers improvement relative to satellite products alone.
Certain dynamics resolved from this analysis were generally consistent among different models, such as the fact that uncertainty reduced systematically from precipitation to surface runoff to subsurface runoff and eventually to ET simulations. This is also in accordance with our expectations, given that soil moisture (storage) integrates the precipitation variability in time. Surface runoff exhibits high correlation to precipitation, while uncertainty in subsurface runoff is modulated by storage capacity of the soils. In addition ET is affected only if water availability deviates significantly from the water demand in terms of potential evapotranspiration. Our findings related to the surface runoff uncertainty (due to model structure and precipitation) suggest that the use of Figure 11. Relationship between coefficient of variation and coefficient of variation ratio of simulated hydrological variables and precipitation.
surface runoff (e.g., flash floods diagnostics) should be carefully considered in each application in view of each model formulation.

Conclusions
This study investigated the propagation of precipitation uncertainty in hydrological simulations and its interaction with hydrologic modeling, which was based on satellitereanalysis precipitation forcing of a number of global hydrological and land surface models for the Iberian Peninsula. The following are the major conclusions from this study.
Simulation of surface runoff was shown to be highly sensitive to precipitation forcing, but the direction (that is, overestimation or underestimation) and the magnitude of relative differences indicated strong dependence on the modeling system. Hydrological simulations based on reanalysis and combined product forcing datasets performed better overall than satellite precipitation-driven simulations. Moreover, simulation results using CMORPH as forcing exhibit overall overestimation for ORCHIDEE and HTESSEL, which is totally the opposite to the results from the other models (JULES, SURFEX, and WaterGAP3). These types of differences highlight the complexity of the interaction between precipitation characteristics and different modeling schemes and should be used as a "reference for caution" when generalizing findings produced from single model simulations.
Modeling uncertainty appeared to be much less important for evapotranspiration than for surface and subsurface runoff. The sensitivity of hydrological simulations to different precipitation forcing datasets was shown to depend on the hydrological variable use and model parameterization scheme.  Finally, based on our evaluation of the performance of the different hydrological models and five precipitation products -CMORPH, PERSIANN, 3B42V(7), reanalysis, and the combined product -we could not identify a single model that consistently outperformed others, i.e., certain models appeared to be more successful in the simulation of certain variables.
This study suggests that important benefits may accrue from exploring different model structures as part of the modeling approach. This study assessed the multi-model performances regarding three different hydrologic variables (surface-subsurface runoff and evapotranspiration). Apart from precipitation forcing, other atmospheric forcing variables required for the hydrologic simulations are also essential in investigating the significance of hydrological model uncertainty. In addition, the only calibrated model in this study, WaterGAP3, performs better in specific locations (e.g., hilly) for all the hydrologic variables than other models. Therefore, investigation should be performed in calibrating and regionalizing models for different parameters. Nevertheless, a clear outcome of the current work is that uncertainty in hydrologic predictions is significant and should be assessed and quantified in order to foster the effective use of the outputs of global land surface models and hydrologic models. Considering ensemble representation (e.g., multi-model and multi-forcing) of hydrologic variables provides an appropriate path to address this issue.
Advancing our understanding of precipitation uncertainty, model uncertainty, and their interaction will potentially also aid in the investigation of the impacts of climate change (and associated uncertainty) on hydrological cycle components and water resource systems. Finally, this research provides a fine platform for discussing advances in the applications of different precipitation algorithms, hydrology, and water resource reanalysis.

Appendix A
The statistical metric, the coefficient of variation ratio (CV r ) used in the model evaluation analysis, was computed using the following parameters: Here, o i and m i (i = 1, . . ., N ) are the observed and modeled time series, respectively, of the product for times i, with the means o and m and standard deviations σ o and σ m , respectively; N is the total number of data points used in the calculations. Author contributions. The numerical experiments of this study were conceived during discussions of the Earth2Observe EU project. MAEB developed the blended precipitation product, designed and carried out the analysis of results, and wrote the paper. EIN and ENA co-designed the analysis of results and contributed to the development of the paper. JP contributed to the analysis of results and, together with CA, ED, GF, AMLT, and SM, carried out the hydrological simulations and contributed to the interpretation of results and the writing of the paper.