Evaluation of drought propagation in an ensemble mean of large-scale hydrological models

Hydrological drought is increasingly studied using large-scale models. It is, however, not sure whether large-scale models reproduce the development of hydrological drought correctly. The pressing question is how well do large-scale models simulate the propagation from meteorological to hydrological drought? To answer this question, we evaluated the simulation of drought propagation in an ensemble mean of ten large-scale models, both landsurface models and global hydrological models, that participated in the model intercomparison project of WATCH (WaterMIP). For a selection of case study areas, we studied drought characteristics (number of droughts, duration, severity), drought propagation features (pooling, attenuation, lag, lengthening), and hydrological drought typology ( classical rainfall deficit drought, rain-to-snow-season drought, wet-todry-season drought, cold snow season drought, warm snow season drought, composite drought ). Drought characteristics simulated by large-scale models clearly reflected drought propagation; i.e. drought events became fewer and longer when moving through the hydrological cycle. However, more differentiation was expected between fast and slowly responding systems, with slowly responding systems having fewer and longer droughts in runoff than fast responding systems. This was not found using largescale models. Drought propagation features were poorly reproduced by the large-scale models, because runoff reacted immediately to precipitation, in all case study areas. This fast reaction to precipitation, even in cold climates in winter and in semi-arid climates in summer, also greatly influenced the hydrological drought typology as identified by the largescale models. In general, the large-scale models had the correct representation of drought types, but the percentages of occurrence had some important mismatches, e.g. an overestimation ofclassical rainfall deficit droughts , and an underestimation ofwet-to-dry-season droughts and snow-related droughts. Furthermore, almost no c mposite droughts were simulated for slowly responding areas, while many multiyear drought events were expected in these systems. We conclude that most drought propagation processes are reasonably well reproduced by the ensemble mean of largescale models in contrasting catchments in Europe. Challenges, however, remain in catchments with cold and semiarid climates and catchments with large storage in aquifers or lakes. This leads to a high uncertainty in hydrological drought simulation at large scales. Improvement of drought simulation in large-scale models should focus on a better representation of hydrological processes that are important for drought development, such as evapotranspiration, snow accumulation and melt, and especially storage. Besides the more explicit inclusion of storage in largescale models, also parametrisation of storage processes requires attention, for example through a global-scale dataset on aquifer characteristics, improved large-scale datasets on other land characteristics (e.g. soils, land cover), and calibration/evaluation of the models against observations of storage (e.g. in snow, groundwater).

occurrence had some important mismatches, e.g. an overestimation of classical rainfall deficit droughts, and an underestimation of wet-to-dry-season droughts and snow-related droughts.Furthermore, almost no composite droughts were simulated for slowly responding areas, while many multiyear drought events were expected in these systems.
We conclude that most drought propagation processes are reasonably well reproduced by the ensemble mean of largescale models in contrasting catchments in Europe.Challenges, however, remain in catchments with cold and semiarid climates and catchments with large storage in aquifers or lakes.This leads to a high uncertainty in hydrological drought simulation at large scales.Improvement of drought simulation in large-scale models should focus on a better representation of hydrological processes that are important for drought development, such as evapotranspiration, snow accumulation and melt, and especially storage.Besides the more explicit inclusion of storage in largescale models, also parametrisation of storage processes requires attention, for example through a global-scale dataset on aquifer characteristics, improved large-scale datasets on other land characteristics (e.g.soils, land cover), and calibration/evaluation of the models against observations of storage (e.g. in snow, groundwater).Stahl (2001); Peters (2003); Van Loon and Van Lanen (2012a).Mishra and Singh, 2011;Wang et al., 2011;Stahl et al., 2012).There is, however, little knowledge on the performance of large-scale models in simulating drought development in the large variety of climate zones and catchments around the world (Gudmundsson et al., 2012).Simulating low flow and drought is a challenge, even for catchmentscale models (Smakhtin, 2001;Staudinger et al., 2011).So the question is how well do large-scale models perform for low flows and drought?A evaluation of large-scale models is needed to estimate the uncertainty related to drought simulation using large-scale models and to guide further improvement of these models.Some first steps in the evaluation of drought simulation by large-scale models are set by Prudhomme et al. (2011); Stahl et al. (2011aStahl et al. ( , 2012)), and Gudmundsson et al. (2012).They looked at trends and general patterns/statistics of low flows, but most of them did not take into account actual timing and duration of drought events.Only Prudhomme et al. (2011) investigated timing and duration of drought events.However, like Stahl et al. (2011aStahl et al. ( , 2012) ) and Gudmundsson et al. (2012), they focused solely on runoff.Drought propagation from meteorological to hydrological drought was not taken into account.Hence, the simulation of processes underlying hydrological drought development (i.e.drought propagation, Fig. 1) by large-scale models is not yet evaluated.With this study we take a first step towards filling that gap.A correct simulation of these processes is needed, so that we know that large-scale simulations are robust when extrapolating to data-scarce regions (e.g.Stahl et al., 2012) or to the future (e.g.Gosling et al., 2011;Corzo Perez et al., 2011).
In this study, drought is defined as a sustained and regionally extensive period of below-average natural water availability (Tallaksen and Van Lanen, 2004).We focus on the development of hydrological drought, which is a drought in groundwater and/or discharge (Fig. 1).Hydrological drought is a recurring and worldwide phenomenon, with spatial and temporal characteristics that vary significantly from one region to another (Tallaksen and Van Lanen, 2004).Some of the most studied drought characteristics are number of droughts, drought duration, and drought deficit (Hisdal et al., 2004;Fleig et al., 2006;Sheffield and Wood, 2011).Not only drought characteristics vary per region, but also the way a drought propagates from a precipitation and/or temperature anomaly to a hydrological drought differs around the world (Tallaksen and Van Lanen, 2004;Mishra and Singh, 2010;Van Loon et al., 2010).The flow chart in Fig. 1 demonstrates the propagation of drought and how it is dependent on meteorological factors like precipitation and temperature (similar illustrations can be found in e.g.Changnon Jr., 1987;Tallaksen and Van Lanen, 2004;Sheffield and Wood, 2011, however without making a distinction between rain and snow seasons in cold climates).Despite these different ways that a hydrological drought can develop from the meteorological situation, some drought propagation features are common to all hydrological droughts (Eltahir and Yeh, 1999;Peters et al., 2003;Van Lanen et al., 2004;Van Loon et al., 2011b;Van Loon and Van Lanen, 2012a): -meteorological droughts are combined into a prolonged hydrological drought (pooling); -meteorological droughts are attenuated in the stores (attenuation); -a lag occurs between meteorological, soil moisture and hydrological drought (lag); -droughts become longer moving from meteorological to soil moisture to hydrological drought (lengthening).
These drought propagation features manifest themselves in different ways dependent on catchment characteristics and climate (Van Lanen et al., 2004, 2012).This results in different hydrological drought types, dependent on the interplay between precipitation, temperature, and catchment characteristics.Van Loon and Van Lanen (2012a) distinguish six different hydrological drought types in their hydrological drought typology: (i) classical rainfall deficit drought, (ii) rain-to-snow-season drought, (iii) wet-to-dryseason drought, (iv) cold snow season drought, (v) warm snow season drought, and (vi) composite drought.
The above-mentioned elements of drought propagation, i.e. drought characteristics, drought propagation features, and drought typology, can be used as tools to evaluate the simulation of drought propagation by large-scale models.In hydrology, often only one single large-scale model is used with its specific advantages and disadvantages (e.g.Lehner et al., 2006;Sheffield and Wood, 2007;Döll and Zhang, 2009;Hurkmans et al., 2009;Mishra and Singh, 2010;Sutanudjaja et al., 2011).In several studies, however, the multi-model ensemble of a number of large-scale models was closer to observations than most participating models individually, both in general hydrological studies (e.g.Gao and Dirmeyer, 2006;Guo et al., 2007) and in low flow and drought research (e.g.Gudmundsson et al., 2012;Stahl et al., 2011b).Therefore, in this study, we investigated a multi-model ensemble, as was previously done in some other drought studies (Wang et al., 2009(Wang et al., , 2011;;Gudmundsson et al., 2012;Stahl et al., 2012;Van Huijgevoort et al., 2012a).The aim of this paper is explicitly not to compare individual models or model approaches, but to see whether large-scale models in general can reproduce drought propagation.Therefore, outcome from individual models is not shown; only the multi-model ensemble with ranges of daily minimum and maximum is presented.
The objective of this study is to evaluate the simulation of drought propagation in large-scale hydrological models.To reach this objective, we used a global meteorological dataset (Sect.2.1.1),hydrological data from an ensemble of ten large-scale models (Sect.2.1.2),selected a number of case study areas with contrasting climate and catchment characteristics (Sects.2.2.1 and 2.2.2), and studied drought development in those areas in detail (Sects.2.2.3 and 2.2.4).Focus is hereby not on individual drought events, but on general phenomena, i.e. (i) drought characteristics (Sect.3.1), (ii) drought propagation features (Sect.3.2), and (iii) drought typology (Sect.3.3).Individual drought events of specific case study areas are only included as examples to illustrate these general phenomena.In Sect.4, we discuss our methodology and results and in Sect. 5 we summarize and conclude this study.

Data and methods
In this study, we used data from a large-scale meteorological dataset and from a suite of large-scale hydrological models.These large-scale data were extracted and post-processed in a number of steps.Subsequently, we performed drought analysis on the hydrometeorological data and applied the hydrological drought typology.

Meteorological data
The large-scale meteorological data used in this study were obtained from the WATCH Forcing Data (WFD, Weedon et al., 2011).This dataset consists of gridded time series of meteorological variables (e.g.rainfall, snowfall, temperature, wind speed) on a daily basis for 1958-2001.The data have a spatial resolution of 0.5 • based on the CRU land mask.
The WFD originate from modification (e.g.bias correction and downscaling) of the ECMWF ERA-40 re-analysis data (Uppala et al., 2005).The data have been interpolated and corrected for the elevation differences between the grids.For precipitation, the ERA-40 data were first adjusted to have the same number of wet days as CRU (Brohan et al., 2006).Next, the data were bias-corrected using monthly GPCC precipitation totals (Schneider et al., 2008) and, finally, gauge-catch corrections were applied.
For temperature, the ERA-40 data were bias-corrected using CRU monthly average temperatures and temperature ranges.For more information the reader is referred to Weedon et al. (2011).In this study, we used time series of temperature and precipitation to investigate drought propagation.The WFD have also been used to force the large-scale hydrological models (Haddeland et al., 2011), from which output data were used in this study.
Based on the type of model (LSM/GHM) and its development history, the large-scale models use different variables from the WFD as input (Table 1) and have different schemes for calculating evapotranspiration, snow accumulation and melt, and runoff (Haddeland et al., 2011;Gudmundsson et al., 2012).LSMs and GHMs were run on a different time steps, and after simulation sub-daily data were aggregated to daily data.The model time step is not expected to influence drought simulation, in contrast with model structure, which is of paramount importance (see Sect. 4.2).
Human impacts such as reservoir operation and water withdrawals for agriculture or drinking water were not included in the model output we used for this study.The largescale models have not been calibrated for WaterMIP, except WaterGAP, for which correction factors were applied in some major river basins (e.g.Alcamo et al., 2003;Hunger and Döll, 2008).More details of the models can be found in Haddeland et al. (2011) and Gudmundsson et al. (2012), or in the references listed in Table 1.
Output variables used in this study include the main water balance states and fluxes on daily time scale: soil moisture storage (SM), groundwater storage (GW), subsurface runoff (Q sub ), and total runoff (Q total = surface runoff + subsurface runoff).Soil moisture data were only available for nine models, groundwater storage only for three models (see Table 1).In the models that explicitly simulate groundwater storage, subsurface runoff reflects baseflow.In the other models, subsurface runoff is drained from the soil storage and reflects a slow runoff component.

Extraction of data for case study areas
To investigate whether drought propagation from an anomaly in precipitation/temperature (meteorological situation in Fig. 1) to groundwater/runoff (hydrological drought in Fig. 1) is well reproduced by large-scale models, time series of model results need to be studied.Only a limited number of case study areas can be studied in detail, and prior knowledge of drought propagation in the selected case study areas is essential for a proper evaluation of the models.For example, Gudmundsson et al. (2012) concluded that the limitation of their study was the loss of information due to spatial aggregation in data processing.Therefore, in this study, a limited selection of case study areas was used that corresponds to catchments that have been studied in previous papers (Van Huijgevoort et al., 2010;Van Loon et al., 2010;Van Loon and Van Lanen, 2012a).These catchments are restricted to Europe, but the conclusions drawn with regard to the studied catchments have a wider validity because of their contrasting climate and catchment characteristics and the general phenomena that were studied.
From the gridded large-scale meteorological and hydrological datasets mentioned in the previous section, we selected five case study areas for detailed drought propagation research.These case study areas correspond to natural headwater catchments in Europe with contrasting climate and catchment characteristics (Table 2).A short description of the case study areas is given in this subsection; more detailed descriptions can be found in Van Lanen et al. (2008) and Van Loon and Van Lanen (2012a).
The Narsjø catchment is located in a mountainous region in southeastern Norway.It has a subarctic climate with mild summers and very cold winters, with a permanent snow cover for, on average, seven months per year.Mean measured discharge is 820 mm yr −1 , with lowest flows in winter and highest in spring (May).Narsjø is quickly responding to precipitation due to its impermeable subsoil, but storage in lakes and bogs causes some delay.Of the two grid cells covering the catchment, the one with the highest coverage (72 %) was used (Table 2).
The Upper-Metuje and Upper-Sázava catchments are located in a hilly region in northeastern and central Czech Republic, respectively.Both catchments have an oceanic climate with mild summers and winters, with some snow accumulation in winter.Mean measured discharge is around 300 mm yr −1 , with lowest flows in summer/autumn and highest flows in spring (March).Both catchments are slowly responding to precipitation, Upper-Metuje due to an extensive multiple aquifer system and Upper-Sázava due to a number of lakes.One grid cell completely covers the Upper-Metuje catchment, whereas for Upper-Sázava, of the two grid cells covering the catchment, the one with the highest coverage (91 %) was used (Table 2).
The Nedožery catchment is located in central Slovakia in a mountainous region.It has a humid continental climate with warm summers and cool winters, with some snow accumulation in winter.Mean measured discharge is around 350 mm yr −1 with lowest flows in summer and highest flows in spring (March).Nedožery is quickly responding to precipitation due to limited storage (no groundwater, lakes, or bogs).One grid cell completely covers this catchment (Table 2).
The Upper-Guadiana catchment is located on the central Spanish plateau.It has a Mediterranean and semi-arid climate with very warm summers and mild winters.Potential evaporation exceeds precipitation, resulting in a relatively low mean measured discharge of 16 mm yr −1 .Lowest flows occur in summer and highest flows in winter.Upper-Guadiana is slowly responding to precipitation due to large storage in extensive multi-layer aquifer systems and wetlands.Of the grid cells covering the catchment, the one closest to the outlet of the catchment representing 14 % of the catchment was used (Table 2).A number of other grid cells from this catchment were also studied (including one with a Bsk-climate instead of a Csa-climate), but the results were not significantly different.The time series of hydrological variables, the drought characteristics, and the conclusions drawn with regard to the performance of the large-scale models in simulating drought propagation processes were similar.
We are aware that caution should be taken when comparing large-scale models against observations on the scale of one single grid cell.In this study, we therefore did not compare model output with observations.Instead, we studied the most important processes underlying drought propagation in the example catchments and compared the results with general knowledge on drought propagation and with results of catchment-scale models, described by Van Loon and Van Lanen (2012a).Comparisons of large-scale model(s) with observations have been performed previously by Van Loon et al. (2011b) and Stahl et al. (2011a).Van Loon et al. (2011b) did a qualitative assessment of the regime of the ensemble mean of a comparable set of large-scale models for four of the case study areas that were also used in this study.They concluded that the most important characteristics of those regimes, i.e. low flows and snow melt peaks, were reproduced by the large-scale models.This gives confidence that large-scale models can be used for drought analysis in these case study areas.Stahl et al. (2011a) compared anomaly indices in a large number of small catchments in Europe, some being represented by a single grid cell and some by more than one grid cell (up to nine cells).They found no significant correlations of anomaly indices with area, and thus ruled out a scaling effect.Hence, small catchments can be represented by a single grid cell, as long as the elevation difference between model and observations is not too high (in Stahl et al., 2011a, less than 300 m).

Post-processing
We processed the data of the selected case study areas through a number of steps: 1. interpolation of NA-values of leap days, 2. standardisation of the state variables SM and GW by dividing the data by the long-term average (needed because of huge inter-model differences in reference level, as reported by Wang et al., 2009), 3. calculation of the ensemble mean of all models for SM, GW, Q sub , and Q total (nine models for SM, three for GW, and ten for Q sub and Q total ; see Table 1), 4. calculation of the daily maximum and minimum value of all models for SM, GW, Q sub , and Q total to determine model range, 5. smoothing the daily ensemble mean, maximum, and minimum of SM, GW, Q sub , and Q total by applying a 30-day centred moving average (the need for smoothing when using large-scale models was demonstrated by Van Loon et al., 2011b).

Drought analysis
Droughts were identified using the variable threshold method (Yevjevich, 1967;Hisdal et al., 2004;Van Loon and Van Lanen, 2012a).A monthly threshold derived from the 80percentile of the monthly duration curves was used.The discrete monthly threshold values were smoothed by applying a centred moving average of 30 days (Van Loon and Van Lanen, 2012a).To eliminate minor droughts, a minimum duration of 3 days was used (Van Loon et al., 2010).This method was applied to all hydrometeorological variables, i.e. precipitation (from WFD), and the smoothed ensemble mean of SM, GW, Q sub , and Q total (from the large-scale hydrological models).The smoothing (Sect.2.2.2, step 5) was used as pooling method (Hisdal et al., 2004;Fleig et al., 2006).
Each drought event can be characterised by its duration and by some measure of the severity of the event.For fluxes (e.g.precipitation and runoff) the most commonly used severity measure is deficit volume (Fig. 2), calculated by summing up the differences between actual flux and the threshold level over the drought period (Hisdal et al., 2004;Fleig et al., 2006).For state variables (e.g.soil moisture and groundwater storage), we used the maximum deviation from the threshold (max.deviation) as severity measure (Fig. 2).These drought characteristics are used to illustrate drought propagation (Di Domenico et al., 2010;Van Loon et al., 2011b;Van Loon and Van Lanen, 2012a).

Typology of hydrological droughts
The hydrological drought typology developed by Van Loon and Van Lanen (2012a) was used to study drought propagation processes.This typology (Table 3) was developed using a catchment-scale model that was calibrated against observations.Here, a short summary is given of the hydrological drought types distinguished in the drought typology; for more details refer to Van Loon and Van Lanen (2012a).
-Classical rainfall deficit droughts are caused by a rainfall deficit (in any season) and occur in all climate types.
-Rain-to-snow-season droughts are caused by a rainfall deficit in the rain season and extend into the snow season in which precipitation peaks do not end the hydrological drought, because temperatures have decreased below zero, and occur in catchments with a pronounced snow season.
-Wet-to-dry-season droughts are caused by a rainfall deficit in the wet season and extend into the dry season in which precipitation peaks do not end the hydrological drought, because they are completely lost to evapotranspiration, and occur in catchments with pronounced wet and dry seasons (e.g.Mediterranean and monsoon climates).
-Cold snow season droughts are caused by a low temperature in the snow season.In catchments with a very cold winter, subtypes A and B occur, which are caused by an early beginning of the snow season and a delayed snow melt, respectively.In catchments with temperatures around zero in winter, subtype C occurs, which is caused by below-normal recharge due to snow accumulation.
-Warm snow season droughts are caused by a high temperature in the snow season.In catchments with a very cold winter, subtype A occurs, which is caused by an early snow melt.In catchments with temperatures around zero in winter, subtype B occurs, which is caused by a complete melt of the snow cover in combination with a subsequent rainfall deficit.
The application of the drought typology is based on expert knowledge (like in Van Loon and Van Lanen, 2012a).
In the part of this study dealing with typology, subsurface runoff (Q sub ) was used as proxy for groundwater, because groundwater storage data were only supplied by three out of ten large-scale models (see Table 1).

Results
In this section, we present the results of the analysis of the large-scale models on drought characteristics, drought propagation features, and drought typology, and link these results to earlier work on drought propagation.This exercise can be regarded as evaluation of the large-scale models.

Drought characteristics
General drought characteristics were determined from the large-scale model ensemble mean for all five case study areas (Table 4).These drought characteristics reflect aspects of drought propagation and differences in climate: -Drought events became fewer and longer when moving from precipitation via soil moisture to groundwater storage; i.e. the number of droughts decreased from 3-5 per year to 0.5-1 per year, and the duration increased from around 15 days to 70-160 days.The decrease in the number of droughts can be seen in Fig. 3e, in which there were more drought events in precipitation (2nd row) than in groundwater (4th row) due to attenuation, and the increase in duration is visualised in Figs.3c and 4b, and c, in which drought events in precipitation (2nd row) were (more and) shorter than those in groundwater (4th row).
-Drought events in total runoff had drought characteristics in between those of precipitation and groundwater, because total runoff reflects both fast and slow pathways in a catchment.This is visualised in Figs. 3 and 4, in which the signal of total runoff (lower row) is a combination of the signals of subsurface runoff (5th row) representing slow pathways and precipitation (2nd row) representing fast pathways.
-Deficit volumes were higher for droughts in precipitation than for droughts in total runoff, because precipitation is higher and more variable, resulting in higher threshold values and a larger deviation from the threshold (compare 2nd and lower row in Figs. 3 and 4).The exception was Narsjø, which had a slightly lower variability in precipitation and a slightly higher variability in total runoff than the other case study areas, resulting in a similar mean deficit (i.e.4.3 mm; Table 4).
-Drought characteristics of subsurface runoff were comparable to those of groundwater storage (although a different number of large-scale models was used to calculate the average of both variables; see   4).The similarity of both variables also justifies the use of Q sub as a proxy of groundwater storage in the remainder of this research.
-Due to its semi-arid climate, Upper-Guadiana had slightly fewer and longer meteorological droughts than the other case study areas (Table 4).
These results correspond to earlier work on drought propagation (Peters et al., 2003;Tallaksen and Van Lanen, 2004;Di Domenico et al., 2010;Van Loon et al., 2011b;Van Loon and Van Lanen, 2012a).The drought characteristics in Table 4 also showed unexpected behaviour: -Mean max.deviation was lower for soil moisture droughts than for droughts in groundwater.This was expected to be the other way around (like in Hohenrainer, 2008 andVan Loon andVan Lanen, 2012a) and is probably due to the standardisation of the values of soil moisture and groundwater (Sect.2.2.2, step 2).
-The drought characteristics of total runoff were in between those of precipitation and soil moisture in all case study areas, while a differentiation between fast and slowly responding systems was anticipated.The drought characteristics of total runoff in the slowly responding systems Upper-Metuje, Upper-Sázava, and Upper-Guadiana were expected to be more comparable to those of groundwater storage/subsurface runoff (fewer and longer droughts, like in Van Loon and Van Lanen, 2012a).In the Upper-Sázava and Upper-Guadiana case study areas, mean duration of droughts in groundwater storage and subsurface runoff was relatively long, as expected (106 and 117 days and 159 and 107 days, for Upper-Sázava and Upper-Guadiana, respectively), but total runoff did not reflect a substantial groundwater influence as mean duration of droughts in total runoff was short (30 and 36 days, respectively).This is visualised in Figs. 3 and 4, in which drought events in total runoff (lower row) were more and shorter than those in groundwater (4th row).
-  storage in the catchment and therefore its fast reaction to precipitation (Sect.2.1.2and Oosterwijk et al., 2009), and Upper-Metuje was anticipated to have longer groundwater droughts, due to storage in the extensive aquifer system and therefore its slow reaction to precipitation (Sect.2.1.2and Rakovec et al., 2009).Upper-Guadiana was expected to have even longer groundwater droughts than the average of around 160 days, because multi-year droughts are common in that catchment due to its semi-arid climate and large storage in extensive aquifer systems and wetlands (Sect.2.1.2and Peters and Van Lanen, 2003).In Van Loon and Van Lanen (2012a), average duration of groundwater droughts in Upper-Guadiana was more than 750 days.
In conclusion, the ensemble mean of the large-scale models showed a reasonable reproduction of general drought characteristics in the case study areas.Propagation processes were clearly reflected.In general, the ensemble mean of the large-scale models is better in simulating quickly responding systems than slowly responding systems.In slowly responding systems, too many short hydrological droughts were simulated.

Drought propagation features
For a more thorough insight into drought generating mechanisms, we also investigated time series of meteorological data of the WFD and hydrological data of the large-scale models for the propagation features mentioned in Sect. 1. From a visual inspection of the total time series of precipitation (examples in 2nd row in Figs. 3 and 4) and total runoff (examples in lower row in Figs. 3 and 4), we learned that the shape of the signal of the ensemble mean total runoff was quite similar to the precipitation signal.Recessions, which are an indication of catchment processes, were not visible in the time series of total runoff and only slightly in groundwater storage.With regard to the drought propagation features, the ensemble mean of the large-scale models showed -very little lag: the start of a hydrological drought almost coincided with the start of the associated meteorological drought.The lag between a drought in precipitation and total runoff was estimated to be on average between 4 and 15 days (dependent on catchment), while using a catchment-scale model it has been estimated to be between 24 and 51 days for the same catchments (Van Loon and Van Lanen, 2012a).A European-wide study on the hydrological drought response time to weather-type occurrence showed even larger values, varying between 45 and 210 days, dependent on basin storage properties (Fleig et al., 2010).The absence of a lag in the ensemble mean of large-scale models can partly be explained by the fact that we studied single grid cell runoff, for which no routing was applied.If we would have studied routed discharge of a large number of grid cells (i.e. a larger catchment), a larger lag would have been expected.We checked this hypothesis by studying the routed discharge of the Upper-Guadiana case study area, because it is the largest catchment with highest routing effects expected there.When switching from single grid cell runoff to routed discharge, the lag between precipitation and discharge increased from 4 days to 11 days, which is still considerably lower than the lag of 24 days produced by a catchment-scale model (Van Loon and Van Lanen, 2012a).
-very little lengthening: also the end of a hydrological drought almost coincided with the end of the associated meteorological drought, because a precipitation peak immediately caused higher runoff in the large-scale model simulations.Exceptions are some cases in winter with temperatures below zero in which snow accumulation took place (e.g. in Upper-Metuje and Upper-Sázava, Fig. 3b, c).Furthermore, sometimes during a dry series of years, recovery from drought was slightly slower than during a wet series of years.
-almost no pooling: most meteorological droughts resulted in a separate hydrological drought (compare precipitation, 2nd row, and total runoff, lower row, in Figs. 3 and 4).Only in some cases, meteorological droughts grew together into one long hydrological drought (e.g. the drought events in Upper-Sázava; see Figs. 3c and 4c, lower row).
-some attenuation: during a multi-year period of on average high precipitation, short meteorological drought events were filtered out (e.g. in Upper-Guadiana in 1970; see Fig. 3e, lower row).Prudhomme et al. (2011) also found that the non-occurrence of extremes is generally simulated in the correct period by a number of large-scale models.
In conclusion, the ensemble mean of the large-scale models showed a poor reproduction of drought propagation features and a precipitation peak immediately ended a hydrological drought (little lengthening or pooling).

Typology
Additionally, we applied the drought typology of Van Loon and Van Lanen (2012a) to the large-scale model results.
Many hydrological drought events were unidentifiable (5 % of all events for Upper-Metuje, up to 28 % for Narsjø, Table 5, last column), meaning that no anomaly in precipitation or temperature could be found that caused the hydrological drought event.Many of these unidentifiable drought events occurred in the snow season.The snow-related drought types (i.e.rain-to-snow-season drought, cold snow season drought, and warm snow season drought, Sect.2.2.4) were clearly more difficult to distinguish using the ensemble mean of the large-scale models than using catchment-scale models (with which the typology was developed).In Narsjø, for example, a precipitation deficit during winter (with temperatures well below zero and precipitation falling as snow, Table 2) sometimes initiated a hydrological drought during that same winter.This should not occur, because if temperatures are below zero, a lack of snowfall should not influence winter runoff, but only snow accumulation.

Classification of all hydrological drought events in the case study areas
Table 5 gives the percentages of all drought events in total runoff and subsurface runoff (proxy for groundwater storage; Sects.2.2.4 and 3.1) in all five case study areas that were attributed to a certain hydrological drought type.The following can be noted: -Drought events in subsurface runoff and total runoff had very similar hydrological drought types.The exception is composite drought, which did not occur in total runoff in some case study areas (e.g.Upper-Sázava).
-Many drought events were classified as classical rainfall deficit drought (in total for all case study areas together, 48 % in subsurface runoff and 62 % in total runoff).Especially Upper-Sázava and Upper-Guadiana had many classical rainfall deficit droughts.
-As expected, wet-to-dry-season droughts only occurred in the case study area with a semi-arid climate (Upper-Guadiana) and snow-related droughts (rain-to-snowseason drought, cold snow season drought, and warm snow season drought) only in regions with a continuous snow cover in winter (all except Upper-Guadiana).
-Composite droughts were found in all case study areas, but with low percentages.They did not only occur in regions with a slow response to precipitation (Upper-Metuje, Upper-Sázava, and Upper-Guadiana), but also in Narsjø and Nedožery (regions which typically have only limited storage and a quick response to precipitation).In Nedožery, these composite droughts were two events in subsurface runoff for which different hydrological drought types in different seasons were not interrupted by a recharge peak.One example in Nedožery, in which warm snow season droughts and classical rainfall deficit droughts were combined, is shown in Fig. 5a.This is a phenomenon that can occur in reality, but that was not expected in this specific case study area because of its quick response to precipitation.In Narsjø, composite drought events were related to a missing snow melt peak due to a severe meteorological drought in winter (e.g. the winter of 1996; see Fig. 5b, 2nd row).This phenomenon was not previously found in observations or catchment-scale models for the respective catchment (Van Loon et al., 2010, 2011b;Van Loon and Van Lanen, 2012a), nor in other European catchments (Hannaford et al., 2011;Prudhomme et al., 2011).In these studies, winter drought events in cold climates always ended by snow melt, even after winters with limited snow cover.It is therefore unknown whether these simulations with the large-scale models reflect a phenomenon that occurs in reality.
-Only few composite droughts occurred in Upper-Guadiana and Upper-Metuje, while those case study areas reflect catchments with extensive aquifer systems and were therefore expected to have more composite droughts (in Van Loon and Van Lanen, 2012a; composite droughts were 17 % of all groundwater drought events in Upper-Metuje and 67 % in Upper-Guadiana).
In Narsjø and Upper-Guadiana, the interplay between precipitation and temperature was not always according to expectations, leading to an unforeseen distribution over the hydrological drought types in Table 5.In Narsjø, runoff peaks and hydrological droughts developed during winter, although winter temperatures were well below zero.This has two consequences.Firstly, drought events starting in summer/autumn were ended by a runoff peak in winter and could therefore not develop into a rain-to-snow-season drought, but were classified as classical rainfall deficit droughts (see the drought in groundwater, 4th row, and the minor event in subsurface runoff and total runoff, 5th and lower row, in November 1974 in Fig. 5c).Secondly, warm snow season droughts -subtype B, or classical rainfall deficit droughts developed in Narsjø during winter (see the drought in subsurface runoff and total runoff (5th and lower row) in March 1975 in Fig. 5c), while those were expected to occur only in catchments with winter temperatures around or above zero (Sect.2.2.4).The reason is that in winter, despite the well below zero temperatures, runoff still reacted immediately to precipitation, so that a lack of precipitation in winter could start a hydrological drought.A similar process was observed in Upper-Guadiana.In summer, when potential evapotranspiration is much higher than precipitation, recharge and runoff should be zero because all precipitation is normally lost to evapotranspiration.In the ensemble mean of the large-scale models, however, runoff peaks still occur in Upper-Guadiana in summer.Consequently, drought events did not extend into the dry season and were classified as classical rainfall deficit droughts instead of wet-to-dry-season droughts (see the runoff peak in July 1987 in Fig. 5d, lower row).

Classification of the five most severe hydrological drought events in selected case study areas
For each case study area, the five most severe drought events were selected based on deficit volume (like in Van Loon and Van Lanen, 2012a).This changed the distribution over the hydrological drought types (compare Tables 5 and 6).
The classical rainfall deficit drought occurred less in most case study areas (in total, for all case study areas together, from 48 % to 16 % in subsurface runoff, and from 62 % to 36 % in runoff).The exception is total runoff in Nedožery, where four of the five most severe drought events were of the classical rainfall deficit type.The cold snow season drought disappeared almost completely from the list, because this hydrological drought type usually has low deficit volumes.These shifts are in line with Van Loon and Van Lanen (2012a).
If we compare Table 6 with Table 5 in Van Loon and Van Lanen (2012a), we note some differences between the typology of severe drought events using catchment-scale and large-scale models, using the ensemble mean of large-scale models: -In general, more of the most severe drought events were classical rainfall deficit droughts and warm snow season droughts (on average in total runoff, 36 % classical rainfall deficit droughts using large-scale models vs. 32 % using a catchment-scale model, and 20 % warm snow season droughts using large-scale models vs. 16 % using a catchment-scale model).Differences between catchments were large.For example, Upper-Metuje had fewer classical rainfall deficit droughts using the largescale models instead of a catchment-scale model (20 % instead of 60 % in total runoff), whereas Nedožery had more (80 % instead of 40 % in total runoff).(1974)(1975), and (d) drought in summer in Upper-Guadiana (1987) (legend: see Fig. 3).

Hydrol
-Fewer of the most severe drought events were rain-tosnow-season droughts (for example, in Narsjø 20 % and 40 %, instead of 80 % using a catchment-scale model).
-The distribution of composite droughts was different.
Severe drought events of this type did not only occur in slowly responding catchments, but in all catchments (in subsurface runoff).
If drought events had have been classified according to their duration (instead of deficit volume) and the five longest drought events selected, the distribution over the hydrological drought types would have been only slightly different from Table 6 (not shown).Intense, but short-lived drought types like warm snow season droughts would have occurred slightly less, and long, but non-intense drought types like rain-to-snow-season droughts and wet-to-dry-season droughts would have occurred slightly more.
In conclusion, the ensemble mean of the large-scale models showed a reasonable reproduction of drought typology in the case study areas.All hydrological drought types of Van Loon and Van Lanen (2012a) were represented in the ensemble mean of the large-scale models, and in the climate type in which they were expected.The distribution of the hydrological drought types had some mismatches, e.g. a high percentage of classical rainfall deficit droughts in all case study areas, a low percentage of composite droughts in slowly responding case study areas, unexpected occurrence of composite droughts in quickly responding case study areas, a low percentage of rain-to-snow-season droughts in cold climates and wet-to-dry-season droughts in semi-arid climates.

Discussion and recommendations for improvement of large-scale models
In this research, the central question was how well largescale models reproduce drought propagation.Before we answer that question (Sect.4.2) and give some recommendations for improvement of the models based on our analysis (Sect.4.2.3),we first discuss the limitations of our methodology (Sect.4.1).

Methodology
We used a specific set of large-scale models for our analysis, but we could have chosen other or more models (GHMs and LSMs).The time series of the individual models and therefore the ranges of the hydrological variables shown in Figs. 3, 4, and 5 would have been different.However, we expect that the ensemble mean of the models would not change significantly, because the models in our selection are representative of the range of large-scale models that exist (e.g.Haddeland et al., 2011;Harding et al., 2011).They have very different model structure and parametrisations, and therefore very different responses.Unfortunately, no overall "best" large-scale model exists.Some models are, for example, very good in temperate regions, but bad in cold climates; others are good in cold climates, but very bad in tropical regions.The same is true for fast and slowly responding physio-geographic regions.For drought propagation studies in small uniform regions, i.e. with similar climate and catchment characteristics, it would be possible to select the largescale model that performs best in that region.But for drought studies on continental or global scale, where conditions and therefore model results are extremely variable, such a choice cannot be made and the best solution is using a multi-model ensemble (as was earlier suggested by various authors; see Sect. 1).As this study aims to test these large-scale applications, we follow that approach.The model spread is an indication of model structure uncertainty in the multi-model ensemble.Parametric uncertainty in the individual models has not been investigated in this study.A single simulation was used for all models.We do, however, expect that parametric uncertainty is substantial.The large-scale models were not (or only minimally) calibrated (Sect.2.1.2),because (i) observed and simulated variables and scales do not match (for example simulated grid cell runoff vs. observed catchment discharge, or scarce pointmeasurements of groundwater vs. simulated total subsurface storage); (ii) the models are assumed to include all important physical processes; and (iii) parameters of the models were derived from large-scale maps of e.g.vegetation and soil properties.As a result of both model structure and parametric uncertainty, the simulation of soil moisture and hydrological droughts is far more uncertain than simulation of meteorological droughts.Especially, the simulation of state variables has a high uncertainty, as reported recently by Samaniego et al. (2012).In this study, however, the standardisation of the state variables SM and GW (Sect.2.2.2) and the use of a relative threshold (percentile of flow duration curve; Sect.2.2.3) account for biases in the absolute value of the states.Further issues regarding the effect of model structure and parametric uncertainty on drought propagation will be discussed in the next section (Sect.4.2).
We tested the ensemble mean of the large-scale models in five case study areas.An extrapolation to more and other case study areas would be interesting, especially to outside Europe (e.g.tropical and arid regions in Africa and Asia).The analysis of drought characteristics can be done on a high number of grid cells with different climate using the method of Van Huijgevoort et al. (2012b).The analysis of drought propagation features and the classification of hydrological droughts into types requires visual inspection and expert knowledge.Therefore, it would be more difficult to study these drought-related aspects in a much larger sample of case study areas.
In classifying hydrological droughts into types, we found a large number of unidentifiable droughts (Table 5).For the remaining events, the meteorological anomaly/anomalies causing the drought event was/were found by visual inspection of time series of all hydrometeorological variables.Quantification of this relationship between meteorological and hydrological drought is barely investigated and has proven to be very difficult.To our knowledge, the best effort is elaborated in the recent paper of Wong et al. (2012).They found that copulas have more potential to link a hydrological drought to preceding meteorological drought(s) than classical linear correlation techniques.
Our aim was to use only natural headwater catchments.The Upper-Guadiana, however, is far from natural, as groundwater extraction for irrigation has increased dramatically since the 1980s (e.g.Bromley et al., 2001).The resulting hydrological situation is a combination of drought (natural causes) and water scarcity (anthropogenic causes).Therefore, the observed hydrological time series of this case study area were naturalised using the method described in Van Loon and Van Lanen (2012b).We compared drought propagation in the large-scale models (which did not simulate anthropogenic influences for this exercise; see Sect.2.1.2) with drought propagation in these naturalised time series.The use of an undisturbed catchment would have been better, but finding an undisturbed groundwaterdominated catchment in a semi-arid climate with sufficient good quality data is not trivial.
In this study, we used the variable threshold to identify droughts.There are many other ways to calculate droughts using a kind of threshold approach, e.g.standardized precipitation index (SPI) and standardized runoff index (SRI; Lloyd-Hughes and Saunders, 2002;Shukla and Wood, 2008), regional deficiency index (RDI; Stahl, 2001;Hannaford et al., 2011), fixed threshold level method (Hisdal et al., 2004), cumulative precipitation anomaly (CPA), and soil moisture deficit index (SMDI) (e.g.Wanders et al., 2010).These approaches give different numbers for the drought characteristics for a specific hydrometeorological variable (i.e. the numbers in Table 4), but the conclusions regarding propagation are not expected to change when using one of these other methods.For example, Peters et al. (2006) and Tallaksen et al. (2009) use a fixed threshold in the Pang catchment (UK) instead of a variable threshold.They found drought propagation processes (e.g.lag, lengthening) that are comparable to the ones found in studies that used a variable threshold (e.g., Van Loon and Van Lanen, 2012a).An important reason to choose the variable threshold level method is that it enables comparison with the catchment model studies described in Van Loon and Van Lanen (2012a).
For our analyses, we used grid cell precipitation and runoff.The use of average catchment precipitation instead of grid cell precipitation would not have led to different results in the drought analysis.There are two reasons for that.First, the differences between observed catchment precipitation and grid cell precipitation for the studied case study areas were small, as was demonstrated by Van Huijgevoort et al. (2010, 2011).Second, meteorological droughts have a large spatial extent and frequently cover a large region, as was demonstrated by Peters et al. (2006) and Tallaksen et al. (2009), so there is little chance of missing a meteorological drought event by using a slightly different spatial coverage.As river routing has a considerable influence on discharge characteristics in large catchments, we tested the use of simulated streamflow at the outlet instead of grid cell runoff for the Upper-Guadiana case study area.Upper-Guadiana is the only studied area that is large enough to encompass more than one grid cell.We found that the lag between meteorological drought and hydrological drought increased slightly, but that the shape of the time series did not change at all.Our conclusion regarding the lack of attenuation and multi-year droughts are also valid when using streamflow at the outlet.We expect this to be consistent also in other regions.

Evaluation of simulation of drought propagation by large-scale models
We investigated three different aspects of drought propagation: drought characteristics, drought propagation features, and drought typology.In general, these drought propagation aspects indicated a reasonable simulation of hydrological drought development in contrasting catchments in Europe, but we also found important deficiencies.Some drought propagation processes were clearly not well simulated by the ensemble mean of the large-scale models.These difficulties are all related to a too strong coupling between precipitation and discharge, which results in an immediate reaction of runoff to precipitation.This should not occur in certain climates types, i.e. semi-arid climates in summer and cold climates during the frost season, and in catchments with considerable storage.Hence, the difficulties arise from deficiencies in the simulation of processes related to temperature and storage.

Temperature
The drought events simulated by the ensemble mean of the large-scale models are mainly governed by P control, and less by T control (Table 3).This resulted in an overestimation of the occurrence of the hydrological drought type that is predominantly caused by P control, i.e. classical rainfall deficit drought, and an underestimation of the occurrence of hydrological drought types that are (partly) caused by T control, i.e. rain-to-snow-season drought, wet-to-dry-season drought, cold snow season drought, warm snow season drought, especially subtype A (see Table 3 and Sect.2.2.4).This is mainly due to the simulation of droughts and discharge peaks in periods in which no drought or peaks were expected.Discharge peaks in winter in cold climates and in summer in semiarid climates end drought events prematurely and therefore largely influence drought characteristics (shorter than anticipated) and drought typology (fewer rain-to-snow-season droughts and wet-to-dry-season droughts than anticipated).Hence, the deficiencies of large-scale models in the reproduction of drought propagation processes are related to simulation of snow (low temperature) and evapotranspiration (high temperature).Large-scale models are known to have difficulties with the correct simulation of snow accumulation (Feyen and Dankers, 2009;Haddeland et al., 2011;Stahl et al., 2011bStahl et al., , 2012)).Prudhomme et al. (2011) and Stahl et al. (2012) found problems in drought simulation in regions with winter temperatures close to zero.Their conclusion is confirmed in this study.Additionally, we also encountered problems in regions with winter temperatures well below zero, which is inconsistent with Prudhomme et al. (2011), who concluded that droughts in Scandinavia were well reproduced.One reason for incorrect snow simulation is related to elevation.Prudhomme et al. (2011) and Stahl et al. (2012) found a larger error of drought simulation in mountainous areas.In these areas, the grid cell elevation often deviates from the actual elevation of a catchment (Gudmundsson et al., 2012).This difference influences both snowfall (simulated by WFD or by some of the large-scale models themselves; see input data in Table 1) and snow accumulation and melt (simulated by the large-scale models).According to Van Loon and Van Lanen (2012a), elevation plays an important role in drought propagation, because the development of snow-related hydrological drought types is very sensitive to a narrow temperature range around zero.This is comparable to floods, for which a critical zone for snowmelt was found by Biggs and Whitaker (2012).Subgrid variability, which is not captured by the large-scale models, results in a deviation in elevation between large-scale models and observations/catchment-scale models, and therefore in a deviation in drought typology.A higher resolution for the large-scale models might solve this issue, as argued by Wood et al. (2011).They explicitly mention snow(melt) simulation as one of the challenges that can be overcome using hyperresolution models.In climate modelling, the benefits of higher resolution models are proven, e.g. by Hagemann et al. (2009).
Another temperature-related problem in large-scale models is the simulation of evapotranspiration.The methodology used for calculation of evapotranspiration varies considerably between models (Haddeland et al., 2011) and can cause significant differences in model results (Gosling and Arnell, 2011;Stahl et al., 2012).The importance of evapotranspiration for drought development has been demonstrated by Melsen et al. (2011) and Teuling et al. (2012).One reason for deficiencies in the simulation of evapotranspiration can be the lack of evapotranspiration from wetlands and surface water (Gosling and Arnell, 2011).Gosling and Arnell (2011) also mention that their model does not include transmission loss along the river network or evaporation of infiltrated surface runoff.This is a common issue in GHMs, which generally leads to an overestimation of runoff in dry catchments.Another reason can be related to groundwater storage.Van den Hurk et al. (2005) state that larger storage in model reservoirs results in sustained summertime evaporation.As many large-scale models have little storage, summertime evaporation is probably underestimated and discharge peaks can occur during summer in semi-arid climates.Also Bierkens and van den Hurk (2007) and Lam et al. (2011) point towards the role of groundwater storage in the simulation of evaporation, especially related to the convergence of groundwater in wet discharge zones.

Storage
The effect of storage on hydrological drought development has been demonstrated by many authors (e.g.Peters et al., 2003;Van Lanen et al., 2004, 2012;Tallaksen et al., 2009;Hannaford et al., 2011;Van Loon et al., 2011a;Van Loon and Van Lanen, 2012a).Therefore, the correct simulation of storage is important if large-scale models are to be used in hydrological drought analysis.Additionally, storage is important in climate change impact assessment.A more realistic storage capacity leads to smaller changes in both wintertime and summertime monthly mean runoff, so to less extreme impacts of climate change (Van den Hurk et al., 2005).Storage acts as a buffer to climate change.
Currently, storage is not well simulated in the ensemble mean of the large-scale models, resulting in insufficient variability between fast and slowly responding areas.In slowly responding areas, the reaction of runoff to precipitation is too fast, resulting in deficiencies in the reproduction of drought characteristics (shorter than anticipated), drought propagation features (little lag, lengthening, pooling, and attenuation), and drought typology (few composite droughts).The fast reaction of runoff to precipitation corresponds to the findings of, for example, Gosling and Arnell (2011);Haddeland et al. (2011);Stahl et al. (2012); Gudmundsson et al. (2012).Based on their analysis of spatial cross-correlation patterns and runoff percentiles, Gudmundsson et al. (2011Gudmundsson et al. ( , 2012) ) conclude that discharge during dry conditions is largely influenced by terrestrial hydrological processes (catchment storage and release), in contrast to floods, which are mostly determined by forcing data.Stahl et al. (2012) and Gudmundsson et al. (2012) found that these terrestrial hydrological processes are poorly replicated in the simplified storage schemes of large-scale models.Most models release too much of the incoming precipitation too quickly (Gudmundsson et al., 2012), and simulated droughts are interrupted more frequently than in observations (Stahl et al., 2011a).Therefore, models perform best in regions where the runoff response to rainfall is more direct (Stahl et al., 2011a) or in very wet climates, where storage does not play an important role.
So both climate control (temperature) and catchment control (storage) on drought propagation are not simulated correctly by the ensemble mean of the large-scale models.This indicates a limited suitability of large-scale models when extrapolating to the future (e.g.Gosling et al., 2011;Corzo Perez et al., 2011), in which drought propagation is governed by climate control, and to data-scarce regions (e.g.Stahl et al., 2012), in which drought propagation is governed by climate control and catchment control.

Recommendations
Although representation of hydrological processes is better in large-scale hydrological models than in global climate models (GCMs; Hagemann and Dümenil, 1998;Van den Hurk et al., 2005;Sperna Weiland et al., 2010), there is still space for improvement of large-scale hydrological models for a correct reproduction and prediction of drought propagation across the globe.Simulation of evapotranspiration, snow accumulation, and storage in large-scale models should be improved to decrease uncertainty in hydrological drought simulation.
For improvement of the simulation of evapotranspiration, better understanding and representation of local-scale hydrological processes in dry regions of the world is essential (Gosling and Arnell, 2011;Lam et al., 2011).Furthermore, re-infiltration and evaporation of surface runoff should be implemented in large-scale models.
First steps on the improvement of snow simulation are being set by Cherkauer et al. (2003), who improved the VIC model for cold areas, and Dutra et al. (2010) and Balsamo et al. (2011), who improved snow simulation in TES-SEL.However, despite major advances, Lettenmaier and Su (2012) note that "there remain important problems in parameterization of cold land hydrological processes within climate and hydrology models." First steps on the improvement of storage simulation are being set by Sutanudjaja et al. (2011) and Tian et al. (2012), who coupled a groundwater model (MODFLOW and AquiferFlow) to a land surface model (PCR-GLOBWB and SiB2).An important limitation is that these couplings are still offline, not allowing for dynamic feedbacks between groundwater storage, soil moisture, and evapotranspiration (Sutanudjaja et al., 2011).Another difficulty is that in largescale models parameters are representative of typical rather than locally realistic hydrogeological conditions (Gosling and Arnell, 2011;Gudmundsson et al., 2012).For more locally (or at least, regionally) realistic subsurface runoff simulation using large-scale models, two steps are needed.Firstly, storage should be better represented in the models, e.g. by including more groundwater reservoirs into the models or by online coupling with a groundwater model; secondly, higherresolution large-scale datasets on storage properties should be derived to come to more realistic model parameters for this groundwater part of large-scale models.This is needed even in hyperresolution models, because there will always be sub-grid variability that needs parametrisation of processes (Beven and Cloke, 2012).It is important to evaluate model results not only against observed discharge, but also against observations of state variables like snow accumulation, soil moisture, and groundwater storage.
An encouraging note is that not all models have the same difficulties in simulating temperature and storage effects on drought propagation (see the model range in Figs. 3 and 4).For example, at least one model in the suite of large-scale models used in this study had extremely slow recessions, so a very slow reaction to precipitation (as previously also demonstrated by Gudmundsson et al., 2012).The drawback of this lies in the fact that a single large-scale hydrological model is often used globally, independent of the representativeness of the model for that specific region.Models with a fast reaction to precipitation are also used in slowly responding systems and vice versa (e.g.Prudhomme et al., 2011).Comparably, models that have difficulties simulating snow accumulation processes are applied in cold regions and models that have difficulties simulating evapotranspiration processes are applied in semi-arid regions (e.g.Feyen and Dankers, 2009).Therefore, like Stahl et al. (2012) and Gudmundsson et al. (2012), we still advise the use of a multi-model ensemble of a number of large-scale model for drought studies, because then flashy and smooth hydrographs of very different large-scale models are averaged out.According to Beven and Cloke (2012), ensemble simulation is one methodology for taking into account the lack of knowledge on parametrisation of sub-grid processes.
Large-scale modellers can learn form each other, as has been shown by WaterMIP of the WATCH-project.More model inter-comparison projects (MIPs) are needed that focus on hydrology, instead of climate (e.g.Gates et al., 1999;Meehl et al., 2000Meehl et al., , 2007;;Covey et al., 2003;Friedlingstein et al., 2006).Therefore, expectations for the recently started Inter-Sectoral Impact Model Intercomparison Project, ISI-MIP1 , are high (Schiermeier, 2012).

Conclusions
This study showed that drought propagation processes in contrasting catchments in Europe are reasonably well reproduced by an ensemble mean of ten large-scale models.However, results also indicated a limited suitability of largescale models when extrapolating to the future and to datascarce regions, because both climate control (temperature) and catchment control (storage) on drought propagation are not simulated correctly by the ensemble mean of the largescale models.
The ensemble mean of the large-scale models was well able to simulate general drought propagation processes in drought characteristics; i.e. drought events became fewer and longer when moving from precipitation via soil moisture to groundwater storage, and drought characteristics of discharge were in between.Furthermore, the correct hydrological drought types were generally simulated in the correct climate type, i.e. classical rainfall deficit droughts in all climates, wet-to-dry-season droughts only in semi-arid climate, and snow-related droughts in areas with a continuous snow cover in winter.
However, challenges still occur in catchments with cold or semi-arid climates and catchments with large storage in aquifers or lakes.The immediate reaction of runoff to precipitation in the large-scale models, even in winters with below-zero temperatures and summers with high evapotranspiration, resulted in many short droughts in total runoff, and consequently in an overestimation of classical rainfall deficit droughts and an underestimation of wet-to-dry-season droughts and snow-related droughts.The still limited representation of storage in the large-scale models is reflected in the absence of a differentiation in drought characteristics of total runoff between quickly and slowly responding systems.Furthermore, almost no composite droughts were simulated for the slowly responding case study areas, while many multi-year drought events were expected in these systems.The flashiness of the hydrograph of the ensemble mean of the large-scale models also showed up clearly in the drought propagation features.Drought events in the ensemble mean had very little lag and lengthening, almost no pooling, and only some attenuation.
In general, we anticipate that the simulation of hydrological drought has a significantly higher uncertainty than the simulation of meteorological drought.Potential improvement of hydrological drought simulation in large-scale models lies in the better representation of hydrological processes that are important for drought development.These processes are evapotranspiration, snow accumulation, and especially storage.Besides the more explicit inclusion of storage in large-scale models, also parametrisation of storage processes requires attention, for example through a global-scale dataset on aquifer characteristics, improved large-scale datasets on other land characteristics (e.g.soils, land cover), and calibration/evaluation of the models against observations of storage (e.g. in snow, groundwater).

Fig. 2 .
Fig. 2. Threshold level method with variable threshold (80th percentile of monthly duration curve, smoothed by 30-day moving average) for groundwater storage (GW; state variable; upper row) and total runoff (Q; flux; lower row), including an illustration of drought characteristics duration, deficit volume, and maximum deviation.

Fig. 3 .
Fig. 3. Example of drought events in all case study areas: Narsjø, Upper-Metuje, Upper-Sázava, Nedožery, and Upper-Guadiana in 1970 (all rows: black, solid line = time series of meteorological variable (30-day moving-average of WFD temperature and precipitation) or ensemble mean of hydrological variable (see y-axis), grey area = range of individual models, dashed line = smoothed monthly 80 %-threshold of displayed variable, red area = drought event; upper row: grey line = long-term average of WFD temperature, red line = 0 • C).

Table 1 .
Haddeland et al., 2011) the participating models (derived fromHaddeland et al., 2011).Model names written in bold are classified as LSMs in this paper; the other models are classified as GHMs.

Table 3
also includes a column on the influence of precipitation (P ) and temperature (T ) control on the development of each hydrological drought type.Classical rainfall deficit droughts are the only hydrological drought type that is completely governed by P control.Cold snow season droughts (all subtypes) and warm snow season droughts -subtype Aare hydrological drought types that are completely governed by T control.Rain-to-snow-season droughts and wet-to-dryseason droughts are initiated by P control and continued by T control.Warm snow season droughts -subtype B -are initiated by T control and continued by P control.In the case of composite droughts, it is dependent on the hydrological drought types that are combined, whether only P control, only T control, or a combination of P and T control plays a role

Table 4 .
General drought characteristics using a 80 % monthly threshold (moving average 30 days) and a minimum drought duration of 3 days for the hydrometeorological variables derived from WFD and simulated with the large-scale models for all selected case study areas.

Table 5 .
Hydrological drought types of all hydrological drought events per catchment (subsurface runoff and total runoff).

Table 6 .
Hydrological drought types of the five most severe hydrological drought events per catchment (subsurface runoff and total runoff), selection based on deficit volume.