The effect of climate type on timescales of drought propagation in an ensemble of global hydrological models

Drought is a natural hazard that occurs at many temporal and spatial scales and has severe environmental and socioeconomic impacts across the globe. The impacts of drought change as drought evolves from precipitation deficits to deficits in soil moisture or streamflow. Here, we quantified the time taken for drought to propagate from meteorological drought to soil moisture drought and from meteorological drought to hydrological drought. We did this by cross-correlating the Standardized Precipitation Index (SPI) against standardized indices (SIs) of soil moisture, runoff, and streamflow from an ensemble of global hydrological models (GHMs) forced by a consistent meteorological dataset. Drought propagation is strongly related to climate types, occurring at sub-seasonal timescales in tropical climates and at up to multi-annual timescales in continental and arid climates. Winter droughts are usually related to longer SPI accumulation periods than summer droughts, especially in continental and tropical savanna climates. The difference between the seasons is likely due to winter snow cover in the former and distinct wet and dry seasons in the latter. Model structure appears to play an important role in model variability, as drought propagation to soil moisture drought is slower in land surface models (LSMs) than in global hydrological models, but propagation to hydrological drought is faster in land surface models than in global hydrological models. The propagation time from SPI to hydrological drought in the models was evaluated against observed data at 127 in situ streamflow stations. On average, errors between observed and modeled drought propagation timescales are small and the model ensemble mean is preferred over the use of a single model. Nevertheless, there is ample opportunity for improvement as substantial differences in drought propagation are found at 10 % of the study sites. A better understanding and representation of drought propagation in models may help improve seasonal drought forecasting as well as constrain drought variability under future climate scenarios.

calculate the SPI may not be representative of surface conditions. In addition, the calculation of the index requires long time series (>30 years) of data (Guttman, 1999;Mckee et al., 1993;Zargar et al., 2011).
The standardization approach of the SPI is not only easily applied to different timescales, but can also be applied to other (hydrological) variables such as soil moisture or streamflow. In this way, we use the same methodology to identify soil moisture and hydrological drought by defining the Standardized Soil Moisture Index (SSMI) (Hao and AghaKouchak, 5 2013), Standardized Runoff Index (SRI) (Shukla and Wood, 2008), and Standardized Streamflow Index (SSFI).
Soil moisture and runoff are controlled by pixel-scale precipitation and hydrological processes. Streamflow, on the other hand, is affected by upstream pixels. Therefore, we computed a catchment-aggregated SPI to quantify drought propagation from meteorological to streamflow droughts. The input for the aggregated SPI is the total precipitation falling within each pixel and its upstream area, based on a 0.5° routing network (see Sect. 3). We did not include a travel time factor in this 10 calculation, thus assuming that precipitation falling in the upper parts of each catchment will impact streamflow at the outlet within the same month. In the rest of the paper, meteorological drought is based on the pixel-based SPI when referring to soil moisture or runoff drought, and to the catchment-based SPI when referring to streamflow drought.
In this study, we quantified drought propagation from meteorological drought to soil moisture and hydrological droughts. Therefore, we calculated the SPI using accumulation periods of 1, 2, 3, 6, 9, 12, 24 and 36 months. These accumulation 15 periods span sub-seasonal, (multi-)seasonal, and (multi-)annual timescales. For the other standardized indices (SIs), we used the 1-month accumulation period to identify short-term drought conditions.

Timescales of drought propagation
Drought propagation from meteorological to soil moisture or hydrological drought was based on correlations between the SPI and target SI (SSMI, SRI or SSFI). The SI time series were prepared by applying two criteria to the target SI. First, we 20 distinguished between drought conditions in summer and winter seasons. The summer (winter) season was defined as June, July, and August above (below) the equator, and December, January, and February below (above) the equator. Second, we focused on months corresponding to dry conditions, here defined as SI ≤ 0. This threshold includes near-normal and mildly dry conditions, but leaves a larger sample size than when we limit the analysis to moderate or severe drought events.
Theoretically, moderate drought events (SI ≤ −1) have an occurrence probability of about 16 % per year, which would 25 correspond to about 5 events in a 30-year study period, for a total of 15 events considering a three-month season. Severe drought events (SI ≤ −1.5) would result in six events during a 30-year period of three months based on the same reasoning, compared to 45 events using SI ≤ 0 as a threshold.
After preparing the SI time series, we cross-correlated each of the SPI time series with the SSMI, SRI, and SSFI, without considering lag between the time series. The drought propagation timescale was defined as the SPI accumulation period that 30 is most closely related to the target drought index, which we call SPI-n. Pixels where the final correlations are not statistically significant (p = 0.05) were masked from the results. Autocorrelation is a potential issue when correlating time Hydrol. Earth Syst. Sci. Discuss., https://doi.org /10.5194/hess-2017-745 Manuscript under review for journal Hydrol. Earth Syst. Sci. Discussion started: 2 January 2018 c Author(s) 2018. CC BY 4.0 License. series, as it reduces the degrees of freedom compared to a standard significance test. In this study, the effective degrees of freedom are based on the modified Chelton method (Pyper and Peterman, 1998) as in Barker et al. (2016).
The strength of the relationships between the Köppen-Geiger classification (Kottek et al., 2006) shown in Fig. 1, as well as certain model characteristics on SPI-n, were quantified using a variety of tests. We use the rank of SPI-n rather than the duration of the accumulation period in months in the calculations. This means we assume that the difference between 5 accumulation periods of 12 and 24 months (both in the order of annual timescales, differing by one rank SPI-n) to be equivalent to the difference between 1 and 2 months (both in the order of sub-seasonal timescales, differing by one rank SPIn), but very unlike the difference between 1 and 12 months (sub-seasonal versus annual timescales, differing by 5 rank SPIn). Statistical significance was based on (paired) t-tests and ANOVA tests. Since statistical significance does not reflect the relevance of differences between groups, we use Cohen's d to quantify the effect size between two groups (Cohen, 1988). 10 This metric is defined as: ( 1) where (2) and where μ represents the group mean, σ the standard deviation, the number of observations, and subscripts indicate the 15 group. The outcome of the metric is thus the difference in group means relative to the standard deviation of the groups. A result of 1 therefore means that the group means differ by one standard deviation and those groups overlap by about 62 %.

Validation of drought propagation
Timescales of drought propagation from meteorological to streamflow drought in global hydrological models are validated against observational data. Sites with observational streamflow data were matched to model pixels by selecting the model pixel containing the in-situ site. Since there may be discrepancies between the model and actual river routing schemes, we extracted the model streamflow data from the selected pixel as well as from the eight surrounding pixels. The in-situ site was 5 then assigned to the model pixel with the lowest root mean square error (RMSE) between observed and modeled streamflow.
Once the in-situ sites were matched to model pixels, the SPI-n were calculated as described in Sect. 2.2. For consistency, model results were also recalculated at the in-situ sites using a paired data approach.
The evaluation of SPI-n was based on the rank SPI-n rather than the length of the accumulation period in months, for the same reason for which rank SPI-n were used in the statistical tests (Sect. 2.2). The performance metrics used in the 10 evaluation were mean absolute error (MAE) and Spearman correlation coefficient.

Data
We assess drought propagation in seven global models from the eartH2Observe project (www.earth2observe.eu), as well as in the model ensemble mean. Three of the models are land surface models (LSMs): HTESSEL-CaMa (Balsamo et al., 2009;Yamazaki et al., 2011), ORCHIDEE (D'Orgeval et al., 2008Krinner et al., 2005;Ngo-Duc et al., 2007), and the SURFEX-  Döll et al., 2009;Flörke et al., 2013). Other models in the eartH2Observe project were excluded because they did not provide all of the variables required for this study. The models are run with a consistent 0.5° meteorological forcing dataset, the WATCH Forcing Data methodology applied to ERA-Interim reanalysis 20 (WFDEI) data (Weedon et al., 2014). However, static fields such as land cover and soil physical properties were not prescribed, as these tend to be closely linked to the modeling system. Important characteristics of the models such as runoff mechanisms and representation of reservoirs or water use are presented in Table 1. For more information about the model datasets and project design, see Schellekens et al. (2016).
In this study, we used the precipitation, root-zone soil moisture, runoff, and streamflow datasets to calculate the SIs. We do 25 not study groundwater droughts because HTESSEL-CaMa does not simulate this store, and two of the other models have not made the data available. In addition, groundwater is defined differently between models, complicating comparisons of this store. The common forcing dataset means that the SPI time series are identical for all models, while the SSMI, SRI, and SSFI are model-specific. The model ensemble mean was calculated as the average of the SIs. A consistent model dataset for the eartH2Observe project is available from 1980-2012 , though we use the years 1983-2012 to avoid data gaps in the first 30 years of the SI time series caused by the 36-month accumulation period for the SPI. In addition to the datasets used to Hydrol. Earth Syst. Sci. Discuss., https://doi.org /10.5194/hess-2017-745 Manuscript under review for journal Hydrol. Earth Syst. Sci. Discussion started: 2 January 2018 c Author(s) 2018. CC BY 4.0 License.
calculate the SIs, we used other model variables to relate these to the drought propagation results. These are the runoff coefficient, or the ratio of runoff to precipitation, and the ratio of surface runoff to total runoff. The validation of modeled drought propagation was based on observed precipitation and streamflow time series. We used 5 gauge-based precipitation data from the Global Precipitation Climatology Centre's (GPCC) Full Data Reanalysis product version 7 (Schneider et al., 2015). The data are available as monthly precipitation totals at 0.5° resolution for the entire modeled period. Note that reanalysis precipitation data, such as used for the model forcing in this study, and the GPCC dataset are not truly independent. However, the dependence of reanalysis precipitation on both gauge and satellite observations inhibits the selection of a completely independent dataset with global coverage that also has a record long 10 enough for drought studies. Monthly streamflow data were obtained from the Global Runoff Data Centre (GRDC; Koblenz, Germany; http://grdc.bafg.de) database. We used only sites with a catchment area larger than 9000 km 2 and at least 15 years of data. In addition, we ensured that the sites were independent by searching for the most upstream stations that fit the previous requirements. All stations located downstream of these stations were excluded from the analysis. Finally, we only report on sites where the correlation between SPI-n and the SSFI was statistically significant (p < 0.05). These criteria resulted in 297 sites, with an average data availability of 27 years.

Results and discussion
Here, we characterize and discuss the timescales of drought propagation from meteorological to soil moisture and hydrological drought at global scale. First, we describe drought propagation in the model ensemble mean and its relationship 5 to climate. Second, we assess the variability between models and identify factors that may explain these differences. Finally, we compare the results of the model ensemble mean as well as the individual models to observational data.

Model ensemble mean
Drought propagation based on SPI-n for the model ensemble mean varies considerably, and all timescales from 1 to 36 months are represented in the results (Fig. 2). Summer soil moisture droughts (based on SSMI) are best represented by SPI-n 10 of one or two months in wet regions such as the Amazon, but by much longer SPI-n in dry climates and some boreal regions.
Results are more mixed when focusing on runoff droughts (based on SRI). Runoff droughts are most linked to precipitation deficits in the same month in dry climates such as the Sahel, southern Africa, and central Australia. Runoff droughts in other dry regions such as the Middle East, northern Africa, and the western USA, however, are related to much longer precipitation deficits up to several years. The patterns of drought propagation timescales from meteorological to streamflow 15 drought are similar to the patterns from meteorological to runoff droughts, though SPI-n tends to be slightly longer. Longer SPI-n for streamflow compared to runoff can be expected as streamflow is simply routed runoff. In general, SPI-n are also longer, and drought propagation slower, for winter droughts than for summer droughts (Fig. 2). In this study, we focus on SPI-n rather than the strength of the relationship between SPI and other SI. However, the correlations behind SPI-n are generally high, with median values ranging from 0.67 to 0.74, depending on climate and season (Fig. S1). The strength of the 20 relationship is highest in tropical climates (medians around 0.8) and lowest in polar climates (medians around 0.6).
Hydrol. Earth Syst. Sci. Discuss., https://doi.org /10.5194/hess-2017-745 Manuscript under review for journal Hydrol. Earth Syst. Sci. Discussion started: 2 January 2018 c Author(s) 2018. CC BY 4.0 License. The relationship between drought propagation timescales and climate is further examined using the Köppen-Geiger 5 classification (Kottek et al., 2006). We use six climate classes that reflect the five major climate types, with an additional distinction between tropical wet (i.e. tropical rainforest or monsoonal climates) and tropical savanna climates (Fig. 1). The results in Fig. 3 confirm that the climate type plays an important role in the timescale of drought propagation. As in Fig. 2, droughts in tropical climates tend to respond to short periods of precipitation deficits, while continental and polar climates respond to longer periods of precipitation deficits. Overall, the variability within the tropical climate groups (both wet and 10 savanna) is relatively low compared to dry climates, which are represented by the entire range of SPI accumulation periods studied. Despite the large variability in drought propagation timescales in dry climates, further distinctions between desert/savanna or hot/cold climates within the dry climate class did not have added value.
Hydrol. Earth Syst. Sci. Discuss., https://doi.org /10.5194/hess-2017-745 Manuscript under review for journal Hydrol. Earth Syst. Sci. Discussion started: 2 January 2018 c Author(s) 2018. CC BY 4.0 License. the snow. In this way, drought conditions in winter may be more related to precipitation deficits in the previous summer.
Indeed, soil moisture droughts may be more related to the SPI accumulated over the three-month period before snowfall than to the longer period starting with those three months and extending until the defined winter season. Note that we used a rather simple definition of summer and winter seasons by using the equator. However, since the climate along the equator is almost exclusively tropical wet, which does not show significant seasonality, we expect that this does not significantly 15 impact the results.
In terms of statistical significance, ANOVA tests show that the mean SPI-n for each climate type are significantly different (p < 0.05) for both soil moisture and hydrological droughts, with just one exception. The means of SPI-n for winter Hydrol. Earth Syst. Sci. Discuss., https://doi.org /10.5194/hess-2017-745 Manuscript under review for journal Hydrol. Earth Syst. Sci. Discussion started: 2 January 2018 c Author(s) 2018. CC BY 4.0 License. hydrological droughts in continental and polar climates are not significantly different. Similarly, the difference in propagation time between summer and winter droughts for all drought types are significantly different based on paired ttests, except for soil moisture droughts in tropical wet climates. Despite the fact that most group means differ significantly, they are not always substantially different in magnitude. For example, the difference between mean summer and winter SPIn for runoff droughts in temperate climates is very small. 5

10
The difference between the timescales of drought propagation to soil moisture and hydrological droughts can provide additional insights into the mechanisms of drought propagation in the models. The differences in the rank of SPI-n are shown in Fig. 4. As explained in Sect. 2.2, we use rank SPI-n rather than the duration in months because these are more useful in interpreting differences between (sub-)seasonal and annual timescales than the SPI-n in months. A difference of 1-2 rank SPI-n indicates that drought propagation occurs at similar timescales (i.e. sub-seasonal, seasonal or yearly time scales). 15 Differences of more than four rank SPI-n represent large differences in drought propagation timescales, such as between subseasonal and yearly timescales. In summer, SPI-n for runoff are higher than for soil moisture in the Amazon, eastern North America, central Africa, and parts of Europe. This means that drought propagation to soil moisture is quicker than drought propagation to runoff, which suggests that subsurface runoff or baseflow is more important than surface runoff in these locations. The opposite is true for most parts of Australia, large parts of central and eastern Asia, and parts of western North 20 America. In these locations, drought propagation to soil moisture drought is slower than drought propagation to runoff, which implies that surface runoff is an important component of total runoff. The differences can be substantial, with rank differences of five and more, which roughly represent the difference between sub-seasonal and annual timescales. In winter, Hydrol. Earth Syst. Sci. Discuss., https://doi.org /10.5194/hess-2017-745 Manuscript under review for journal Hydrol. Earth Syst. Sci. Discussion started: 2 January 2018 c Author(s) 2018. CC BY 4.0 License. differences tend to be smaller, and longer drought propagation timescales for runoff than soil moisture (i.e. positive values in Fig. 4) are more common. Drought propagation timescales for streamflow droughts tend to be longer than for runoff drought, which is consistent with streamflow being routed runoff.
Spearman correlation coefficients show that the difference in rank SPI-n of summer soil moisture and runoff droughts is related to the amount of surface runoff relative to total runoff (ρ = −0.53). The relationships with average annual 5 precipitation (ρ = 0.44) and the runoff coefficient (ρ = 0.36) are slightly weaker. The amount of surface runoff relative to total runoff, annual average precipitation, and the runoff coefficient are also related to the difference between soil moisture and streamflow drought, though the relationships are slightly weaker (correlations are up to 0.1 lower). The difference in SPI-n between drought types in winter, as well as between runoff and streamflow droughts in summer, cannot be explained by these variables. Each of the correlation coefficients noted here is highly significant (p < 0.01). 10 In an additional analysis, we calculated SPI-n for the model ensemble mean focusing on mild droughts (SI ≤ −0.5) and moderate droughts (SI ≤ −1). When using mild droughts as a threshold value, the results are largely similar to those including near-normal conditions (SI ≤ 0), as shown in Fig. S2. Both the global patterns as well as the differences between the climate types are similar to the results shown in this section. However, the number of pixels masked due to insignificant correlations between SPI-n and the SSMI, SRI or SSFI increases. When moderate droughts are used as a threshold (Fig. S3), 15 the proportion of pixels that are masked increases further, resulting in 40-50 % less data than when including near-normal conditions. In addition, the maps of SPI-n become noisier and the distributions of SPI-n by climate type (as shown in Fig. 3) flatten, especially in continental and polar climates. However, the relationships between the climate and seasonal group means remain similar.
The patterns of runoff drought propagation found in this study are similar to a previous study that calculated correlations 20 between ensemble median runoff and precipitation percentiles (van Huijgevoort et al., 2013). Specifically, runoff is more closely related to shorter SPI (or precipitation percentile) accumulation periods in tropical regions, and to longer accumulation periods in continental and polar climates. However, more specific regional comparisons between these studies are hindered by differences in the approach of the two studies. In van Huijgevoort et al. (2013), correlations are based on the full precipitation and runoff time series, while in our study they are limited to below-average runoff conditions in either 25 summer or winter months only. The results shown here are also in line with results from an observational study comparing SPI and SSFI in the United Kingdom. That study reported that SPI-n of 1-4 months were most closely related to SSFI, except in the southeast where some major aquifers are located and where longer SPI-n were found (Barker et al., 2016). That is just slightly shorter than the 2-6 months found in this study. Where drought propagation occurs at very short timescales, drought is mainly driven by precipitation deficits, possibly in combination with temperature anomalies (though these are not 30 reflected in the SPI). Where drought propagation occurs at long timescales, attenuation by hydrological stores likely plays a more important role.

Model variability
The variability of SPI-n in the models underlying the model ensemble mean is shown in Fig. 5. Again, the standard deviation is not shown in months, but in the rank of SPI-n timescales studied. In summer, the standard deviation ranges from 1-4 rank SPI-n. Note that differences of 1-2 rank SPI-n indicate that models tend to agree on whether drought propagation occurs at sub-seasonal, seasonal, or annual timescales. Model variability is low in temperate regions such as Europe and eastern North 5 America, as well as in tropical regions such as the Amazon and Southeast Asia. The model variability is high in (semi-)arid regions such as the Sahel and central Australia. The patterns of model variability in hydrological drought propagation timescales are largely similar to those of soil moisture drought. However, the variability is patchier, and there are some regional differences. For example, model variability in SPI-n in tropical and temperate climates is slightly higher for runoff than for soil moisture. In addition, model variability is lower in central Asia for runoff droughts than for soil moisture 10 droughts.

15
Attributing observed differences between models to model characteristics is not straightforward, despite the common meteorological forcing, because there are considerable differences in model structures and parameterizations (Beck et al., 2016(Beck et al., , 2017Döll et al., 2016). However, we examine the relationship between drought propagation and specific model characteristics to identify areas for further study. First, we examine the relationship between SPI-n and average soil moisture 20 storage, as previous work has suggested that water storage plays an important role in drought propagation (Barker et al., 2016;Van Loon and Laaha, 2015). In addition, we examine the effect of model structural choices on drought propagation in an exploratory analysis.
Soil moisture storage in the models varies considerably between models, mainly due to differences in the definitions of rootzone soil moisture used to calculate the SSMI. The reported depths of the root zone range from 0.2 to 8 m, which may be a fixed value for all pixels or vary by pixel and/or land use type. Although we use standardized indices (SSMI) and not absolute values of soil moisture, the response time to changes in precipitation and/or evaporation will differ between soils with large and small storage volumes. We examine the relationship between average root-zone soil moisture storage and 5 average SPI-n for soil moisture droughts in Fig. 6. Soil moisture storage is averaged over space and time, and SPI-n in space, resulting in a single point for each model. The figure shows that drought propagation from meteorological to soil moisture drought is strongly related to average soil moisture storage, with correlation coefficients between 0.56 and 0.91, depending on the season and climate type. The impact of changes in storage on drought propagation is especially high in dry climates, where relatively small differences in average soil moisture correspond to large differences in SPI-n. In comparison, the 10 impact of storage on SPI-n is low in tropical wet climates. In general, changes in SPI-n with storage are smaller in winter than in summer. For tropical wet, continental, and polar climates the impact of storage is less than one rank SPI-n over the full range of storage volumes in winter, suggesting that storage is not an important driver of model variability in this season. The relationship between soil moisture storage and drought propagation to runoff and streamflow is not included here because the link is not as clear-cut. Runoff consists of surface and subsurface components. The subsurface component of 20 runoff is related to water stored in groundwater and/or in the lowest layer of the soil, and not to the upper soil layers included in the root zone. Furthermore, groundwater data are not available for three out of seven models. The surface component, on the other hand, is only affected by soil moisture in as much that saturated conditions near the surface will result in a larger surface flow component. Therefore, it is impacted by the relative saturation of the soil rather than the storage itself.
In an exploratory analysis, we also examine the relationship between four qualitative factors related to model structure and the timescales of drought propagation. First, we compare SPI-n in LSMs and GHMs. Second, we study the effect of different evaporation schemes, specifically comparing Penman-Monteith evaporation schemes to more empirical temperature-based approaches. Third, we compare models that represent reservoirs with those that do not. Fourth, we group models by runoff generation mechanisms, comparing models that include infiltration excess runoff generation to those that only represent 5 saturation excess. In this last analysis, we exclude ORCHIDEE and WaterGAP3 because they use alternative methods of runoff generation; a Green-Ampt infiltration and a beta function, respectively.
The relative importance of these model characteristics based on Cohen's d (see Sect. 2.2) varies considerably over the different climates and drought types (Fig. 7). Overall, the tested model characteristics have a larger effect on soil moisture droughts than on runoff and streamflow droughts. Grouping models by model type has the largest effect on mean soil 10 moisture drought SPI-n for most climates, where drought propagation is slower in LSMs than in GHMs. For runoff droughts, on the other hand, GHMs tend to have higher SPI-n for runoff droughts than LSMs. However, it is unclear whether the representation of the energy balance is the primary source of the difference between the groups, or whether underlying or related model characteristics are key.

15
Hydrol. Earth Syst. Sci. Discuss., https://doi.org /10.5194/hess-2017-745 Manuscript under review for journal Hydrol. Earth Syst. Sci. Discussion started: 2 January 2018 c Author(s) 2018. CC BY 4.0 License. Grouping models by runoff generation mechanisms has a smaller effect on average SPI-n for soil moisture droughts than the 5 other studied factors. However, it can be more relevant for runoff and streamflow droughts, especially in tropical and continental climates. In these climates, including infiltration excess runoff leads to lower SPI-n and faster drought propagation. In tropical climates, this can be explained by the fact that high-intensity rainfall events exceeding the infiltration capacity of the soil are more common.
The simulation of reservoirs has a substantial impact on drought propagation, especially for soil moisture and runoff. This is 10 consistent with previous work , and also with increasing recognition that human adaptations should be taken into account in drought studies (Van Loon et al., 2016;Veldkamp et al., 2015). However, it is unclear what mechanism could be responsible for the effect of reservoirs on soil moisture droughts. The way in which reservoirs are implemented in global models affects runoff and streamflow, but has no direct link to soil moisture. Therefore, it is likely that another mechanism is truly responsible for the observed differences. 15 Note that the model characteristics tested here are not fully independent. For example, LSMs are less likely to model reservoirs than GHMs. In addition, the number of models is small compared to the number of differences in model structures and parameterizations. Therefore, grouping models by different characteristics can result in identical groups, making it impossible to untangle the two in the current study setup. One example is the snow scheme. Previous studies have suggested that using energy-based or temperature-based snow schemes results in different model behavior, both based on the same 20 models used in this study (Beck et al., 2017) as well as in other models (Haddeland et al., 2011). In this study, however, it is impossible to distinguish between model type and snow scheme since all LSMs use an energy-based approach, while GHMs use temperature-based approaches.
Insight into which factors are truly responsible for model differences can only be gained through exhaustive experiments testing different model parameterizations and structures (i.e. Medlyn et al., 2015). Nevertheless, while the effect sizes cannot 25 be used to confirm that a certain model characteristic is the true factor underlying observed differences, they can be used to identify directions for further study in more comprehensive analyses.

Evaluation against observations
The timescales of drought propagation from meteorological to streamflow drought in the models and model ensemble mean have been evaluated against data from 297 in-situ streamflow stations (Fig. 8). Of the individual models, W3RA performs 30 best for summer droughts and WaterGAP3 for winter droughts based on MAE and Spearman correlations. The performance of the model ensemble mean is similar to that of the best-performing model in both seasons, with the lowest mean absolute errors and the second-highest correlations with observed SPI-n. WaterGAP3 is the only model that was calibrated against Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2017-745 Manuscript under review for journal Hydrol. Earth Syst. Sci. Discussion started: 2 January 2018 c Author(s) 2018. CC BY 4.0 License. streamflow observations, which could be a reason for the performance in winter, even though it is outperformed by other models in summer. While the mean absolute errors of the ensemble mean and individual models are within one or two rank SPI-n, correlations are low for all models.
In addition to the overall performance metrics, we compare the number of models for which the difference with observed SPI-n is small (absolute error ≤ 1) and the number of models for which this difference is large (absolute error ≥ 4) at each 5 site (Fig. 8). The thresholds are based on the models' (in)ability to capture the overall timescales of drought in terms of subseasonal, seasonal or yearly timescales. At least six out of seven models are within one step of the observed SPI-n at 18 % of the study sites, and at least four models are within one step at nearly half of the study sites (47 and 49 % for summer and winter, respectively). This suggests that models are well able to capture observed drought propagation timescales at these sites. However, the majority of models differ by at least 4 steps of SPI-n at approximately 20 % of sites. At these sites, the 10 models show substantially different drought propagation timescales. These differences in SPI-n correspond to differences between sub-seasonal and annual timescales of drought propagation. Models tend to do well in eastern North America and Europe, as well as western North America in winter, but poorly in central North America and Australia. However, the errors between models and observations are not related to climate, catchment size, or SPI-n. It is important to note that some of the streamflow time series span as little as 15 years, which is shorter than the 30 years of data recommended for the calculation of SI. This means that the observational time series at some sites are too short to capture the climatology of their locations. However, the average time series length (27 years) is close to the required 30 5 years. Furthermore, even where time series are relatively short we can evaluate whether the models capture the relationship between observed SPI and SSFI during the available time period. Another limitation of the evaluation is that the sites with observed data are not spread evenly across the globe, as most sites are located in the United States or Europe, with scarce data in Africa and Asia. Differences between the models and observations can be attributed to several types of errors (Van Loon, 2015). The first type of error concerns errors in the model meteorological forcing data, but also in the GPCC 10 precipitation data used to create the validation dataset. Then there are the errors in model structure and parameterizations of hydrologic processes, including the representation of anthropogenic influence on streamflow. Finally, there are errors in the (discretization of) the routing schemes employed by the models.
Unfortunately, evaluation of soil moisture drought propagation timescales is inhibited by the lack of root-zone soil moisture data at global scale. While satellite soil moisture products are available, these are limited to the upper few centimeters of the 15 soil (Owe et al., 2008), which is not representative of root-zone soil moisture. Field-measured soil moisture is also available, for example through the International Soil Moisture Network (Dorigo et al., 2011). However, only a handful of sites remain after applying the same site selection procedure as for streamflow drought validation (i.e. sufficiently long time series and a statistically significant relationship between SPI and SSMI).

Conclusion 20
This study evaluates timescales of drought propagation from meteorological to soil moisture and hydrological drought in an ensemble of seven land surface and global hydrological models. Drought propagation was quantified by cross-correlating standardized indices of hydrological variables. Here, we focus on soil moisture, runoff and streamflow droughts in summer and winter. However, the simple and flexible approach used here can be applied to other drought types, such as groundwater droughts, and to other months or seasons. 25 Drought propagation is closely related to climate, with slower drought propagation in dry and continental climates and quicker drought propagation in tropical climates. Winter season drought propagation tends to be slower than in the summer, especially in tropical savanna and continental climates. This may be a result of the distinct wet and dry seasons in the former, and snow cover in the latter. Faster propagation of meteorological drought to runoff drought than to soil moisture drought has been linked to a higher proportion of surface runoff, thereby causing a larger portion of total runoff to bypass the soil 30 moisture store.
Model variability can be quite high, especially for summer droughts and in dry climates, where the socio-economic impacts of drought can be severe. Since the models were run with consistent forcing datasets, differences can be attributed to model parameterization and structure. For example, drought propagation from meteorological to soil moisture drought was generally slower in models with higher average soil moisture storage, and vice versa. Although the differences cannot be definitively attributed to specific model characteristics in the current experiment, we identified several directions for further 5 study. In addition, grouping models by model type, runoff generation mechanisms, and representation of reservoirs all resulted in significantly different average drought propagation timescales.
The relationship between meteorological and streamflow drought in the global models was evaluated against observational data. Overall, the models were able to capture the timescales of drought propagation, as errors were relatively low on average. However, considerable model advancements can be made since there were large discrepancies between model and 10 observed drought propagation at 20 % of the study sites.
A better understanding and representation of drought propagation in global models may improve drought forecasting (Cancelliere et al., 2007), especially when combined with the availability of accurate seasonal forecasts . Drought forecasting potential is expected to be higher in regions with relatively slow drought propagation, such as the dry and continental climates in this study, as drought forecasting for longer SPI-n tends to be more 15 accurate than for shorter SPI-n (Mishra and Desai, 2005). Additional research using lagged SPI-n could assess the potential for forecasting different types of drought based on meteorological data. Improved representation of drought propagation in models is also crucial to constrain the impact of climate change on drought frequency and severity, and thereby improve the reliability of projected changes.

Data availability 20
The meteorological forcing, model outputs, and standardized indices from the eartH2Observe project are openly available from the project portal (www.earth2observe.eu). GPCC daily precipitation data can be obtained via https://www.esrl.noaa.gov/psd/data/gridded/data.gpcc.html and GRDC monthly streamflow data are available at http://grdc.bafg.de.