Interactive comment on “ Effects of different reference periods on drought index estimations for 1901 – 2014 ”

The authors use two global precipitation and temperature datasets to calculate the SPEI drought metric using various choices of the calibration period. By quantifying drought severity, duration and extent in maps and by aggregating SPEI values over four regions, conclusions are reached on ’best practice’ in setting the calibration period. A very nice touch is that the authors have looked at specific record-dry years for the regions under consideration, and visualised the effects of the choice of calibration period on the drought estimate.


Introduction
Drought is a complex, slow onset and natural phenomenon that affects more people than any other hazard and seriously influences water resources, agriculture, society and ecosystems (Hagman, 1984;Wilhite, 2002;Ionita et al., 2015).As drought impacts are largely nonstructural and spread over relatively large regions, the onset and end of a drought, as well as its severity, are often difficult to determine (Wilhite, 2002).Furthermore, based on recent changes in the 21st century and projected climate warming, such drought phenomena will likely worsen (Sheffield and Wood, 2008;Dai, 2010).Sheffield et al. (2012) stated that severe and prolonged drought events have been observed since the 1970s, and these changes are related to higher temperatures and lower precipitation.
Drought can be defined and explained using absolute or relative terminology, allowing these terms or measures to be compared to each other (Dai, 2011;Trenberth et al., 2014).Absolute terms such as the amount of precipitation, the amount of soil moisture and other metrics can be used.The relative measures include the Palmer drought severity index (PDSI), the standardized precipitation index (SPI), the standardized precipitation and evapotranspiration index (SPEI) and others.Relative drought indices, however, are limited in their utility because they are based on standardized or normalized shortages relative to average conditions at a given station or in a specific period (Vicente-Serrano and Beguería-Portugués, 2003;Vicente-Serrano et al., 2010).Nevertheless, various drought indices have been widely used in many drought studies.Dracup et al. (1980) suggested three components of drought: duration, magnitude (average water deficiency) and severity (cumulative water deficiency).Such concepts have been applied to various drought indices to analyze historical characteristics.Wang et al. (2011) defined the intensity-duration-frequency of droughts with the SPI, standardized runoff index (SRI), standardized soil water index (SSWI) derived from observations and future regional climate change projections in central Illinois.To evaluate how well global climate models simulate observed drying or wetting trends, Nasrollahi et al. (2015) applied the Mann-Kendall trend test to the SPIs derived from global observational climate data, in this case, the dataset from the Climate Research Unit (CRU), and 41 predictions of global climate models (GCMs) from the Coupled Model Intercomparison Project Phase 5 (CMIP5).
Similarly, Tan et al. (2015) utilized climate data from 22 meteorological stations in Ningxia, a well-known food production area in Northwest China, and performed Mann-Kendall trend tests with the SPI and SPEI.The degrees of increasing drought frequency and intensity varied with the stations in the study region.Furthermore, Touma et al. (2015) used data from 15 GCMs in CMIP5 and assessed the likelihood of changes in the spatial extent, duration and number of occurrences of four drought indices, including the SPI, SPEI, and others.
Estimating a drought index requires a calibration step.Specifically, historical data such as precipitation data should be fitted to a specific probability distribution function (PDF) and used to estimate drought indices.A few previous studies addressed the issue of the data period in the calibration step (e.g., Karl et al., 1996;Dubrovsky et al., 2009).
While it is common to use self-calibrated indices (i.e., using the same dataset for calibration and index estimation), some studies suggest calibration using reference climate data to allow for an intercomparison of the index among stations or different periods (Dubrovsky et al., 2009).Such reference periods (i.e., calibration periods) of climate data are particularly important in climate change studies.It was previously noted for the self-calibrated PDSI that trends toward more extreme conditions are amplified when the calibration period does not include recent data, including the recent effects of climate change (van der Schrier et al., 2013;Trenberth et al., 2014).Still, few studies have clarified their approaches to calibration.Therefore, we aim to understand how a different reference period (i.e., calibration period) of climate data influences regional drought assessment.Specifically, we investigate the influences of different reference periods on historical drought characteristics such as trends, frequency, intensity and spatial extents with the SPEI estimated using two historical global climate datasets from the CRU and the University of Delaware (UDEL).This study shows that the reference periods influence the assessment of drought characteristics, particularly the severity and spatial extent, while its influence on the frequency is relatively small.These influences are especially significant in regions with dominant drying trends such as East Asia and West Africa.These findings suggest that the reference period should be clarified in drought assessments for a better understanding of regional drought characteristics and their temporal changes.

Study area and climate data
We investigate the drought characteristics over the Northern Hemisphere with a focus on four different regions: East Asia (EA), Europe (EU), the United States (US) and West Africa (WA) (Fig. 1).We performed the analyses based on the spatially distributed patterns over those regions, as well as their averages, but without distinguishing the sub-regions based on the climate characteristics.Two widely used global observational datasets from the CRU and UDEL are utilized in this study.From these two datasets, monthly precipitation and temperature data with a spatial resolution of 0.5° are used from 1901 to 2014.
This study uses the latest CRU dataset (CRU TS3.10), as described in Harris et al. (2014).The principal sources of the CRU data are the World Meteorological Organization (WMO) in collaboration with the US National Oceanographic and Atmospheric Administration (NOAA).Covering all land areas between 60°S and 80°N at a spatial resolution of 0.5°, the dataset includes global monthly climate data for ten variables: precipitation, mean temperature, diurnal temperature range, minimum and maximum temperature, vapor pressure, cloud cover, rain days, frost days and potential evapotranspiration.The dataset is derived from archives of climate station records with extensive manual and semi-automated quality control measures.
The UDEL dataset (V 4.01, Willmott and Matsuura, 2001) is also used in this study.The dataset includes gridded monthly precipitation and temperature data at a spatial resolution of 0.5° across the land over the globe.The dataset was compiled from sources including the Global Historical Climatology Network (GHCN) and the Global Surface Summary of Day (GSOD).To interpolate the station values to the grid, climatologically aided interpolation (CAI) and traditional interpolation were used for precipitation and digital elevation model (DEM)-assisted interpolation, traditional interpolation and CAI for temperature.In this work, traditional interpolation is a spherical version of Shepard's algorithm, which employs an enhanced distance weighting method (Shepard, 1968;Willmott et al., 1985).

Meteorological drought index
Various drought indices have been used to understand different types of droughts, including meteorological drought, agricultural drought and hydrological drought (Heim, 2002).For meteorological droughts, the indices include the PDSI (Palmer, 1965), the SPI (McKee et al., 1993) and the SPEI (Vicente-Serrano et al., 2010).As different studies have used different meteorological drought indices (Seneviratne, 2012;Sheffield et al., 2012;Trenberth et al., 2014;Nasrollahi et al., 2015;Touma et al., 2015), this study focuses on the SPEI.Devised by Vicente-Serrano et al. (2010), the SPEI has the advantage of considering the effects of temperature variability on drought relative to the SPI (Naumann et al., 2014).The SPEI uses the amount of precipitation minus PET and fits the data to the log-logistic PDF.Here, we summarize the steps in estimating the SPEI based on monthly precipitation and temperature data.The detailed procedure for estimating the SPEI was presented by Vicente-Serrano et al. (2010).
Step 1: Estimate the water surplus or deficit in month j ( " ) using the difference between precipitation ( " ) and potential evapotranspiration ( " ).
" =  " −  " (1) Here, the potential evapotranspiration is estimated based on the Thornthwaite method (1948), which requires the monthly temperature, latitude, day and month.
Step 2: Estimate the accumulated difference ( *," , ) over timescale  in a given month  and year .For example, the accumulated difference for a month in a particular year with a 12-month timescale is calculated as follows.
Step 3: Fit the accumulated difference to a log-logistic distribution as follows: where   is the cumulative probability function of a three-parameter log-logistic distribution and ,  and  represent the scale, shape and origin parameters, respectively.For model fitting, the L-moment procedure (Hosking, 1990) is employed, as it is one of the most robust and easy-to-use approaches.
Step 4: Estimate the SPEI based on the estimated   .The SPEI can be derived from the standardized values of   and the classical approximation of Abramowitz and Stegun (1965) following Vicente-Serrano et al. (2010).
The estimated drought index is classified as shown in Table 1 for moderate, extreme and very extreme cases.In this study, we focused on the SEPI with a 12-month lag (SPEI-12).SPEI can be estimated for different lag times, such as 1, 3, 6, 9, 12 and 24 months.
The temporal trend is investigated with a nonparametric and monotonic trend test based on the S-statistic of the Mann-Kendall trend test (Mann, 1945;and Kendall, 1976).In this test, an increasing (positive) trend or decreasing (negative) trend is tested for at a significance level of 5%.For the frequency, severity and spatial extent of drought, different measures have been defined and used in past studies (e.g., Wang et al., 2011;Touma et al., 2015) because it is not straightforward to define these quantities in practice.For example, Touma et al. (2015) defined the duration, occurrence and spatial extent of drought to investigate the drought changes with 15 CMIP5 models throughout the world in the 21st century.The duration of drought was defined as the consecutive period below a certain drought threshold.The occurrence of droughts was defined as the total number of droughts in the period of interest.Additionally, the spatial extent of drought was defined as the percentage of grid points below the given drought level, in which the corresponding drought index was less than the given drought category in each month, relative to the total number of terrestrial grid points in the domain.
In this study, we defined three measures of drought based on the SPEI-12: (1) Drought frequency was calculated as the ratio of the total number of drought events (i.e., SPEI-12 £ -1) to the total number of terrestrial grid points; here, we counted the number of drought events without considering whether a given drought event (i.e., SPEI-12 £ -1) was identified consecutively.(2) Severity was defined as the lowest estimate of the regional monthly average SPEI-12 with moving windows with periods of 1 to 12 months; here, regional averages were estimated in the four study regions depicted in Fig. 1. (3) Spatial extent was calculated as the number of grid points with an annual SPEI-12 £ -1.0 relative to the total number of terrestrial grid points.

Design of data analysis
To understand the influence of the reference period (i.e., calibration period) on the drought index, three different types of reference periods are used to estimate the SPEI-12 with the CRU and UDEL data.To separately analyze the drought characteristics in the estimation periods of 1901-1957 (P1) and 1958-2014 (P2), different sets of reference periods are used (Table 2).Here, we assume that the mean climates of P1 and P2 are different to some extent because of global climate and environmental changes, which will be discussed further in Section 3.For the first type of reference period (Ref1), we calibrated the distribution of a specific PDF (Step 3 in Section 2.2) using data from 1901 to 2014, which is used to estimate the SPE12 for the P1 and P2 estimation periods.For the second type of reference period (Ref2), calibrations are performed separately for P1 and P2; thus, so-called self-calibrated indices are derived.For the third type (Ref3), we calibrated the distribution using the data from P1 (i.e., 1910-1957) and then use this distribution for both estimation periods.

Spatial and temporal patterns of climate variables
In this section, we examine the spatial and temporal variations of precipitation, air temperature and PET (Figs. 2, 3 and 4 and Table 3), which are used to estimate D (= P -PET) (in Eq. 1) and thus the SPEI values.We particularly focused on the differences in meteorological conditions between P1 and P2 to enhance our understanding of similar or different drought index values according to the different reference periods in the following sections.
To investigate the temporal changes in precipitation, air temperature and PET, we compared the means and standard deviations between the two periods (i.e., P1 and P2) (Figs. 2 and 3 and Table 3).Most cases showed largely consistent results between CRU and UDEL; therefore, we did not focus extensively on the differences between the two datasets.In general, the temporal pattern of precipitation varied among regions, and increased air temperature was observed in all regions.On average (Fig. 3 and Table 3), annual precipitation was decreased in P2 relative to P1, as in EA and WA, whereas decreased precipitation was clearly evident only in limited areas within the regions (Fig. 2); for example, the west Sahel within WA.In contrast, annual precipitation increased in EU and the US.Increases in air temperature were clearly shown in all regions; consequently, increases in PET, which is controlled mainly by air temperature, were generally evident.Decreases in D were observed only in EA and WA (Fig. 4c).In these regions, an annual water deficit (i.e., negative D) was evident, whereas in other regions, i.e., EU and the US, an annual water surplus (i.e., positive D) was present.
The Mann-Kendall trend tests for annual precipitation, annual average temperature and annual PET were also performed, as shown in Fig. 4. The data reflect whether these variables showed statistically increasing, decreasing or no trends.For annual precipitation in EA, the areal extent with an increasing trend was almost twice that with a decreasing trend based on CRU, but the areal extent with a decreasing trend based on UDEL was broader than that with an increasing area.In EU and the US, the areal extent with an increasing trend was clearly larger than that with a decreasing area based on both CRU and UDEL.However, in WA, the areal extent with a decreasing trend was larger than that with an increasing trend based on both CRU and UDEL.These patterns were generally more severe for CRU than for UDEL.For annual average air temperature and PET, CRU produced increasing trends over most regions.Similar patterns were observed for UDEL, but the areal extent of the decreasing trend was slightly larger than that of CRU.

Temporal patterns of drought index
The drought index (i.e., SPEI-12) was estimated by fitting the three-parameter log-logistic model for three different reference periods (Table 2), as described in Section 2.4.As shown in the L-moment ratio diagram with the CRU for Ref1 as an example (Fig. 5), the model is well fitted with the L-moment approach, following Vicente-Serrano et al. ( 2010).Fig. 6 shows the temporal variations in SPEI-12 based on the reference periods (Ref1, Ref2 and Ref3) and datasets (CRU and UDEL) used in the two periods.In the US and EU, the SPEI-12 averages are very similar in the two periods, with values of 0.005 (P1) and 0.118 (P2) in the US and -0.011 (P1) and -0.001 (P2) in EU.In EA, the SPEI-12 averages for the three different reference periods slightly decrease from P1 to P2, whereas the deviations in SPEI-12 increase markedly.In WA, the averages and deviations in SPEI-12 significantly decrease and increase, respectively, from P1 to P2.Furthermore, the variances in SPEI (box lengths in Fig. 5) are relatively small in P1 compared with those in P2 in EA and WA, whereas no noticeable differences in the variances are observed in EU and the US.This result may be attributed to the lack of ground-based observations before 1950 (i.e., most of P1).As suggested in previous studies (i.e., Becker et al., 2013;Vittal et al., 2013;Nasrollahi et al., 2015), the limited availability of data for the early 20 th century can result in underestimates of the spatial variabilities of climate variables in global datasets; in the present study, such limited data availability might have contributed to the reduced SPEI variance in P1 in EA and WA.Based on regional averages, the role of the reference period is not clear; thus, we investigated the spatial patterns of SPEI-12 hereafter.
Based on the Mann-Kendall trend test of annual SPEI-12 from 1901 to 2014, we determined the increasing (i.e., wetting), decreasing (i.e., drying) and no trend areas of the regions (Fig. 7).First, the spatial distribution of SPEI-12 trends is identical between Ref1 and Ref3, and that in Ref2 is different.Ref1 and Ref2 use different calibration datasets but are similar in using one dataset for the two estimation periods; however, Ref2 uses different calibration datasets for different estimation periods (Table 4).Therefore, SPEI-12 of Ref2 exhibits relatively smaller areas of wetting and drying trends in the first and second periods relative to those of Ref1 and Ref3.
Regarding the temporal characteristics in different regions, the following are our findings for Ref1 and Ref3.In WA, drying trends are clearly dominant.In EU, drying trends are scattered over the domain.In the US, wetting trends are scattered in the eastern region, and drying trends can be observed in the southwestern region.In EA, the drying trends are clearly in the western region.
Based on the grid-level trend analyses of precipitation, air temperature, PET and SPEI-12, we categorized each grid cell based on increasing, decreasing or neutral trends for each variable (i.e., precipitation, air temperature, PET and SPEI-12) (Fig. 8).For SPEI-12, increasing and decreasing trends represent wetting and drying trends.
We present the ratio of each case relative to the total number of cases (i.e., total number of terrestrial grid cells in all four regions).First, the SPEI-12 trends are the same between Ref1 and Ref3, as the estimation periods share one reference period in both Ref1 and Ref3, while each estimation period uses its own reference period in Ref2.
Thus, the values of SPEI-12 are different in both cases, but the trends (i.e., relative values) are the same.Second, precipitation and air temperature exhibit neutral (or no) trends (i.e., the center panel among the 3 x 3 panels in Fig. 8, indicating a presumably stationary climate), and the grid percentages of different trends in SPEI-12 vary between Ref1/Ref3 and Ref2.However, the ratio is relatively small, as most grid cells display increasing temperature and PET trends.Finally, in the case of neutral precipitation and increasing air temperature (or PET) trends (i.e., the top middle panel among the 3 x 3 panels in Fig. 8), the numbers of cells with neutral and drying SPEI-12 trends are notably different between Ref1/Ref3 and Ref2.We observed increasing temperature and thus increasing PET trends in most regions (refer to Fig. 4).This discrepancy between the reference periods might play a critical role in assessing drought status.

Frequency, severity and spatial extent of drought
In this section, we examine how the reference periods play a role in assessing the frequency, severity and spatial extent of drought using SPEI-12.The definitions of frequency, severity and spatial extent of drought used in this study are clarified in Section 2.3, and they may differ in different studies.
As explained above, a drought event occurs when the monthly SPEI-12 is estimated to be at or below -1.0 based on the drought duration-frequency relationship.For each drought event in a grid cell, the duration is how long the SPEI-12 stays at or below -1.The frequency is the ratio between the total number of drought events and the number of terrestrial grid points in each region (Fig. 9).We found that the drought events with longer durations (prolonged right tails in the plot) occur more frequently in P2 than in P1 in all regions.However, we did not find any particular differences between the three different reference periods except in WA.The drought frequencies differ among the three reference periods.The frequencies of Ref2 and Ref3 are higher than those of Ref1 in P1, and slight differences in the frequency among the three reference periods are observed throughout the 12-month duration of P2.
We examine how the severity of drought varies with the moving window size for the average monthly SPEI-12.
Fig. 10 shows the most severe SPEI-12 estimates, which are defined as the lowest values among the regional monthly averages of SPEI-12 in the moving windows from 1 month to 12 months.In EU and the US, we found no large differences between the SPEI-12s for Ref1, Ref2 and Ref3 in the same period.In these regions, the most severe SPEI-12s in P1 are higher than those in P2.Such findings are seemingly inconsistent with the recently observed severe drought events in the US and EU, but they are reasonable because we examined the regionally averaged indices and not the local extremes of SPEIs.Additionally, the results are consistent with Fig. 3c.In the US (the third row of Fig. 3c), the increase in precipitation is higher than that in PET, which increases D (Eq.1).In EU (the second row of Fig. 3c), the increase in PET is higher than that in precipitation; thus, D decreases on average.However, at the lower extreme of D in this case (i.e., the lower extent of the vertical line in the box plot of D in Fig. 3c), a slight increase is apparent, indicating that the most severe drought events are less severe in P2 than in P1.In examining the spatial maps of the severest cases (not shown), we found that the severest drought event in P1 is more widespread than that in P2 in this case.Such widespread drought might be due to the sparse network of meteorological stations during the early 20 th century, a possibility that awaits further study.
In EA and WA, different patterns can be observed for the most severe SPEI-12 values.The annual precipitation and air temperature (and thus PET) exhibit regionally scattered decreases and widespread increases, respectively (Fig. 4).Consequently, the droughts in 1958-2014 are more severe than those in P1.Furthermore, the severities vary significantly with the calibration period in EA and WA, where the changes in precipitation and air temperature between the two periods are considerable.
The spatial extents of droughts for annual SPEI-12 £ -1.0 are examined by sorting the results in ascending order (Fig. 11).We counted the numbers of grid points with SPEI-12 values less than -1.0 in each period (i.e., P1 and P2) and divide them by the number of terrestrial grid cells in the region to derive the spatial extent, i.e., the grid percentage of droughts.Then, the annual time series of the spatial extent are sorted in ascending order.No specific patterns are evident in EU and the US.In EA and WA, the spatial extents are generally broader in P2 than in P1.
In particular, the spatial extents in 1958-2014 clearly diverge based on the different calibration periods, suggesting the importance of the calibration method (i.e., reference periods in assessing the droughts in a region).
To understand how the drought characteristics would change if the reference period is dry or wet, we compared the spatial extent of drought (%) for dry and wet cases in EA, EU, the US and WA.We defined dry and wet cases based on the water surplus or deficit D (Eq. 1).Then, we compared D values between the reference period and estimation period.A value of D in the estimation period less than that in the reference period represents a dry case, i.e., the estimation period is drier than the reference period.We performed such analyses only in Ref1 for the estimation periods of 1901-1957 (P1) and 1958-2014 (P2) and a reference/calibration period from 1901-2014 (P1+P2).For dry and wet cases, we quantified the spatial extent (%) according to the three different drought levels (D1, D2 and D3, which denote the cases of SPEI < -1.0, SPEI < -2.0 and SPEI < -3.0, respectively) in the four regions.
As presented in Table 5, the average D in P1 or P2 (estimation period) is smaller than that in P1+P2 (reference period), and it is considered to the dry case.For example, in EA, the D values in P2 and P1+P2 are -4.89mm/month and -5.07 mm/month, respectively; thus, it is a dry case.Then, for each case, the spatial extent of drought, i.e., the number of drought grid cells relative to the total number of terrestrial grid cells, is analyzed, as shown in Fig. 12.
The spatial extent of drought tends to be larger in dry cases than in wet cases in most regions, particularly in WA.
However, we also noted that there are a few exceptions, which may be attributed to the fact that we used regionally averaged values of D. Thus, we cannot consider the grid-level variability in D values.

Case studies using historical drought events
SPEI-12s with different reference periods are evaluated for historical drought events selected in each region to investigate how different reference periods influence the drought assessments of historical events.One drought event is chosen for each region as follows: 1) in EA, droughts that occurred in northern China in 2001 are chosen, and these events caused economic losses of USD 1.52 billion (Zhang and Zhou, 2015); 2) in EU, we chose a 2003 drought that was caused by the European heat wave and spread over the majority of Europe (Stagge et al., 2013;Spinoni et al., 2015); 3) in the US, we chose 2012 as the period of study as drought in that year was the most extensive drought over half of the US since the 1930s, and it caused economic losses of USD 31.2 billion (Smith and Katz, 2013;National Climate Data Center, 2015); and 4) in WA, the drought in 1984 was chosen because it was one of the most severe droughts that has occurred in Sahel countries (Gommes and Petrassi, 1994;Rojas et al., 2011;Masih et al., 2014).
By estimating SPEI-12 for a chosen year in each region, we can compare the magnitudes of SPEI values (Figs. 13,14,15 and 16).Here, the annual SPEI-12 values based on monthly climate data from January to December in each year are first calculated.Then, the SPEI-12 values for a chosen year are examined in detail.All SPEI-12 values in different reference periods reflect the drought status because we chose specific years with drought events.In general, all cases reveal that the SPEI-12 estimates in Ref2 are relatively high (i.e., wet), and those in Ref3 are relatively low (i.e., dry) in EA and WA, where drying temporal trends are clear.In particular, several extreme values (i.e., out of the scale range in Figs.13-16) of SPEI-12 in Ref3 cases highlight the importance of the reference period.If a reference period is based on a certain time (P1 in this study, i.e., Ref3), the drought events in the estimation period may be beyond the range in which the distribution is calibrated for the index.Essentially, for Ref3, it is assumed that not only the stationarity of the climate but also that the entire probability distribution of droughts is sampled in this period.
Furthermore, the percentage of the spatial extent of drought, i.e., the number of drought grid points relative to the total number of grid points, is assessed for different drought thresholds (Table 6).In most cases, the spatial extents of drought with the SPEI less than a certain threshold, such as -1, -2 or -3 (i.e., D1, D2 and D3 as in Table 1) are the greatest in Ref3 among the three cases with different reference periods.These results and the spatial extents are consistent with the SPEI-12 results estimated above.In addition, higher percentages of severe droughts events, which are defined based on low thresholds, such as SPEI-12 values less than -2 or -3, were observed in Ref3 than in Ref1 and Ref2 in all regions of EA, EU, the US and WA.

Conclusions
This study seeks to understand how a different reference period (i.e., calibration period) of climate data for estimating the drought index can influence regional drought assessment.Specifically, we investigated the influences of different reference periods on historical drought characteristics such as trends, frequency, intensity and spatial extents using SPEI-12 and the CRU and UDEL datasets.For the 1901-1957 (P1) and 1958-2014 (P2) estimation periods, three different types of reference periods are used.In the first case, data from 1901 to 2014 (P1+P2) are used for both estimation periods.In the second case, data from P1 and P2 are used separately for the estimation periods of P1 and P2, respectively (self-calibrated).In the final case, data from P1  are used for both estimation periods.
Focusing on the EA, EU, US and WA regions, we found that the influence of the reference period is significant in regions with dominant drying trends from P1 to P2, such as EA and WA.Additionally, the results suggest that it is necessary to quantify the trends of climate variables such as precipitation and air temperature as the first step in selecting a reference period.Our results also show that the reference period influences the assessment of drought characteristics, particularly the severity and spatial extent, based on the two datasets; however, their influence on the frequency is relatively small.Finally, we found that the use of the calibrated distribution with the past observations (i.e., Ref3) tends to overestimate the drought severity and spatial extent relative to the other approaches used in this study.However, we noted that these results are drawn from only three sets of reference periods, two different datasets (i.e., CRU and UDEL) and four regional examples.Therefore, future work should evaluate different combinations of reference periods with increased sample sizes and different datasets.
This study highlights the need for clarifying the reference period in drought assessments to better understand regional drought characteristics and their temporal changes, particularly under climate change scenarios.This study, which was based on historical data, may yield different results at the local scale, and similar studies based on historical data or climate change scenarios in different regions would undoubtedly strengthen our findings.In the present study, we focused on the temporal aspects of calibration data (i.e., calibration period).As briefly mentioned in the Section 1, using data from a particular station or grid to obtain averaged data for calibration could permit a meaningful comparison of drought indexes at different locations.In conjunction with temporal considerations, spatial issues should be addressed in future studies.Furthermore, we noted that the Thornthwaite approach, in which air temperature is a main controlling factor of PET, is used to estimate SPEI in this study; however, other approaches such as the Penman method could be used to consider changes in other meteorological variables, such as wind, atmospheric humidity and radiation.McVicar et al. (2012) have suggested that there may be limited effects of temperature increase on drought through increased PET because other meteorological conditions affecting PET may compensate for the temperature increase.-16 - -17 - -18 -  -25 - -26 - -27 - -28 - -29 - -30 -

Type Estimation Period Calibration Period
Table 3. Mean and standard deviation (STD) of precipitation and air temperature for each period and each region.1901-1957 1958-2014 1901-1957 1958-

Figure 1 .Figure 2 .
Figure 1.Study area, including East Asia (EA), Europe (EU), the United States (US) and West Africa (WA), and elevation (m above sea level (a.s.l.)).The dashed blue box represents the boundary for each study region.

Figure 3 .
Figure 3. Temporal variations in annual precipitation (mm), PET (mm) and surplus or deficit (D) (mm) based on two datasets (CRU and UDEL) and periods (1901-1957 and 1958-2014).In the box plots, the center line represents the median value; the top and bottom of each box represent the 25 th and 75 th percentile of the data, respectively; and the dots represent outliers.

(a- 1 )Figure 4 .
Figure 4. Trends in annual precipitation, annual averaged temperature and annual PET based on the CRU and UDEL datasets.The colored regions correspond to regions of IN, N and DE, which indicate statistically positive (increasing) trend, no trend and negative (decreasing) trend, respectively, with a significance level of 5%.

Figure 5 .
Figure 5. L-moment ratio diagrams for D in Eq. (1) with a 12-month timescale based on CRU for each region for 1901-2014 and 1901-1957.

Figure 6 .Figure 8 .Figure 9 .Figure 10 .Figure 11 .
Figure 6.Temporal variations in SPEI-12 for three different reference periods (Ref1, Ref2 and Ref3 in Table 2) based on the CRU and UDEL datasets from 1901-1957 and 1958-2014.In the box plots, the center line represents the median value; the top and bottom of each box represent the 25 th and 75 th percentile of the data, respectively; and the dots

Figure 12 .
Figure 12.Monthly average D (mm) in Eq. (1) and average drought area (%) based on the CRU and UDEL datasets for each region of EA, EU, US and WA for the Ref1 condition.In (a), ALL denotes the period of 1901-2014.In (b), Dry denotes that the monthly average D in the assessment period is less than that in the reference period, and Wet denotes that the monthly average D in the assessment period is greater than that in the reference period based on the Ref1 condition.

Figure 13 .
Figure 13.Spatial distribution of SPEI-12 for three different reference periods (Ref1, Ref2 and Ref3 in Table 2) based on the (a) CRU and (b) UDEL datasets in East Asia in 2000 as an example.

Figure 14 .
Figure 14.Spatial distribution of SPEI-12 for three different reference periods (Ref1, Ref2 and Ref3 in Table 2) based on the (a) CRU and (b) UDEL datasets in Europe in 2003 as an example.

Figure 15 .
Figure 15.Spatial distribution of SPEI-12 for three different reference periods (Ref1, Ref2 and Ref3 in Table 2) based on the (a) CRU and (b) UDEL datasets in the United States in 2012 as an example.

Figure 16 .
Figure 16.Spatial distribution of SPEI-12 for three different reference periods (Ref1, Ref2 and Ref3 in Table 2) based on the (a) CRU and (b) UDEL datasets in West Africa in 1984 as an example.