Hydrologic benchmarking of meteorological drought indices at interannual to climate change timescales: a case study over the Amazon and Mississippi river basins

. Widely used metrics of drought are still derived solely from analyses of meteorological variables such as precipitation and temperature. While drought is generally a consequence of atmospheric anomalies, the impacts to society are more directly related to hydrologic conditions. The present study uses a standardized runoff index (SRI) as a proxy for river discharge and as a benchmark for various meteorological drought indices (scPDSI, SPI, SPEI_th, and SPEI_hg respectively). Only 12-month duration droughts are considered in order to allow a direct (no river routing) comparison between meteorological anomalies and their hydrological counterpart. The analysis is conducted over the Mississippi and Amazon river basins, which provide two contrasted test beds for evaluating drought indices at both interannual (using detrended data) and climate change (using raw data) timescales. Looking ﬁrst at observations over the second half of the 20th century, the simple SPI based solely on precipitation is no less suitable than more sophisticated meteorological drought indices at detecting interannual SRI variations. Using the detrended runoff and meteorological outputs of a ﬁve-member single model ensemble of historical and 21th century climate simulations leads to the same conclusion. Looking at the 21st century projections, the response of the areal fraction in drought to global warming is shown to be strongly metric dependent and potentially overestimated by the drought indices which account for temperature variations. These results suggest that empirical meteorological drought indices should be considered with great caution in a warming climate and that more physical water balance models are needed to account for the impact of the anthropogenic radiative forcings on hydrological droughts.


Introduction
Droughts are recurrent natural manifestations of climate variability that have plagued civilizations throughout history. They are often commonly classified into three types -meteorological, agricultural and hydrological -depending on which variable -respectively precipitation, soil moisture and river flow -is below normal conditions (Dai, 2011a). Meteorological drought often precedes and causes other types of droughts. Meteorological indices are therefore used not only for monitoring drought at regional to global scales, but also for anticipating their potential impacts on agriculture and water resources.
Several empirical meteorological drought indices have been proposed and applied at regional to global scales over the second half of the 20th century (e.g. Heim Jr., 2002). Nevertheless, evidence is building that human-induced climate change is perturbing the global hydrological cycle (e.g. Trenberth, 2011), making it necessary to analyse the validity of such indices in a warmer climate. While most 21st century climate scenarios project a global increase in the frequency, intensity and duration of droughts (Sheffield and Wood, 2008;Orlowsky and Seneviratne, 2012), the response is still very uncertain at the regional scale and is not necessarily consistent from one metric to the other (e.g. Burke and Brown, 2008).
In the IPCC Fourth Assessment Report, the 20th century multi-decadal variations of drought were mainly discussed on the basis of the Palmer Drought Severity Index (PDSI, Palmer, 1965). This standardized index measures the departure of soil moisture using a simplified surface water balance model. It requires globally available precipitation (P ) and temperature data as input for the calculation of potential evapotranspiration (PET) with Thornthwaite's (1948) equation, as well as the soil water field capacity. Analysis of global PDSI maps indicates that drought has generally increased throughout the 20th century (Dai et al., 2004). The PDSI was however criticized in several respects (e.g. Guttman, 1998;Vicente-Serrano et al., 2011). The underlying water balance model is quite empirical and was tuned using a limited number of instrumented sites in the US. This limitation was addressed by the development of the "self-calibrated" scPDSI (Wells et al., 2004). The empirically derived climate parameters and duration factors of this index are automatically calculated using the historical climatic data of the selected location. The Thornthwaite's approximation for the computation of PET was also criticized and recently replaced by a more physical but still empirical Penman-Monteith approach (Van der Schrier et al., 2011;Sheffield et al., 2012). Finally, it was argued that the PDSI cannot reflect the different timescales which characterize the impact of drought on different systems, including the surface hydrology (Vicente-Serrano et al., 2010).
In contrast, the Standardized Precipitation Index (SPI) of McKee et al. (1995) is a multi-scale index, computed as a standardized transform of cumulative precipitation over a given period (ranging typically between 1 and 48 months), but does not account for possible variations in the atmospheric demand. More recently, Vicente-Serrano et al. (2010) developed the Standardized Precipitation Evapotranspiration Index (SPEI) by applying a similar transform on cumulated P minus PET. The aim was to combine the simplified water balance approach of the PDSI and the multi-scale nature of the SPI.
The superiority of the SPEI is however a matter of debate (Dai, 2011b). In spite of the criticisms of Guttman (1998) or Vicente-Serrano et al. (2010), the PDSI has been evaluated successfully at the regional or basin scale against both soil moisture and river discharge (Dai et al., 2004). Moreover, it compares relatively well with the 12-month SPEI (Vicente Serrano et al., 2011). In the recent IPCC SREX report on managing the risks of extreme events and disasters to advance climate change adaptation , the PDSI was still used as a reference drought index, but the metric sensitivity of drought projections was highlighted as well as the need for more comprehensive comparisons of the various globally available drought indices.
The aim of the present study is to use a hydrologic drought index as a benchmark for assessing the variability of several meteorological drought indices at both interannual and climate change timescales. Given the limited instrumental record, the comparison will be conducted with both observations and an ensemble of global climate simulations spanning the 1850-2100 period. The simulations will allow us to test the robustness not only of the comparison made at the interannual timescale, but also of drought projections based on different meteorological indices.
We chose two among the world's largest river basins -Amazon and Mississippi -as a test bed for our analysis. While it would be interesting to extend the study to a larger number of basins, we believe that this subset is sufficient to illustrate our main findings (which are moreover confirmed by a global comparison of meteorological vs. hydrological time series simulated at each continental grid cell). Both Amazon and Mississippi are well documented in terms of climate and river discharge observations and are not too much influenced by human activities (dams and irrigation). Both show a substantial year-to-year variability (including during the dry season) of river discharge and a potential vulnerability to climate change. Nevertheless, they show contrasted climatological features. Precipitation in the Amazon Basin has a stronger annual cycle and a larger interannual variability than in the Mississippi Basin. The opposite is true for temperature and therefore for the atmospheric water demand (PET). These features are representative of the contrast between tropical and mid-latitude areas and might have consequences on the behaviour of the analysed meteorological drought indices.
For such large river basins, meteorological droughts generally precede their hydrological counterpart by a few weeks or months. In order to guarantee the relevance of our hydrological benchmark and to avoid the use of a river routing model, the focus will be only on 12-month droughts. Shortterm droughts are therefore beyond the scope of the present study although they can be detected on a 12-month timescale if they show a sufficient magnitude and if the rest of the year is close to normal conditions. In other words, the 12month deficit can be concentrated on a particular season, but we do not distinguish between wet-season versus dry-season droughts, which might have contrasted impacts on natural ecosystems and human societies.
Section 2 describes the input data (derived from either observations or climate simulations) and the methodology used for the calculation of both meteorological drought indices and the hydrologic benchmark. Section 3 first compares the ability of the different meteorological indices to capture the interannual variability of hydrological drought, as well as their skill to detect major hydrological droughts. Indices derived from the CNRM-CM5 climate scenarios are also analysed to highlight the contrasted sensitivity of the different drought indices to climate change. Section 4 discusses the results and draws the main conclusions of the study. 2 Data sets and methodology

Observed and simulated drought indices
All meteorological drought indices (SPI, SPEI_th, SPEI_hg, cf. summary in Table 2) are derived solely from monthly precipitation and surface air temperature. As far as observations are concerned, the selected global 20th century data sets are summarized in Table 1. Model outputs (monthly precipitation and temperature, but also monthly runoff for the hydrologic benchmark) have been derived from a five-member ensemble of 1850-2100 simulations obtained with the CNRM-CM5 global climate model (Voldoire et al., 2013). Each realization is the concatenation of a historical (i.e. 1850-2005) simulation driven by observed concentrations of greenhouse gases and sulfate aerosols (as well as realistic volcanic and solar forcings) and of a 21st century (i.e. 2006-2100) projection based on the RCP8.5 concentration scenario (corresponding to a 8.5 W m −2 radiative forcing at the end of the 21st century) proposed by the phase 5 of the Coupled Model Intercomparison Project (CMIP5, http://cmip-pcmdi. llnl.gov/cmip5/).
Although the aim of the study is not to compare simulated versus observed drought indices, but meteorological indices versus the hydrologic benchmark in both model and observations, precipitation and temperature observations (see Table 1) were first interpolated onto the model horizontal grid (about 1.4 • ) to ensure the same spatial resolution for all indices. On each grid cell, the scPDSI and the 12-month SPI and SPEI (hereafter SPI12 and SPEI12 respectively) were computed following the original algorithms proposed by Wells et al. (2004), McKee et al. (1995) and Vicente-Serrano et al. (2009) respectively. Cumulated P was fitted with a gamma function, while a log-logistic function was preferred for P minus PET (Vicente-Serrano et al., 2009) for the SPEI. While the simple Thornthwaite equation was used to compute PET from temperature and latitude for SPEI (hereafter SPEI_th) and scPDSI, another empirical formulation (Hargreaves and Samani, 1982) accounting more accurately for the role of solar radiation was tested for SPEI (hereafter SPEI_hg). Unlike in Van der Schrier et al. (2011) or Sheffield et al. (2012), more sophisticated formulations such as Penman-Monteith have not been tested given the lack of reliable (satellite) global observations of solar radiation before the 1980s.  Table 4. Schematic of the 2 × 2 contingency table used to assess the ability of the meteorological indices to detect a hydrological drought event: A denotes the number of "hits", B the number of "false alarms", C the number of "misses", and D the number of "correct non-events".
Hydrological drought index: SRI12 For all indices and in order to focus on interannual and longer timescales, annual mean values have been obtained by averaging monthly indices from January to December. Finally, basin average indices have been calculated, as well as the area of the basin in drought based on a common threshold (only for the simulated indices).
It must be here emphasized that the SPI and SPEI normalization was made in each grid cell before spatial averaging. While such a choice is somewhat arbitrary, it allows us to compute the areal fraction in drought (cf. Sect. 3.2) and to have a fair comparison with the PDSI, which is by definition a distributed index given the spatial variability of the soil water capacity (which is a key input parameter used in the simplified water balance model). Therefore, we have considered all drought indices as global gridded and monthly data sets that can be averaged in both space and time.
Hydrological drought has been defined using the SRI proposed by Shukla and Wood (2008), i.e. applying the same algorithm as for SPI12 but on the 12-month cumulated runoff. Runoff has been chosen rather than river discharge given the selected timescale (no need of a river routing model) and the possibility to compute the basin-averaged index and the areal fraction in drought exactly in the same way as for the meteorological indices. While runoff is a standard output of the CNRM-CM5 climate model, there is no observational counterpart so that we have used an off-line simulation of the ISBA land surface model (included in the CNRM-CM5 model) to produce a "pseudo-observed" gridded runoff. This was done by driving the ISBA land surface model with bias-corrected atmospheric reanalyses available over the 1951-2006 period (Alkama et al., 2011). In line with the comprehensive evaluation of Alkama et al. (2011), this "pseudo-observed" SRI12 ( Fig. 1) is highly correlated with in situ river discharge observations over both Amazon and Mississippi. This result makes us relatively confident about the relevance of our hydrologic SRI benchmark, which can be used to assess the behaviour of both observed and simulated meteorological drought indices. Moreover, it also means that the off-line ISBA simulation of land surface evapotranspiration is reasonable, at least at the annual timescale. This is the reason why we will also introduce a "Standardized Precipitation Actual Evapotranspiration Index" (SPAEI) by replacing PET by actual evapotranspiration in the SPEI algorithm. Note that the aim here is not to propose an alternative meteorological drought index given the difficulty to compute actual evapotranspiration from monthly observations, but just to highlight the consequences of the PET approximation in the SPEI algorithm.

Methodology
Before using the raw time series of the projected drought indices to assess the behaviour of the meteorological indices at the climate change timescale, the first step is to evaluate their interannual variability using both observations and simulations (see Table 3 for a summary of the selected periods and methods). For this purpose, and in order to get rid of the recent warming trend in each region, all basin-averaged indices have been detrended using cubic spline functions (Whaba, 1990;Ribes et al., 2010) with 2 and 4 degrees of freedom for detrending over a 49 and 251 yr time span respectively, before computing their correlation with the SRI12 benchmark. The Clayton skill score (CSS; Wilks, 2004), based on the probability for each index to be either above or below a given percentile of the distribution, has also been used to assess the ability of different indexes to detect major hydrological droughts. Given the contingency table given in Table 4, this skill score is simply computed as the difference between two conditional probabilities: Hydrol. Earth Syst. Sci., 17, 4885-4895

CSS =
where A is the number of meteorological droughts detected by the index that correspond to hydrological droughts (number of "hits"), B the number of meteorological droughts that do not correspond to hydrological droughts (number of "false alarms"), C the number of no-drought forecasts corresponding to hydrological droughts (number of "misses"), and D the number of "correct non-events"). For a perfect detection, B = C = 0, so that CSS = 1. The CSS allows us to focus on particular drought events. Unfortunately, the relatively short river discharge time series is a strong limitation to our study that is based upon the 20th percentile of the distribution rather than on extreme events. For the observed annual mean time series, correlation and CSS have been calculated on detrended data with 2 • spline function over a 49 yr period  with available river discharge data. For the sake of comparison, similar scores have been computed over 49 yr sliding windows for each 1850-2100 CNRM-CM5 climate simulation (the 20th percentile being estimated over the same  period as in the observations). In addition, scores of simulated and detrended indices with 4 • spline function have been also estimated over the whole 251 yr integrations, using the 20th but also lower (10th and 5th) percentiles.

Evaluation of meteorological drought indices against hydrological benchmark index at interannual timescale
Besides observed and ISBA-simulated variations of annual mean discharge at Obidos (Amazon) and Vicksburg (Mississippi), Fig. 1 shows the detrended time series for the various meteorological indices, as well as the ISBA-derived SRI12 for further comparison over years without discharge observations (over the Amazon Basin). Both correlations and CSS are slightly higher over the Amazon than over the Mississippi. Such a difference could be partly related to the different seasonality of precipitation and the possible contribution of early winter snowfall to the following year annual mean runoff in the Mississippi Basin. Over the Amazon, the SPEI12_hg shows the best correlation with the SRI12 benchmark, closely followed by the SPEI12_th and SPI12. However, such differences are not significant and CSSs are the same for all three indices. Over the Mississippi, scores are also very close and longer time series would be useful to reach more robust conclusions about the relative skill of the different meteorological indices. For this purpose, correlations and CSS have also been estimated over 49 yr sliding windows from our five-member ensemble of 1850 to 2100 climate simulations, with modelderived SRI12 taken as a reference. As explained in Sect. 2.2, all time series have been here detrended with 2-degree spline functions before computing correlation and CSS. Results are summarized in box-and-whisker plots (Fig. 2). In line with observations, all model-derived meteorological indices are relatively skillful over both river basins. Ranking them is particularly difficult over the Mississippi where differences in mean scores are very small. Results are more contrasted over the Amazon where SPI and SPEI_hg outperform other indices. This suggests that the details of the index computation (SPEI_hg versus SPEI_th) are as important as the choice of the index (SPEI vs. SPI or PDSI). The apparent superiority of SPEI_hg vs. SPEI_th (obvious over the Amazon, less clear over the Mississippi) did not show up in the observations. This might be due to the intrinsic uncertainty of scores based on 49 yr time series only, as well as to possible biases of the CNRM-CM5 model (for instance a dry bias over the Amazon; Joetzjer et al., 2013), which might increase the relative contribution of PET (vs. precipitation) in the SPEI calculation.
How sensitive are our CSSs to the quantile chosen as a threshold for drought definition? Considering now moderate (q20), severe (q10) and extreme (q5) droughts over the whole 1850-2100 period (Table 5), the simple SPI is the best proxy of 12-month hydrological droughts, closely followed by the SPEI_hg. Indeed, SPEI scores improve when PET is calculated with the Hargreaves in place of the Thornthwaite equation. Note that the scPDSI and the SPEI_th that estimate both PET through Thornthwaite show very similar skill.
In summary, precipitation remains the main driver of runoff at the interannual timescale, and accounting for PET (for SPEI) or even a simplified water balance (for sc-PDSI) does not improve the detection of 12-month hydrological droughts. Taking into account PET allows the SPEI to reach the same skill as the SPI when using the Hargreaves formula. As shown in Table 5, such a conclusion is not specific to the Amazon and Mississippi river basins, but also holds when averaging scores over all land grid points in the CNRM-CM5 model. Syst. Sci., 17, 4885-4895, 2013 www.hydrol-earth-syst-sci.net/17/4885/2013/ Table 5. Correlation and CSS between various meteorological drought indices and the reference standardized runoff index (SRI12) in the CNRM-CM5 model. Scores were calculated for average and detrended indices over the Amazon and Mississippi watersheds, as well as averaged over the globe on the basis of grid-cell rather than basin-scale indices (long: −180 • E, +180 • W; lat: −60 • S, +60 • N). The CSS was calculated using the 5th, 10th and 20th percentiles of the annual drought index distribution. Mean and standard deviation (sd) based on our five-member ensemble of 1850-2100 simulations are shown. Highest (bold) and lowest (italics) mean values are also shown.

Climate change timescale
Moving to the raw model outputs, Fig. 3 shows the projection of the areal fraction of the Amazon and Mississippi basins in moderate, severe and extreme drought conditions (respectively defined under the 20th, 10th and 5th percentile estimated over the whole 1850-2100 period). Results obtained with the SRI12 benchmark are compared to the fractions derived from each meteorological index, as well as with the SPAEI to highlight the influence of the PET approximation on the simulated trends. Bold lines represent the ensemble mean value for each percentile. The envelope is defined by the minimum and maximum values among the five members for severe drought only (10th percentile), as an indication of the internal variability of the CNRM-CM5 climate model. For SRI12, CNRM-CM5 under the RCP8.5 concentration scenario does not show any trend in the areal fraction of the Amazon Basin touched by hydrological drought, while a clear increase is projected over the Mississippi Basin. This response does not agree with the contrasted long-term variations derived from the various meteorological drought indices. The SPI12 behaves as a better proxy of SRI12 than scPDSI and SPEI12 over the Amazon Basin where precipitation change seems to dominate the long-term evolution of hydrological droughts and surface warming remains of marginal control. Conversely, the SPI12 evolution is in contradiction with the SRI12 evolution over the Mississippi Basin, where increased evapotranspiration seems to exceed increased precipitation and leads to more frequent and/or extended hydrological droughts at the end of the 21st century. This result highlights the SPI limitations, where and when temperature trends become strong enough to alter evapotranspiration without or despite changes in precipitation. Nevertheless, accounting for changes in PET does not necessarily solve the problem, as emphasized by Fig. 4. Indeed, the SPEI response to global warming is strongly dependent on the PET calculation. The strong sensitivity shown by SPEI12_th over both basins suggests that Thornthwaite's formula is not adequate for climate change studies and should be at least superseded by more robust approaches (e.g. Hargreaves or Penman-Monteith). The sensitivity of the PDSI to the PET calculation is controversial. For the 20th century Van der Schrier et al. (2011) showed weak sensitivity while Sheffield et al. (2012, Supplement) attribute this apparent weak sensitivity to inconsistencies in the forcing data sets and simulation configuration. Over the 21st century, and in line with Sheffield's results, it is likely that the large increase of the areal fraction in drought obtained with this index is also due to the simplistic PET calculation in the original algorithm. Not surprisingly, the SPAEI12, accounting for actual rather than potential ET, shows more consistency with the "target" SRI12 than the other indices over both river basins. This confirms the limitation of the empirical meteorological indices for hydrological applications.

Discussion and conclusion
The present study aimed at comparing globally available empirical meteorological drought indices on one tropical (Amazon) and one mid-latitude (Mississippi) river basin, first in their skill to detect interannual variations, then in their response to anthropogenic climate change. The focus is only on 12-month droughts, and a standardized runoff index (SRI), closely related to the river discharge, is used as a hydrologic benchmark. At interannual timescales and over both basins, the simple SPI, based solely on precipitation, is no less suitable than more sophisticated empirical indices also using temperature inputs. This is true not only for observations, but also in the CNRM-CM5 climate simulations. When using the Hargreaves formula, the SPEI scores are however very close to the SPI scores. In contrast, the Thornthwaite formula systematically leads to lower scores. Such conclusions should be however tempered. First, there might be some regional heterogeneities in the ranking of the four indices given the weak spread between all indices, not only over the selected basins, but also when averaging the scores obtained over all land grid cells between 60 • S and 60 • N (cf. Table 5). Moreover, similar scores calculated on shorter timescale (3 and 6 months respectively, not shown) indices suggest a slight superiority of the SPEI_hg compared to the simple SPI.
Beyond the ability of the various meteorological indices to account for the interannual variability of annual streamflow, and in line with the conclusions of Burke and Brown (2008), Kingston et al. (2009) and Burke (2011), our study also emphasizes that drought projections are strongly indexdependent given the differing impact of temperature in their calculation. While the SPEI was recently proposed as a drought index sensitive to global warming (Vicente Serrano et al., 2010), it shows a stronger drying of the Amazon and Mississippi basins than indicated by our hydrologic benchmark. This discrepancy is less pronounced when estimating PET with Hargreaves, especially for the Mississippi, showing that precipitation is not the only driver of the long-term drought variations. Such inconsistencies can lead to differences at the end of the 21st century, and are also discernible from the end of the 20th century as demonstrated by Sheffield et al. (2012) for the PDSI.
A caveat of the present study is the fact that we have neglected potential vegetation feedbacks in our climate projections. Under a higher atmospheric CO 2 concentration, the stomatal closure for instance might alter the relationship between meteorological and hydrological droughts as the stomatal closure partly regulates water exchange within the soilplant-atmosphere continuum. The ISBA land surface model implemented in CNRM-CM5 model still uses a common Jarvis-type formulation (Jarvis, 1976) for the computation of the stomatal conductance and does not account for a direct CO 2 effect on plant transpiration. This effect is also neglected by the meteorological drought indices. This remark highlights again the fundamental limitations of such empirical indices, which can be relevant for present-day climate but less suitable for long-term projections. This caveat however does not change our main conclusion: besides the choice of a concentration scenario (here RCP8.5, i.e. the most severe scenario considered in CMIP5) and of a global climate model (here CNRM-CM5), the index definition and the associated PET calculation also represent a major source of uncertainty for drought projections (Taylor et al., 2013). Note that only one concentration scenario and one global climate model have been considered in this study, but that preliminary analyses of the different scenarios obtained with CNRM-CM5 as well as of the RCP8.5 scenario obtained with a subset of CMIP5 models suggest that CNRM-CM5 is not an outlier among the CMIP5 models and that the index definition is as important as the choice of the scenario/model as a source of uncertainty for drought projections (details are given in the Supplement).
Finally, another limitation of the present study is the arbitrary choice of the SRI benchmark. Besides runoff and river discharge, other impact-oriented benchmarks could have been proposed such as soil moisture (e.g. the SMA -soil moisture anomaly; Orlowsky and Seneviratne, 2013) or photosynthesis activity, which can be derived from satellite observations. Nevertheless, such observations only cover a few decades (only since the early 1980s) and are sometimes still difficult to interpret given the limitations of remote sensing techniques (e.g. Anderson et al., 2011). Therefore, the main alternative for drought monitoring and projections is probably the use of process-oriented land surface models, which can be either driven by observed atmospheric forcings (e.g. Sheffield and Wood, 2007) and biascorrected climate scenarios or directly coupled to global climate models (e.g. Sheffield and Wood, 2008). Given the intrinsic uncertainties related to the various physical and biological processes represented in such land surface models (e.g. Betts et al., 2007), a multi-model approach is however strongly encouraged.