Monitoring and quantifying future climate projections of dryness and wetness extremes : SPI bias

Section 3 (general comment). The authors impression was that the approach, of calculating the SPI transformation separetely for each mon th, could be regarded as 20 a quasi standard due to the large amount of published materia l using the same procedure. This seems not generally be accepted. There are howe ver, s veral reasons for prefering a seperate calculation in context of the prese nted manuscript. First of all, it ensures the comparability with other analys is (Edwards and McKee (1997); Guttman (1999); Lloyd-Hughes and Saunders (200 2); Bordi and Sutera 25 (2001)), which also include partly a comparision of distrib ut on functions. From a statistical point of view, estimating the yearly distribu t on violates the independence assumption, because of the pronounced seasonal cycle present in most regions worldwide. In addition, it is questionable whether mo nthly precipitation can be regarded as identical distributed or even unimodal on yea rly b sis. These prob30 lems are reduced and partly avoided by estimating the distri bution separately for each month. Applying the SPI calculation separetely for each month lead s to consistent interpretations of the SPI values. That is, the resulting SPI clas sification is consistent, in terms of probability and SPI value, in different climate r egimes and different 35 seasons. The consequence is that the same precipitation amo unt will be classified differently in differing seasons. This however, is a pla usible interpretation, because a drought condition in a rainy season shows other cha racteristics (precipitation amounts) than in a dry season. In our opinion, these are the main reasons for calculating th e SPI for each calen40


Introduction
The Standardised Precipitation Index (SPI) is widely applied to characterise extreme dryness or wetness.An increasing number of publications uses the SPI to diagnose observed precipitation deficits or excesses and analyse its variability (for example: Vicente-Serrano, 2006;Lopez-Moreno and Vicente-Serrano, 2008;Mo and Schemm, 2008;Bordi et al., 2009;Bothe et al., 2010;Santos et al., 2010;Zhu et al., 2011).The SPI is further applied as a monitoring tool, to provide the actual state of meteorological, agricultural and hydrological conditions of drought and wetness (US Drought Monitor1 ).The World Meteorological Organisation (WMO press release No. 872, December 2009) as well as the "Lincoln declaration on drought indices" (Hayes et al., 2011) recommend the SPI to all meteorological and hydrological services for characterising meteorological droughts.Recent applications use the SPI for diagnosing future drought occurrences in climate change scenarios (Sienz et al., 2007;Burke and Brown, 2008;Heinrich and Gobiet, 2011).
One reason for the wide application of the SPI is its simplicity compared to other drought indicators, such as the Palmer drought severity index (PDSI; Palmer, 1965).Only precipitation is needed as input quantity, contrary to the PDSI, where in addition temperature and local available water content of soil are required.Contrary to other precipitation-based indices, such as precipitation deciles (Gibbs and Maher, 1967) or the rainfall anomaly index (Rooy, 1965), the SPI benefits from its unique description in different seasons or climate regions.Regarded as an easy-touse measure, the SPI has its restrictions concerning the sample size and in arid environments.Wu et al. (2005) present a critical assessment of sample size effects.Furthermore, months without precipitation create a lower bound in the SPI (Wu et al., 2007).This leads to problems for drought indication, because for sufficiently high lower bounds, extreme dryness is not observable.
For the SPI calculation the probability distribution of precipitation is of relevance.This has been analysed by Guttman F. Sienz et al.: Dryness and wetness extremes: SPI bias (1999), who concluded that "the SPI should not be used widely until a single probability distribution is accepted as a standard".Guttman (1999) compared different distributions with a regional drought model and proposed the gamma distribution (GD) as standard.The GD is now widely applied in hydrological and climatological science.However, several authors pointed out that the GD can lead to problems and does not fulfil goodness of fit criteria (Lloyd-Hughes and Saunders, 2002;Sienz et al., 2007).
Beside the usage of the precipitation distribution for SPI calculation, the distribution itself is of interest, and there is a long history of applying and comparing different kinds of distribution functions (Mielke and Johnson, 1974;Groisman et al., 1999).The knowledge about the underlying distribution is of importance, as therefrom probabilistic properties of precipitation can be derived.Therefore, the analysis of dryness and wetness could be done comparably in terms of the estimated distributions.However, some additional effort is needed because of the missing standardisation.It is the standardisation which makes the SPI the preferred method of analysis where relative deviations from a climatological mean state are of interest or where normality is required for further analysis.On the other hand, the SPI is meaningless in applications where direct precipitation properties should be described.Here, the distribution itself gains in relevance, for example, for precipitation climatology or for climate model validation.
In this article we reconsider the GD as the standard distribution for precipitation.Monthly precipitation is of main interest here.This is motivated by the expert recommendation for applying the SPI as standard index worldwide on short time scales (Hayes et al., 2011).However, precipitation sums related to longer SPI time scales are discussed as well.We find that the GD describes precipitation erroneously in many parts of the world.The implications are a biased description of precipitation and the derived SPI.Mainly the extremes are affected, leading to overestimation or underestimation of extreme dryness or wetness, respectively.This error translates into SPI applications.For example, in the case of SPI-based drought warning systems, the consequences can be too many false alerts, and in climate projection studies extreme dryness is detected too frequently.
A comparative method is used to demonstrate that SPI biases are caused by incorrect distributional assumptions.Four other distributions are applied: the Weibull (WD), Burr Type III (BD), exponentiated Weibull (EWD) and the generalised gamma distribution (GGD).Distributions are compared employing Akaike's information criterion (AIC), which quantifies the information gain or loss by the chosen statistical models.The appropriateness of the AIC is supported by a simulation study.In addition, individual SPIs for each distribution are calculated and their deviations from the expected SPI classes are compared.
For our analysis we use data sets ranging from an observed individual time series up to precipitation fields simulated

Standardised Precipitation Index (SPI)
The SPI was introduced by McKee et al. (1993) to classify and monitor dryness and wetness.The calculation of the SPI is based on an equal probability transformation: monthly precipitation is transformed to a standard normal distribution to yield SPI values by preserving probabilities.Standardisation ensures that the SPI gives a uniform measure in different climate regimes or under seasonal dependence.The SPI definition is given in Table 1.The SPI can be constructed for time scales ranging from months to years and enables the description of meteorological, agricultural and hydrological droughts.
The SPI calculation is applied separately for each month.This procedure ensures seasonal independence, contrary to a yearly distribution estimation, and leads to a consistent SPI classification not only in different climate regimes but also in differing seasons.The following steps are required (Fig. 1; according to McKee et al., 1993, andEdwards andMcKee, 1997; see also Bordi and Sutera, 2001): 1. distribution estimation: F (x; λ), with the vector of estimated parameters, λ; 2. probability calculation for each precipitation event: 3. calculation of associated standard normal quantiles, quantile function: Q(p) = −1 (p; 0, 1), (x; 0, 1) is the standard normal distribution, with mean µ = 0 and standard deviation σ = 1.
The first point ensures that the resulting SPI achieves the desired properties.Deviations from standard normal properties occur due to problems in the estimation procedure or, even more important, due to the wrong distribution assumption.This is demonstrated in Fig. 1: Contrary to the estimate for the artificial data (blue line), the distribution given by the red line leads to overall too low probabilities.Consequently the single SPI values are too small and the resulting SPI distribution is shifted to lower values.Thus, extreme dryness (wetness) is overestimated (underestimated).These deviations from standard normality can indicate that the selected distributional type is misleading and are one criterion used in the following for distribution comparison.
In the following a threshold of 0.035 mm month −1 is used to separate months with and without precipitation in the climate model.This prevents that numerical noise present influences the analysis.Distributions are calculated if at least 50 values remain.Linear regression is applied to check the observed precipitation time series for existing trends.Trends are removed to ensure the stationarity for distribution estimation.The subsequent SPI transformation, however, is performed with the original data.In this way the resulting SPI series preserves present trends.

Distributions
The monthly precipitation sums are described by skewed distribution functions, defined on the positive real axis.All distribution functions consist of scale (σ ) and shape parameters (γ ).The three parameter distributions include an additional shape parameter (α).The lower dimensional distributions are partly subsets of the higher dimensional ones.The probability density functions are: i.The gamma distribution (GD): is the gamma function.The GD and its location parameter extension, the Pearson Type III distribution, are recommended for SPI calculation (Guttman, 1999).Note that applying a location parameter does not alter the presented results.
ii.The Weibull distribution (WD), with the same number of parameters as the GD, is given by: The WD is widely used for the analysis of wind speed, but rarely for precipitation.An exception is Reeve (1996), applying WD for Indian rainfall.
iii.The Burr Type III distribution (BD) extends the parameter space by an additional shape parameter (α): The BD extends the flexibility of the GD in terms of kurtosis and skewness (Rodriguez, 1977;Tadikamalla, 1980).An early precipitation application is the study of Mielke and Johnson (1974), using a Beta distribution, which is associated with the BD by a parameter transformation.Note that the BD is a special case of the Kappa IV distribution (Hosking, 1994).The Kappa distribution was applied for SPI comparison (Guttman, 1999) and heavy precipitation events (Kysely and Picek, 2007).
iv.The exponentiated Weibull distribution (EWD) is also a three parameter distribution: The EWD extends the WD by a factor including a stretched exponential term and a shape parameter α.For α = 1 the WD is obtained.
v. The generalised gamma distribution (GGD): This version includes as special cases the gamma distribution (for α = 1) and the Weibull distribution (for γ = 1).
The parameters of the distributions are estimated by the Maximum Likelihood method.This is a versatile approach and applicable for all analysed distributions.The maximised F. Sienz et al.: Dryness and wetness extremes: SPI bias likelihood is further the basis for Akaike's information criterion (AIC).The optimization is performed by a quasi-Newton method and checked for convergence.In the subsequent analysis cases are omitted when convergence is not achieved.The number is below 1 % (4 %) of all grid points and months in the CRU (ECHAM5) data set.

Akaike's information criterion (AIC)
The Akaike information criterion (AIC; Akaike, 1974)  with maximised likelihood (L( θ|y)), estimated parameters ( θ) dependent on the data (y) and the number of parameters (k).The term 2k corrects the maximum likelihood bias as estimator for the Kullback-Leibler information and is interpreted as penalty term for higher model dimension.A modified penalty term, changed from 2k to (2kn)/(n − k − 1), improves AIC calculation for small sample sizes (n; Burnham and Anderson, 2002) and is applied in the following.Note that the modified version approaches the standard one for large n.Within a set of models (indexed i) and corresponding AIC values (AIC i ), the best model attains a minimum AIC value (AIC min ).Calculating AIC differences (AICD) AICD i = AIC i − AIC min (7) eases comparison and ranking of the models, because absolute AIC values are of minor importance in contrast to the relative differences between them.With this definition the best model achieves AICD = 0.If several models achieve sufficiently small AICD, a decision for the AIC best model is hampered.The reason is that the sample size is too small and it is likely that the AIC min model will change from sample to sample.This behaviour is analogous to classical tests.Burnham and Anderson (2002) give guidelines for the interpretation of AICD, which are reproduced in Table 2.

Data
The data sets analysed in the following range from a single observed time series to global precipitation data produced by a climate model.
-England and Wales precipitation time series: one of the longest observed precipitation time series, starting in the year 1766 and reaching up to the present day (Alexander and Jones, 2001).Here, the years up to 2007 are used.
-Observed high-resolution precipitation: the Climatic Research Unit (CRU) data set covers the global land areas in 0.5 degree resolution and the time period from 1901 to 2002.The data set is tested for inhomogeneities, and details on the homogenization procedure are given in Mitchell and Jones (2005).The analysis is restricted to grid points where at least one station is present over the whole time period.This avoids problems arising from the interpolation scheme, filling observational gaps in time and space.Under this restriction two larger regions, Europe (EU; 11 -Precipitation from a coupled atmosphere-ocean climate model: simulated precipitation in T63 spectral resolution (about 2.8 • ) from the coupled climate model ECHAM5/MPI-OM (Roeckner et al., 2003;Marsland et al., 2003).A pre-industrial control experiment is used with constant greenhouse gas concentrations as observed in 1860 with an integration time of 500 yr.The simulation of the 20th century (1860 to 2000; 20C), the scenario run A1B (2001 to 2100; A1B) and the adjacent stabilisation run (100 yr; A1BS) are analysed for climate change assessment.Again, Europe and the contiguous United States are investigated, but further all land and ocean grid points are considered from 60 • S-85 • N.

Precipitation distributions and SPI
Observed and simulated precipitation data sets are investigated.The observed data sets (Sects.3.1 and 3.2) provide case studies related to drought monitoring, whereas climate model precipitation (Sect.3.3) provides an example for drought projection studies.Monthly precipitation is considered throughout, and results for longer SPI time scales are additionally presented in Sect.3.4.In the following the AIC best distributions are determined for each month separately.This is analogous to the SPI calculation.Resulting SPI time series are compared with respect to probability deviations from defined SPI classes.

England and Wales precipitation
The England and Wales precipitation data set consists of a single time series.This eases analysis and enables the visualisation of the results on a monthly basis, contrary to the later sections where gridded precipitation fields are of interest.

Distributions:
The GD reaches the AIC minimum only in November (Fig. 2).In all other months the AICD are greater than 2 and even exceed 7 in the majority of cases.The WD approaches most frequently the AIC minimum (9 months) and AICD smaller than 2 for the rest of the months, with the exception of November.Although the higher dimensional BD partly reaches smaller AICD than the GD, the information gain by the additional parameter is low compared to the WD.Small AICD are expected for the EWD and GGD, because they include the WD as special case (Sect.2.2).But the AIC penalises the additional shape parameter, so that AICD around 2 are obtained in the majority of months.Hence the GD fails to adequately represent England and Wales monthly precipitation.With the restriction of November, the WD outperforms the GD.A comparable information gain is given by EWD and GGD, while BD demonstrates that higher dimensional distributions do not necessarily improve the results.If only a single distribution is to be used, the WD is preferred due to the lower number of distributional parameters.
SPI: The impact of the selected distribution on the SPI time series is analysed.The SPI calculation is based on either the GD or the WD, and their adequacy for the description of precipitation and its extremes is investigated.The resulting SPI series are expected to be standard normally distributed, with SPI wet (dry) extremes exceeding 2 (falling below −2).
Even by visual inspection of the SPI time series, a shift to lower values is obvious for GD-transformed precipitation (Fig. 3a).Contrary to the expected number of extremes with probability of 2.3 % (Table 1), extremely dry conditions occur more frequently (3.41 %) than extremely wet conditions (0.96 %).The WD-transformed SPI is evenly scattered, approaching adequate equal probabilities of 2.31 % (2.1 %) for extreme wet (dry) conditions (Fig. 3b).A summarised presentation highlights the tail deviations.For this purpose the differences from the expected probabilities of the SPI classes (Table 1) are calculated.The SPI based on the GD yields the largest deviations compared to all alternative distributions (Fig. 4a).Extreme dryness (wetness) is clearly overestimated (underestimated) and detected around 40 % (60 %) more (less) frequently.All other distributions reduce this bias.The smallest deviations are achieved with the Weibull type distributions (WD and EWD) and the GGD.
The SPI quantile-quantile plots are calculated to associate SPI deviations to goodness of fit.Empirical SPI values are obtained by utilising empirical probabilities for the probability transformation.The standardisation enables the presentation of all months in a single plot.The GD underestimates the SPI in the tails (Fig. 5a).Most values drop below the straight line and are partly located outside the confidence bounds (confidence intervals are calculated from 1000 samples of standard normally distributed data).Further, the GD shows a slight tendency to overestimate the SPI at the centre.In contrast, WD-transformed SPI values are equally scattered around the straight line, with almost all values inside the confidence bounds (Fig. 5b).The quantile-quantile plots demonstrate that the differing SPI time series (Fig. 3) are not related to random variability.In fact, choosing the GD for SPI calculation leads to systematic deviations, most pronounced in the tails, leading to overestimation or underestimation of extreme dryness or wetness, respectively.

Observed high resolution precipitation data set
Observed precipitation is investigated grid point-wise in the high resolution CRU data set.The main interest is in the European region, however a comparison to the contiguous United States is included.This analysis clarifies whether previously presented results for England and Wales precipitation are specific to this data set.
Distributions: The selected distributions are estimated and compared for Europe.Therefore a summarised presentation is used to combine calculated AICD into a single figure, which displays the number of times a distribution reaches values equal or below a given AICD in percent of all grid points and months.In this way the percentages for AICD = 0 achieved by the single distributions add up to 100 %.Further, the cumulative way of construction leads to increasing curves for increasing AICD.Distribution functions with good data support should show a rapid increase and approach 100 % quickly, preferably before AICD reaches 4. Higher AICD indicate considerably lower model support.The GD is at first compared to every other distribution separately, followed by an overall comparison.This stepwise approach enables the identification of potential alternative distributions in view of the circumstances that the distribution functions have similar properties and are partly nested.A reference data set is created additionally, consisting of values simulated with the previously estimated parameters of the GD and a sample size equivalent to the observed one.These data show the outcome for one realisation of true gamma-distributed precipitation and serve as guideline.
Beginning with the reference data set, the GD (Fig. 6, black dashed-dotted lines) gives most frequently the AIC best model in comparison with all alternatives (WD, BD, EWD and GGD; red dashed-dotted lines).The AICD rates start around 80 % and higher, approaching 100 % quickly.The frequencies for the alternatives show different behaviour reflecting their ability to reproduce the GD properties.The BD shows lowest rates, exceeded by the WD, while EWD and GGD rates approach 100 % for small AICD.The GGD result is primarily attributed to the property that the GD is a special type of the GGD.Therefore the differences in AIC are mainly a result of the additional parameter.The GD however is not a subset of EWD, but the similar behaviour of EWD and GGD demonstrates the EWD potential to reproduce GD characteristics.The AICD frequencies of CRU precipitation differ considerably from the reference data set.The GD rates are lowered, whereas rates of all alternatives are increased (Fig. 6, continuous lines).The BD achieves the smallest but still remarkable increase.The WD, EWD and GGD exceed the related GD frequencies for Europe (GGD at least for AICD > 1).Therefore, each of the three distributions is superior to the GD in its ability to describe European monthly precipitation.However, neither the GD nor the WD achieves frequencies of 100 % for sufficiently low AICD.This demonstrates that neither of the two parameter distributions is able to represent European precipitation completely.Note that the AICD show coherent spatial patterns in the individual months, but with regional and seasonal differences (Fig. S5).Below, an overall comparison guides the decision if one or both distributions can be excluded.

Hydrol
The AICD frequency comparison using all distributions negates this (Fig. 7).Incorporating BD and EWD in the AICD calculation, resulting GD and WD AICD frequencies are just slightly lowered compared to Fig. 6a.The discussion is restricted to the EWD for simplicity, as the outcome is similar if the GGD is used instead of the EWD.If both are included, EWD and GGD compete against each other, demonstrated by similar AICD frequencies and reduced frequencies at AICD = 0 (Fig. S1).The BD frequencies are strongly decreased in comparison to Fig. 6b, pointing to minor importance of this distribution for European precipitation.The EWD frequencies yield a similar result for small AICD, but the frequencies increase fast and exceed the GD and WD rates towards higher AICD.This is a result of the additional parameter penalised by AIC for months where GD and WD achieve small AICD (note that the AIC comparison for the simulated data set using all distributions is given in Fig. S3a for completeness.).
These results are not restricted to the European region.The outcome is similar for the contiguous United States (US; Figs.7b and S1b).The agreement is largest for the BD and EWD frequencies, with just a small offset compared to Fig. 7a.Notable differences are the increased (reduced) percentages for the GD (WD), leading to almost equal frequencies.
As the GD and WD are not able to describe EU and US precipitation sufficiently for all grid points and months, a single two parameter distribution is not recommended for SPI calculation.On the other hand, both cannot be excluded as they still yield high percentages.It follows that a mixture of the GD and WD is a possible solution, given that a complete coverage is achieved for small AICD.The higher dimensional distributions (EWD and GGD) are another suitable choice.Here one has the advantage of using only a single distribution, although at the expense of increasing variance due to the bias and variance tradeoff.
SPI: Analogous to the previous Sect.3.1 (Fig. 4), deviations from the expected SPI occurrences are shown, including all grid points and months.Note that the SPI biases may cancel each other out due to the large number of distributions involved and that the inclusion of months with small AICD reduces the overall SPI differences.Nonetheless large deviations occur for the GD-transformed SPI, mainly in the tails (Fig. 8a).On average more than 30 % too many (few) extreme drought (wet) events are detected in the considered time period and region.The BD underestimates both extremes (Fig. 8c), whereas the WD and, even more, the EWD and GGD reduce the SPI differences (Fig. 8b, d and  e).Selecting only SPI values for which the transforming distributions yield AICD ≤ 2, the differences for the GD, WD and BD are reduced (Fig. 8, red lines).This criterion applied to the EWD and GGD achieves no further improvement, consistent with the fast approach of 100 % coverage in Fig. 7a.

Precipitation from a coupled atmosphere-ocean climate model
For the SPI analysis of future drought occurrences in climate projections, it is essential that the SPI of a reference climate state is determined.This might be either the present or a climate undisturbed by anthropogenic greenhouse gas emissions.Therefore, precipitation distributions and the derived SPI are evaluated in a pre-industrial control simulation (ECHAM5/MPI-OM), with an integration time of 500 yr and constant greenhouse gas levels fixed at values for the year 1860.
Distributions: The distributions are compared first for the reference data set of true gamma-distributed values representing European precipitation (Fig. 9, dashed-dotted lines and in the Supplement Fig. S3b).In comparison to the CRU data set (Fig. 6), the GD yields higher frequencies, whereas frequencies of the alternatives (WD and BD) are reduced.This difference is due to the larger sample size, helping to distinguish between the distributions.The sample size is of minor importance for EWD and GGD.Here, the additional parameter dominates the achieved frequencies, which are similar to the ones in the CRU data set (Figs. 6 and 9c and d).
The GD frequencies are strongly reduced for ECHAM5 precipitation (Fig. 9, continuous lines), and each of the alternatives outperforms the GD in terms of AIC.Depending on the alternative distribution selected, the GD is not supported according to AIC for 60 % (BD) of all grid points and months or even higher (WD, EWD and GGD).The marginal increase of the frequencies in Fig. 9a and b, remaining below 100 % for high AICD, demonstrates that neither the WD nor the BD alone is able to cover European precipitation completely.This is in contrast to the EWD and GGD, yielding AICD < 3 in all cases.
A comparison for all distributions follows: The EWD and GGD are found to be exchangeable again, as for the CRU data set (Fig. S2).Therefore, the GGD is omitted for the AICD calculation below.The GD yields the lowest frequencies, which are even exceeded by the BD (Fig. 10a and b).The WD achieves high frequencies in the European region, but not for the contiguous United States.The EWD outperforms all other distributions, even for AICD below 2, and yields the AIC best model in around 40 % of times.This, together with the minor importance of GD and WD (Fig. 10a  and b), demonstrates a reduced impact of nested distribution types.That is, for a complete coverage the higher dimensional distributions (EWD or GGD) are essential.However, and e) GGD.The percentages each distribution is approaching AICD ≤ 2 is given in parentheses.Additional, the differences for SPI time series selected according to this criterion are given (red).and e) GGD.The percentages each distribution is approaching AICD ≤ 2 is given in parentheses.Additional, the differences for SPI time series selected according to this criterion are given (red).The percentages each distribution is approaching AICD ≤ 2 is given in parentheses.Additionally, the differences for SPI time series selected according to this criterion are given (red).
the EWD frequencies do not achieve 100 % sufficiently fast (Fig. 10b and c).Given that the EWD is an extension of the WD and is able to reproduce GD characteristics, this limit is related to the BD.SPI: Based on ECHAM5 precipitation the deviations from the expected SPI probabilities are given in Fig. 11 restricted to Europe.The outcome is similar to the CRU data set (Fig. 8).That is, the largest differences occur with the GD, and they are reduced with Weibull type distributions (WD, EWD and GGD).If distributions are selected with AICD ≤ 2, the deviations are again reduced.

Distribution selection for longer SPI time scales
So far the investigation focussed on the monthly SPI.A short outlook extends the results to time scales of agricultural and hydrological droughts.AICD frequencies are shown for the CRU (Fig. 12) and ECHAM5 data set (Fig. 13) for time scales in the range from 1 to 24.The analysis is restricted to the case AICD = 0, to simplify matters.
The GD yields a rapid increase in the number of times the AIC minimum is reached, if longer SPI time scales are considered.The alternative distributions show the direct opposite.The same qualitative behaviour is present for each of the alternative distributions and for both data sets.However, differences occur in the percentages reached.For example, the GD frequencies remain below 80 % for the alternatives EWD and GGD in the ECHAM5 data set, even for the time scale of 24 months (Fig. 13).Because the GD outperforms each alternative for longer time scales, the same holds also if all distributions are compared together (Fig. S4).
The preference of the GD for longer time scales is explained by the central limit theorem.The longer time scales are constructed by averaging precipitation, so that the distribution approaches the normal distribution for increasing time scales.This is consistent with the property of the GD that for higher shape parameters normality is reached, but is in contrast to the WD because of deviations in the kurtosis (Fig. A1).The BD, EWD and GGD occupy areas in the skewness-kurtosis diagram, so that their frequency development is mainly a result of the additional parameter.

Climate change projection of SPI extremes
The role of the SPI bias is further analysed in climate change projections.Therefore, an optimised SPI is defined, minimising systematic deviations.This newly created SPI serves as reference and eases comparison to the SPI resulting from the GD (GD-SPI).

Multi-Distribution SPI (MD-SPI):
Several ways can be deduced to reduce SPI errors from the previous results.One possibility is the usage of a single, the most general, distribution, that is the distribution with highest AICD frequencies for sufficiently low AICD.Reasonable choices are the EWD or the GGD, independent of the analysed data set.Calculating the SPI with several distributions is another possibility, which is applied in the following.The AIC best distribution is selected for each grid point and month separately and is applied for SPI calculation.Compared to a single distribution approach, the additional effort needed to obtain a Multi-Distribution SPI (MD-SPI) is legitimated by improved error reduction and bias-variance adjustment.
Figure 14a to d shows differences to the expected probability in percent of extreme dryness/wetness according to the SPI definition in the control simulation (CTL).The MD-SPI yields strongly reduced errors (Fig. 14a and b) compared to the GD-SPI (Fig. 14c and d).Extreme dryness, however, is underestimated in large regions worldwide, mainly in Northern Africa, Australia and tropical areas (Fig. 14a and  c).This underestimation is due to the lower bound restriction of the SPI, as can be inferred from the green lines.The green lines enclose regions where, at least for one month, the lower bound is higher than −2, that is, extreme dryness is not observable.Note that the lower SPI bound is distribution independent and results solely from the probability of zero precipitation.Thus deviations for extreme dryness are similar in these regions for both SPIs.Outside the lower bound affected areas, the GD-SPI attributes too high (low) probabilities for the occurrences of extreme dryness (wetness) (Fig. 14c and d) in most regions worldwide.
Climate change projections: Percentage differences between projected MD-SPI extremes for the stabilisation run (A1BS) and the control run (CTL) are given in Fig. 14e and f.Large areas worldwide show an increase in dryness and wetness extremes.This increase can reach up to 200 % and more, dependent on the considered areas.Inverse relationships are notable, that is, regions with increased dryness (wetness) show also decreased wetness (dryness).Regions with an increase in both extremes, for example in South America or China, are a result of seasonally different responses (not shown).The MD-SPI and GD-SPI projections deviate from each other (Fig. 14g and h).Shown are differences in percent relative to the expected probability.The resulting patterns are similar to the ones demonstrating the GD bias in the control run (Fig. 14c and d), with most parts of the world covered by positive and negative differences.Thus, the GD not only overestimates (underestimates) extreme dryness (wetness) in the unforced climate model run, but also their potential future changes.Further, comparing the projection differences with  the ones yielded before (Fig. 14c and d) points to increased differences in the future climate.
The temporal evolution of the SPI extremes under greenhouse gas forcing are analysed next: Time series of extreme SPI occurrences, that is the number of events falling below (above) SPI ≤ −2 (SPI ≥ 2) in each year and for different regions, are created.These time series have again a probability of 2.3 % on average, according to the SPI definition.Differences to this probability in percent are shown in Fig. 15 for the model runs CTL, 20C, A1B and A1BS together.Idealised lines highlight the general behaviour (thick blue and red lines, Fig. 15).They are built of the CTL (A1BS) mean at the beginning (end) and smoothed time series between them.Grid points influenced by the SPI lower bound are excluded (compare Fig. 14, green lines).The variability of the time series is expected to be higher in smaller regions due to the lower number of grid points analysed since spatial and temporal dependencies may result in a homogeneous coverage with one type of extremes in a certain year.The time series are expected to oscillate around a zero mean in the unforced climate, well approached by the MD-SPI (Fig. 15).In contrast, the GD-SPI yields higher (lower) frequencies of extreme dryness (wetness), which is consistent with previous findings.For the future climate, all time series show increasing extreme dry and wet frequencies.The absolute value of differences between the smoothed lines highlight the discrepancy between MD-SPI and GD-SPI.The deviations range between 20 % and 50 % in CTL, depending on the considered area (Fig. 15, black lines).The MD-SPI/GD-SPI differences increase dependent on the strength of external greenhouse gas forcing (20C, A1B and A1BS).Therefore, the SPI bias resulting from the GD assumption, that is overestimation (underestimation) of extreme dryness (wetness), enlarges if projected future climate states are investigated.

Conclusions
Single location, regional and global precipitation data sets are analysed to substantiate the gamma distribution (GD) for monthly precipitation sums.The aim is to validate the adequacy of the GD as a basis for the calculation of the Standardised Precipitation Index (SPI), drought monitoring, climate model evaluation and climate change assessment.Distribution functions are compared using the Akaike information criterion (AIC), which accounts for the information gain or loss of different statistical models.The impact of different distributional types on the calculated SPI series is investigated.A comparison solely on a statistical basis is justified given that the SPI is a purely statistical measure of droughts.The main results are: -The GD fails to represent precipitation in considerable areas of global observed and simulated data, with largest deviations on short SPI time scales (SPI 1-3).
-Improvements are attained by Weibull type distributions, mainly for the European region, but also for the contiguous United States and global land areas in ECHAM5.
-The selected distribution strongly impacts the outcome of the SPI calculation.The SPI based on the GD can lead to severe overestimation (underestimation) of extreme dryness (wetness).
The SPI bias has direct implications on potential applications.The misleading detection of drought onset and intensity reduces its usefulness for drought monitoring.Furthermore the comparison between different regions and seasons is hampered if differences can occur solely due to the adequacy of the transforming distribution.Climate change projections of drought conditions can include increasing SPI errors in future climates.
Notable differences are detected between the observed (CRU) and climate model (ECHAM5) data sets, with pronounced lower GD frequencies in the model simulation.This might indicate a model bias in precipitation.Additional work is required to clarify the source of this discrepancy due to different spatial resolutions and temporal coverage.It is shown, however, that applying the GD for the SPI calculation is too restrictive under a wider range of applications, for example in multi-model comparison studies.Because precipitation calculation is a critical component in climate models, model dependent deviations from a distributional type might occur.This can hamper the drought comparison in terms of the SPI.
Two ways to overcome this problem are suggested: Firstly, in terms of bias-variance adjustment, the preferred SPI calculation should be performed stepwise with multiple distributions, that is, using lower dimensional distributions as long as they are appropriate (AICD ≤ 2) and changing to the higher dimensional ones for the remaining grid points.Secondly, if comparability and reproducibility are important, a distribution should be chosen that yields accurate estimates in almost all areas and months.For this purpose the generalised gamma (GGD) and the exponentiated Weibull distribution (EWD) are plausible candidates.This approach, however, implies the risk of overfitting.
The Empirical Cumulative Distribution Function (ECDF) is potentially an alternative way to calculate the SPI.This avoids completely the distribution hypotheses and therewith associated problems.On the other hand, the ECDF is likely to be a too coarse measure due to its discrete nature.The smoothed ECDF prevents this, and a comparison between the different approaches is of interest for future research.
Evidence is presented that the monthly precipitation distribution cannot be described on the basis of the GD alone.The source of the deviations, however, remains open to questions.Beside the possibility that the true distribution is of another type or even of an unknown type, it is conceivable (i) that the GD is just modified by instationarities of the climate system and not flexible enough to capture these deviations under the stationary assumption or (ii) that different distributions are preferred dependent on specific climate states.Examples of instationarities with known impact on precipitation are the North Atlantic Oscillation or the El Nino/Southern Oscillation.
The presented analysis framework can be useful for upcoming studies which extend the set of alternative distributions or investigate other data sets.It is of additional interest to see if distributions can be physically founded.However, despite the problems discussed here, the SPI remains a valuable method for drought analysis.Its general usefulness is not affected by the calculational details.In the nearer future the index will gain in importance because of its intended implementation as a worldwide standard.

F. Sienz et al.: Dryness and wetness extremes: SPI bias
the decision for the GD (not shown).Further, the rates are expected to drop if the set of competing distributions is enlarged with higher dimensional ones.But even then the AIC is superior to the other measures due to the penalty for the larger number of parameters.In summary the AIC is a reliable choice for the problem of distribution selection.

Fig. 1 .
Fig. 1.Conceptual diagram illustrating the SPI calculation of artificial data (black dots) with the estimated distribution (F (x; λ), blue line, left) and another distribution given by the red line.The outcome is two different SPI time series, represented through their empirical distribution functions (blue and red lines, right).

Fig. 4 .Fig. 5 .Fig. 5 .
Fig. 4. Differences to the expected probabilities of the SPI classes in percent for England Wales precip SPI time series are calculated with a) GD, b) WD, c) BD d) EWD and e) GGD.

Fig. 6 .Fig. 6 .Fig. 7 .
Fig. 6.AICD frequencies (CRU data set): percentages of all grid points and months a distribution yields AICD smaller or equal than a given value for the European region.The GD (black lines) is compared to a) WD, b) BD, c) EWD and d) GGD (red lines).Dashed-dotted lines with corresponding colors show the respective outcome for a simulated data set, representing Gamma distributed precipitation in Europe.

F.Fig. 8 .
Fig. 8. Differences to the expected probabilities of the SPI classes in percent (CRU data set), using all grid points of the European region (black).SPI time series are calculated with a) GD, b) WD, c) BD d) EWD and e) GGD.The percentages each distribution is approaching AICD ≤ 2 is given in parentheses.Additional, the differences for SPI time series selected according to this criterion are given (red).

Fig. 9 .Fig. 8 .Fig. 8 .
Fig. 9. AICD frequencies (ECHAM5 data set): percentages of all grid points and months a distribution yields AICD smaller or equal than a given value for the European region.The GD (black lines) is compared to a) WD, b) BD, c) EWD and d) GGD (red lines).Dashed-dotted lines with corresponding colors show the respective outcome for a simulated data set, representing Gamma distributed precipitation in Europe.

Fig. 9 .Fig. 9 .
Fig. 9. AICD frequencies (ECHAM5 data set): percentages of all grid points and months a distribution yields AICD smaller or equal than a given value for the European region.The GD (black lines) is compared to a) WD, b) BD, c) EWD and d) GGD (red lines).Dashed-dotted lines with corresponding colors show the respective outcome for a simulated data set, representing Gamma distributed precipitation in Europe.

Fig. 10 .Fig. 11 .
Fig. 10.AICD frequencies (ECHAM5 data set): percentages of all grid points and months a distribution yields AICD smaller or equal than a given value for a) the European region, b) the contiguous United States and c) global land areas.AIC comparison with GD, WD, BD and EWD.

Fig. 12 .Fig. 10 .Fig. 10 .Fig. 11 .
Fig. 12. AICD frequencies (CRU data set): percentages of all grid points and months a distribution yields AICD equal zero for the European region.The GD (black dots) is compared to the WD, BD, EWD and GGD (red triangles) in dependence of the SPI time scale.

Fig. 12 .Fig. 11 .
Fig. 12. AICD frequencies (CRU data set): percentages of all grid points and months a distribution yields AICD equal zero for the European region.The GD (black dots) is compared to the WD, BD, EWD and GGD (red triangles) in dependence of the SPI time scale.

F.
Sienz et al.: Dryness and wetness extremes: SPI biasand e) GGD.The percentages each distribution is approaching AICD ≤ 2 is given in parentheses.Additional, the differences for SPI time series selected according to this criterion are given (red).

Fig. 12 .Fig. 12 .Fig. 13 .
Fig. 12. AICD frequencies (CRU data set): percentages of all grid points and months a distribution yields AICD equal zero for the European region.The GD (black dots) is compared to the WD, BD, EWD and GGD (red triangles) in dependence of the SPI time scale.

Fig. 14 .Fig. 15 .
Fig. 14.Comparison of SPI dry (left column) and wet (right column) extreme occurrences in the control (CTL) and stabilisation run (A1BS).(a-d): differences to the expected probability (P in %) of the extreme SPI classes (P ≈2.3 %) for CTL based on MD-SPI (a, b) and CTL based on GD-SPI (c, d).MD-SPI probability changes in percent of extreme dry (e) and wet (f) conditions in respect to the CTL.(g, h): SPI projection differences between GD-SPI and MD-SPI.The green lines surround dry regions where the SPI lower bound is higher than −2 for at least 1 month.

Fig. A1 .
Fig. A1.Skewness-kurtosis diagram comparing the possible shapes of the GD and WD, together with the normal and exponential distribution.The black line is the limit of all distributions.

Table 1 .
Definition of the Standardised Precipitation Index (SPI) classes and corresponding event probabilities (P ).The precipitation data sets (observed and climate model output) are analysed with respect to distribution properties and SPI biases in Sect.3. Discussion and outlook close the manuscript.The appendix presents technical supplements regarding the evaluation of distribution functions.

Table 2 .
AICD and their interpretation in respect to the achieved strength of model support.