Introduction
Droughts can impact many human activities and environmental processes
including agriculture, water resource management, inland water transport, energy
production and freshwater ecology . They often
spread over vast geographical regions and last for many months or even years
. The spatial extent and manifold impacts make them
one of the costliest natural disasters . Given
this situation, continuous monitoring as well as forecasting of the onset or
likely evolution of an ongoing drought over the next few weeks are important
to trigger actions for mitigating negative impacts in the mentioned fields.
To do so, decision makers and end users require simple and robust forecast
indicators which are capable of informing about the onset, possible duration
and end of drought conditions.
Droughts can be classified in several categories
: (i) meteorological drought, which is
defined as a rainfall deficit over a certain space and period of time; (ii)
agricultural or soil moisture drought, which describes the propagation of
precipitation deficits to soil moisture deficits resulting in plant water
stress; and (iii) finally hydrological drought, which is associated with the
effects of precipitation deficits on surface and subsurface water supplies.
In this study we focus on meteorological droughts using monthly precipitation
forecasts from the European Centre for Medium-range Weather Forecasts (ECMWF)
ensemble systems. This timescale is considered a challenge because it is
located between the medium-range forecasting, which is strongly related to
initial conditions, and the seasonal timescale, mainly driven by oceanic
variabilities . The goal is to test the
possibilities to provide decision makers with a forecast of the onset or
likely evolution of a drought during the next month.
It has been demonstrated that droughts can be forecasted using stochastic or
neural networks . While
demonstrated that these forecasts can provide
“reasonably good agreement for forecasting with 1 to 2 months lead times”,
they do not quantify the improvement of these methods with respect to using
probabilistic forecasts of the precipitation fields. Forecasts of droughts
can also be produced using deterministic numerical weather prediction models.
Such forecasts are highly uncertain due to the chaotic nature of the
atmosphere, which is particularly strong on a sub-seasonal timescale
. Therefore, ensemble
prediction systems have been developed that forecast multiple scenarios of
future weather. Probabilistic forecasts become particularly important for
assessing the risks associated with high-impact and rare weather events such
as tropical cyclones or droughts as well as for identifying uncertainties in the forecasts
.
Forecasts on the sub-seasonal timescale and seasonal forecasts from dynamical
models have considerably evolved over recent years and demonstrate potential
usefulness to predict large-scale features and teleconnections
. The latter can be used in
statistical downscaling methods using weather types.
, for example, used the North Atlantic sea level
pressure precursors to forecast drought over the eastern Mediterranean.
However, while their forecasts are statistically significant for several
months' lead time, this region represents a relatively small part of Europe
known to be one of the most sensitive to weather types. In general, the
published literature indicates that the skill of the precipitation fields
produced by numerical weather predictions over Europe is low
even though there are considerable spatial variations. However, these
analyses tend to be performed from the point of view of weather forecasting
and do not incorporate specific properties that are relevant for drought
forecasting such as persistence.
Drought forecasts can be based on different lead times, ranging from a few
weeks to several months and the accuracy of any forecast will decrease with
increasing lead times. Nevertheless, so far, there is no reference study
providing a general assessment of meteorological drought forecasting over
Europe. Such a study is necessary to provide a base for researchers that
develop new forecast methods. It is also necessary for decision makers and
end users to assess the uncertainties of the warning provided by forecast
services.
The ECMWF provides two different types of forecasts for this time range: an
extended range forecast, with lead times up to 32 days which is issued twice
a week, and a seasonal forecast, with lead times of up to 12 months issued
once a month. The extended range forecast incorporates more recent model
developments and is usually of higher resolution . The
seasonal forecasting system is based on an older model cycle
, among other significant differences. Analysing the
potential of both products requires understanding the property and skill
differences between the two systems for the particular application. For the
case of droughts such an analysis needs to include both the numerical
forecasting skill and the possibilities for binary decisions to issue drought
warnings. In particular, the latter is challenging if such decisions are
based on probabilistic forecasts.
The objectives of this paper are to analyse the possibilities for issuing
30-day forecasts of drought conditions based on ensemble prediction systems
and the Standardized Precipitation Index (SPI,
). The latter is a normalized quantification
of the precipitation anomalies and is considered a good indicator for analysing
meteorological droughts over different timescales .
Considering the difficulties in predicting drought, in this study, we focus
on the evolution of the precipitation for the next month, calculating the
rainfall anomaly for the same time period (SPI-1). This product, which
provides the trend of precipitation for the next month in relation to the
climatology, could be combined with routine drought monitoring to create more
robust and useful information for stakeholders. To do so, the extended range
and seasonal forecasting systems are compared directly but also within the
setting of a decision-making framework. Multiple scores as well as multiple
methodologies which allow the transformation of probabilistic forecasts into
binary decisions are developed and tested.
Underlying issues are the following: what is the predictability of a drought
based on the SPI for a 1-month rainfall accumulation period (SPI-1), what is
the most useful model between the Seasonal (SEAS) and the monthly ENSemble
system (ENS) for forecasting 30-day cumulative precipitation, and what are
the spatial and temporal variabilities of the model's ability? Adapted skill
scores provide information about the ability of the probabilistic models to
accurately forecast such kinds of extreme events. The paper is organized as
follows: the tools and methods used will be detailed in Sect. 2 and the
results will be discussed in Sect. 3. Final conclusions are drawn in Sect. 4.
Data and methods
Precipitation
Observations
In this study, the combined gridded precipitation data set from the ENSEMBLES
project and ECA & D (, E-OBS
Version 5) was used which is available from 1950 onwards and is continuously
updated. The spatial resolution of the data set is 0.25∘ by
0.25∘, which was up-scaled by averaging the cumulative precipitation
to a 1∘ by 1∘ grid as this analysis focuses on large-scale
droughts.
Validation of the original data sets has been performed by
and , who found
that data sets from ECA & D show higher values for extreme precipitation,
and E-OBS tends to over-smooth the data. This can generate some problems when
analysing intense precipitation events but appears of secondary importance in
drought analysis. Daily precipitation values have been aggregated to monthly
values to provide comparison with monthly forecasts. To be consistent with
the data provided by the ensembles from ECMWF, a common period of the
hindcast that covers the period from 1992 to 2013 is used to calculate the
precipitation anomalies.
Forecasts
Two sets of coupled ensemble forecasting systems are provided by ECMWF to
forecast 1 month ahead: an extended range monthly forecast and a seasonal
forecast.
The ECMWF monthly (32-day) extended range ensemble forecasting system
ENS hereafter; has been routinely issued twice a
week since October 2011. This model is the latest version of the ECMWF
Integrated Forecasting System. For lead times up to day 10 the model is not
coupled to the ocean and has a resolution of ∼32 km (T639). It
is forced by persistent sea-surface temperature anomalies. Beyond a lead time
of 10 days the resolution of the model is coarser (T319, 64 km);
however, it is coupled to an ocean model. The vertical resolution remains
unchanged during the entire simulation at 62 vertical levels. ECMWF provides
a back statistic (hindcasts) for ENS, which is a five-member ensemble
starting on the same day and month as each Thursday's real-time forecast for
each of the past 20 years. For a more detailed description see
.
The second ECMWF ensemble system used in this study is the seasonal forecast
called System 4 (; SEAS hereafter), which is launched
once a month (on the first day of the month). It has lead times up to 13
months and a resolution of T255 (80 km). This model is the 2011
version of the Integrated Forecast System, with 91 vertical levels. SEAS
provides a back statistic, which is a 15/51 member ensemble (number depends
on month) identical to SEAS for every month from 1980 onwards. In this study,
only the first forecast month is used.
SEAS and ENS are composed of 50 members, which are generated by perturbing
initial conditions and physical tendency and one unperturbed member. Both data sets were
re-gridded to a 1 square degree resolution using a mass conservative
interpolation. The two systems will be compared over their hindcast periods
as well as over a forecast period as can be seen in Table . This
allows for a larger sample size and enables a more significant comparison.
However, despite this technique being robust and frequently used, it also has
a few disadvantages: the ensemble size of the reforecasts is only five
members instead of 51 members for the real-time forecasts. Ensemble size can
have an impact on skill scores, which needs to be corrected for.
faced the same issue when they scored the
ECMWF reforecasts produced in 2006 and used a correction of the probabilistic
skill score which takes into account the ensemble size.
Drought detection
In this study the Standardized Precipitation Index (SPI) is used to detect
droughts. It was developed by and is currently
used in many scientific studies or operational systems
. SPI has the advantage that it provides easily
understandable information about the precipitation anomaly. In addition it is
also very flexible, allowing calculations aggregated over different spatial
scales (from station data to large-scale area) as well as temporal domains
(from 10-day to several months' cumulative precipitation –
).
ENS and SEAS configurations for the hindcast and the
forecast periods.
Periods
Evaluation period
ENS
SEAS
Hindcasts
Nov 1992 to Oct 2012
5 members
15/51 members
Forecasts
1 Nov 2012 to 31 Oct 2013
51 members
51 members
This study focuses on the monthly timescale and therefore the SPI was
calculated using monthly accumulated precipitation (SPI-1). The SPI is
usually computed by fitting a probability density function (often a Gamma
distribution) to the data . Through the application of an inverse
normal (Gaussian) function, data are transformed into normal space with a
mean equal to 0 and a standard deviation (SD) equal to 1. It is important
that the hypothesis that the data can be approximated by a Gamma distribution
is tested to ensure that all conclusions are valid. The Gamma function cannot
be fitted when only a low number of data points (events) or very low data
values (precipitation) exist because numerical convergence of the
optimization process cannot be achieved. Therefore, the SPI methodology
cannot be applied in very arid regions.
The SPI value can be broken down into different classes :
normal conditions from -1 to 1; moderate drought with SPI < -1;
severe droughts with SPI < -1.5; and extreme drought for
SPI < -2. The time series of the analysed forecasts in this paper are
too short to justify any focus on an SPI lower than -2 (last 2.3 % of
the distribution). Therefore, this study focuses on moderate and severe
droughts only. One strong advantage of this method is that it produces an
unbiased product with a homogeneous rank histogram (Talagrand diagram) of the
observed precipitation onto the forecasted precipitation (not shown).
Deriving a decision from probabilistic forecasts
One of the main objectives of this work is to provide decision makers and end
users with a simple and robust Boolean index to forecast a drought based on a
probabilistic forecasting system. Several methods to select the Boolean
solution are tested and are compared with a deterministic model (defined here
as the unperturbed member of the ensemble). Also, a comparison against a
climatological forecast will be performed. Methods to derive this index are
given in Table and can be categorized into three types:
individual, where the index is based on an individual member or percentile;
partially integrative, where the sum of particular individual members or
percentiles are used; and integrative which is represented by the ensemble
mean. The individual types should be seen as providing complementary
information giving information about the intensity of the SPI-1, but also the
distribution of the members.
List of the 10 methods used to provide a Boolean index
for drought forecasting using an ensemble system.
Name
Definition
13th percentile (Q13)
Member located at the 13 % of the CDF
23th percentile (Q23)
Member located at the 23 % of the CDF
Median (MED)
Member located at the 50 % of the CDF
77th percentile (Q77)
Member located at the 77 % of the CDF
88th percentile (Q88)
Member located at the 88 % of the CDF
Large spread (SpL)
Sum of the extreme members (Q13 + Q88)
Low spread (Spl)
Sum of the members (Q23 + Q78)
Dry spread (SpD)
Sum of the dry members (Q13 + Q23)
Flood spread (SpF)
Sum of the wet members (Q77 + Q88)
Mean
Ensemble mean
The individual types have been subdivided into five classes representing dry
members (Q13, Q23), wet ones (Q77, Q88) or the median. The extreme members of
the distribution are not used to avoid outliers generally associated with
ensemble systems . For each method, a threshold was
defined. A SPI lower than -1 or -1.5 will select 16 and 6.7 %
respectively of the normalized series. Therefore, to be coherent, the
thresholds have been defined to select the same number of events.
Evaluation scores
A plethora of scores to evaluate probabilistic forecasts exist
and in this study we have chosen scores
which are suitable for drought forecasting.
The relative operating characteristic (ROC) score was proposed by
and plots the false alarm rate against the hit rate.
The objective of that score is to calculate the ability of the forecast to
discriminate between events and non-events. This score is not bias sensitive
to the forecast and can be considered as a measure of potential usefulness
because it is conditioned by the observations (i.e. given that a drought
occurred, what was the corresponding forecast?). The area under the ROC curve
can be calculated and ranges between 0 and 1. Higher numbers indicate a
better forecast.
The reliability diagram, which is conditioned on the forecasts, is a good
complementary score to the ROC because it assesses the average agreement
between the forecast values and the observed values. In a reliability diagram
the forecast probability is plotted against the observed relative frequency
. A perfect score is associated with the
1:1 line, the climatology score (i.e. no resolution) corresponds to the
mean observed frequency (i.e. observed relative frequency of y=0.159 for
SPI < -1).
The accuracy of the probability forecasts is assessed using the Brier score
:
BSf=∑k=1r∑j=1m(pf(j,k)-Io(j,k))2,
where pt is the probability that was forecast, Io the
observation of the event (1 or 0 if it does happen or not), r the number of
classes (here 2) and m is the number of forecasting instances. A skill
score can be derived by comparing the Brier score to climatology.
BSS=1-BSf/BSc.
The Brier skill score ranges from -inf to 1. The higher the score the more
skilful is the forecast and any negative values indicate that the
climatological forecast outperforms the probabilistic forecast. The scores
above are complemented by the correlation of the ensemble mean and the root
mean square error of the ensemble mean as those are frequently used in the
evaluation of seasonal forecasts.
Several scores exist which deal with the contingency table and where the
forecasted and observed solutions are Booleans. In this paper, we have used
five of them. The probability of detection (POD, perfect = 1) is the
ratio of the total number of observed events that have been forecasted.
POD=hitshits+misses.
The false alarm rate (FAR, perfect =0) is the fraction of the forecasted
events which actually did not occur.
FAR=falsealarmshits+falsealarms.
The extreme dependency score (EDS, see Eq. ) is an informative
assessment of skill in deterministic forecasts of rare events that can
converge to different values for different forecasting systems and
furthermore it does not explicitly depend upon the bias of the forecasting
system. .
EDS=2log(hits+missestotal)log(hitstotal)-1.
The percent correct (PC, perfect = 1) is the ratio of good forecasting
events in relation to the total number of events.
PC=hits+correctnegativetotal.
Finally, the Gilbert score balances POD and PC cases
and measures the fraction
of observed and/or forecasted events that were correctly predicted, and
adjusted for hits associated with random chance.
GSS=hits-hitsrandomhits+misses+falsealarms-hitsrandom.
Results
Evaluation of the SPI calculation
The sensitive part of the SPI calculation is the fitting of a theoretical
distribution to the empirical distribution. In this study, the Gamma
distribution is fitted to the probability density function of monthly
precipitation. It is therefore necessary to set a threshold at which a
minimum cumulative precipitation can be considered as significant.
Different thresholds were tested (0, 1, 5, 10 and 20 mm, not shown)
and it was decided that only monthly precipitations larger than 10 mm
are considered significant. This threshold allows the retention of a large
number of events and the discarding of events or regions with non-significant
monthly accumulated rainfall. As outlined in the methodology, fitting a Gamma
distribution to precipitation data relies on an adequate sample size
(adequate with respect to the variability of the data). The Gamma
distribution was fitted to the distribution if a grid point possesses at
least 66 % of values significantly larger than 0 (i.e. larger than
10 mm). That ensures a minimum number of events to fit the
distribution. These thresholds allow for the removal of arid areas, where the
fitting of the Gamma distribution resulted in biased values due to the low
spread and low sampling of the time series.
The performance of the fitting procedure and of the underlying assumptions
can be analysed by investigating the resulting SPI-1 distribution. This was
done by calculating the integral of the differences between the fitted Gamma
distribution and the empirical distribution. Zero values are considered as
perfect values (no bias of the SPI-1 calculated), whereas positive or
negative values indicate bias and therefore question the validity of the
fitting procedure. In Fig. the bias of the Gamma distribution over
the entire globe is shown. It can be seen that the Gamma distribution is well
adapted for most of Europe (see also ).
Nevertheless, the low precipitation amounts over the southern part of Spain
can create some bias in the fitting. This is especially true during the
summer season and therefore the assumptions for fitting the Gamma
distribution are not valid for the entire year. This analysis shows that it
will be necessary to adapt the method in particular over dry areas, for
example, by focusing the study only during the rainy seasons.
Bias of the SPI-1 calculated between the fitted Gamma
distribution and the observed monthly cumulative precipitation (see text for
more details). Regions in white are considered as too dry to fit this
distribution. Regions where the bias becomes significantly different to 0
(non-hatched areas) could generate bias in the SPI calculation.
(a) Correlation of the forecasted (using the
mean of the ensemble) and observed SPI-1 during the hindcast period (from
November 1992 to November 2012). (b) Same but for the RMSE.
Ratio of events following the forecasted (x axis) and
observed SPI-1 (colour bars) over Europe using the hindcast period in
relation to the theoretical distribution. Results are standardized by the
theoretical normal distribution of events.
(a) Mean SPI-1 forecasted of ranked members
using ENS during observed drought or floods (SPI-1 <-1.5 and
SPI-1 > 1.5 respectively). (b) Ensemble mean and SD of the
SPI-1 forecasted using ENS following the associated observed SPI-1.
(c) and (d) are the same panels as (a) and
(b) but using SEAS.
ROC curve using ENS and SEAS (red and black lines
respectively) for the period from November 2012 to November 2013 over Europe
to detect a drought defined as an SPI lower than -1 (a) or lower
than -1.5 (b). The ROC area values for the different spatial
resolutions are indicated.
Validation during the hindcast period
This evaluation is based on the hindcast period (see Table ) of ENS
and SEAS. It allows a long-term evaluation using the same version of the
model. The correlation and root mean square error of the ensemble means are
displayed in Fig. . The mean correlation (0.32) and the mean RMSE
(1.02) for ENS is better than that for SEAS (0.05 and 1.45 respectively, not
shown). Neither the correlation nor the RMSE are significantly different from
zero suggesting that a mean monthly forecast has no skill. In addition, the
spatial variability is low, meaning that there is no significant spatial
difference in the ability of the model to predict the SPI-1, on average.
Note that the correlation of the SPI-1 is comparable to the anomaly
correlation coefficient (ACC) that removes the seasonal cycle. Indeed, the
SPI-1 is the anomaly of a monthly precipitation in relation to the
climatology of that specific month. So this correlation coefficient is
much more robust but also less likely significant.
Reliability diagrams for drought detection defined as a
SPI-1 lower than -1 using ENS (top panels) and SEAS (bottom panels) in the
period from November 2012 to November 2013. The spatial resolution is 1
square degree (left panels) and 5 square degrees (right panels).
ROC anomaly (in %) in relation to the mean value of the ROC over
the domain (equal to 0.67) for the period from November 2012 to November 2013
with drought defined as an SPI-1 <-1.
Seasonal decomposition of the ROC curves for drought
forecasting (with the 5 square degree smoothing) using ENS (a) and
SEAS (b) over Europe for the period from November 2012 to
November 2013 with drought defined as an SPI-1 < -1.
The SPI-1 values of individual ensemble members and observations were
analysed in bins to assess whether these results are also valid for extreme
events. Here, the individual ability (for each member independently) was
assessed by decomposing the SPI-1 forecasted and observed over Europe during
the hindcast period in 10 classes (from SPI-1 lower than -2 to SPI-1 larger
than +2, at intervals of 0.5). The frequency in each bin naturally follows
the Gamma distribution which generates a large number of cases centred around
0. This distribution was normalized by computing the ratio between the
empirical distribution and the theoretical distribution. The result is shown
in Fig. . The figure shows that the more a drought is forecasted,
the more it is observed (red bars). In addition it has to be noted that the
distribution is highly non-symmetric. This indicates that the forecasts of
extreme dry events are more accurate than the forecasts of extreme wet
events. This result could be explained by the usually large spatial and
temporal scales of drought events that are better predictable by a global
model even 1 month ahead.
Validation during the forecast period
The analysis of the forecast period from November 2012 to November 2013
largely confirms earlier findings in this paper of the forecasts over a
significantly longer temporal period, but allows for a more detailed
investigation of the distributions due to the larger ensemble number (see
Table ).
Figure a compares the behaviour of the ENS members during observed
extreme wet and dry events. In both cases, the normal distributions of the
ranked ensemble members are quite similar. The only difference is the shift
towards negative values of forecasted SPI when a drought is observed (red
line) in comparison with when wet events are observed (blue line).
Nevertheless, the SD (indicated by the barlines) highlights that there is no
significant difference (significance level of 0.9) between the two events. It
is interesting to observe that the value of the ensemble mean increases with
the increase of the observed SPI-1 (black line in Fig. b), whereas
the spread of the ensemble (defined as the SD) shows little sensitivity
(yellow line in Fig. b). It can be concluded that only the ensemble
mean displays a significant difference between wet and dry anomalies, whilst
there is no such relation in the SD. In SEAS, the same trends are observed
but the difference between the two conditional distributions is reduced
(Fig. c and d). This indicates that ENS has a stronger resolution
than SEAS, and therefore a greater ability to discriminate events with
different frequency distributions.
These results are confirmed by analysing the ROC curve. Over the European
continent, the ROC curves show an improvement in relation to the no-skill
curve (1:1 in Fig. ). The ROC area is slightly better for ENS
than for SEAS (+0.4 and +0.2 for SPI-1 < -1 and SPI-1 < -1.5,
respectively).
Both ENS and SEAS present a positive but low reliability for detection of
SPI-1 < -1 (Fig. ). Indeed, the observed relative frequencies
increase with the increase in the forecast probabilities. The distribution of
cases per percentage (not shown) indicates more events with a large
percentage of members associated with a drought in ENS rather than SEAS. This
result indicates the better consistency between the members in ENS to
forecast an extreme rainfall deficit than in a case of SEAS. Using ENS,
several events are forecasted with more than 93 % of members associated
with a drought forecasting, whereas using SEAS, the maximum is 81 %. The
ENS and SEAS systems are better than climatology, achieving values of 0.14
and 0.12 respectively. But, here the difference between ENS and SEAS is not
significant.
Sensitivity to drought scales
All analysis so far has been performed on a scale of 1∘ by
1∘; however, the sensitivity to different resolutions needs to be
analysed, because the impacts of large-scale droughts will be stronger.
Figure shows SPI-1 values smoothed to 3 and 5 square degrees,
using a simple upscaling method based on the average of the values. The
resolution of about 1 square degree has been kept to compare the impact of
the resolution in the native grid. The results show a slight improvement of
the ROC area with a coarser resolution (broken and dotted lines in
Fig. ). The smoothed signal favours the large-scale signatures that
are better represented in models than small-scale structures of droughts. The
effect of spatial upscaling can also be seen in the ROC results as a little
positive impact of SEAS for the largest forecast probabilities
(Fig. d). However, as mentioned previously, the number of events in
these cases is low. The effect has been quantified using the BSS (see
Eq. ), which goes up to 0.17 and 0.14 respectively for the
5 ∘ smoothed signal.
Spatial and seasonal variabilities
Spatial variability
The analysis so far has ignored the spatial and seasonal scale.
Figure shows the ROC anomaly for the forecast period, which is the
ROC area for each grid cell in relation to the average (0.67 for ENS). The
anomaly is preferred to the raw value to highlight regions where the ROC is
improved or reduced. A maximum variability of 20 % can be observed. For
the hindcast period (not shown), this variability is much lower at ∼6 %. There is a difference in spatial patterns between the two periods,
suggesting that the spatial patterns are not significant and are mainly
driven by the extreme cases encountered during the period.
Seasonal variability
A seasonal decomposition is used to highlight the temporal variabilities. ROC
scores and curves were independently calculated for the autumn (September to
November), winter (December to February), spring (March to May) and summer
(June to August) seasons and are displayed in Fig. (for
SPI-1 < -1).
The four ROC areas are very similar, and the four distributions are identical
for ENS, meaning that the skill to forecast droughts is identical throughout
the year. In contrast, SEAS shows some differences between the seasons, with
a small improvement in the forecast during the autumn season. Identical
interpretations can be derived for the SPI-1 < -1.5 and are therefore
not shown.
Index performance
Figure shows the POD (see Eq. ) and the FAR (see
Eq. ) for ENS and SEAS. POD indicates that, on average, one in
three drought events over Europe is correctly forecasted 1 month in advance.
This is significantly better than the climatology (16 %) and better than
the deterministic forecast (around 25 %, green line in Fig. ).
The importance of the drought duration has also been tested. The scores were
calculated independently for a drought onset (first SPI-1 lower than
thresholds), persistence (consecutive SPI-1 lower than the threshold), or end
of the drought (first SPI-1 above the threshold). First, the duration of a
large majority of SPI-1 lower than -1 (more than 80 %) is 1 month
(isolated values, dry spell). The scores display a slight increase of the
score for the persistent droughts (condition unchanged); for the median the
POD score increases from 0.33 to 0.36. But the difference is not significant
according to the t test.
The highest POD is achieved by using the 13 percentile (7th member of
the ranked ensemble distribution), and the product using the Q13 and
Q23 (noted SpD). The mean of the ensemble (last point on the right of
each panel), which is used widely, is not the best method for detecting
droughts.
The POD values of the wettest members of the ranked distribution (noted Q77
and Q88 in Fig. ) give the worst results of all methods, meaning
that there is a low consistency between the extreme dry and wet members. The
FAR displays a low variability between the methods, but every single one is
better than the deterministic solution (red lines). It is also worth noting
that, using the ENS, the driest members are associated with a decrease of FAR
in relation to the dry members. This can be explained by the previous scores,
which show a larger consistency between the members. However, it could also
be due to a technical effect, since the number of events selected is
constant, these scores could be dependent on each other.
Probability of detection (POD, in green,
perfect = 1) and false alarm ratio (FAR, in red, perfect = 0) for
different methods used to detect drought (x axis), using ENS (a)
and SEAS (b). Lines indicate the scores of the deterministic model
(unperturbed member of the ensemble).
Extreme dependency score (EDS) for the 10 methods used
to forecast a drought (x axis, see Table 1 for more details) using the ENS
(a) and SEAS (b) ensemble systems. Black lines indicate the
score of the unperturbed member.
The highest EDS (see Eq. ) is achieved for the driest members (Q13 and Q23,
Fig. ), whereas the wettest members (Q77 and Q88) have the lowest
scores. The score of the ensemble mean is better than that of the median.
Even if the POD and FAR differences are partially statistically significant,
the improvement of the EDS for the driest members is significant for all
differences larger than 0.04.
ENS and SEAS are reliable (see Fig. ) and hence a potential method
for drought forecasting could be simply based on the percentage of ensembles
predicting a drought. In total, 10 different percentage thresholds were
selected. Figure shows that the percentage correct is increasing
with the increase in the percentage used for both models (black points in
Fig. a and c) which is in agreement with the positive reliability.
This means that with more members forecasting a drought, the chance to
observe one is increased. However, with an increasing threshold, the number
of misses also increases (provided by the POD value, red points in
Fig. a and c). For example, if the threshold to determine a
drought is defined with the 10 % of members associated with a drought
forecasting, around 80 % of droughts that occurred were correctly
detected (red points), but more than 50 % of those forecasted are
associated with false alarms. Contrarily, if the threshold of detection is
defined with a percentage larger than 70 %, the percentage correct is
about 85 %, but the POD is close to 3 %. Based on this result, the
user can tune the percentage depending on an acceptable false alarm ration
and misses.
(a) POD (red) and percentage correct (black)
using different percentage of members to forecast a drought event using ENS.
(b) Gilbert score (see text for more details) following the
percentage used to forecast a drought using ENS. Lines indicate the score of
the deterministic model (unperturbed member). (c) and (d)
are the same panels as (a) and (b) using SEAS.
Mean SPI-1 and SD of the ranked members following the
four conditions in the contingency table (see Table 2 and text for more
details): hits (green), false alarm (red), misses (blue) and correct negative
(black line), using ENS. Vertical lines indicate the spread of the members
used for the Boolean drought detection methods.
The maximum Gilbert score (Fig. b and d, see Eq. ) is
achieved for a threshold of 30 % for ENS and 40 % for SEAS. In that
case, 40 % of droughts observed are forecasted and 75 % of forecasts
are hits. The number of missed events becomes too high with a larger
percentage threshold, whereas for lower percentage thresholds the errors are
associated with false alarms.
Assessing the uncertainties of the forecasts
Several previous studies have shown that probabilistic
simulations can provide additional information to assess the uncertainties of
the simulation.
The idea here is to estimate the quality of the forecast, based on a specific
behaviour of the simulation. So the characteristics of the ensemble in the
four different cases of the contingency table have been analysed. This table
has been built using the threshold of SPI-1 < -1 to detect a drought
and the forecast method is based on the median of the members. The mean SPI-1
of the 51 ranked members for the four cases is illustrated in
Fig. . During correct negative events (i.e. events without
droughts forecasted or observed), where more than 70 % of the events are
located, a normal distribution is observed, with a mean slightly larger than
0. During the missed cases, the median is very close to 0 and the
distribution of the ranked members is very close to the ensemble mean.
In addition, the spread of the members is displayed (barb lines) and shows
the increase of the spread for extreme members. The fact that the two
distributions become undistinguishable means that the response of the model
is no different to a normal distribution and it is not significant to find a
specific behaviour of the model to assess the missed events.
Finally, the distributions of the members during hits and false alarms are
compared. In that case, there is no significant difference. The average and
the distribution of the mean SPI-1 of the ensemble are quite similar. These
results are in agreement with Table , which quantifies the ensemble
spread for each case in the contingency table. Based on these results, it
appears impossible to evaluate the uncertainties of the ensemble simulation
associated with a Boolean decision.
Discussion
Most drought studies use SPI with 3 to 6 months or even longer accumulation
periods for drought monitoring and characterization. To forecast droughts
over such long-term periods a very accurate and reliable atmospheric model is
required. Since it is well known that the current reliability of
precipitation forecasts decreases drastically after the first month, the
benefit of using a lead time of 2 months or more is, however, not obvious
.
Contingency table (in percentage) obtained using the
median of ENS to forecast a drought. The definition of the drought observed
is an SPI-1 lower than -1 and a drought is forecasted when the ensemble
median is lower than the 16th percentile. The second values of each case
indicate the ensemble spread and its SD is given in brackets.
Drought
Observed
Yes
No
Drought
Yes
4.4 %/2.31 (0.4)
10.7 %/2.37 (0.4)
Forecasted
No
10.4 %/1.99 (0.4)
74.5 %/1.88 (0.3)
This paper, therefore, looks as a first step at the possibilities of
providing a reliable 1-month forecast over the European continent. This
information, in combination with monitoring data such as satellite or in situ
measurements that provide an accurate characterization of ongoing drought
conditions (e.g. during the last 2 months), can provide the best estimate of
near future conditions. However, such a combination of monitoring and
forecasting data will also not allow one to look more than 1 month ahead and an
amalgamation of both information types would bias the testing of the forecast
skill, which is the intention of this paper. Several meteorological services
or agencies, such as the Bureau of Meteorology in Australia or the United
States National Drought Mitigation Center, provide relevant monitoring data
as well as a 1-month outlook. For the case of Europe, the European Drought
Observatory (EDO) at the Joint Research Centre of the European Commission
provides relevant monitoring data, but up to now has lacked a forecast beyond 7 days.
A 1-month forecast with a good reliability is considered to be a very
valuable product for decision makers as it provides information on the
probability of occurrence of a dry spell (in case of ongoing normal
conditions) and of the probable persistence or end of a drought (in case of
an ongoing precipitation deficit). Before providing such information, it is
however necessary to assess the quality of the forecasts, which was the first
aim of this study. The second objective was to define the most robust
(Boolean) method to activate alert levels for the end users of the forecast
information. Both steps are essential in an operational early warning
environment.
Conclusions
This study provides the first assessment of the predictability of
meteorological droughts over Europe and of the ability to issue an early
warning of such droughts with a 1-month lead time. The analysis is based on
the 1-month forecast of the SPI-1 from the precipitation outputs provided by
two ECMWF ensemble systems. In a first step the ability to forecast SPI-1
from the ensemble outputs was tested, showing that
the reliability of the ensemble is better than the climatology,
the spatial variability of the scores can reach up to 20 %
over Europe and the seasonal variability is not significant, and
ensemble models are better at forecasting large-scale droughts.
In a second step the ability to provide a robust Boolean index for drought
forecasting was analysed. The best method is defined by using a threshold of
30 % of ensemble members associated with a drought. In that case,
slightly more than 40 % of the droughts observed are forecasted correctly
1 month ahead, with only 25 % of false alarms. This is significantly
better than using the climatology (16 %) or the deterministic models
(around 25 %). Finally, this study has shown that there is no possibility
of providing uncertainties associated with the Boolean index.
By providing the first assessment of meteorological drought forecasting in
Europe, this work will be particularly useful by as a benchmark comparison
for future studies using, for example, statistical weather prediction methods
based on atmospheric predictors, which are better represented in the seasonal
models. As a follow-up of the analysis presented in this paper work, we will
assess the advantages of predicting droughts by analysing specific weather
types that are related to the occurrence and persistence of droughts in
Europe . It could further be useful to investigate the use
of moving windows of 10 day cumulative precipitation to detail the temporal
behaviour of the forecasted SPI-1. As the forecast skills are better for
short lead times, an SPI-1 lower than -1, explained by a strong decrease in
precipitation at the beginning of the period, should be more reliable.