Seasonal forecasts of droughts in African basins using the Standardized Precipitation Index

Vast parts of Africa rely on the rainy season for livestock and agriculture. Droughts can have a severe impact in these areas, which often have a very low resilience and limited capabilities to mitigate drought impacts. This paper assesses the predictive capabilities of an integrated drought monitoring and seasonal forecasting system (up to 5 months lead time) based on the Standardized Precipitation Index (SPI). The system is constructed by extending near-real-time monthly precipitation fields (ECMWF ERA-Interim reanalysis and the Climate Anomaly Monitoring System–Outgoing Longwave Radiation Precipitation Index, CAMS-OPI) with monthly forecasted fields as provided by the ECMWF seasonal forecasting system. The forecasts were then evaluated over four basins in Africa: the Blue Nile, Limpopo, Upper Niger, and Upper Zambezi. There are significant differences in the quality of the precipitation between the datasets depending on the catchments, and a general statement regarding the best product is difficult to make. The generally low number of rain gauges and their decrease in the recent years limits the verification and monitoring of droughts in the different basins, reinforcing the need for a strong investment on climate monitoring. All the datasets show similar spatial and temporal patterns in southern and north-western Africa, while there is a low correlation in the equatorial area, which makes it difficult to define ground truth and choose an adequate product for monitoring. The seasonal forecasts have a higher reliability and skill in the Blue Nile, Limpopo and Upper Niger in comparison with the Zambezi. This skill and reliability depend strongly on the SPI timescale, and longer timescales have more skill. The ECMWF seasonal forecasts have predictive skill which is higher than using climatology for most regions. In regions where no reliable near-real-time data is available, the seasonal forecast can be used for moni oring (first month of forecast). Furthermore, poor-quality precipitation monitoring products can reduce the potential skill of SPI seasonal forecasts in 2 to 4 months lead time.


Introduction
Most of Africa relies on the rainy season for water supply for livestock and agriculture (IWMI, 2010).Therefore, rain shortage can have a severe impact over this continent, which in many areas has a very low resilience and limited capabilities to mitigate drought effects.For example, the long sequence of droughts in the sub-Sahel region during the 1970s and 1980s (e.g.Nicholson et al., 1998), and the recent 2010/11 drought that afflicted the Horn of Africa (Dutra et al., 2013) had severe humanitarian consequences.Monitoring and forecasting both the length and geographical extension of droughts is a key component of increasing resilience.
Droughts are typically classified into four types: meteorological, hydrological, agricultural and socioeconomic, and there are many drought indicators associated to each drought type (e.g.Keyantash and Dracup, 2002).In this work we focus on the meteorological drought using the Standardized Precipitation Index (SPI) initially developed by Mc-Kee et al. (1993) and recommended by the World Meteorological Organization as a standard to characterize meteorological droughts (Hayes et al., 2011;WMO press release in 2009).The SPI is based on the probability of an observed precipitation deficit occurring over a given prior accumulated time period.This time period (also referred to as timescale) is defined according to the particular application, Published by Copernicus Publications on behalf of the European Geosciences Union.

E. Dutra et al.: Seasonal forecasts of droughts in African basins using the Standardized Precipitation Index
with typical values of 3, 6 and 12 months.The flexibility of the accumulation in different time periods allows a range of meteorological, agricultural and hydrological applications.However, the time responses to drought of different systems (e.g.soil moisture, stream flow, reservoirs) are not known a priori (Vicente-Serrano et al., 2012).In each particular region/application, a detailed study should ideally be carried out to relate the different SPI timescales to the target variable, such as available soil moisture to crops or natural reservoirs.
The SPI calculation relies only on monthly means of precipitation, which are usually available in near real time from observations (in situ and/or satellite) and also from seasonal forecasts (in both cases generally associated with large uncertainties).Data availability, and simplicity of calculation, makes the SPI a multiscalar drought index with potential capabilities for combining monitoring and forecasting.The monitoring component relies on near-real-time data that can either be derived through the merging of ground observations and remote sensing information or by using numerical weather prediction (NWP) models as reanalysis tools.The forecasting component traditionally relied on climatological or statistical forecasts.In more recent years as the quality of forecast models steadily improves over the monthly to seasonal lead time (Simmons and Hollingsworth, 2002), there is an increasing interest to test these products also in the sectorial application of drought monitoring.For example, Yoon et al. (2012) recently proposed a methodology to forecast 3and 6-month SPI for the prediction of meteorological drought over the contiguous United States based on the NCEP climate forecast system (CFS) seasonal forecasts of precipitation (Saha et al., 2006).
The latest version of the European Centre for Medium-Range Weather Forecasts (ECMWF) seasonal forecasting systems, system 4 (S4), was released in November 2011 (Molteni et al., 2011).Despite the recent model improvements, predicted fields such as temperature, and to a higher extent precipitation, can be biased and in some areas have little or no skill.This is particularly the case in some regions in Africa, where in situ observations are scarce and models often show persistent systematic errors.One such example is the prediction of the West Africa monsoon system, where S4 is able to reproduce the progression of the West Africa monsoon but shows persistent biases caused by a southerly shift of the precipitation in the main monsoon months of July and August (Agustí-Panareda et al., 2010;Tompkins and Feudale, 2009).
In this paper an integrated monitoring and forecasting drought system for four African river basins has been designed to explore the current capability of ECMWF products to provide drought information over the African continent.This has been done by combining globally available monthly precipitation monitoring products with the forecast from S4.The four basins were chosen to represent different synoptic conditions typical of the African continent.The drought monitoring and forecasting system is described in Sect. 2 followed by the evaluation of the system in Sect. 3 and the main conclusions in Sect. 4.

Precipitation monitoring
Several datasets could be used for drought monitoring.However, there are few technical requirements a dataset should fulfil to be suitable for an operational monitoring and forecasting chain employing the SPI, and these will be described in the following section.
Firstly, it should be long enough (at least 30 yr, as recommended by McKee et al., 1993) and statistically homogeneous (Guttman, 1999).This means that observations should as much as possible (i) avoid changes in rain gauge location and measuring equipment, and (ii) use similar techniques to derive precipitation from remote sensing data, even when using different platforms.When employing dynamical model outputs, the model should have the same spatial and temporal structure (as in reanalysis or global/regional climate models) to avoid disruptions due to model changes, such as a change in resolution or parameterization schemes.Changes in the observation systems and/or models can produce artificial signals, such as trends, that will be reflected in the drought indicators.Secondly, the dataset needs to be available in near real time, meaning no more than a 1-month delay.
The long-term homogeneity and near-real-time update are two criteria difficult to achieve, especially on a global scale.Two freely available products that partially fulfil the requirements are the reanalysis produced by the dynamical ECMWF model ERA-Interim (Dee et al., 2011) and the observationally based product Climate Anomaly Monitoring System-Outgoing Longwave Radiation Precipitation Index (CAMS-OPI; Janowiak and Xie, 1999).These datasets have a global coverage, span 30 yr and are available in near real time.
ERA-Interim (ERAI) starts on 1 January 1979 and is extended forward in near real time (Dee et al., 2011).It has a spectral T255 horizontal resolution, which corresponds to approximately 79 km in the grid-point space and employs a sequential 4D-var data assimilation scheme which ensures the optimal consistency between available observations and the model background.Full 3-D fields are stored 6 hourly, while a large number of surface parameters, including precipitation, are provided every 3 h.ERAI precipitation is a forecast product generated by the NWP model.Therefore, different forecast lead times can be used to calculate the monthly means.In this study, the monthly means of precipitation are calculated from the daily forecasts starting at 00:00 and 12:00 UTC with lead times +24 to +48 h (2nd day of forecast).The forecast lead time was based on several tests, and the main results are not greatly affected by this selection.A full evaluation of the optimal forecast lead time (spin-up/spin-down effects) for the monthly mean calculation is out of the scope of the current study, but it is expected to differ spatially.
The CAMS-OPI is a merged dataset produced by the NOAA Climate Prediction Center (CPC) combining satellite rainfall estimates from the Outgoing Longwave Radiation (OLR) Precipitation Index (OPI) with ground-based rain gauge observations from the Climate Anomaly Monitoring System (CAMS).The OPI estimates are computed from NOAA polar-orbiting IR window channel data, using the technique developed by Xie and Arkin (1998).While it is recognized that the OPI has significant limitations for many climate applications, the merged CAMS-OPI dataset is potentially useful for near-real-time applications.For example, Sohn et al. (2011) found that CAMS-OPI was reliable for monitoring large-scale precipitation variations over the East Asia region.Janowiak and Xie (1999) provide a complete description of the CAMS-OPI merged dataset, which is available from January 1979 to present in a 2.5 • × 2.5 • lat/lon regular resolution.
For research purposes, NOAA CPC encourage users to use instead either GPCP or CMAP (Xie and Arkin, 1997) merged climate rainfall datasets, both of which have better quality control measures and include satellite passive microwave rain estimates.Therefore, the Global Precipitation Climatology Project (GPCP) version 2.2 (Huffman et al., 2011) monthly precipitation was used in the following as a benchmark.It is available for the period January 1979 to December 2010 in a 2.5 • × 2.5 • lat/lon regular resolution.

Precipitation forecasting
Forecasted precipitation is the second required input to construct the drought forecasting system.In the implementation presented here this is provided by the most recent seasonal forecasting system at ECMWF (S4), which became operational in November 2011, issuing 51 ensemble members with 6 months lead time.S4 has the same horizontal resolution as ERAI (about 79 km) and is fully coupled with an ocean model with a horizontal resolution of 1 • .The initial perturbations are defined with a combination of atmospheric singular vectors and an ensemble of ocean analyses.Atmosphere model uncertainties are simulated using the 3-time-level stochastically perturbed parameterized tendency (SPPT) scheme and the stochastic back-scatter scheme (SPBS), which are also operational in the current ECMWF medium-range ensemble prediction system.The hindcast set is provided for calibration, covering a period of 30 yr (1981 to 2010) with the same configuration as the operational forecasts but only with 15 ensemble members.Molteni et al. (2011) presented an overview of S4 model biases and forecast performance.

Drought metric
Drought is predicted by means of the SPI (McKee et al., 1993).In the calculation of the SPI for a specific k timescale, the total precipitation for a certain month m (m = 1, .-Serrano, 2006).In this study the gamma distribution was chosen since it is commonly used.The fitting of the distribution used the approximation proposed by Greenwood and Durand (1960).

Drought forecasts
The extension of the SPI from the monitoring period, i.e. past, to the seasonal forecast range was performed by merging the seasonal forecasts of precipitation with the monitoring product and is a crucial step of the whole system.Firstly, both the forecasts and monitoring products have to be interpolated to a common resolution.This step will depend on the available data (resolution of monitoring and seasonal forecasts of precipitation) and final application of the drought forecasts.Two options are available: (i) downscale the forecasts to a higher resolution using a simple (for example bilinear interpolation) or more advanced methods (for example statistical downscaling) as was proposed by Yoon et al. (2012); or (ii) upscale the forecasts and monitoring to a coarser resolution or to a region.The second option has the advantage of reducing the amount of data to process, and it filters some of the intrinsic noise of grid-point precipitation from the dynamical models (Lander and Hoskins, 1997) and has been preferred in this exercise.The precipitation was therefore spatially aggregated (mass conserving) to a basin scale (the basin definitions are described in the end of this section).
Secondly, some care needs to be taken to the biases in the seasonal forecast.Uncertainties in S4 precipitation forecasts are mainly controlled by model biases (Di Giuseppe et al., 2012).These biases can drift over time, i.e. change with lead time, and can jeopardize the prediction skill.Moreover, since the merging procedure involved two different precipitation datasets, care has to be taken to ensure that the two datasets have the same mean climate.This is achieved by performing a simple bias correction of monthly precipitation in the form

E. Dutra et al.: Seasonal forecasts of droughts in African basins using the Standardized Precipitation Index
where P and P are the original and corrected seasonal forecast of precipitation, respectively; α is a multiplicative correction factor; and the subscripts "m" and "l" are the calendar month (1 to 12) and the forecast lead time (0 to 5 months), respectively.The correction factor is given by the ratio where P mon m is the multi-annual mean of precipitation of the monitoring dataset for a particular calendar month, and P m,l the multi-annual and ensemble mean of the forecasts for a particular calendar month m and lead time l.This simple bias correction does not address inter-annual variability and ensemble spread.More sophisticated bias correction methods (e.g.Yoon et al., 2012) are possible, but being mostly focused on spatially integrated quantities this was not deemed necessary.
To create a seamless transition from the monitoring to the forecast fields, the interpolated and bias-corrected S4 precipitation data were merged with the monitoring fields.This merge is a simple concatenation of the precipitation time series, performed for the entire S4 hindcast period.For a particular initial forecast date in month m (m = 1 [January 1981], . . ., 360 [December 2010]) the accumulated precipitation for SPI-k with lead time l is given by The application of Eq. ( 3) to all years and ensemble members for a specific calendar month (for example for the forecasts starting in February all months m = 2, 13, 25, . . ., 350) creates a sample of 450 (30 yr × 15 ensemble members) values of accumulated precipitation that are transformed to the normal space following the SPI calculation procedure described before.

Verification metrics
The forecasts were verified using different scores: anomaly correlation coefficient (ACC), continuous rank probability skill scores (CRPSS), relative operating characteristic (ROC) diagrams, and reliability (REL) diagrams.We selected one deterministic score, the ACC (which considers only the ensemble mean) and three probabilistic scores (which consider the forecast ensemble spread).The ACC and CRPS provide an integrated measure of the forecast quality (entire range of forecasts), while the ROC and reliability diagrams evaluate categorical forecasts (in our case drought detection, with the SPI below a certain threshold) and are recommended by WMO in the Standardized Verification System for Long Range Forecasts (WMO, 2012).The ACC was calculated as the ordinary correlation coefficient on the anomalies, i.e. removing the mean annual cycle.The CRPS (see Hersbach, 2000) can be interpreted as the integral of the Brier Score over all possible threshold values of the parameter under consideration.Since the CRPS is not a normalized measure, the related skill score (CRPSS) was used.In the skill score calculation the reference forecast was taken from the observational dataset to produce a climatological forecast (CLM) with the same ensemble size as the forecast, by randomly sampling different years.The ROC diagram (Mason and Graham, 1999) displays the false alarm rate (FAR) as a function of hit rate (HR) for different thresholds (i.e.fraction of ensemble members detecting an event) identifying whether the forecast has the attribute to discriminate between an event or not.The area under the ROC curve is a summary statistic representing the skill of the forecast system.The area is standardized against the total area of the figure such that a perfect forecasts system has an area of 1 and a curve lying along the diagonal (no information) has an area of 0.5.The reliability diagrams (Hamill, 1997) measure the consistency between predicted probabilities of an event and the observed relative frequencies and were used to assess the reliability and confidence of the forecasts.Details of the calculations are given by Hersbach (2000) for the CRPS and by WMO (2012) for the ROC and reliability diagrams.

Selection of the basins
The drought forecasting system was tested in four river basins of the African continent: Blue Nile (NB), Limpopo (LP), Upper Niger (NG) and Zambezi (ZB) (Table 1 and Fig. 1).The catchment definitions were taken from the river network and basins created by Yamazaki et al. (2009).The location of the basins was selected to sample different climatic regions of Africa, with different seasonal precipitation distributions.The regions are defined as hydrological basins instead of lat/lon limits since basins have a geographically meaningful draining to the same river.All the basins have a similar size (see Table 1) of about 3.5 km 2 × 10 5 km 2 , corresponding to approximately 60 grid points of ERAI or S4 and 4.5 grid points of GPCP and CAMSOPI.The possible ranges of basins sizes are not addressed in this study.The selection of the basin will mainly depend on prior knowledge of the region (precipitation patterns and variability) and underlying skill of the seasonal forecasts (avoiding merging region with different skill).Very small basins will not allow for the spatial filtering that reduces precipitation noise, while very large basins (e.g.entire Nile, Niger) might account for different climatic regions with different forecast skill.Basins definition (dark grey), and the full basin (dark and light grey).See also Table 1.

Quality of observations
Throughout the paper, GPCP version 2.2 is assumed to be the ground truth and is used as a benchmark for drought monitoring.However, the quality of this large-scale dataset is significantly influenced by station count and changes in the number of stations in time.Since all basins have a similar area (see Table 1), the differences in the station count and its change in time can potentially compromise the reliability of GPCP and its temporal homogeneity (essential for drought monitoring).
The analysis of the temporal evolution of the number of stations present in the Global Precipitation Climatology Centre (GPCC; Schneider et al., 2011), the underlying data used in GPCP over land, along with the error estimates provided by GPCP (Fig. 2), provides a qualitative overview of possible challenges in GPCP over each basins.In the NG there is a drop in the station count from the early 1980s to late 1990s of about 50 %, which was further reduced in the last decade.This is reflected in an increase of the error estimates during the last decade.In both BN and ZB basins, the number of stations is lower than in NG, being much lower (around 10) in the ZB.LP is the basin with a higher and stable number of stations except for a drop in the last 5 yr of the dataset.The number of stations present in CAMS-OPI (Fig. 2) is much lower than in GPCP over the selected basins, especially in NG, NG and ZB.This will impact its potential use of CAMS-OPI for real-time monitoring, which will be addressed in the next section.

Drought monitoring
Precipitation over the NG and the BN is controlled by the south to north and back progression of the West Africa monsoon.Peak rainfall occurs in the boreal summer (June-September, Fig. 3a and b), when the ITCZ moves to its far northern limit, producing disturbances that are dynamically linked to the African Easterly Jet.These are the first cause for the large-scale precipitation observed in the region at the monsoon onset.Westward propagating mesoscale disturbances generate the dominant convective systems.They feed into the large-scale disturbance only during late boreal summer (when enough moisture is available), changing the rainfall regime from frontal precipitation (June-July) to convective (August-September).The LP and ZB river basins are instead located in southern Africa and have their peak precipitation occurring during austral summer (Fig. 3c and d).
The rainy season is therefore generally out of phase with that in western Africa (i.e.dry (wet) western Africa corresponds to wet (dry) southern Africa).Although wave activity has not been identified, rainfall tends to be organized into mesoscale convective systems analogous to those in Sahelian West Africa.
In the BN and NG basins, the rainy season (June-September) is captured by all datasets, with an overestimation in the BN by ERAI (Fig. 3b).S4 forecasts overestimate precipitation in both BN and NG in the first month of forecasts with a reduction of the peak rainfall with lead time.This is an example of model drift with lead time, justifying the applied bias correction for each calendar month (initial forecast date) and lead time.In the LP and ZB basins all datasets show a reasonable agreement with GPCP, and S4 has a reduced drift in the mean cycle with lead time.
The temporal correlation of the 3-, 6-and 12-month SPI calculated with ERAI, CAMS-OPI and S4 first month of forecasts (S4L0) against the SPI calculated with GPCP (Table 2 and Fig. 4 for the SPI-12 time series, and Supplement for the SPI-3 and SPI-6 in Figs.S1 and S2, respectively) gives an overview of the potential quality of each dataset for drought monitoring in the regions.Both ERAI and CAMS-OPI have a good agreement with GPCP in LP for the different timescales, while in the remaining three basins the correlations are lower.In the NG and BN, the SPI derived from the first month of forecasts from S4 has a better agreement with GPCP than ERAI or CAMS-OPI, while in ZB all datasets display low correlations (Table 2).It should be noted that S4L0 has a better inter-annual variability of precipitation anomalies than ERAI or CAMS-OPI in the NG and BN regions (higher correlations of SPI-12 with GPCP in Table 2).African region which is likely to be associated with a substantial warm bias in the model due to an underestimation of aerosol optical depth in the region, as well as changes in the data entering the data assimilation system (Dee et al., 2011).This resulted in an unrealistic model drying that penalizes the SPI scores.The poor performance of CAMS-OPI, when compared with GPCP, in the NG, BN and ZB basins is most likely linked with the low number of stations used (Fig. 2), mainly due to the near-real-time restriction, since not all stations report in near real time.
The decrease in the number of rain gauges is a main limitation for the verification and monitoring precipitation in the different basins.This reduction is present both in GPCP (used for verification) and in CAMS-OPI (used for monitoring, see Fig. 2), while ERAI and S4 seasonal forecasts are not affected.The impact of these changes is not straightforward to address because (1) the decrease in the number of stations affects the dataset that we use for verification (GPCP) and (2) splitting the time series of precipitation into two periods would result in short series of data (16 yr) being transformed to SPI, therefore increasing the uncertainty Table 2. Temporal correlation of the GPCP 3-, 6-and 12-month SPI and ERAI, CAMS-OPI and S4L0 (each column) for the different basins.The correlations are given by the lower and upper bounds for a 95 % confidence interval.Bold (underlined) intervals indicate the dataset with higher (or lower) correlation and with confidence intervals outside of the range of the remaining datasets for each SPI timescale and basin.  of the transformation.Table 3 presents the temporal correlations between GPCP SPI and the remaining datasets for the different SPI timescales and basins, considering the first and second half of the period but keeping the full period of precipitation in the SPI calculations.The correlations between GPCP and S4L0 do not change significantly.For ERAI there are only significant changes in the BN, with an increase of the correlations from the first to the second half of the period.This is likely associated with changes in the ERAI precipitation due to changes in the amount/quality of observations entering the data assimilation system.The main changes occur in CAMS-OPI with a decrease in the correlations from the first to the second half of the period in the NG, BN and ZB basins, which are the basins with larger reduction of rain gauges.In particular, the significant reduction in the ZB basins is associated with a drop to zero stations used by CAMS-OPI in the region from 1999 onwards (see Fig. 2).The spatial patterns of the temporal correlations of the SPI-3 and SPI-12.calculated from the different products are in agreement with GPCP in southern and north-west www.hydrol-earth-syst-sci.net/17/2359/2013/  Africa, while in central Africa (between the 20 • N/S parallels) ERAI and CAMS-OPI have low or non-significant correlations, especially for the SPI-12 (Fig. 5).S4L0 has in general a lower variability than ERAI or CAMS-OPI, except over a latitudinal band south of Sahel (including the NG and BN basins), being a good candidate for drought monitoring in those regions, considering the poor performance of ERAI and CAMS-OPI.The lower variability of S4L0 (compared with ERAI) can be primarily attributed to the longrange integrations of the coupled atmosphere-ocean model and to the tendency of the forecasts to predict climatological conditions.

Drought forecasting
The skill of the seasonal forecasts of SPI will depend on the skill of the underlying seasonal forecasts of precipitation and on the quality of the monitoring (for long SPI timescales and short lead times, where the monitoring dominates over the forecast).These two components of the skill can be separately analysed, by (i) accessing the skill of the seasonal forecasts of precipitation and (ii) evaluating the potential skill of the SPI seasonal forecast, i.e. using a perfect monitoring product.

Precipitation monitoring skill
Over LP, NG and BN, S4 has skill in the first month of forecasts, explaining the good performance of the SPI calculated using S4L0 when compared with ERAI and CAMS-OPI, especially in the NG and BN basins (Fig. 6).This can be primarily attributed to the predictability coming from the landatmosphere initial conditions that will dominate the first days of the forecast.With increasing lead time, there is a general drop in skill that is only present in regions/seasons associated with large-scale climate forcings that can be captured by the coupled atmosphere-ocean modelling system.In both NG and BN, S4 has skill up to 2/3 months lead time for the forecasts valid between June to September, which is also the main rainy season, while in the LP a similar skill with lead time is also found during November to February, also the rainy season.In the ZB, S4 has a reduced skill (only 3 months at 0 lead time), which is also visible in ERAI and CAMS-OPI.However, ZB was also the basin with a lower number of rain gauges included in GPCP, therefore being the most uncertain in terms of verification.

Forecast skill of the benchmark
The potential skill of the SPI forecasts was evaluated by merging the S4 precipitation with GPCP to create a benchmark of the different SPI timescales for the seasonal forecasts described above.This method isolates the contribution of the seasonal forecasts of precipitation to the SPI skill, avoiding the problems of the different monitoring products.On a regional scale, this can be adapted by using local information, such as long-term rain gauges and/or gridded precipitation datasets.The SPI seasonal forecasts using GPCP + S4 were benchmarked against forecasts using the same monitoring merged with climatological forecasts (CLM) created by randomly sampling different 15 yr (same ensemble size as S4) of GPCP.
The ACC of the SPI-12 is very close to 1 in all basins for 0 and 1 months lead time (Fig. 7e-h).In this case, the SPI-12 is built from 11 or 10 months of the monitoring and 1 or 2 months of the seasonal forecasts for the 0 and 1 months lead time, respectively.For the short lead times, the monitoring dominates the ACC of the SPI-12, which yields scores close to 1 since the verification is done against the same dataset used for monitoring.In the SPI-3 the ACC for the 0 lead time is already lower than in the SPI-12, and it rapidly drops to low values or not significant in regions/periods with low or no precipitation predictability.For long lead times, there is a drop in the SPI-12 skill in particular for the verification in the calendar months after the rainy season.This is associated with the different weight of the monitoring and forecast in regions with a pronounced annual cycle.The SPI-12 forecasts valid before the rainy season will tend to have a higher skill since the core precipitation information comes from the monitoring (year before), while the forecasts valid right after the rainy season will rely on the S4 seasonal prediction.The CRPSS identifies the verification months and lead time where the SPI forecasts using S4 outperform a simple climatological forecast (Fig. 7i-p).Those periods are consistent with higher ACC of S4 compared with CLM (with symbols in Fig. 7a-h) and reflect the underlying skill of S4 precipitation (Fig. 6).
The previous skill analysis, based on ACC and CRPSS, considered the full range of SPI forecasts.For drought detection/early warning, ROC and REL diagrams (among others like the Brier Score) are a useful tool for testing categorical forecasts, i.e. event or no event.A drought event is defined as SPI < −1.The ROC diagrams in Fig. 8 of the SPI-3 and SPI-6 represent the skill of using only precipitation forecasts (no monitoring), while in the SPI-12 6 months of monitoring are merged with 6 months of seasonal forecasts.The ROC scores of CLM are close to 0.5 (no information) in all basins for the SPI-3 and SPI-6 at 2 and 5 months lead time since these are just a random climatological sampling (Fig. 8).On the other hand, S4 has skill in drought detection in the NG, BN and LP, and no skill in ZB.For the SPI-12 at 5 months lead time, the ROC of CLM is always around 0.8 and was outperformed by S4 in the NG, BN and LP.
The reliability diagrams (Fig. 9) further support the previous results, showing that SPI-3 and SPI-6 with 2 and 5 months lead time, respectively, are reliable in the NG, BN and LP and tend to be over-confident (reliability curves with In the colour matrix, the horizontal axis represents the verification month and the vertical axis the lead time (months).In the ACC S4 forecasts are compared with GPCP, and the white circles indicate that the S4 ACC > CLM ACC by at least 0.05.In the CRPSS panels, S4 CRPS is benchmarked against the CPRS of CLM. a slope < 1; Fig. 9).While the ACC and ROC evaluation indicated a clear difference between S4 and CLM forecasts for the SPI-12 at 5 months lead time, the reliability diagrams show similar results, with slopes of the reliability curves close to 1 with S4 being under-confident (slopes > 1) in the BN, NG and LP.The variation of ROC and ROC skill score (ROCSS) with lead time are summarized in the following results (Fig. 10): (i) in the ZB S4 is similar to a climatological forecast, i.e. no skill, while it outperforms CLM in the NG, BN and LP; (ii) in the SPI-3 the 2 months lead time (using the first 3 months of the seasonal forecast) and in the SPI-6 the 5 months lead time (using the first 6 months of the seasonal forecast) have the highest skill scores; (iii) the skill score of the SPI-12 is reduced, i.e. it is difficult to beat a climatologybased forecast for long SPI timescales, where the monitoring dominates; and (iv) SPI S4 forecasts are never worse than CLM, and CLM skill is only driven by the accumulated effect of the monitoring.

Seasonal forecast skill
The potential skill allows a clear understanding of the importance and impact of the skill of S4 precipitation, but for a near-real-time operational implementation GPCP is not available.Therefore, a similar analysis to the previous section was performed using other precipitation products that have long-term records and are available in near real time to assess the actual predictive skill of the merged forecast.
The ROC scores for the near-real-time forecasts are equal from 2 months lead time onwards for the SPI-3, and for the 5 months lead time in the SPI-6 since these do not include precipitation from the monitoring (Figs. 10 and 11).In the NG and BN, ERAI and S4L0 were similar, outperforming CAMS-OPI although having a similar skill with 0 months lead time to using GPCP as monitoring at 2 months lead time.This means that the problems identified in those datasets (Sect.3.2) lead to a reduction of skill of 2 months in the NG and BN and 1 month in LP for the SPI-3.For the SPI-6 the skill reduction is between 3 and 4 months, while for the SPI-12 only CAMS-OPI is able to reach similar skill to GPCP at 5 months lead time in the LP basin.These results highlight the role of the precipitation monitoring quality for SPI seasonal forecasts, showing that significant gains in skill can be obtained by using good-quality observation/modelled precipitation.
An example of a drought event showing the evolution of the forecasts is represented in Fig. S3 in the Supplement.We selected the 1991/1992 drought in the Limpopo region (see Fig. 4) that caused humanitarian crises in several countries in southern Africa associated with crop failures and livestock mortalities among other factors (FAO, 2004).This  and for the four basins (rows).Each panel compares the ROC of S4 forecasts using GPCP (square symbols), CAMS-OPI (triangle up symbols), ERAI (circle symbols) and S4L0 (triangle left symbols) as monitoring.Note that for the SPI-3 all ROC scores are the same from 2 months lead time onwards and for the SPI-6 for 5 months lead time since in those lead times only S4 precipitation is used.
-the system 4 seasonal forecast has predictive skill in comparison with climatology in the Niger, Blue Nile and Limpopo and no skill in the Zambezi basin; and -poor-quality monitoring products can reduce the potential skill of SPI seasonal forecasts with 2 to 4 months lead time.
The generally low number of rain gauges and their decrease in the recent years (used both in GPCP and CAMS-OPI) is the main limitation for the verification and monitoring of droughts in the different basins.This will potentially impact the skill of the combined forecasts and their verification, reinforcing the need for a strong investment in climate monitoring.A proper evaluation of changes in the skill of the forecasts between different periods (for example the first and second half of the records) would require a deeper study, in particular the impact of using a reduced number of data (for example 16 yr) for the SPI transformation.This would be very informative, with further implications since there are other precipitation datasets available in near real time but with shorter records (for example the Tropical Rainfall Measurement Mission (TRMM)) that could be used for drought monitoring.
The methodology presented in the paper to merge a monitoring and seasonal forecasts of precipitation on a regional scale can be adapted by using other sources of precipitation for the monitoring (e.g. in situ rain gauges, gridded precipitation datasets, remote sensing estimates) and seasonal forecasts (e.g.other systems, multi-model approaches, statistical methods).This methodology can be also applied on a gridpoint basis, following downscaling methods as proposed by Yoon et al. (2012), but care has to be taken when interpreting seasonal-scale predictions of precipitation on local scales.Furthermore, the role, quality and skill of other drought indicators (e.g. based on soil moisture, river discharge, evaporation) has to be established, but such work will be highly dependent on the availability of reliable monitoring networks.
Fig.1.Basins definition (dark grey), and the full basin (dark and light grey).See also Table1.

Fig. 2 .Fig. 3 .
Fig. 2.Temporal evolution of the mean annual number of stations present in GPCC (dashed lines) and CAMS-OPI (circle symbols) in each basin and GPCP error estimates normalized by the mean precipitation (solid line).CAMS-OPI number of stations was multiplied by 5.

Fig. 4 .
Fig. 4. Evolution of the 12-month SPI in the different basins given by S4L0 (first forecast month), CAMS-OPI, GPCP and ERAI precipitation.The horizontal ticks represent January of each year.

Fig. 7 .
Fig. 7. Anomaly correlation coefficient (ACC) of the seasonal forecasts of SPI-3 (a-d) and SPI-12 (e-h), and continuous rank probability skill score (CRPSS) of SPI-3 (i-l) and SPI-12 (m-p).In the colour matrix, the horizontal axis represents the verification month and the vertical axis the lead time (months).In the ACC S4 forecasts are compared with GPCP, and the white circles indicate that the S4 ACC > CLM ACC by at least 0.05.In the CRPSS panels, S4 CRPS is benchmarked against the CPRS of CLM.

Fig. 9 .Fig. 10 .
Fig.9.Reliability diagrams (CLM, S4) and frequency histograms (fCLM, fS4) for SPI < −1 forecasts produced by S4 (black lines and white bars) and CLM (grey lines and bars).For perfect reliability the curves should fall on top of the dotted diagonal line.The thin solid lines (CLM * and S4 * ) are the weighted least-squares regression lines of the reliability curves, and the slope of each curve is displayed in each panel.Each panel represents a particular basin (column) and SPI timescale (rows), with the same organization as in Fig.8.

Fig. 11 .
Fig. 11.Relative operating characteristic (ROC) of the S4 SPI forecasts as a function of lead time (months) for different timescales (columns) and for the four basins (rows).Each panel compares the ROC of S4 forecasts using GPCP (square symbols), CAMS-OPI (triangle up symbols), ERAI (circle symbols) and S4L0 (triangle left symbols) as monitoring.Note that for the SPI-3 all ROC scores are the same from 2 months lead time onwards and for the SPI-6 for 5 months lead time since in those lead times only S4 precipitation is used.

Table 3 .
Temporal correlation of thefor the different basins.The correlations between [c1 c2] indicate the correlations in the first half of the record (c1: 1979 to 1994) and second half of the record (c2: 1995 to 2010).The bold values indicate that the difference between c1 and c2 is statistically significant at 99 % (using a Fisher transformation).