Primer on prediction models and cluster analysis

HESS

Hydrology and Earth System Sciences

HESS

Hydrol. Earth Syst. Sci.

1607-7938

Copernicus Publications

Göttingen, Germany

10.5194/hess-22-143-2018

Does objective cluster analysis serve as a useful precursor to seasonal precipitation prediction at local scale? Application to western Ethiopia

Zhang

Ying

Moges

Semu

Block

Paul

paul.block@wisc.edu

https://orcid.org/0000-0003-1993-7496

1Department of Civil and Environmental Engineering, University of Wisconsin-Madison, Madison, WI, USA 2School of Civil and Environmental Engineering, Addis Ababa University, Addis Ababa, Ethiopia

Paul Block (paul.block@wisc.edu)

9January2018

22 1 143157 7February2017 21February2017 1November2017 12November2017

This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/3.0/

This article is available from https://hess.copernicus.org/articles/22/143/2018/hess-22-143-2018.html

The full text article is available as a PDF file from https://hess.copernicus.org/articles/22/143/2018/hess-22-143-2018.pdf

Prediction of seasonal precipitation can provide actionable information to guide management of various sectoral activities. For instance, it is often translated into hydrological forecasts for better water resources management. However, many studies assume homogeneity in precipitation across an entire study region, which may prove ineffective for operational and local-level decisions, particularly for locations with high spatial variability. This study proposes advancing local-level seasonal precipitation predictions by first conditioning on regional-level predictions, as defined through objective cluster analysis, for western Ethiopia. To our knowledge, this is the first study predicting seasonal precipitation at high resolution in this region, where lives and livelihoods are vulnerable to precipitation variability given the high reliance on rain-fed agriculture and limited water resources infrastructure. The combination of objective cluster analysis, spatially high-resolution prediction of seasonal precipitation, and a modeling structure spanning statistical and dynamical approaches makes clear advances in prediction skill and resolution, as compared with previous studies. The statistical model improves versus the non-clustered case or dynamical models for a number of specific clusters in northwestern Ethiopia, with clusters having regional average correlation and ranked probability skill score (RPSS) values of up to 0.5 and 33 %, respectively. The general skill (after bias correction) of the two best-performing dynamical models over the entire study region is superior to that of the statistical models, although the dynamical models issue predictions at a lower resolution and the raw predictions require bias correction to guarantee comparable skills.

Primer on prediction models and cluster analysis

Seasonal precipitation prediction can provide potentially actionable information to guide management of various sectoral activities. For instance, precipitation prediction is often translated into a hydrological forecast, which can be used to optimize reservoir operations, provide early flood or drought warning, inform waterway navigation, etc. As a primary input to soil moisture, precipitation prediction is also essential to agricultural management – farmers can take advantage of anticipated preferable climatic conditions or avoid unnecessary costs under expected undesirable conditions. Two types of models are commonly used for seasonal precipitation prediction: statistical and dynamical. Dynamical models, such as general circulation models (GCMs), include complex physical climate processes, while statistical models are purely data driven, relating observations and hydroclimate variables directly.

While both modeling approaches have produced skillful seasonal predictions for a variety of applications (e.g., Barrett, 1993; Hammer et al., 2000; Shukla et al., 2016), each has noteworthy drawbacks. Dynamical models often require a great amount of time to build and parameterize, whereas statistical models require considerably fewer resources (e.g., Mutai et al., 1998; Gissila et al., 2004; Block and Rajagopalan, 2007; Diro et al., 2008, 2011b; Block and Goddard, 2012). Dynamical models also suffer from their high sensitivity to initial uncertain conditions, particularly given a long lead time. Consequently, a number of simulations are typically produced, each with unique initial conditions, to provide a range of possible outcomes (e.g., Roeckner et al., 1996; Anderson et al., 2007). Furthermore, the outputs from dynamical models often require additional bias correction, typically using statistical methods, to better match observations (e.g., Ines and Hansen, 2006; Block et al., 2009; Teutschbein and Seibert, 2012). Statistical models, on the other hand, are highly dependent on substantial high-quality historical data to capture hydroclimatic patterns and signals, particularly extreme conditions, which is often not available. Additionally, statistical models are often linear by construction and may not well capture non-linear complex interactions and feedbacks. The physical nature of dynamical models, however, allows for prediction under non-stationary conditions and also when insufficient historical data are available, whereas statistical models, by construction, typically rely on stationary relationships (Schepen et al., 2012).

The spatial extent selected for statistical seasonal prediction is critical. It is not uncommon to simply assume homogeneity in precipitation across an entire study region; however, this limits addressing potential spatial variability. While this may be suitable for very broad regional planning, it is often ineffectual for operational and local-level decisions, particularly for locations with high spatial variability. This prompts the need for delineation of subregional-scale homogeneous regions, often defined through cluster analysis. Defining these homogeneous regions, however, is a non-trivial process. There are a variety of methods to delineate homogeneous regions, including comparing annual cycles (e.g., unimodal and bimodal distributions in precipitation) between stations (or grid cells), comparing station correlations with regional averages, applying empirical orthogonal functions (EOFs), various clustering techniques, and other methods of increasing complexity (e.g., Parthasarathy et al., 1993; Mason, 1998; Landman and Mason, 1999; Gissila et al., 2004; Diro et al., 2008, 2011b; Singh et al., 2012). In addition, delineation of the subregion size is also important to consider. Smaller-sized homogeneous subregions do not necessarily lead to improved predictions, as the noise at overly small scales can dominate any real signals representing spatial coherency of precipitation. For additional discussion regarding defining homogeneous subregions and cluster analysis, the reader is referred to Zhang et al. (2016) and Badr et al. (2015).

Spatial and temporal variability of June–September seasonal total precipitation in western Ethiopia: (a) spatial pattern of temporal-average and (b) spatial-average time series.

Application to western Ethiopia and objectives of the study

Precipitation in western Ethiopia peaks in the summer with approximately 70 % of annual total precipitation falling during the main raining season – also known as the Kiremt season, spanning from June to September (JJAS). On average, the seasonal total precipitation in the study region is approximately 760 mm; however, in the northwest, precipitation can exceed 1200 mm (Fig. 1a). Along with the high spatial variability in this mountainous region, the temporal variability is also remarkable with spatial-average seasonal total precipitation ranging from 650 mm in dry years up to 900 mm in wet years (Fig. 1b). These highly variable spatial and temporal precipitation patterns have made skillful seasonal predictions challenging, particularly at local scales (e.g., Gissila et al., 2004; Block and Rajagopalan, 2007).

The climate mechanisms affecting JJAS precipitation patterns in western Ethiopia are quite complex. Sea surface temperatures (SSTs) in the equatorial Pacific Ocean representing the well-known El Niño–Southern Oscillation (ENSO) phenomenon are considered a primary indicator of JJAS precipitation variability, with El Niño/La Niña often associated with deficit/excess of JJAS precipitation across the study region (e.g., NMSA, 1996; Camberlin, 1997; Bekele, 1997; Segele and Lamb, 2005; Diro et al., 2011a; Elagib and Elhag, 2011). Additionally, there is evidence of direct moisture transport from the Gulf of Guinea (equatorial Atlantic Ocean), the Indian Ocean, and the Mediterranean Sea, affecting Ethiopia's summertime precipitation (Viste and Sorteberg, 2013a, b). These moisture fluxes are often related to pressure patterns across the continent. For instance, the St. Helena High over the southern Atlantic Ocean or a high pressure over the Gulf of Guinea, coupled with a simultaneous low pressure over the Indian Ocean or a monsoon trough over the Arabian Peninsula, all bring intensified westerlies and southwesterlies that transport moist air across the Congo Basin to the western Ethiopian highlands (Segele et al., 2009; Williams et al., 2011). Similarly, the southwest Asian monsoon in the Indian Ocean, which has a strong positive relationship with concurrent JJAS precipitation in western Ethiopia, is associated with the Mascarene High over the southern Indian Ocean and a low pressure system near Bombay. During this monsoon season, the southeast trade winds in the Southern Hemisphere are channeled by the east African highlands while crossing the Equator and become a southwest monsoon flow. They are further diverted by the Turkana Channel, enhancing convergence with the westerlies/southwesterlies above the western Ethiopian highlands and bringing moisture to the region (Kinuthia, 1992; Nicholson, 1996, 2014; Camberlin, 1997; Slingo et al., 2005; Segele et al., 2009). In addition, the effect of other hydroclimate variables, such as Indian Ocean SST, local and regional atmospheric pressure systems (e.g., Azores High) also have notable influence on Ethiopia's precipitation variability (e.g., Kassahun, 1987; Tadesse, 1994; NMSA, 1996; Shanko and Camberlin, 1998; Goddard and Graham, 1999; Latif et al., 1999; Black et al., 2003; Segele and Lamb, 2005). Consequently, these large-scale climate variables may serve as potential predictors in statistical seasonal precipitation prediction models.

Ethiopia is vulnerable to fluctuations in precipitation given its reliance on rain-fed agriculture and limited water resources infrastructure. The majority of agriculture and infrastructure are in western Ethiopia, where water resources are relatively rich compared to other parts of the country (Awulachew et al., 2007). Operational precipitation predictions in Ethiopia have been issued by its National Meteorological Agency (NMA) since 1987 using an analog methodology (i.e., locating a similar climate condition in the past – an analog – to predict future conditions); however, this approach has produced only marginally skillful outcomes (Korecha and Sorteberg, 2013). For NMA's prediction, the country is divided into eight homogeneous regions for which NMA produces independent predictions. Similarly, others have also addressed seasonal prediction in Ethiopia contingent on both temporal and spatial precipitation patterns. Gissila et al. (2004) divide Ethiopia into four regions conditioned on the seasonal cycle and interannual variability coherence prior to prediction, while Diro et al. (2009) apply a similar approach but with dynamic cluster boundaries, allowing for different delineations for each rainy season. Segele et al. (2015) consider statistical precipitation predictions across Ethiopia as a whole, as well as for northeastern Ethiopia and at two Ethiopian cities. Block and Rajagopalan (2007) predict the average summertime (JJAS) precipitation over the upper Blue Nile Basin – a region they claim is homogenous at interannual timescales. Korecha and Barnston (2007) select an all-Ethiopia average precipitation index to characterize predictability broadly, with minimal attention to operational-level predictions. All of these studies focus on predicting regional average precipitation based on subjective clustering methods applying a limited number of stations or coarsely gridded data; no local predictions at a finer spatial scale are explored.

This study moves forward by exploring local-level seasonal precipitation prediction through the use of regional-level predictions, based on previous cluster analyses over western Ethiopia (Zhang et al., 2016). The advantages of defining homogeneous regions for seasonal prediction at operational (small) scales will be demonstrated by comparing approaches with and without undertaking a cluster analysis a priori. The combination of objective cluster analysis, spatially high-resolution prediction of seasonal precipitation, and a modeling structure spanning statistical and dynamical approaches makes clear advances compared to previous studies.

Modeling high-resolution seasonal prediction

To evaluate high-resolution seasonal precipitation with versus without cluster analysis a priori, statistical models are developed and further compared with bias-corrected dynamical model predictions. Four scenarios are evaluated based on two criteria: (1) clustered vs. non-clustered and (2) direct vs. indirect. In the clustered case, predictions are produced for each homogeneous region (cluster) given a unique set of predictors. In the non-clustered case, the entire study region is considered as one cluster, and thus only one set of predictors is utilized for predictions. For the direct case, precipitation is predicted directly at the local level (grid scale); for the indirect case, the average precipitation within each homogeneous region is predicted first (as an intermediary) and then regressed to local-level (grid-scale) predictions. Combinations of the two criteria form four scenarios – clustered direct (C-D), non-clustered direct (NC-D), clustered indirect (C-I), and non-clustered indirect (NC-I) predictions.

Cluster analysis

Using a k-means clustering technique, western Ethiopia – the major agricultural region of the country – is divided into eight homogeneous regions (Fig. 2), conditioned on the interannual variability of total precipitation in JJAS, the same variable that is to be predicted. Precipitation is based on a 0.1∘ × 0.1∘ gridded precipitation dataset from NMA (Dinku et al., 2014), consisting of 7320 grid cells across 1983–2011 (29 years). This product has been verified against station data and has been deemed representative of observed precipitation in western Ethiopia (Dinku et al., 2014). Given the high-resolution gridded dataset, k-means clustering is performed for a range of predefined numbers of clusters; the optimal number of clusters is identified by various evaluation metrics based on the within-cluster sum of square errors (WSS), including an elbow method with difference in WSS, gap statistics with difference in difference, and qualitative analysis on post-visualization of clusters. During the clustering process, each grid cell is assigned and reassigned to clusters until the WSS is minimized. This does not require any subjective delineation or manual delineation of boundaries between clustered stations or grid cells; instead, an automated and objective delineation is performed. The mean time series of each cluster illustrates high intracorrelation within the cluster and low intercorrelation between any two clusters, indicating strong coherency of the clustering results. For a detailed analysis including a complete correlation table and unique patterns for each cluster-level time series associated with large climate variables, readers are referred to Zhang et al. (2016).

Regionalization map of eight homogeneous regions marked by different colors, with country boundaries and river profiles. The figure is based on Zhang et al. (2016).

Statistical modeling approach

Many studies have investigated statistical models for seasonal climate prediction. These studies vary by preclassification of predictor or predictand regime, predictor selection process, and statistical methods. For example, Hertig and Jacobeit (2011) investigate SST regimes as potential predictors for subsequent precipitation and temperature in the Mediterranean region. Through techniques including multiple applications of principal component analysis (PCA), 17 stationary SST regimes were identified. Gerlitz et al. (2016) apply a k-means cluster analysis to grid cells identified with significant correlations in the predictor field in order to facilitate predictor selection. Suárez-Moreno and Rodríguez-Fonseca (2015) investigate stationarity based on a long time series using a 21-year moving correlation window. The statistical prediction models are then applied to each stationary period, respectively, and the entire period for comparison. Despite diverse methods in seasonal prediction, multiple linear regression (MLR) is favored by many as a statistical modeling approach given its well-developed theory, simple model structure, efficient processing, and often skillful outcomes (e.g., Omondi et al., 2013; Camberlin and Philippon, 2002; Diro et al., 2008). As mentioned, only a few studies have focused on seasonal precipitation prediction in Ethiopia (Gissila et al., 2004; Block and Rajagopalan, 2007; Korecha and Barnston, 2007; Diro et al., 2008, 2011b; Segele et al., 2015), and almost all of them include the applications of MLR. This study also applies MLR to predict seasonal precipitation, yet differentiates from other studies by applying predictions to predefined homogeneous regions and further translating to local-level predictions.

Season-ahead (March–May) or month-ahead (May) large-scale climate variables that are physically relevant in potentially modulating moisture transport to the basin (or cluster) are selected as potential predictors. Four climate variables are selected here for further evaluation based on outcomes of the aforementioned prediction studies: SST, sea-level pressure (SLP), geopotential height (GH) at 500 mb, and surface air temperature (SAT). All climate variables are from the National Centers for Environmental Prediction and National Center for Atmospheric Research (NCEP/NCAR) reanalysis dataset (Kalnay et al., 1996) at a 2.5∘ × 2.5∘ grid scale.

Those potential predictors are first transformed through PCA (Jolliffe, 2002). PCA is a common approach in climate modeling to reduce the dimensionality of predictors and remove multi-collinearity, while simultaneously extracting the most dominant signals from the potential predictors, typically reflected in the first few principal components (PCs). Since PCA is independent of the predictand, retaining the first few PCs as predictors, in lieu of the original variables, also helps to reduce artificial prediction skill.

Flow chart of data processing for predictors into the statistical model. Numbers framed by dashed lines correspond to the procedures listed in the context. Note: pre. – precipitation, t-s – time series, avg. – average.

Subsequently, a certain number of PCs are used as the direct inputs into a MLR model, otherwise known as the principal component regression (PCR). PCR is performed in a “drop-one-year” cross-validation mode to reduce overfitting effects and therefore avoid overestimation of prediction skill. This requires reconstructing the principal components for the dropped year and then multiplying the coefficient estimates with each reconstructed PC, respectively, in order to obtain the final predicted value for the dropped year (e.g., Block and Rajagopalan, 2009; Wilks, 2011). A detailed methodology is provided below.

Justifiable climate regions globally for selecting predictors. (a) SLP and GH at 500 mb with regions including EP, ES, LO, AH, SH, MH, and AM. For SAT, only LO is included. (b) SST with regions including EP, NI, SI, and AT. Note: EP – equatorial Pacific region, ES – Tahiti island for ENSO measurement, LO – local region, AH – Azores High, SH – St. Helena High, MH – Mascarene High, AM – southwest Asian monsoon, NI – North Indian Ocean, SI – South Indian Ocean, AT – equatorial/South Atlantic Ocean.

Correlation map between mean JJAS seasonal precipitation time series in Cluster 5 and global SST under cross-validation, with correlations lower than the 99 % significance level masked out (one-tail test).

(a) Bias correction of North American Multi-Model Ensemble (NMME) predictions using probability mapping; (b) precipitation time series from NMME (colored lines) before and after correction, compared to observations (black line). Examples are shown for randomly selected six grid cells.

Equations of linear regression panel models under four scenarios.

Non-clustered Clustered Direct

Yi,t=α̃i+β̃iXt+εi,t

(1a)

Yiϵj,t=α̃i+β̃iXj,t+εi,t

(2a)

Ỹi,t=α̃i+β̃iXt

(1b)

Ỹiϵj,t=α̃i+β̃iXj,t

(2b) Indirect

Ym,t=α̃+β̃Xt+εt

(3a)

Yj,t=α̃j+β̃jXj,t+εj,t

(4a)

Ỹm,t=α̃+β̃Xt

(3b)

Ỹj,t=α̃j+β̃jXj,t

(4b)

Yi,t=η̃i+γ̃iỸm,t+νi,t

(3c)

Yi∈j,t=η̃i+γ̃iỸj,t+νi,t

(4c)

Ỹi,t=η̃i+γ̃iỸm,t

(3d)

Ỹi∈j,t=η̃i+γ̃iỸj,t

(4d)

Y – predictand of JJAS seasonal total precipitation; X – predictors of top PCs; ε, ν – error terms; Ỹ – predicted values of JJAS seasonal total precipitation; α̃, β̃, η̃, γ̃ – estimated coefficients; i – grid-cell index; t – time (year) index; j – cluster index; i∈j – grid cell i that belongs to cluster j; m – mean over entire study region that is equivalently the only cluster.

To avoid overfitting, the entire process including predictor selection and statistical modeling is processed using cross-validation. To start, drop-one-year precipitation observations for JJAS averaged across the region and each cluster are spatially correlated independently with each global climate variable. As a result, there are total of 1044 global correlation maps given the 29-year time series, eight clusters plus one non-cluster, and four climate variables. Hence, a program to automatically select highly correlated and justifiable regions as predictors is developed. The following steps describe the subsequent statistical modeling process (Fig. 3): 1.

Grid cells within each justifiable region (e.g., equatorial Pacific; Fig. 4) with correlation above the 99 % significance level are identified (Fig. 5). For regions containing grid cells with both positive and negative correlations, the number of the identified grid cells in each sign is counted. If a greater number of grid cells is associated with significant positive correlation, for example, only grid cells with positive correlations are kept for the following steps, and vice versa.

The top 10 % of the identified grid cells with the highest correlation in each region are then selected in order to boost the potential model skill.

For each region, data of the selected grid cells within the region are spatially averaged (defined as “prepredictors”).

Prepredictors are standardized, combined, and transformed through PCA for each cluster or non-cluster and each dropped-year analysis separately.

5a.

The top PCs from the PCA with a total of 95% variance explained are used as predictors in PCR. For the direct case, PCR is used to directly predict the grid-level precipitation; for the indirect case, PCR is used to predict the intermediate cluster-level precipitation.

5b.

For the indirect case only, cluster-level predictions are regressed to the grid level. Note that the downscaling of cluster-level predictions to grid-level predictions is also cross-validated to avoid overfitting.

For the four scenarios, the model structures are quite similar but have subtle differences which could lead to evidently different outcomes (Table 1). Under the NC-D (Eq. 1a, b) and C-D scenarios (Eq. 2a, b), the time series of JJAS seasonal total precipitation in each grid cell (i.e., at local level) is used as the direct predictand (Yi,t); however, the NC-D and C-D scenarios differ, as the former uses the same predictors (Xt) across all the grid cells, while the latter uses different predictors according to the cluster to which the grid cell is assigned (Xj,t). In the indirect case, the cluster-level time series of JJAS seasonal total precipitation (the time series averaged over all grid cells that belong to a given cluster, Ym,t or Yj,t) is first predicted (Eqs. 3a, b and 4a, b). The predicted intermediate product (Ỹm,t or Ỹj,t) is then used as the only regressor in the second step to estimate the grid-level precipitation (Ỹi,t or Ỹiϵj,t for every j; Eqs. 3c, d and 4c, d). Again, for the C-I scenario, predictors in the first step are unique for each of the eight clusters and grid cells within that cluster (Xj,t), while predictors are identical for all grid cells (Xt) under the NC-I scenario. A 95 % confidence interval of the cross-validated predictions is also constructed conditioned on model errors. Q-Q plots are evaluated to verify normally distributed residuals (results not included).

Cluster-level predictions and observations under the C-I and NC-I scenarios, with drop-one-year cross-validation. The 95 % envelope shows the 95 % confidence interval constructed using model errors.

Pearson correlations between grid-level observations and predictions under the four scenarios, with the clustering boundary delineated roughly in black.

Dynamical modeling approach

The North American Multi-Model Ensemble (NMME; Kirtman et al., 2014) is an experimental multi-model system consisting of coupled dynamical models from various modeling centers in North America. To our knowledge, it is also the most extensive multi-model seasonal prediction archive. The NMME provides gridded climate predictions that cover regions globally and with different lead times. The hindcasts of monthly mean precipitation are easily accessible through the International Research Institute for Climate and Society (IRI) website (http://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/) and can be easily aggregated to seasonal totals for comparison with the statistical model results in this study. Therefore, NMME JJAS seasonal precipitation predictions (1∘ × 1∘ grid cells) are extracted from model ensembles that cover the same time period (1983–2011), geographic region (western Ethiopia), and with the same lead time (predictions made on 1 June). A subset of 10 NMME models meet these criteria and are retained for further evaluation: (1) COLA-RSMAS-CCSM3, (2) COLA-RSMAS-CCSM4, (3) GFDL-CM2p1, (4) GFDL-CM2p1-are04, (5) GFDL-CM2p5-FLOR-A06, (6) GFDL-CM2p5-FLOR-B01, (7) IRI-ECHAM-AnomalyCoupled, (8) IRI-ECHAM-DirectCoupled, (9) NASA-GMAO, and (10) NCEP-CFSv2. The names are kept the same as on the International Research Institute for Climate and Society (IRI) data repository website.

Grid-level RPSS (%) under the four scenarios using climate variables as predictors, with the clustering boundary delineated roughly in black.

The NMME predictions for each of the 10 models are bias corrected by applying probability mapping (e.g., Block et al., 2009; Teutschbein and Seibert, 2012; Chen et al., 2013) under cross-validation, subject to the observational dataset from NMA (Fig. 6). This is performed on a grid-cell-by-grid-cell basis on standardized data (the NMME dataset is reshaped to 0.1∘ × 0.1∘ grid cells to match the observational NMA dataset grid-cell size). The basic steps include the following:

Fit gamma distributions to drop-one-year time series from each observed and NMME grid cell; for NMME, this is performed on an individual model basis using all ensemble members available. (Goodness-of-fit tests indicate gamma distributions are appropriate; results not shown.)

Translate gamma distributions into cumulative distribution functions (CDFs).

For any given dynamical model prediction at the grid-cell level, a corrected prediction value is attained by mapping from the modeled CDF to the observed CDF and applying the inverse gamma distribution. This is repeated for all grid cells, all NMME models, and all dropped years.

After correction, the gamma CDFs of predictions and observations approximately match (Fig. 6a). Additionally, each ensemble still retains its variability over time, though the overall ensemble mean is shifted to closely match observation (Fig. 6b).

Performance metrics

Pearson correlations are used to measure the standardized covariance between observations and predictions. Ranked probability skill scores (RPSSs; Wilks, 2011) are also evaluated to determine categorical skill based on probabilistic predictions. Here, the data are split into three equal terciles representing below-normal, near-normal, and above-normal conditions. A perfect prediction yields an RPSS of 100 %, and a prediction with less skill than climatology (long-term averages) yields an RPSS of less than zero. Median RPSS values from all 29 years are reported.

Box plots of grid-level RPSS (%) for 10 dynamical models from NMME (a) before and (b) after bias correction, labeled with the same number as listed in the context. Note that for each box plot, the line inside the box is the median, the box edges represent the 25th and 75th percentiles, and the whiskers extend to the most extreme data points not considered outliers (outliers not shown).

Pearson correlations between grid-level observations and ensemble mean of bias-corrected predictions for 10 dynamical models from NMME, labeled with the same number as listed in the context.

Correlation coefficients (“Corr.”) and RPSS for (drop-one-year cross-validated) predictions at cluster level compared to observations under the C-I and NC-I scenarios.

Cluster C1 C2 C3 C4 C5 C6 C7 C8 Non-cluster Corr. 0.137 -0.027 0.171 0.184 0.514 -0.157 0.353 -0.108 0.297 RPSS (%) 22.88 -26.14 33.32 12.74 10.02 -43.61 20.92 -26.40 13.25

Results Statistical model predictions

Correlations between cluster-level model predictions and observations range from -0.16 to 0.51, with Cluster 5 having the highest correlation and Cluster 6 the lowest (Table 2). In approximately one-fifth of the 29 years, the observation falls outside the prediction envelope (Fig. 7), indicating model overfitting and an inability of the predictors to capture precipitation variability. For RPSS, three out of eight clusters indicate superior prediction skill over climatology (Table 2). Improvement in terms of RPSS over the non-cluster scenario is evident for Clusters 1, 3, and 7. Although Cluster 5, in agriculturally rich central-northwestern Ethiopia (Fig. 2), shows a slightly deteriorated RPSS relative to the non-cluster scenario, it still performs outstandingly with the highest correlation and a positive RPSS value of 0.51 and 10 %, respectively. Clusters 2, 4, 6, and 8 show deteriorated RPSS compared to the non-cluster scenario, although those clusters are mainly regions outside Ethiopia and southern Ethiopia (Fig. 2) where water resources and agricultural activities are considerably less (Fig. 1).

Grid-level Pearson correlation and RPSS statistics.

Statistical model Grid-level correlations Grid-level RPSS Mean SD Significant corr. % Mean (%) SD (%) Positive RPSS % NC-D 0.128 0.258 19.3 % -5.21 27.0 42.8 % NC-I 0.063 0.186 3.13 % -2.26 14.6 43.9 % C-D 0.055 0.230 10.6 % -14.0 31.0 33.9 % C-I 0.080 0.205 12.3 % -9.93 29.3 43.7 % Dynamical model (1) -0.105 0.209 0.51 % -31.4 25.4 5.70 % (2) 0.133 0.171 6.26 % -14.2 24.6 27.0 % (3) 0.086 0.130 2.08 % -14.9 25.2 26.2 % (4) 0.027 0.156 0.38 % -14.4 19.3 22.6 % (5) 0.067 0.170 1.64 % -9.66 17.0 28.4 % (6) 0.139 0.165 6.53 % -5.66 16.7 38.1 % (7) 0.102 0.130 1.67 % -8.64 17.6 31.7 % (8) 0.009 0.185 0.90 % -10.3 14.8 26.7 % (9) 0.244 0.149 23.1 % -2.33 21.8 46.0 % (10) 0.244 0.149 21.2 % -1.09 16.8 48.9 %

At the grid scale, depending on the case (direct or indirect), and for different clusters, correlations between predictions and observations can favor the clustered case or the non-clustered case (Fig. 8). In general, the indirect model provides a smoother pattern of correlations, with grid cells showing a negative correlation in the direct case now improved to near or above zero (Fig. 8). For example, Cluster 5 under the indirect case illustrates a more consistent positive correlation within the cluster. Some parts of the study region reach a correlation over 0.6, such as central-northwestern Ethiopia (Cluster 5), which is consistent with the region of high cluster-level prediction skill. The percentage of grid cells with correlations passing the 95 % significance test is the highest for the NC-D case (Table 3); however, some locations demonstrate the lowest skills among all four scenarios.

Similar findings are evident by evaluating the RPSS, except for Cluster 8; instead of improving with increased RPSS in the indirect case, the grid-scale predictions deteriorate given poor cluster-level prediction (for the C-I case). However, the percentage of grid cells with positive RPSS values overall for the C-I case is still the second highest after the NC-I case (Table 3), indicating the indirect cases are superior in terms of the number of grid cells with improved skill compared to using climatology, particularly for grid cells associated with skillful intermediate cluster-level predictions. The predictions are most skillful for the same region of central-northwestern Ethiopia (Cluster 5; Fig. 9) with 87 % of its grid cells showing positive RPSS and a spatial-average RPSS value of 15 % under the C-I scenario (Table 4).

Dynamical model predictions

The RPSS values based on the prediction ensembles of each dynamical model improve remarkably after bias correction. The median RPSS values over all the grid cells are now close to zero (Fig. 10) with two models, NASA-GMAO and NCEP-CFSv2, showing the highest RPSS value (-2.3 and -1.1 %, respectively; Table 3). These two dynamical models also exhibit generally higher grid-level correlations over the study region (averaging 0.24 for both models; Table 3 and Fig. 11), as compared with other NMME models. The two best-performing dynamical models after bias correction show advantage over statistical models, as assessed by correlation and RPSS metrics; however, all other dynamical models are inferior to the statistical models under the NC-D and C-I scenarios, particularly given the percent of grid cells with significant correlation and positive RPSS metrics (Table 3).

Grid-level Pearson correlation and RPSS statistics for grid cells within Cluster 5.

Statistical model Grid-level correlations Grid-level RPSS Mean SD Significant corr. % Mean (%) SD (%) Positive RPSS % NC-D 0.378 0.211 60.7 % 19.1 22.9 80.3 % NC-I 0.265 0.111 12.8 % 8.33 14.8 70.3 % C-D 0.229 0.244 30.5 % 6.91 24.1 62.3 % C-I 0.346 0.167 55.4 % 14.5 13.1 87.0 % Dynamical model (9) 0.353 0.110 46.8 % 8.21 18.2 65.7 % (10) 0.248 0.130 18.4 % 3.92 16.2 59.5 %

Within a certain cluster, statistical models may perform better than all dynamical models. For example, for Cluster 5, all statistical models show higher average RPSS values than those of all dynamical models (Table 4). The percentage of grid cells with significant correlation reaches 61 % for the statistical model under the NC-D scenario, compared to the highest value of 47 % among all the dynamical models. Similarly, the percentage with positive RPSS achieves 87 % under the C-I scenario as opposed to 66 % for dynamical models. Note that the dynamical models also produce raw predictions in a lower spatial resolution (1∘ × 1∘) than the statistical models (0.1∘ × 0.1∘) and require bias correction to guarantee comparable skills.

Conclusions and discussion

This study demonstrates the potential for applying season-ahead large-scale climate information to predict high-resolution precipitation using a statistical modeling approach. Skillful and credible predictions are produced for some regions in western Ethiopia, particularly under a clustered indirect statistical approach. At the regional scale, the approach shows promise for northwestern Ethiopia (Clusters 1, 3, 5, and 7), particularly compared to current NMA operational forecasts, which are only moderately more skillful than climatology (Korecha and Sorteberg, 2013). The regional average RPSS in this study under the clustered case ranges from 10 to 33 % for northwestern Ethiopia, as opposed to values under 6 % for NMA operational forecast (Korecha and Sorteberg, 2013). The approach adopted here also advances on previous studies (Gissila et al., 2004; Block and Rajagopalan, 2007; Korecha and Barnston, 2007; Diro et al., 2011b; Segele et al., 2015) by first applying an objective cluster analysis and then conditionally constructing high-resolution predictions. A unique set of predictors is applied to each cluster, which contributes to superior prediction performance at cluster levels in northwestern Ethiopia, as compared with predictions from the non-clustered approach. Grid-level prediction under the clustered indirect case also reduces the effect of overfitting relative to the direct case and improves negative RPSS values to near or above zero; that said, the non-clustered direct case also illustrates higher correlation and RPSS values on average.

A total of 2 out of 10 NMME dynamical models, NASA-GMAO and NCEP-CFSv2, demonstrate overall superior performance to the statistical models; however, for certain regions such as Cluster 5, the performance of statistical models under the clustered indirect and non-clustered direct cases is still superior. It is also worth noting that the statistical model predictions are at a 100 times finer spatial resolution than the dynamical models, providing additional advantages at the local scale, when skillful. Nevertheless, improvements in dynamical models continue and their application to seasonal precipitation prediction is likely to grow (e.g., Palmer et al., 2004; Saha et al., 2006; Lim et al., 2009).

Relatively poor prediction performance is evident in some locations such as southwestern Ethiopia and regions outside Ethiopia, where the hydroclimatic processes that produce precipitation might be driven by local factors or other regional climate patterns rather than large-scale climate variables identified in this study. A previous study (Zhang et al., 2016) has shown that the influence of ENSO on JJAS precipitation in western Ethiopia decreases generally from north to south, and is likely one of the reasons why skills are relatively low in southwestern Ethiopia. Cluster 5 was also identified with the strongest connection to equatorial Pacific SST (Zhang et al., 2016), which is consistent with the highest skill found in this study. Other regions with low prediction skill show relatively strong connections to SST in neighboring oceanic regions. However, connections with those climate patterns appear to be less robust than with ENSO, making the predictions in their associated regions less skillful. This is also consistent with the findings from other studies that even though all three oceans (Indian, Atlantic, and Pacific) affect the JJAS precipitation in western Ethiopia, the Pacific Ocean still plays the greatest role (Segele et al., 2009; Omondi et al., 2013).

The southwest Asian monsoon over the Indian Ocean may also be critical in determining the precipitation, given that the clusters with better prediction skills lie along the pathway of the monsoon. Based on the global concurrent correlation maps between JJAS precipitation and SLP for each cluster, Clusters 5 and 7 – the two clusters with the best skills – are the only ones that are strongly and negatively correlated with SLP near Bombay, and in the meantime strongly and positively correlated with the SLP at the eastern equatorial Pacific Ocean. The former indicates that a strong southwest Asian monsoon is associated with higher JJAS precipitation amount, and vice versa. The latter indicates that a high surface pressure over the eastern equatorial Pacific Ocean often accompanied by cold SST and a raining pattern – a La Niña phenomenon – also brings higher JJAS precipitation to western Ethiopia, and vice versa. Cluster 2 – one of the worst predicted clusters – shows moderately strong negative correlation with SLP near Bombay; however, it is also correlated strongly and negatively with SLP in the southern Indian Ocean (a high pressure system that drives the monsoon toward the low pressure system near Bombay), indicating that high JJAS precipitation in Cluster 2 is not necessarily associated with a strong southwest Asian monsoon. Moreover, its correlation with SLP over the equatorial Pacific Ocean is nonsignificant. Considering, in general, El Niño suppresses the monsoon and La Niña increases it (Kumar et al., 2006), strong correlations with both ENSO and the monsoon in the correct direction, such as for Clusters 5 and 7, indicate a double insurance over their association with the southwest Asian monsoon. Therefore, clusters which are more affected by the southwest Asian monsoon over the Indian Ocean, particularly coupled with the influence of ENSO, are likely to be more promising in their prediction skills.

Additional prediction features also warrant future attention, including longer prediction lead times and evaluation of other relevant characteristics (e.g., intraseasonal dry spells, seasonal onset or cessation). As observational datasets continue to grow, data-driven cluster analyses and statistical modeling approaches may be expected to improve. Careful analysis of possible significant trends in the data is also warranted; a region with a relatively high correlation may be selected solely based on trends in predictors and observations. For shorter time series, such as the data used in this study, trend analysis may not be reliable; detrending can also reduce evidence of large-scale decadal climate signals. Improving predictive capabilities may not be a complete panacea, but it can continue to be an important part of a decision maker's portfolio as they cope with hydroclimatic variability and its inherent risks.

The National Centers for Environmental Prediction and National Center for Atmospheric Research (NCEP/NCAR) reanalysis dataset can be accessed through the National Oceanic and Atmospheric Administration (NOAA) Earth System Research Laboratory (ESRL) website (https://www.esrl.noaa.gov/psd/data/reanalysis/).

The NMME hindcasts are available through the International Research Institute for Climate and Society (IRI) website (http://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/).

The gridded precipitation dataset in western Ethiopia is available upon request from NMA (http://www.ethiomet.gov.et/).

The authors declare that they have no conflict of interest.

This article is part of the special issue “Sub-seasonal to seasonal hydrological forecasting”. It does not belong to a conference.

Acknowledgements

This study was supported by NASA project NNX14AD30G and NSF PIRE project 1545874. We acknowledge the National Meteorological Agency of Ethiopia for sharing data. We also want to thank the reviewers for their suggestions in improving this work. Edited by: Quan J. Wang Reviewed by: Yared Bayissa and Lars Gerlitz

References 1

Anderson, D., Stockdale, T., Balmaseda, M., Ferranti, L., Vitart, F., Molteni, F., Doblas-Reyes, F., Mogensen, K., and Vidard, A.: Development of the ECMWF seasonal forecast System 3, ECMWF Technical Memoranda, 1–56, 2007.

Awulachew, S. B., Yilma, A. D., Loulseged, M., Loiskandl, W., Ayana, M., and Alamirew, T.: Water Resources and Irrigation Development in Ethiopia. Colombo, Sri Lanka: International Water Management Institute, 78 pp., Working Paper 123, 2007.

Badr, H. S., Zaitchik, B. F., and Dezfuli, A. K.: A tool for hierarchical climate regionalization, Earth Sci. Inform., 8, 949–958, 2015.

Barrett, C. B.: The Development Of The Nile Hydrometeorological Forecast System1, Wiley Online Library, 1993.

Bekele, F.: Ethiopian Use of ENSO Information in Its Seasonal Forecasts, Internet Journal of African Studies, available at: https://www.bradford.ac.uk/research-old/ijas/ijasno2/bekele.html (last access: 15 June 2016), 1997.

Black, E., Slingo, J., and Sperber, K. R.: An observational study of the relationship between excessively strong short rains in coastal East Africa and Indian Ocean SST, Mon. Weather Rev., 131, 74–94, 2003.

Block, P. and Goddard, L.: Statistical and Dynamical Climate Predictions to Guide Water Resources in Ethiopia, J. Water Res. Pl., 138, 287–298, 2012.

Block, P. J. and Rajagopalan, B.: Interannual Variability and Ensemble Forecast of Upper Blue Nile Basin Kiremt Season Precipitation, J. Hydrometeorol., 8, 327–343, 2007.

Block, P. and Rajagopalan, B.: Statistical–Dynamical Approach for Streamflow Modeling at Malakal, Sudan, on the White Nile River, J. Hydrol. Eng., 14, 185–196, 2009.

Block, P. J., Filho, F. A. S., Sun, L., and Kwon, H. H.: A Streamflow Forecasting Framework Using Multiple Climate and Hydrological Models, J. Am. Water Resour. As., 45, 828–843, 2009.

Camberlin, P.: Rainfall Anomalies in the Source Region of the Nile and Their Connection with the Indian Summer Monsoon, J. Climate, 10, 1380–1392, 1997.

Camberlin, P. and Philippon, N.: The East African March–May Rainy Season: Associated Atmospheric Dynamics and Predictability over the 1968–97 Period, J. Climate, 15, 1002–1019, 2002.

Chen, J., Brissette, F. P., Chaumont, D., and Braun, M.: Finding appropriate bias correction methods in downscaling precipitation for hydrologic impact studies over North America, Water Resour. Res., 49, 4187–4205, 2013.

Dinku, T., Hailemariam, K., Maidment, R., Tarnavsky, E., and Connor, S.: Combined use of satellite estimates and rain gauge observations to generate high-quality historical rainfall time series over Ethiopia, Int. J. Climatol., 34, 2489–2504, 2014.

Diro, G. T., Black, E., and Grimes, D. I. F.: Seasonal forecasting of Ethiopian spring rains, Meteorol. Appl., 15, 73–83, 2008.

Diro, G. T., Grimes, D. I. F., Black, E., O'Neill, A., and Pardo-Iguzquiza, E.: Evaluation of reanalysis rainfall estimates over Ethiopia, Int. J. Climatol., 29, 67–78, 2009.

Diro, G. T., Grimes, D. I. F., and Black, E.: Teleconnections between Ethiopian summer rainfall and sea surface temperature: part I – observation and modelling, Clim. Dynam., 37, 103–119, 2011a.

Diro, G. T., Grimes, D. I. F., and Black, E.: Teleconnections between Ethiopian summer rainfall and sea surface temperature: part II. Seasonal forecasting, Clim. Dynam., 37, 121–131, 2011b.

Elagib, N. A. and Elhag, M. M.: Major climate indicators of ongoing drought in Sudan, J. Hydrol., 409, 612–625, 2011.

Gerlitz, L., Vorogushyn, S., Apel, H., Gafurov, A., Unger-Shayesteh, K., and Merz, B.: A statistically based seasonal precipitation forecast model with automatic predictor selection and its application to central and south Asia, Hydrol. Earth Syst. Sci., 20, 4605–4623, 10.5194/hess-20-4605-2016, 2016

Gissila, T., Black, E., Grimes, D. I. F., and Slingo, J. M.: Seasonal forecasting of the Ethiopian summer rains, Int. J. Climatol., 24, 1345–1358, 2004.

Goddard, L. and Graham, N. E.: Importance of the Indian Ocean for simulating rainfall anomalies over eastern and southern Africa, J. Geophys. Res.-Atmos., 104, 19099–19116, 1999.

Hammer, G. L., Nicholls, N., and Mitchell, C.: Applications of Seasonal Climate Forecasting In Agricultural and Natural Ecosystems, Dordrecht, Kluwer Academic Publishers, 2000.

Hertig, E. and Jacobeit, J.: Predictability of Mediterranean climate variables from oceanic variability. Part II: Statistical models for monthly precipitation and temperature in the Mediterranean area, Clim. Dynam., 36, 825–843, 2011.

Ines, A. V. M. and Hansen, J. W.: Bias correction of daily GCM rainfall for crop simulation studies, Agr. Forest Meteorol., 138, 44–53, 2006.

Jolliffe, I. T.: Principal Component Analysis, 2nd ed., New York, Springer-Verlag, 2002.

Kalnay, E., Kanamitsu, M., Kistler, R., Collins, W., Deaven, D., Gandin, L., Iredell, M., Saha, S., White, G., and Woollen, J.: The NCEP/NCAR 40-year reanalysis project, B. Am. Meteorol. Soc., 77, 437–471, 1996.

Kassahun, B.: Weather systems over Ethiopia, Proc. First Tech. Conf. on Meteorological Research in Eastern and Southern Africa, Nairobi, Kenya, UCAR, 1987.

Kinuthia, J. H.: Horizontal and Vertical Structure of the Lake Turkana Jet, J. Appl. Meteorol., 31, 1248–1274, 1992.

Kirtman, B. P., Min, D., Infanti, J. M., Kinter, J. L., Paolino, D. A., Zhang, Q., Van Den Dool, H., Saha, S., Mendez, M. P., Becker, E., Peng, P., Tripp, P., Huang, J., Dewitt, D. G., Tippett, M. K., Barnston, A. G., Li, S., Rosati, A., Schubert, S. D., Rienecker, M., Suarez, M., Li, Z. E., Marshak, J., Lim, Y.-K., Tribbia, J., Pegion, K., Merryfield, W. J., Denis, B., and Wood, E. F.: The North American Multimodel Ensemble: Phase-1 Seasonal-to-Interannual Prediction; Phase-2 toward Developing Intraseasonal Prediction, B. Am. Meteorol. Soc., 95, 585–601, 2014.

Korecha, D. and Barnston, A. G.: Predictability of June–September Rainfall in Ethiopia, Mon. Weather Rev., 135, 628–650, 2007.

Korecha, D. and Sorteberg, A.: Validation of operational seasonal rainfall forecast in Ethiopia, Water Resour. Res., 49, 7681–7697, 2013.

Kumar, K. K., Rajagopalan, B., Hoerling, M., Bates, G., And Cane, M.: Unraveling the Mystery of Indian Monsoon Failure During El Niño, Science, 314, 115–119, 2006.

Landman, W. A. and Mason, S. J.: Operational long-lead prediction of South African rainfall using canonical correlation analysis, Int. J. Climatol., 19, 1073–1090, 1999.

Latif, M., Dommenget, D., Dima, M., and Grötzner, A.: The role of Indian Ocean sea surface temperature in forcing east African rainfall anomalies during December-January 1997/98, J. Climate, 12, 3497–3504, 1999.

Lim, E.-P., Hendon, H. H., Hudson, D., Wang, G., and Alves, O.: Dynamical Forecast of Inter–El Niño Variations of Tropical SST and Australian Spring Rainfall, Mon. Weather Rev., 137, 3796–3810, 2009.

Mason, S.: Seasonal forecasting of South African rainfall using a non?linear discriminant analysis model, Int. J. Climatol., 18, 147–164, 1998.

Mutai, C. C., Ward, M. N., and Colman, A. W.: Towards the prediction of the East Africa short rains based on sea-surface temperature–atmosphere coupling, Int. J. Climatol., 18, 975–997, 1998.

Nicholson, S. E.: A review of climate dynamics and climate variability in Eastern Africa, in: The Limnology, Climatology and Paleoclimatology of the East African Lakes, edited by: Johnson, T. C. and Odada, E. O., Australia, Gordon and Breach, 25–56, 1996.

Nicholson, S. E.: The Predictability of Rainfall over the Greater Horn of Africa. Part I: Prediction of Seasonal Rainfall, J. Hydrometeorol., 15, 1011–1027, 2014.

NMSA: Climate and agroclimatic resources of Ethiopia, NMSA Meteorological Research Report Series, Addis Ababa, National Meteorological Services Agency of Ethiopia, Ethiopia, 1996.

Omondi, P., Ogallo, L. A., Anyah, R., Muthama, J. M., and Ininda, J.: Linkages between global sea surface temperatures and decadal rainfall variability over Eastern Africa region, Int. J. Climatol., 33, 2082–2104, 2013.

Palmer, T. N., Alessandri, A., Andersen, U., Cantelaube, P., Davey, M., Délécluse, P., Déqué, M., Díez, E., Doblas-Reyes, F. J., Feddersen, H., Graham, R., Gualdi, S., Guérémy, J.-F., Hagedorn, R., Hoshen, M., Keenlyside, N., Latif, M., Lazar, A., Maisonnave, E., Marletto, V., Morse, A. P., Orfila, B., Rogel, P., Terres, J. M., and Thomson, M. C.: Development of a European Multi-Model Ensemble System for Seasonal to Inter-Annual Prediction (DEMETER), B. Am. Meteorol. Soc., 85, 853–872, 2004.

Parthasarathy, B., Kumar, K. R., and Munot, A. A.: Homogeneous Indian Monsoon rainfall: Variability and prediction, P. Indian As.-Earth, 102, 121–155, 1993.

Roeckner, E., Oberhuber, J. M., Bacher, A., Christoph, M., and Kirchner, I.: ENSO variability and atmospheric response in a global coupled atmosphere-ocean GCM, Clim. Dynam., 12, 737–754, 1996.

Saha, S., Nadiga, S., Thiaw, C., Wang, J., Wang, W., Zhang, Q., Van Den Dool, H. M., Pan, H. L., Moorthi, S., Behringer, D., Stokes, D., PeÑA, M., Lord, S., White, G., Ebisuzaki, W., Peng, P., and Xie, P.: The NCEP Climate Forecast System, J. Climate, 19, 3483–3517, 2006.

Schepen, A., Wang, Q. J., and Robertson, D. E.: Combining the strengths of statistical and dynamical modeling approaches for forecasting Australian seasonal rainfall, J. Geophys. Res.-Atmos., 117, D20107, 10.1029/2012JD018011, 2012.

Segele, Z. T. and Lamb, P. J.: Characterization and variability of Kiremt rainy season over Ethiopia, Meteorol. Atmos. Phys., 89, 153–180, 2005.

Segele, Z. T., Lamb, P. J., and Leslie, L. M.: Seasonal-to-Interannual Variability of Ethiopia/Horn of Africa Monsoon. Part I: Associations of Wavelet-Filtered Large-Scale Atmospheric Circulation and Global Sea Surface Temperature, J. Climate, 22, 3396–3421, 2009.

Segele, Z. T., Richman, M. B., Leslie, L. M., and Lamb, P. J.: Seasonal-to-Interannual Variability of Ethiopia/Horn of Africa Monsoon. Part II: Statistical Multi-Model Ensemble Rainfall Predictions, J. Climate, 28, 3511–3536, 10.1175/JCLI-D-14-00476.1, 2015.

Shanko, D. and Camberlin, P.: The effects of the Southwest Indian Ocean tropical cyclones on Ethiopian drought, Int. J. Climatol., 18, 1373–1388, 1998.

Shukla, S., Roberts, J., Hoell, A., Funk, C. C., Robertson, F., and Kirtman, B.: Assessing North American multimodel ensemble (NMME) seasonal forecast skill to assist in the early warning of anomalous hydrometeorological events over East Africa, Clim. Dynam., 1–17, 10.1007/s00382-016-3296-z, 2016.

Singh, A., Kulkarni, M. A., Mohanty, U. C., Kar, S. C., Robertson, A. W., and Mishra, G.: Prediction of Indian summer monsoon rainfall (ISMR) using canonical correlation analysis of global circulation model products, Meteorol. Appl., 19, 179–188, 2012.

Slingo, J., Spencer, H., Hoskins, B., Berrisford, P., and Black, E.: The meteorology of the Western Indian Ocean, and the influence of the East African Highlands, Philos. T. R. Soc. A, 363, 25-42, 2005.

Suárez-Moreno, R. and Rodríguez-Fonseca, B.: S4CAST v2.0: sea surface temperature based statistical seasonal forecast model, Geosci. Model Dev., 8, 3639–3658, 10.5194/gmd-8-3639-2015, 2015.

Tadesse, T.: The influence of the Arabian Sea storms/depressions over the Ethiopian weather, Proc. Int. Conf. on Monsoon Variability and Prediction, World Meteorological Organization, Geneva, Switzerland, 1994.

Teutschbein, C. and Seibert, J.: Bias correction of regional climate model simulations for hydrological climate-change impact studies: Review and evaluation of different methods, J. Hydrol., 456–457, 12–29, 2012.

Viste, E. and Sorteberg, A.: The effect of moisture transport variability on Ethiopian summer precipitation, Int. J. Climatol., 33, 3106–3123, 2013a.

Viste, E. and Sorteberg, A.: Moisture transport into the Ethiopian highlands, Int. J. Climatol., 33, 249–263, 2013b.

Wilks, D. S.: Statistical methods in the atmospheric sciences, 3rd ed., Burlington, Elsevier Science, Academic press, 2011.

Williams, A. P., Funk, C., Michaelsen, J., Rauscher, S. A., Robertson, I., Wils, T. H. G., Koprowski, M., Eshetu, Z., and Loader, N. J.: Recent summer precipitation trends in the Greater Horn of Africa and the emerging role of Indian Ocean sea surface temperature, Clim. Dynam., 39, 2307–2328, 2011.

Zhang, Y., Moges, S., and Block, P.: Optimal Cluster Analysis for Objective Regionalization of Seasonal Precipitation in Regions of High Spatial-Temporal Variability: Application to Western Ethiopia, J. Climate, 29, 3697–3717, 2016.

</app></app-group></back> </article>