Identifying ENSO Influences on Rainfall with Classification 1 Models : Implications for Water Resource Management of Sri 2 Lanka 3

Seasonal to annual forecasts of precipitation patterns are very important for water infrastructure 9 management. In particular, such forecasts can be used to inform decisions about the operation of multipurpose 10 reservoir systems in the face of changing climate conditions. Success in making useful forecasts often is achieved by 11 considering climate teleconnections such as the El-Nino-Southern Oscillation (ENSO), Indian Ocean Dipole (IOD) as 12 related to sea surface temperature variations. We present a statistical analysis to explore the utility of using rainfall 13 relationships in Sri Lanka with ENSO and IOD to predict rainfall to Mahaweli and Kelani, river basins of the country. 14 Forecasting of rainfall as classes; flood, drought and normal are helpful for the water resource management decision 15 making. Results of these models give better accuracy than a prediction of absolute values. Quadratic discrimination 16 analysis (QDA) and classification tree models are used to identify the patterns of rainfall classes with respect to ENSO 17 and IOD indices. Ensemble modeling tool Random Forest is also used to predict the rainfall classes as drought and 18 not drought with higher skill. These models can be used to forecast the areal rainfall using predicted climate indices. 19 Results from these models are not very accurate; however, the patterns recognized are useful input to the water 20 resources management and adaptation the climate variability of agriculture and energy sectors. 21


Introduction
The spatial and temporal uncertainty of water availability is one of the major challenges in water resource management.Understanding patterns and identifying trends in seasonal to annual precipitation are very important for water infrastructure management.In particular, forecasts that incorporate such information can be used to inform decisions about the operation of multipurpose reservoir systems in the face of changing climate conditions.Success in making useful forecasts often is achieved by considering climate teleconnections such as the El-Nino-Southern Oscillation (ENSO) as related to sea surface temperature variations and air pressure over the globe using empirical data (Amarasekera et.al., 1997;Denise et.al., 2017;Korecha & Sorteberg, 2013;Seibert et.al., 2017).Also, modes of variability of other tropical oceans can be related to regional precipitation (Dettinger and Diaz 2000;Eden et al. 2015;Maity and Kumar 2006;Malmgren et al. 2005;Ranatunge et al. 2003;Suppiah 1996;Roplewski & Halpert,1996).For example, the effect of the Indian Ocean Dipole (IOD) is identified as independent of the ENSO effect (Eden et al., 2015).Pacific decadal oscillation (PDO), Atlantic multi-decadal mode oscillation (AMO), ENSO, and IOD teleconnections to precipitation have been found by many studies over the globe.Variations of precipitation in the United States are explained by ENSO, PDO and AMO (Eden et al., 2015;National Oceanic and Atmospheric Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-249Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 15 June 2018 c Author(s) 2018.CC BY 4.0 License.Administration, 2017;Ward, Eisner, Flo Rke, Dettinger, & Kummu, 2014), in African countries by ENSO, AMO and IOD (Reason et.al., 2006), and in South east Asian countries by ENSO: Indonesia (Lee, 2015;Nur'utami & Hidayat, 2016), Thailand (Singhrattna et.al., 2005), China (Cao et al., 2017;Ouyang et al., 2014;Qiu et.al., 2014).Australia (Bureau of Meteorology, 2012;Verdon & Franks, 2005), and central and south Asia (Gerlitz et al., 2016).
The impact of ENSO and IOD on the position of the intertropical convergence zone (ITCZ) has been identified as a primary factor driving south Asian tropical climate variations.South Asian countries get precipitation from two monsoons from the movements of ITCZ in boreal summer (2 0 N) and boreal winter (8 0 S).The South western monsoon (summer monsoon) is during June-August months and the North eastern monsoon (winter monsoon) is during December -February months (Schneider et.al, 2014).Climate teleconnections have been studied for summer monsoons (Singhrattna et. al., 2005;Surendran et.al., 2015) and winter monsoons (Zubair & Ropelewski, 2006), A negative correlation of ENSO with Indian summer monsoon has been identified (Jha et al., 2016;Surendran et al., 2015).
The objective of this study is to explore the climate teleconnection to dual monsoons and inter monsoons.Water resource management decisions typically are based on precipitation throughout the year and it is extremely important to explore the possibility that rainfall might be related to teleconnection indices for which seasonal forecasts are available.Sri Lanka is a South Asian country that gets rainfall from two monsoons and two inter-monsoons.We explore ENSO and IOD climate teleconnection to Sri Lanka precipitation throughout the year.Past studies have identified climate teleconnection linking precipitation to climate indices for several months and monsoon seasons, and shown the importance of these for forecasting rainfall in river basins (Chandimala & Zubair, 2007;Chandrasekara et al., 2003).We extend these analyses across monsoon and inter-monsoon seasons.
Although rainfall anomalies may be correlated strongly with teleconnection indices, the scatter in the data can be large, making predictions from regression models have high uncertainty.However, water managers may act on information about whether rainfall is expected to be abnormally low or high.We investigate river basin rainfall teleconnections to climate indices with classification models.If reasonably accurate relationships can be developed, they will be useful for water resources management.For example, in Sri Lanka decisions about allocations of water for irrigation and hydropower could be improved with estimates of when low rainfall seasons are likely.and 8660 10 6 m 3 respectively (Manchanayake & Madduma Bandara, 1999).The Kelani river basin is totally inside the wet zone whereas the Mahaweli river basin migrates through all three climate zones (Figure 1).

Sri
The temporal pattern of rainfall in Sri Lanka can be divided into four seasons as follows.
(1) Generally low precipitation across the country from the Northeast monsoon (NEM), which gets most precipitation during January to February.The dry zone of the country gets significant precipitation from the NEM, while wet zone gets very little rainfall during this period.
(2) The whole country gets precipitation from the first inter-monsoon (FIM) during March to April months.However, rainfall during this period is not very high across the country.
(3) The highest precipitation for the country is from the South western monsoon (SWM) during May to September.
However, only the wet zone gets high precipitation during this season.
(4) The whole country gets precipitation from the second inter-monsoon (SIM) during October to December.
Generally, precipitation from SIM is higher than FIM.
The time period of NEM and SIM are generally considered as December to February and October to November respectively (Department of Meteorology Sri Lanka, 2017; Malmgren et.al, 2003;Ranatunge et al., 2003).However, considering the bulk amount of water received from the monsoon, we consider January and February as the period of NEM and October to December as the period of SIM.
Reflecting the rainfall seasons, the country has two agriculture seasons "Yala" (April -September) and "Maha"(October -March).Because the dry zone gets minimal precipitation during the SWM, the agricultural systems (165,000 ha) developed under the Mahaweli multipurpose project depend on irrigation water during the Yala season.
The country depends on stored water to drive hydropower year round.The Mahaweli and Kelani hydropower plants of 810 MW and 335 MW capacity serve as peaking and contingency reserve power to the power system (Ceylon Electricity Board, 2015).Management of reservoir systems is done to cater both to irrigation and hydropower requirements.

Sub Basin Rainfall (Areal Rainfall)
Monthly rainfall data for years 1950-2013 are used for the study (Ceylon Electricity Board, 2017).River basin rainfall was calculated using the Thiessen polygon method (Viessman, 2002).The Mahaweli river basin is divided into 16 Thiessen polygons and the Kelani river basin is divided into 11 Thiessen polygons (Figure 1).We calculate the rainfall for the four seasons, NEM, FIM, SWM and SIM for 64 years of historical data.Rainfall anomalies are calculated by reducing the seasonal mean rainfall (Eq.( 1)) and standardized anomalies are calculated by dividing the rainfall anomalies by the standard deviation (SD) (Eq.( 2)).
= ( −  ̅  ) Eq.( 1) Where,  ̅  is the average of seasonal rainfall,   is the rainfall anomaly and  _ is the standardized rainfall anomaly.
Standardized rainfall anomalies are divided into three classes as dry, average and wet (

ENSO & IOD Indices
The Multivariate ENSO Index (MEI) is based on sea-level pressure, zonal and meridional components of the surface wind, sea surface temperature, surface air temperature, and total cloudiness fraction of the sky (National Oceanic and atmospheric administration 2017).The Indian Ocean Dipole (IOD) is an oscillation of sea surface temperature in the equatorial Indian ocean between Arabian sea and south of Indonesia (Bureau of Meteorology Australia, 2017).IOD is identified as relevant to the climate of Australia (Power et.al., 1999) and countries surrounded by the Indian ocean in southern Asia (Chaudhari et al., 2013;Maity & Nagesh Kumar, 2006;Qiu et al., 2014;Surendran et al., 2015).

Statistical Analyses
Seasonal QDA assumes that observations from each class are drawn from a Gaussian distribution.Substituting a Gaussian density function of K th class to Bayes theorem and taking the log values, the quadratic discriminant function is derived (James et.al., 2013;Löwe et.al., 2016) (Eq.( 3))Eq.( 3. The covariance matrix (∑  ), mean (  ) and prior probability (  ) for each class are estimated from the training data set.These values are inserted into the discriminant function together with state variables and the corresponding class is selected according to the largest value of the function.The number of parameters to be estimated for the QDA model for K classes and p predictors are ..( + 1) ⁄ 2 values.The QDA model output is the probability that an observation of a climate category will fall into each of the rainfall classes.

Classification Tree model
For the classification tree model the predictor space is divided into non-overlapping regions ( 1 . .  ).A classification tree predicts each observation as belonging to the most commonly occurring class of the training data regions (James et.al., 2013).
The Gini index () is considered as the criterion for splitting into regions (James et.al., 2013).
(1 − ̂  ) Eq.( 4) In Eq.( 4), ̂  represents the fraction of observations in the m th class that belong to the k th class.The Gini index is considered as a measure of node purity of the tree model, since small values of the index indicate that node has a higher number of observations from a single class.The complexity of trees is adjusted using a pruning process to produce more interpretable results.
Tree models give the probability that an observation falls into each of the three rainfall classes.The predicted class is assigned based on the highest probability.Tree models handle ties of probability values by randomly assigning the class.

Random Forest
A random forest is an ensemble learning method used for classification and regression problems.The method is based on a multitude of decision trees based on training data with the final model as the mean of the ensemble (Breiman, 2001).Individual trees are built on a random sample of the training data with several predictors from the total number of predictors.Individual trees are built from the bootstrapped training data set.Similar to other investigators, we observe several strong correlations between rainfall anomalies and the climate indices (Table A.1, Appendix).For example, rainfall in the SWM is very important for stations in the wet zone of the country which is the source of a large amount of water stored in reservoirs.Correlation coefficients between SWM rainfall at Norton Bridge are negative and strong, -0.31 for MEI (p=0.01) and -0.37 for DMI (p<0.01).The strength of the correlation notwithstanding, the residuals from a regression model indicate that high uncertainty would attach to any forecast (Fig. 3).Thus, we are led to explore the efficacy of classification methods (Appendix).We present classification results for two sub-basins, one that has the highest rainfall during the NEM, Manampitiya, and one that has the highest rainfall for the SWM, Norton Bridge (Figure 4).Norton Bridge represents the areal rainfall of reservoir catchments in the wet zone and Manampitiya represents the rainfall that contributes to irrigation tanks in the dry zone.Results of other sub-basins are presented in the supplementary materials (Appendix).
The SWM is a season when the wet zone receives the bulk of rainfall.At Norton Bridge, the occurrences of the dry rainfall anomaly class in the SWM is seen to "clump" in the region of relatively high MEI and DMI.Both the classification tree and the QDA successfully identify the pattern (Fig. 4(a) and 4(c)) with an overall accuracy of 73 %,    Classification trees are known to be unstable.That is, small changes in the observations can lead to large changes in the decision tree.The random forest approach overcomes the issue by building a "bag" of trees from bootstrap samples.
The robustness of the model can then be checked by considering the "out-of-bag" error.The results of the random forest indicate that predictions of three rainfall anomaly classes using MEI and DMI is not feasible (Table 3).The outof-bag error rate is close to two thirds, which for three categories is equivalent to a random selection.However, the results of the random forest for a classification as either "Dry" or "Not Dry" suggests that there may be skill in such a prediction.The out-of-bag error rates for this case range from 22 % to 38 % for Norton Bridge and Manampitiya (Table 3) and from 20 % to 39 % across all stations (Table A 6).   (Zubair, 2003;Chandimala & Zubair, 2007;Chandrasekara et.al,2017).The El Nino impact during the SWM is not as significant as it is during the NEM season (International Research Institute, 2017a).We find, however, that there is an interaction between two teleconnection indices, MEI and IOD for SWM rainfall.During the Yala season there is a high probability of having a drought when both the IOD and MEI are positive (Figure 5).
Also not having drought is probable when both the IOD and MEI are negative (Figure 5 Classification of wet, average, and dry rainfall anomalies using the MEI and DMI indices is successful.For example, a dry SWM season for Norton Bridge (Table 2) and other wet-zone stations (Table A 3) is classified correctly with greater than 70 % accuracy with QDA and tree models.However, a random forest approach demonstrates that there is little skill in identifying a full wet-average-dry classification.However, a random forest model using only two rainfall categories shows more than 60 % accuracy in identifying "dry" and "not dry" classes of key rainfall seasons of the wet zone ( Australian Government (Bureau of Meteorology, 2017).ENSO and IOD predictions are also associated with the uncertainty.Therefore, final forecast accuracy is a combination of the MEI, DMI forecast uncertainties and model's accuracy rate in each class.Although overall prediction accuracy is not extremely high, a forecast of an anomalously low rainfall season can have value for risk-averse farmers (Cabrera et.a., 2007) and can guide plans for hydropower management (Block & Goddard, 2012).
The electricity and agriculture sectors of Sri Lanka heavily rely on Mahaweli and Kelani river water resources so season ahead forecasts of abnormally low rainfall should be useful for decisions on adaptation measures.For example, water availability of the first three months of a growing season is important for crop selection and the extent of land to be cultivated.Hydropower planning and scheduling of maintenance of the power plants also can benefit from season-ahead forecasts.The damage that can occur due to incorrect rainfall forecasts in the agriculture and energy sectors can be minimized with emergency planning during the season, which is the usual practice.
Although the accuracy of predicting low or not low seasonal rainfall is not very high, decisions based on forecasts that are improvements over climate averages should be an improvement over current practices.The accuracy of statistical models can be improved with longer records, which are important to train the classification models.Also, models can be fine-tuned for important shorter periods such as crop planting months and harvesting months for irrigation water planning.

Conclusion
ENSO and IOD phenomena teleconnections with river basin rainfall provide potentially useful information for water resource management.Relationships identified between teleconnection indices and river basin rainfall agree with other research findings.Prediction of seasonal rainfall classes from ENSO and IOD indices can inform water resources managers in reservoir operation planning for both hydropower and irrigation releases.
Lanka is an island in the Indian Ocean (latitude 5 o 55′ N -9 o 50′ N, longitudes 79 o 40′ E -81 o 53′ E).Mean annual rainfall varies from 880 mm to 5500 mm across the island.The rainfall distribution is determined by the monsoon system of the Indian Ocean interacting with the elevated land mass in the interior of the country.The country is divided into three climatic zones according to the rainfall distribution: wet zone (annual rainfall > 2500 mm), intermediate zone (2500 mm < rainfall < 1750 mm) and dry zone (rainfall < 1750 mm) (Department of Agriculture Sri Lanka, 2017).Sri Lanka, a water-rich country, has 103 river basins varying from 9 km 2 to 10448 km 2 .A large fraction of the water resources management infrastructure of the country is associated with the Mahaweli and Kelani river basins.The catchment areas of the Mahaweli and Kelani are 10448 km 2 and 2292 km 2 respectively.The two rivers start from the Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-249Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 15 June 2018 c Author(s) 2018.CC BY 4.0 License.central highlands.Mahaweli, the longest river, travels to the ocean 331 km in the eastern direction and the Kelani 145 km in the western direction.Average annual discharge volume for the Mahaweli and Kelani basins are 26368 10 6 m 3 The Dipole Mode Index (DMI) is used to represent the IOD capturing the west and eastern equatorial sea surface temperature gradient.Data used for the analyses are MEI monthly data from years 1950 -2013, (Climate indices, NOAA, 2017) and the DMI monthly data from years 1950-2013 ( HadISST dataset, Japan Agency for Marine-Earth Science and Technology 2017).Averages of MEI and DMI values for four rainfall seasons are used for the statistical analysis.
values of MEI and DMI were used as the predictors to classify seasons into the three rainfall classes.The total data set is divided into 75 % for training the model and 25 % for testing model performance.Quadratic discriminant analysis (QDA) and classification trees were selected for the analyses.A random forest model also was applied to investigate the reliability of a cross-validated statistical forecast tool based on an advance estimate of MEI and DMI.Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-249Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 15 June 2018 c Author(s) 2018.CC BY 4.0 License.
Figure 2: Sub basin Rainfall for (a) Morape, (b) Peradeniya,(c) Randenigala, (d) Bowatenna, (e) Laxapana (f) Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-249Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 15 June 2018 c Author(s) 2018.CC BY 4.0 License. 3 Results Monthly rainfall boxplots of eight sub basins over the year for 1950 -2013 illustrate the seasonal and the spatial variation of rainfall patterns (Figure 2).The largest fraction of total rainfall in the dry zone occurs at the end of the SIM (December) and during the NEM (January -February) with correspondingly high variability whereas there is little rainfall in the dry zone during the SWM (May -September) with correspondingly little variability (Figure 2 (h)).The intermediate zone receives approximately 60% of total rainfall from the SIM and NEM.Although the variability of the rainfall is low in the intermediate zone, high rainfall can occur in all seasons (Figure 2 (c) and (d)).In the wet zone, a large portion of rainfall occurs in SWM and early months of SIM (October-November).High variability of wet zone rainfall is observed at the end of FIM (April), in the SWM (May-September), and at the start of SIM (October) (Figure 2 (a), (b), (e), (f) and (g)).

Figure 3 :
Figure 3: Linear regression of rainfall anomaly on MEI and DMI.High values of MEI and DMI are associated with low values of rainfall.
Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-249Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 15 June 2018 c Author(s) 2018.CC BY 4.0 License.19 and 16 correct out of 22 occurrences (Table2).In the dry zone the NEM season is one of the most important for 197 rainfall.At Manampitiya, the MEI provides the primary variable in the classification, with the dry anomaly class being 198 correctly selected in 52 % by tree model and 95 % with the QDA model.The results suggest that it may be possible 199 to identify seasons when it is expected to be anomalously dry.The correct classification of "average" conditions likely 200 has less importance for water managers.We explored classification using two classes, "Dry" and "Not Dry."In this 201 case, the classification model again correctly classifies 86 % of the anonymously dry cases and gets more than 69 % 202 of the "Not Dry" cases correct (Figure5).

Figure 4 :
Figure 4: Norton Bridge and Manampitiya rainfall classes (dry, average, wet ) identified by ENSO and IOD
Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-249Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 15 June 2018 c Author(s) 2018.CC BY 4.0 License.Zubair, L., and Ropelewski, C. F.: The strengthening relationship between ENSO and northeast monsoon rainfall over Sri Lanka and southern India.Journal of Climate, 19(8), 1567-1575., doi.org/10.1175/JCLI3670.1,2006.Appendix: Identifying ENSO Influences on Rainfall with ClassificationModels: Implications for Water Resource Management of Sri LankaCorrelation coefficients between rainfall anomalies and MEI and DMI are negative for the NEM, FIM and SWM seasons and positive for the SIM season.Rainfall anomalies correlations to the DMI are not stronger as the correlations to the MEI.However, there are strong correlations for the anomalies of major monsoons to the sub basins and DMI values.For example, wet sub basins (Morape, Peradeniya, Laxapana, Norwood, Norton Bridge) have high correlation coefficient between SWM rainfall anomalies and DMI, while dry zone (Manampitiya) and intermediate zone (Randenigala, Bowatenna) sub basins have high correlation coefficient between NEM and SIM rainfall anomalies.

Figure A 2 ,
Figure A 2, Figure A 3, Figure A 4). Positive values of MEI and DMI values resulted dry or average rainfall class for the NEM, FIM and SWM seasons.However, for SIM rainfall has wet or average class for the positive values of MEI and DMI.Accuracy of model result are high for the dominant monsoon rainfall seasons of each sub basin (Table A.

Figure A 6 :
Figure A 6: Identifying relationships between two rainfall classes (dry, not dry) and MEI and DMI values using

Table 1
).A normality test for the rainfall data classes is done using the Shapiro-Wilk test.If the rainfall data are not normally distributed, log (e), square root or square functions are used to transform the data into normally distributed data sets.Table 1: Rainfall anomaly classification Class Range dry Minimum <=  _ < -0.5 average -0.5 <=  _ <0.5 wet 0.5 <=  _ <= Maximum

Table 2 :
Classification model results.Highlighted cells indicate where there may be information content with respect to forecasting either dry or wet anomaly classes as judged by a classification success rate of at least 2/3.

Table 3 :
Results of random forest ensemble classification results

Table 4 :
Results of random forest ensemble classification results for two rainfall anomaly classes

Table 4 ,
Table A 6).Similarly, for dry zone locations such as Manampitiya, the dry rainfall class identification for NEM and SIM seasons is about 60 % ( Table 4, Table A 6).Our statistical classification models can be combined with MEI and DMI forecasts to indicate the season-ahead expectation for rainfall.ENSO forecasts are available from the International Research Institute for Climate and Society (International Research Institute, 2017b) and IOD forecasts are available in the Bureau of Meteorology (BOM),

Table A .
1: Correlation between rainfall anomalies and MEI, DMI indices.High correlation coefficients are highlighted.
Table A.     2, Table A. 3, Table A. 4).Ensemble model approach with random forest has given comparatively lower out-of-bag error rate for the dominant monsoons' rainfall anomaly classification (TableA.4).For example, wet zone sub basins such as Norton Bridge, Norwood, Laxapana, Peradeniya and Morape random forest error rate is lower for the SWM and SIM seasons.Same as, dry and intermediate sub basins Manampitiya, Randenigala and Bowatenna NEM and SIM rainfall classes accuracy rate is high than other rainfall seasons.Also all three models have higher accuracy rate in

Table A .
3: Classification QDA model results.Highlighted cells indicate where there may be information content with respect to forecasting either dry or wet anomaly classes Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-249Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 15 June 2018 c Author(s) 2018.CC BY 4.0 License. A. 4: Random forest model results.Highlighted cells indicate where there may be information content with 487 respect to forecasting either dry or wet anomaly classes 488 Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-249Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 15 June 2018 c Author(s) 2018.CC BY 4.0 License.