Analyzing the future climate change of Upper Blue Nile River Basin ( UBNRB ) using statistical down scaling techniques

1 Chair of Hydrology and River Basin Management, Faculty of Civil, Geo and Environmental Engineering, Techn 5 , Arcisstrasse 21, 80333, Munich, Germany. 2 Amhara Regional State Water, Irrigation and Energy Development Bureau, Bahirdar, Ethiopia Correspondence to: Dagnenet Fenta (dagnfenta@yahoo.com) Abstract. Climate change is becoming one of the most arguable and threatening issues in terms of global context and their responses to environment and socio/economic drivers. Its direct impact becomes critical for water resource development and 10 indirectly for agricultural production, environmental quality, economic development, social well-being. However, a large uncertainty between different Global Circulation Models (GCM) and downscaling methods exist that makes reliable conclusions for a sustainable water management difficult. In order to understand the future climate change of the Upper Blue Nile River Basin, two widely used statistical down scaling techniques namely LARS-WG and SDSM models were applied. Six CMIP3 GCMs for LARS-WG (CSIRO-MK3, ECHAM5-OM, MRI-CGCM2.3.2, HaDCM3, GFDL-CM2.1, CCSM3) model while 15 HadCM3 GCM and canESM2 from CMIP5 GCMs for SDSM were used for climate change analysis.


Introduction
The impacts of climate change on the hydrological cycle in general and on water resources in particular are of high significance due to the fact that all natural and socio/economic system critically depend on water.The direct impact of climate change can be variation and changing pattern of water resources availability and hydrological extreme events such as floods and droughts, with many indirect effects on agriculture, food and energy production and overall water infrastructure (Ebrahim et al., 2013).The impact may be worse on trans-boundary Rivers like Upper Blue Nile River where competition for water is becoming high from different economic, political and social interests of the riparian countries and when runoff variability of upstream countries can greatly affect the downstream countries (Kim, 2008;Semenov and Barrow, 1997).
According to IPCC (2007), between 75 and 250 million people are projected to be exposed to increased water stress due to climate change in Africa by 2020.The increasing water demand of upstream countries in the Nile Basin coupled with climate change impacts can affect the availability of water resources for downstream countries and in the basin, that could result in resource conflicts and regional insecurities.Moreover, climate variability, the way climate fluctuates yearly and seasonally above or below a long-term average value, caused by changes in forcing factors such as variation in seasonal extent of the Inter Tropical Convergence Zone (ITCZ) like El Niño and La Niña events, is already imposing a significant challenge to Ethiopia by affecting food security, water and energy supply, poverty reduction and sustainable socio-economic development efforts.To mitigate these challenges, the Ethiopian government is therefore carried out a series of studies on pp N ( N ) w b " w " identifying irrigation, and hydropower potential and the use of the extensive water resources of the basin (BCEOM, 1998;USBR, 1964;WAPCOS, 1990).As the result, large scale irrigation and hydro-power projects including the Grand Ethiopian Renaissance Dam (GERD), the largest hydroelectric power plant in Africa, have been identified and being constructed as mitigation measure for the impacts of climate change.However, most studies were given less emphasis for climate change and its impact on the hydrology of the basin, hence, identifying local impacts of climate change at basin level is quite important especially in UBNRB for the sustainability of large scale water resource development projects, for proper water resource management leading to regional security and looking for the possible mitigation measures otherwise the consequences becoming catastrophic.
To this end, several individual researches have been done to study the impacts of climate change on the water resources of Upper Blue Nile River Basin.Taye et al. (2011) reviewed some of the research outputs and concluded that clear discrepancies were observed particularly on the projection of precipitation.For instance, the result obtained from (Bewket and Conway, 2007;Conway, 2000;Gebremicael et al., 2013) showed that there is no significant change on the amount of rainfall and there is no consistent patterns or trends observed.Kim (2008) used the outputs of six GCMs for the projection of future precipitations and temperature, the result suggested that the changes in mean annual precipitation from the six GCMs range from -11 % to 44 % with a change of 11% from the weighted average scenario at 2050s.On the other hand, the changes in mean annual temperature range from 1.4°C to 2.6 o C with a change of 2.3 o C from the weighted average scenario.
Likewise, Yates and Strzepek (1998a) used 3 GCMs and the result revealed that the changes in precipitation range from -5% to 30% and the change in temperature range from 2.2 o c to 3.5 o c.Yates and Strzepek (1998b) also used 6 GCMs and the result showed in the range from -9% to 55% for precipitation while temperature increased from 2.2 o c to 3.7 o c.Another study done by Elshamy et al. (2009), used 17 GCMs and the result showed that Changes in total annual precipitation range b w −15 % +14 % but the ensemble mean of all models showed almost no change in the annual total rainfall.While, all models predict the temperature to increase between 2 o C and 5 o C. Gebre and Ludwig (2014), used five biased corrected 50km x 50km spatial resolution GCMs for RCP4.5 and RCP8.5 scenarios to down scale the future climate change of 4 watershed (Gilgel Abay, Gumara, Ribb and Megech) located in Tana sub basin of UBNRB for the time period of 2030s and 2050s.The result suggested that the selected five GCMs disagree on the direction of future prediction of precipitation but multimodal average monthly and seasonal precipitation may generally increases over the watersheds.
For the historical context, the discrepancies could be due to the period and length of data analyzed and the failure to consider stations which can represent the spatial variability of the basin and also errors induced from observed data.For the future context, beside the above mentioned reasons, discrepancies could be due to the difference of GCMs and scenarios used for downscaling, the downscaling techniques applied (can be dynamical and statistical), selection of representative predictors, the period of analysis and spatial and temporal resolution of observed and predictor dataset.
To address uncertainty in projected climate changes, the (IPCC, 2014) thus recommends using a large ensemble of climate change scenarios produced from various combinations of Atmospheric Ocean General Circulation Model (AOGCMs) and forcing scenarios.However, it can become prohibitively time consuming to assess the climate change, using simultaneously many climate change scenarios and many Statistical Down scaling models.As a result, researchers typically assess the climate change and its impacts under only one or a few climate change scenarios selected arbitrarily with no justification for instance used only A1B and A2 scenarios respectively.Yet, there is no any hard rule to select an appropriate subset of climate change scenarios among the wide range of possibilities (Casajus et al., 2016).
Although GCMs perform reasonably well at larger spatial scales but poorly at finer spatial and temporal scales, especially precipitation, which is of interest to hydrological impact analysis (Goly et al., 2014).Hence, climate models are usually responsible for high uncertainty in climate change impact analysis, the processes of downscaling that ensures to narrow down the scale discrepancy between the coarse scale GCMs and the required local scale climate variables for hydrological models should be investigated for their contribution which is missed in previous studies of climate change analysis in the UBNRB.Many downscaling models have been developed in the past decade, to bridge the resolution gap between the coarse resolution GCMs and the required local scale for hydrological models to carry out impact studies, , however, no single model has been found to perform well over all the regions and time scales.Thus, evaluations of different models are critical to understand the applicability of the existing modelsMany researchers have been tried to compare the comparative skill of down scaling methods in different study areas such as (Dibike and Coulibaly, 2005;Ebrahim et al., 2013;Fiseha et al., 2012;Goodarzi et al., 2015;Hashmi et al., 2011;Khan et al., 2006;Qian et al., 2004;Wilby et al., 2004;Wilby and Wigley, 1997;Xu, 1999).However, no single model has been found to perform well over all the regions and time scales.Thus, evaluations of different models is critical to understand the applicability of the existing models.However, it remains difficult to directly compare the skill of different downscaling models (Goly et al., 2014).
Apart from the GCMs and downscaling techniques, most of the previous studies e.g (Beyene et al., 2010;Elshamy et al., 2009;Kim, 2008), used CRU, NFS and other gridded data sets constructed based on the interpolation of a few stations in Ethiopia, which has relatively less accuratecy as compared with the station based data (Worqlul et al., 2014).Therefore, the objective of this study is i) to evaluate the comparative performance of two widely used statistical down scaling techniques namely Long Ashton Research Station Weather Generator (LARS-WG) and Statistical Down Scaling Model (SDSM) over UBNRB ii) down scale future climate scenarios of precipitation, maximum temperature (Tmax) and minimum temperature (Tmin) at acceptable spatial and temporal resolution, which can be used directly for further hydrological impact studies.This can be achieved through applying a multi-model approach, to minimize the uncertainty of GCMs to construct and analyze detailed climate change scenarios for precipitation, maximum and minimum temperature over Upper Blue Nile River Basin at required resolution which can be used for further hydrological impact study.This can be achieved through the inclusion of systematically selected multiple GCMs and two downscaling methods byand incorporating acceptable number of weather stations which has long time series and reliable observed climate data to appreciate the uncertainties coming from GCMs and the process of downscaling methods to overcome the short comings of the previous studies on the study area .minimize the errors coming from the less accurate gridded data sets.
Generally, downscaling methods are classified in to dynamic and statistical downscaling (Fowler et al., 2007;Wilby et al., 2002).Dynamic downscaling nests higher resolution Regional Climate Models(RCMs) into coarse resolution GCMs to produce complete set of meteorological variables which are consistent each other.The outputs from this method is still not at required scale to what the hydrological model require.Statistical downscaling overcomes this challenge moreover it is computationally undemanding, simple to apply and provides the possibility of uncertainty analysis (Dibike et al., 2005;Semenov et al., 1997;Wilby et al., 2002).Extensive details on the strength and weakness of the two methods can be found (Wilby et al., 2007;Wilby et al., 1997).Among the different possibilities, two well recognized statistical downscaling tools, a regression based Statistical Down-Scaling Model (SDSM) (Wilby et al., 2002) and a stochastic weather generator called Long Ashton Research Station Weather Generator (LARS-WG) (Semenov et al., 1997;Semenov et al., 2002) were chosen for this study.They have been tested in various regions e.g., (Chen et al., 2013;Dibike et al., 2005;Dile et al., 2013;Elshamy et al., 2009;Fiseha et al., 2012;Hashmi et al., 2011;Hassan et al., 2014;Maurer and Hidalgo, 2008;Yimer et al., 2009) under different climatic conditions of the world.

Description of Study Area
The  (Elshamy et al., 2009).The whole UBNRB has an area coverage of 199,812 km 2 (BCEOM, 1998).For this study Rahad, Gelegu and Dinder sub catchments that do not flow through the main river stem to Sudan is excluded.The basin area coverage is 176,000km 2 which is about 15% of total area of 1.12 million km 2 (Awulachew et al., 2007)

Local data sets
The historical precipitation, maximum and minimum temperature data for the study area were obtained from Ethiopian Meteorological Agency (EMA), which were analyzed and checked for further quality control.A considerable length of time series data were missed in almost all available stations and hence 15 rainfall and 25 temperature stations which have long time series and relatively short time missing records were selected.Filling missed or gap records was the first task for further meteorological data analysis.This task was done using the well-known methodology of inverse distance weighing method (IDW).To check the quality of the data, the Double Mass Curve analysis (DMC) were used.DMC is a cross correlation between the accumulated totals of the gauge in question against the corresponding totals for a representative group of nearby gauges.

Large scale datasets
High uncertainty is expected in climate change impact studies if the simulation result is relied up on a single GCM due to the fact that each GCM has different spatial and temporal resolution with different assumptions of atmospheric processes (Kim and Kaluarachchi, 2009).Hence, aA new version of the LARS-WG5.5 was applied for this study that incorporates predictions from 15 GCMs which were used in the IPCC's Fourth Assessment Report (AR4) based on Special Emissions Scenarios SRES B1, A1B and A2 for three time windows as listed in As it is difficult to process all the incorporated 15 CMIP3 GCMs and as it is expected large differences in predictions of climate variables among the GCMs, the performance of GCMs in simulating the current climate variables of the study area (UBNRB) should be evaluated and best represented GCMs were selected.The MAGICC/SCEGEN computer program tool was used for the performance evaluation of the embedded 15 GCMs found in LARS WG5.5 database, as it is a standard method for selecting models on the basis of their ability to accurately represent current climate, either for a particular region and/or for the globe.
In this study, we used a semi-quantitative skill score that rewards relatively good models and penalizes relatively bad models as suggested by user manual Wigley (2008) The reasons for selecting these two GCMs were due to the fact that they are models that made daily predictor variables freely available to be directly fed into SDSM covering the study area with a better resolution.Additionally, they areHadCM3 is the most used GCMs in previous studies such as (Dibike et al., 2005;Dile et al., 2013;Hassan et al., 2014;Yimer et al., 2009), and HadCM3 ranked first in performance evolution done by MAGICC/SCEGEN computer program tools and its downscaled results match with the ensemble mean of the six GCMs used in LARS-WG model.Furthermore, they can represent two different scenario generations describing the amount of green house gases(GHGs) in the atmosphere in the future.HadCM3 GCM used emission scenarios of A2 (separated world scenario) in which the co2 concentration projected to be 414ppm, 545ppm and 754ppm and B2 (the world of technological inequalities) where the co2 concentration to be expected 406ppm, 486ppm and 581ppm at the time period of 2020s, 2050s and 2080s respectively (Semenov and Stratonovitch, 2010) w IP3 IP ' A 4 (IPCC, 2007).While canESM2 represents the latest and wide range of plausible radiative forcing scenarios, which include a very low forcing level (RCP2.6),where radiative forcing peaks at approximately 3 Wm -2 , approximately 490 ppm co2 equivalent before 2100 and then decline to 2.6Wm -2 ; two medium stabilization scenarios (RCP6 and RCP 4.5) in which radiative forcing is stabilised at 6Wm -2 (approximately 850 ppm co2 equivalent) and 4.5 Wm -2 ( approximately 650 ppm co2 equivalent) after 2100 respectively, and one very high baseline emission scenario (RCP8.5)for which radiative forcing reaches >8.5 Wm -2 (1370 ppm co2 equivalent) by 2100 and continues to rise for some time that were used for the IPCC's AR5, (IPCC, 2014).
The NCEP dataset were normalized over the complete 1961-1990 period data, and 3.

Description of LARS-WG Model
LARS-WG is a stochastic weather generator which can be used for the simulation of weather data at a single station under both current and future climate conditions.These data are in the form of daily time-series for a group of climate variables, namely, precipitation, maximum and minimum temperature and solar radiation (Chen et al., 2013;Semenov et al., 1997).
LARS-WG uses a semi-empirical distribution (SED) that is defined as the cumulative probability distribution function(CDF) to approximate probability distributions of dry and wet series, daily precipitation, minimum and maximum temperatures.EPM is a histogram of the distribution of 23 different intervals (a i-1 , a i ) where a i-1 < a i (Semenov et al., 2002), which offers more accurate representation of the observed distribution compared with the 10 used in the previous version.By perturbing parameters of distributions for a site with the predicted changes of climate derived from global or regional climate models, a daily climate scenario for this site could be generated and used in conjunction with a process-based impact model for assessment of impacts.
In general, the process of generating synthetic weather data can be categorized in three distinct steps: model calibration, model validation and scenario generation as represented in Figure 2 (a), which are briefly described by (Semenov et al., 2002) as follows.
The inputs to the weather generatorLARS-WG are the series of daily observed data (precipitation, minimum and maximum temperature) of the base period  and site information (latitude, longitude and altitude) are the inputs to the LARSWG.After the input data preparation and quality control, the observed daily weather data at a given site were used to determine a set of parameters for probability distributions of weather variables.These parameters are used to generate a synthetic weather time series of arbitrary length by randomly selecting values from the appropriate distributions, having the same statistical characteristics as the original observed data but differing on a day-to-day basis.The LARS-WG distinguishes wet days from dry days based on whether the precipitation is greater than zero.The occurrence of precipitation is modelled by alternating wet and dry series approximated by semi empirical probability distributions.The statistical characteristics of the observed and synthetic weather data are analyzed to determine if there are any statistically-significant differences using Chi-square goodness of fit test (KS) and the means and standard deviation using t and F test respectively by changing the parameters of LARS-WG (number of years and seed number).
To generate climate scenarios at a site for a certain future period and an emission scenario, the LARS-WG baseline parameters, which are calculated from observed weather for a baseline period (1984-2011) j b Δ-changes for the future period and the emissions predicted by a GCM for each climatic variable for the grid covering the site.In this study, the local-scale climate scenarios based on the SRES A2, A1B and B1 scenario simulated by the selected six GCMs are generated for the time periods of 2011-2030, 2046-2065, and 2080-2099 to predict the future change of precipitation and temperature in UBNRB.
-changes were calculated as relative changes for precipitation and absolute changes for minimum and maximum temperatures (Eq. 3 and 4),respectively.No adjustments for distributions of dry and wet series and temperature variability were made, because this would require daily output from the GCMs which is not readily available from LARS-WG data set (Semenov et al., 2010).(Dibike et al., 2005).

Description of SDSM
The SDSM is best described as a hybrid of the stochastic weather generator and regression based in the family of transfer function methods due to the fact that a multiple linear regression model is developed between a few selected large-scale predictor variables (Table 3) and local-scale predictands such as temperature and precipitation to condition local scale weather parameters from large scale circulation patterns.The stochastic component of SDSM enables the generation of multiple simulations with slightly different time series attributes, but the same overall statistical properties.(Wilby et al., 2002) .It requires two types of daily data, the first type corresponds to local predictands of interest (e.g.temperature, precipitation) and the second type corresponds to the data of large-scale predictors (NCEP and GCM) of a grid box closest to the station.
The SDSM model categorizes the task of downscaling into a series of discrete processes such as quality control and data transformation, screening of predictor variables, model calibration and weather and scenario generation as shown in Figure 2(b).Detail procedures and steps can be found (Wilby et al., 2002) for further reading.Screening potentially useful predictor-predictand relationships for model calibration is one of the most challenging but very crucial stage in the development of any statistical down scaling model.It is because of the fact that the selection of appropriate predictor variables largely determines the success of SDSM and also the character of the downscaled climate scenario (Wilby et al., 2007).After routine screening procedures, the predictor variables that provide physically sensible meaning in terms of their high explained variance, correlation coefficient (r) and the magnitude of their probability (p value) were selected.
The model calibration process in SDSM was used to construct downscaled data based on multiple regression equations given daily weather data (predictand) and the selected predictor variables at each station.The model was structured as monthly model for both daily precipitation and temperature using the same set of the selected NCEP predictors for the calibration period.downscaling.ConsequentlyHence, twelve regression equations were developed for twelve months.Bias correction and variance inflation factor was adjusted until the model replicate the observed data.Model validation was carried out by testing the model using independent data set.The weather generator helps to validate the calibrated model ideally using independent data.This operation generates the ensembles of synthetic daily weather data for the specified period with the help of regression model weights along with parameter file prepared during model calibration.To compare the observed and simulated data, SDSM has provided summary statistics function that summarizes the result of both the observed and simulated data.Time series of station data and large scale predictor variable information (NCEP reanalysis data) were divided into two groups; for the period from 1984-1995/ 1984-2000 and 1996-2001/ 2001-2005 for model calibration and validation of HadCM3/canESM2 GCMs respectively.
The Scenario Generator operation produces ensembles of synthetic daily weather series given observed daily atmospheric predictor variables supplied by a GCM either for current or future climate (Wilby et al., 2002).The scenario generation

Model performance evaluation criteria
A number of statistical tests were carried out to compare the skills of the two down scaling models categorized in to two main classes.First, quantitative statistical tests using metrics,A simulation of mean daily and monthly rainfall, Tmax and Tmin, during the calibration and validation of the SDSM and LARSWG time series were checked by using graphical representation and statistical performance index.Performance indicators such as mean absolute error (MAE), root mean square error RMSE), Bias (B), coefficient of determination (R 2 ) and, NasheSutcliffe Model Efficiency (NSE) are used to evaluate the comparative performance of the models to simulate the current climate variable of precipitation on long term monthly average basis defined by were used using Eq.7-Eq.11.Evaluation was done in two steps as suggested by (Goly et al., 2014)  In the above equations Xi and Yi are i-th observation and simulated data by the model, respectively.µx and µy are the average of all data of Xi and Yi in the study population and n is the number of all samples to be tested.
Additionally, varying weights technique was applied to the performance metrics as given in Eq. 12 to rank the models according to their skills.To avoid the discrepancy in weighing the performance measures because of differences in the order of their magnitudes, each performance measure is normalized (divided by the maximum value) and then the cumulative weighted performance measure for each downscaling model is calculated (Goly et al., 2014).The weights of metrics are arranged in such a way that more emphasizes is given to (MAE, RMSE), followed by Bias and less emphasis was given for R 2 and NSE ( 0.3, 0.3, 0.2, 0.1 and 0.1) respectively.
Secondly, qualitative tests , in order to compare the skill of models in regard to capturing the distribution of the observed data to the whole range and in capturing the extreme events were compared.For this purpose, statistical metrics and a graphical representation of Box-Whisker plots and Kolmogorov-Smirnov cumulative distribution test were applied to serve as a goodness of fit test for the distribution of the observed and simulated precipitation at monthly basis.Box-Whisker plots was selected because, in addition to the median, the Box-Whisker plot depicts the extreme values, respectively, the minimum and maximum (the caps at the end of each box), and the outliers falling the interquartile range above the third or below the first quartile (the points in the graph).For Kolmogorov-Smirnov cumulative distribution test, the observed and the simulated precipitation data from each model were compared using p value at significance level of 5% for each station.As the computed p-value is lower than the significance level alpha=0.05,one should reject the null hypothesis H0 (observed and simulated follow the same distribution), and accept the alternative hypothesis Ha.
As statistical metrics the following were used as suggested by Campozano et al. (2016): The interquartile relative fraction Where ACB is the absolute cumulative bias.A value of ACB = 0 is a perfect representation of the three percentiles (respectively, the 25 th , 50 th , and 75 th percentile) of modelled and observed distributions, while under or overestimation indicates a divergence of ACB from zero to positive values.Evaluation was done using equally weighted method only due to the assumption that the two metrics have equal weights as discussed above.Furthermore, the F-test and t-test are applied on testing the equality of monthly variances of precipitation and equality of monthly mean respectively.

1 Calibration and validation of LARS-WG
To verify the performance of LARS-WG, in addition to the graphic comparison, some statistical tests were performed.The Kolmogorov-Smirnov (KS) test is performed to test equality of the seasonal distributions of wet and dry series (WDSeries), distributions of daily rainfall (RainD), and distributions of daily maximum (TmaxD) and minimum (TminD) temperature.
The F-test is performed on testing equality of monthly variances of precipitation (RMV) while the t test is performed on verifying equality of monthly mean rainfall (RMM), monthly mean of daily maximum temperature (TmaxM), and monthly mean of daily minimum temperature (TminM)..All of the tests calculate a p-value, which is used to accept or reject the hypotheses that the two sets of data (observed and generated) could have come from the same distribution at the 5% significance level .Therefore, number of tests that results a p value less than 5% out of the total number of 8 dry/wet seasons or 12months were recorded for each stations.Tthe average number of P values less than 5% recorded from 26 stations and percentage failed from the total of 8 seasons or 12 months has been presented in Table 2 .It can be seen from Table 2 that The result revealed that LARS-WG performs very well for all parameters except RMM and RMV.On the other hand, LARS WG performs poor (i.e. an average of 2.2 % and 17.3 % of the months of a year were recorded obtained a P value < 5 %) for the monthly mean and variance of precipitation respectively.From these numbers, it can be noted that the model is less capable in simulating the monthly variances than the other parameters.
For illustrative purpose, graphical representation of monthly mean and standard deviation of the simulated and observed precipitation, Tmax and Tmin were constructed (insee Figure 3) for randomly chosen Gondar station as it has been difficult to present the result of all stations.It can be seen from Figure 3the result that observed and simulated monthly mean precipitation, Tmax and Tmin matches very well.However, as it is known for being difficult to simulate the standard deviations in most statistical downscaling studies, the performance of the standard deviation is less accurate as compared to the mean (Figure 3 (b) .Generally, according to the obtained statistical performance measure values and from graphical representation, the performance of the model for simulation of the three climatic variables in all stations across UBNRB is acceptable and reasonably well.

Down scaling with LARS-WG
The result of precipitation prediction by using LARS-WG model from six multi model GCMs under three scenarios (A1B, B1 and A2) for three time periods were presented and plotted in Figure 4 for illustrative purpose.In Figure 4, each boxwhisker plot represents the prediction of precipitation across all stations of UBNRB under a single scenario for each GCM and the result revealed that there are no coherent trends observed among various GCMs' for predicting precipitation.
NCCCSM GCM was found the most unstable GCM in predicting precipitation across UBNRB while MPEH5 was relatively stable across all stations as compared to others.
After downscaling the future climate predictions at all stations from the selected six GCMs, the projected precipitation analysis for the areal UBNRB was calculated from the point rainfall stations using Thiessen polygon method.The result analysis revealed that, GCMs disagree on the direction of precipitation change, two GCMs (MIHR and GFCM21) result decreasing trend whereas a majority or four GCMs (NCCSM, Hadcm3, MPEH5 and MIHR) result increasing trend from the reference period in all three time periods.The results from Figure 5 showed that NCCCS reported maximum increase while GFCM21 reported highest reduction.For 2030s, the relative change of mean annual precipitation projected between (-2.3 % and + 6.5 %) for A1B, (-2.3 % and +7.8 %) for B1 and (-3.7 % and +6.4 %) for A2 emission scenarios.At 2050s, the relative change in precipitation range from (-8 % and +22.7 %) for A1B, (-2.7 % and +22 %) for B1 and (-7.4 % and +8.7 %) for A2 scenarios.In the time of 2080s, the relative change of precipitation projected may vary from( -7.5 % and +29.9 %) for A1B, (-5.3 % and +13.7 %) for B1 and (-5.9 % and +43.8 %) for A2 emission scenarios.The multi model average result showed that in the future precipitation may generally increases over the basin in the range of 1 %-14.4% which matches with the result from HadCM3 GCM (0.8 %-16.6 %) as it is shown in Figure 5 .
In a different way from precipitation, the projection of mean annual Tmax and Tmin have coherent increasing trends were observed from the six GCMs under all scenarios in all three future time periods.At 2080s, the change in mean annual Tmax and Tmin is more pronounced than 2030s in all GCMs from three scenarios The result calculated from the ensemble mean showed that mean annual Tmax my increase up to +0.5 o c, +1.8 o c and +3.6 o c by 2030s, 2050s and 2080s respectively under A2 scenario which is in line with the results from both GFCM21 and HadCM3 GCMs (Figure 5).Likewise, UBNRB may experience an increase mean annual Tmin up to +0.6 o c, +1.8 o c and +3.6 o c by 2030s, 2050s and 2080s respectively from the multi model average.

3 2 Screening variable, model calibration and validation of SDSM
Initially, offline correlation analysis was performed using SPSS software between predictands and NCEP re-analysis predictors to identify an optimal lag and physically sensible predictors for climate variables of precipitation, Tmax and Tmin.Analysis of the offline correlation revealed that an optimal lag or time shift does not improve the correlation of predictands and predictors for this particular study.Average partial correlation of observed precipitation of all stations with predictors as shown in Figure 5 is shown in the Figure 7 which indicates all stations followed the same correlation pattern (both in magnitude and direction) that illustrates all stations can have identical physically sensible predictors with a few of exceptions.Furthermore, there exist a number of predictors that have correlation coefficient values in the range of 20 %-45 % for precipitation across all stations as shown in Figure 7.This range is considered to be acceptable when dealing with precipitation downscaling (Wilby et al., 2002) because of its complexity and high spatial and temporal variability to downscale.
The predictor variables identified for each downscaling GCMs and for the corresponding local climate variables conducted in this study are summarized in Figure 8. From the selected predictors, it is observed showed that different large scale atmospheric variables control different local variables.For instance, set of temp, mslp, s500, s850, p8_v, p500, shum are the most potential or meaningful predictors for temperature and s500, s850, p8_u, p_z, pzh, p500 for precipitation of the study area respectively, which is consistent with the result of offline correlation analysis.
After carefully screening predictor variables, model calibration and validation was carried out.The graphical comparison between the observed and generated rainfall, Tmax and Tmin were run to enhance the confidence of the model performance, as shown in Figure 6 and Figure 7 for Gondar station only.Examination of Figure 6 showed that the calibrated model reproduces the monthly mean precipitation and mean and standard deviation of daily Tmax, Tmin, and mean dry-spell length values quite well.However, the wet-spell length were consistently underestimated and alsomodel is less accurate in reproducing variance of observed precipitation.As Wilby et al. (2004) point out, downscaling models are often regarded as less able to model the variance of the observed precipitation with great accuracy.
Furthermore, the performance of the model was evaluated by statistical performance indicators metrics of (MAE, RMSE, R 2 , NSE and BIAS) as summarized in Table 4.The result of statistical analysis revealed that the model is much better in simulating Tmax and Tmin than precipitation, because of the high dynamical properties of precipitation makes it difficult to simulate.After accomplishing a satisfactory calibration (Figure 9), the multiple regression model is validated using an independent set of data outside the period for which the model is calibrated.as discussed under section 4, and the results obtained are shown in Figure 9 and Table 4. Examination of Figure 9, Figure 10 and Table 4 The validation result revealed that the model is successfully validated but at lesser accuracy as compared to calibration for both GCMs.In general, the result analysis of performance measure and graphical representation of observed and simulated both for calibration and validation revealed that the model performs very well in simulating the climate variables with high degree of accuracy.

Down scaling with LARS-WG
Since the performance of LARS-WG during calibration and validation was very good, down scaling of climate scenario can be done from six selected multi model CMIP3 GCMs under three scenarios (A1B, B1 and A2) for three time periods.The result of precipitation prediction were plotted in Figure 4 for illustrative purpose.After downscaling the future climate scenarios at all stations from the selected six GCMs, the projected precipitation analysis for the areal UBNRB was calculated from the point rainfall stations using Thiessen polygon method.The result analysis revealed that, GCMs disagree on the direction of precipitation change, two GCMs (CSMK3 and GFCM21) showed decreasing trend whereas a majority or four GCMs (NCCSM, Hadcm3, MPEH5 and MIHR) showed increasing trend from the reference period in all three time periods.By 2030s, the relative change of mean annual precipitation projected in the range between (-2.3 % and + 6.5 %) for A1B, (-2.3 % and +7.8 %) for B1 and (-3.7 % and +6.4 %) for A2 emission scenarios.At 2050s, the relative change in precipitation range from (-8 % and +22.7 %) for A1B, (-2.7 % and +22 %) for B1 and (-7.4 % and +8.7 %) for A2 scenarios.In the time of 2080s, the relative change of precipitation projected may vary from( -7.5 % and +29.9 %) for A1B, (-5.3 % and +13.7 %) for B1 and (-5.9 % and +43.8 %) for A2 emission scenarios.The multi model average result showed that in the future precipitation may generally increases over the basin in the range of 1%-14.4% which is in line with the result from HadCM3 GCM (0.8 %-16.6 %).
In a different way from precipitation, the projection of mean annual Tmax and Tmin have coherent increasing trends were observed from the six GCMs under all scenarios in all three future time periods.The result calculated from the ensemble mean showed that mean annual Tmax may increase up to +0.5 o c, +1.8 o c and +3.6 o c by 2030s, 2050s and 2080s respectively under A2 scenario which is in line with the results from both GFCM21 and HadCM3 GCMs.Likewise, UBNRB may experience an increase mean annual Tmin up to +0.6 o c, +1.8 o c and +3.6 o c by 2030s, 2050s and 2080s respectively from the multi model average.

Down scaling with SDSM
Here, as it is difficult to process all the selected six CMIP3 GCM3 using SDSM, we choose HadCM3 GCM as the best due to the fact that the downscaling result of HadCM3 using LARS-WG fits with the downscaling result of the ensemble mean model.Also, canESM2 from CMIP5 GCMs was selected to test the improvements of CMIP5 over CMIP3.Results of down scaling future climate scenario of areal UBNRB using SDSM calculated from all stations using Thiessen polygon methods are summarized fromin Figure 6Figure 8 .The magnitude of future climate change at each station has different pattern and magnitude using different scenarios as can be seen the variation in Figure 11.The overall analysis of the result of the whole UBNRB from Figure 6 indicates, a general increase in mean annual precipitation for three time windows (2030s, 2050s and 2080s) under in all 5 scenarios (A2a and B2a for HadCM3 and RCP2.6,RCP4.5 and RCP8.5 for canESM2) in the range of 2.1 % to 43.8 %.The maximum/minimum relative change of mean annual precipitation is projected to be 43.8 %/2.1 %, 29.5 %/3.5 % and 19 %/2.1% at 2080s, 2050s and 2030s under RCP8.5 scenario of canESM2 and H3B2a scenario of HadCM3 respectively.In general, RCP8.5 scenario of canESM2GCM resulted pronounced increase in all three time periods whereas scenario B2a of HadCM3 GCM reported minimum change over the study area.
Regarding to temperature, the down scaling result of Tmax and Tmin showed an increasing trend consistently in all months and , seasons in three time periods under all scenarios with mean annual value ranges from 0.5 o C to 2.6 o C and 0.3 o c to 1.6 o C under scenario RCP8.5 and H3B2a respectively.RCP 8.5 scenario reported maximum change while H3B2a scenario reported minimum change both for Tmax and Tmin in all three time periods as compared to other scenarios.The analysis of down scaling result illustrates maximum temperature may become much hotter as compared to minimum temperature in all scenarios and time periods in the future across UBNRB.Chen et al. (2013) argued that though major source of uncertainty are linked to GCMs and emission scenarios, uncertainty related to the choice of downscaling methods give less attention on climate change analysis.Therefore, in this study, comparative performance evaluation of the downscaling methods was has (Goly et al., 2014) given due emphasis and carried out donein a number of statistical and graphical tests both quantitatively and qualitatively.The model skill was evaluated and ranked at each site as shown in for Abaysheleko station.The overall rank obtained by summing up the rank of each model at each station is presented in and Table6 respectively, for quantitative and qualitative measures.The result revealed that SDSM/canESM2 narrowly performed best in simulating the long term average values in both equally weighted and varying weights of the quantitative metrics.However, LARS-WG performed best in qualitative measure in reproducing the distribution and extreme events of precipitation.It captures the distribution of the observed precipitation 93.3% (Table 5) from all stations while SDSM captures only 20% of the 15 stations equally both in canESM2 and HadCM3 GCMs at 5% significance level of Kolmogorov-Smirnov test.The t-test result revealed that 86.7% of the simulated precipitation by LARS-WG and SDSM/HadCM3 models are capturing their perspective mean values from all stations while SDSM/hadCM3 model capture only 66.7%.The F test showed 93.3 % of the simulated and the observed precipitation are normally distributed around their respective variance value in all three models.In general, the comparative performance test revealed that LARS-WG model performed best in qualitative measures while SDSM/canESM2 is best in quantitative measures in UBNRB.Furthermore, Figure 9  in Figure 12 For precipitation, at daily time series R value was 0.21 in LARS WG while 0.43/042 for HadCM3/canESM2 using SDSM.Whereas, R value for daily Tmax were 0.61 using LARS WG and 0.75/ 0.76 for HadCM3/ canESM2 using SDSM respectively.The R value for precipitation at monthly basis has improved significantly to 0.79 using LARS WG while 0.84 for both HadCM3 and canESM2 using SDSM.For Tmax R value was 0.89 using LARS WG and 0.91/0.92for HadCM3/ canESM2 using SDSM respectively.The result from the two downscaling models suggested that both SDSM and LARS-WG approximate the observed climate data corresponding to the current state reasonably well.

Comparative performance evaluation of LARS-WG and SDSM models
For future simulation, HadCM3 GCM under A2 scenario was used in common for two (LARS-WG and SDSM) down scaling methods for comparison purposeto test whether the downscaling methods may affect the GCMs result under the same forcing scenario.The results obtained from the two down scaling models were found reasonably comparable and both approaches showed increasing trend for precipitation, Tmax and Tmin.However, the magnitude of the downscaled climate data from the two techniques methods as presented in Figure 11: showed different.LARS-WG over predicts precipitation and thantemperature than SDSM.LARS-WG predict The relative change of mean annual precipitation using LARS-WG is about 16.1 % and an average increase in mean annual Tmax and Tmin about 3.7 o C and 3.6 o C respectively at 2080s while SDSM predicts relative change in mean annual precipitation only about 9.7 % and an average increase in Tmax and Tmin about 2 o c and 1.3 o C respectively in the same period.The difference could be due to the fact that SDSM uses large scale predictor variables from GCM outputs to predict local scale climate variables using multiple linear regression, while the LARS WG is analysed by applying the change factors from the GCM output of only those variables which directly correspond to the predictands.Moreover, because of the well known fact that GCMs are not very reliable in simulating precipitation, the error induced from the GCM output for precipitation will propagate the error of downscaling that makes the performance of LARS-WG to downscale precipitation more questionableneeds more caution (Dibike et al., 2005).
Therefore, based on the above facts SDSM would be more robust and can be applied at higher confidence for downscaling large scale GCMs outputs to finer scales to suit for hydrological models for impact assessment in the UBNRB.

Discussions and conclusions
The uncertainty related to climate change analysis can be due to climate models and downscaling methods among many others.In this study, we employed multi model approach to see the uncertainties came from different GCMs.In total, 21 systematically selected future climate scenarios were produced for each time period, which we might think representative to understand fully and to project plausibly the future climate change in the study area and to retain information about the full variability of GCMs.Moreover, we applied two widely used statistical down scaling methods, namely the regression downscaling technique (SDSM) and the stochastic weather generation method (LARS WG) for this particular study.
The performance of the three models (HadCM3/SDSM, canESM2/SDSM and LARS-WG) were tested for the base line period of 1984-2011 in representing the current situation particularly for precipitation, as it is the most difficult climate variables to model.The result suggested that SDSM using canESM2 GCM captures the long term monthly average very well at most of the stations and it ranked first from others.This could be attributed to the increasing performance of GCMs from time to time (i.e,CMIP5 GCMs performs better than CMIP3 GCMs) due to the fact that modeling was based on the new set of radiative forcing scenario (RCPs) that replaced SRES emission scenarios, constructed for IPCC AR5 where the impacts of land use and land cover change on the environment and climate is explicitly included.Also, it is one of the earth system models which has additional features that incorporates various important biogeochemical processes which is the limitation of CMIP3 GCMs like HadCM3.However, LARS-WG performed best in qualitative measures in capturing the distribution of precipitation and extreme events than SDSM.The better performance of LARS-WG in capturing the distribution and extreme events of the precipitation than SDSM in the study area may be associated with the use of 23 interval histograms for the construction of semi-empirical distribution, which offers more accurate representation of the observed distribution compared with the 10 used in the previous version (Semenov et al., 2010).Therefore, LARS-WG would be more preferred in areas of UBNRB where there is high climatic variability to correctly simulate the distribution and extreme events of the precipitation which is crucial for a realistic assessment of flood events and agricultural production.
The down scaling result reported from the six GCMs used in LARS-WG showed large inter model differences, 2 GCMs reported precipitation may decrease while 4 GCMs reported precipitation may increase in the future.The large inter model differences of the GCMs showed the uncertainties of GCMs associated with their differences of resolution and assumptions of physical atmospheric processes to represent local scale climate variables which are typical characteristics for Africa and because of low convergence in climate model projections in the area of UBNRB (Gebre et al., 2014).The multi model average result showed that in the future precipitation may generally increases over the basin in the range of 1 % -14.4 % which agreed is in line with the result from HadCM3 GCM (0.8 %-16.6 %), this indicates HadCM3 from CMIP3 GCMs has a better representation of local scale climate variables in the study area consistent with the previous study result by Kim and Kaluarachchi (2009) and (Dile et al., 2013) in the same study area.
Further uncertainty analysis of HadCM3 GCM from CMIP3 and canESM2 GCM from CMIP5 used by SDSM was carried out for precipitation.The downscaled results from the two GCMs modelled by SDSM suggested that mean annual precipitation may generally increase in the range of 2.1 % to 43.8 %.However, canESM2 better performs than HadCM3 in reproducing the current climate variables of UBNRB both in calibration and validation consistently (Table 4).The better performance of canESM2 could be due to the fact that modelling was based on the new set of radiative forcing scenario (RCP) that replaced SRES emission scenarios, constructed for IPCC AR5 where the impacts of land use and land cover change on the environment and climate is explicitly included.Also, it is one of the earth system models which has additional features that incorporates various important biogeochemical processes which is the limitation of CMIP3 GCMs like HadCM3.Even though, the simulation of large scale precipitation has improved since IPCC AR4, GCMs still continues to perform less well for precipitation as compared to temperature and therefore downscaling of precipitation becomes more complex and difficult to reproduce the base scenario as compared to downscaling of temperature (Fowler et al., 2007) also confirmed in this study (Table 4).However, a direct comparison between the projection from the two datasets (HadCM3 and canESM2) is not possible as seen from Figure 6, as they used different scenarios describing the amount of Green House Gases (GHGs) in the atmosphere differently.
LARS WG as it is a stochastic simulation tool that is commonly used to produce synthetic climate data of any length with the same characteristics as the input record, it simulate weather separately for single sites; therefore, the resulting weather series for different sites are independent of each other, which can lost a very strong spatial correlation that exists in real weather data during simulation.Although, a few stochastic models have been developed to produce weather series simultaneously at multiple sites preserving the spatial correlation, mainly for daily precipitation, such as space-time models, non-homogeneous hidden Markov model and nonparametric models typically use a K-Nearest Neighbour (K-NN) procedure (King et al., 2015), they are complicated in both calibration and implementation and are unable to adequately reproduce the observed correlations (Khalili et al., 2007).
To test the capability of LARS-WG in preserving the spatial correlation of stations while simulated, the simple Pearson's correlation coefficient (R 2 ) value was calculated for two stations Abaysheleko and Bahirdar and checked in two stations.
tThe result from revealed that the spatial correlation of the stations was distorted /decreased/ from the original to a lesser extent is insignificant.as expected.Even though, LARS WG has limitation to preserve the spatial correlation of climate variables, it can be applied for downscaling climate change scenario for the area of UBNRB satisfactorily with caution to hydrological impact models, as spatial distribution of precipitation may have essential effects on the discharge of a river and the formation of floods.
In conclusion, a multi model average from LARS-WG and individual model result from SDSM of this study showed a general increasing trend for all three climatic variables (precipitation, Tmax and Tmin) in all three time periods applying LARS-WG and SDSM downscaling techniques.The positive change of precipitation in future can be a good opportunity for the farmers who are engaged in rain fed agriculture to maximize their agricultural production and to change their lively hoods.However, this information cannot be a guarantee for irrigation farming because precipitation is not the only factor contributing to affect the flow of the river, which is the main source for irrigation.Evapotranspiration, dynamics of land use In general, this study has shown that climate change will occur plausibly that may affect the water resources and hydrology of the UBNRB, and the study proposed the outputs of canESM2 ESM with new sets of emission scenarios downscaled by SDSM technique can be applied for further impact analysis with high degree of certainty.Moreover, the paper provides information that the choice of downscaling methods has a contribution in the uncertainty of future climate estimation prediction of climate change and therefore, the comparative performance test has to be done.The Authors would like also to suggest for further assessment of large ensemble of CMIP5 GCMs which might enhance the limitation of this paper.in the same way as we consider multiple GCMs to acknowledge the uncertainty range.

Parameters
of Ethiopia .The elevation ranges between 489 m.a.s.l downstream on the western side to 4261m.a.s.l upstream at Mount Ras Dashen in the north-eastern part.The Upper Blue Nile River itself has an average annual run-off of about 49 BCM.In addition, the Dinder, Galegu and Rahad rivers have a combined annual run-off of about 5 BCM.The rivers of the Upper Blue Nile River Basin contribute on average about 62 per cent of Nile total at Aswan.Together with contributions of the Baro-Akobo and Tekeze rivers, Ethiopia accounts for 86 per cent of run-off at Aswan(BCEOM, 1998).The climate of Ethiopia is mainly controlled by the seasonal migration of the Inter-tropical Convergence Zone (ITCZ) following the position of the sun relative to the earth and the associated atmospheric circulation.It is also highly influenced by the complex topography.The whole UBNRB has long term mean annual rainfall, minimum and maximum temperature of 1452 mmyr −1 , 11.4 o C and 24.7 o C respectively as calculated by this study from 15 rainfall and 25 temperature gauging stations from the period 1984-2011.The mean seasonal rainfall based on the above data showed about 238 mm, 1065mm, and 148 mm occurred in Belg (October-January), Kiremit (July-September), and Bega (February-May) respectively, in which about 74 % of rainfall concentrates between June and September (Kiremit season).

(
IRF): to evaluate the modelled variability representation relative to the observed Eq.13: ............................................................................................(13) where IRF is the interquartile relative fraction.A value of IRF > 1 represents overestimation of the variability, IRF = 1 is a perfect representation of the variability, and IRF < 1 is an underestimation of the variability; Q m 3 and Q o 3 and the 75 th modeled and observed percentile;Q m 1 andQ o 1 and the 25 th modeled and observed percentile.The absolute cumulative bias (ACB): to evaluate the bias of the 25 th , 50 th , and 75 th percentiles Eq.14; .............................(14) and Figure 10 confirmed graphically the ability of LARS/WG model in capturing the distribution and extreme events of the precipitation in representative stations (randomly chosen) respectively by Whisker box plot and Kolmogorov-Smirnov test.p P ' elation coefficient (R) values of the observed and simulated for all stations presented land cover, proper water resource management and other climatic factors, which are not yet assessed by this study can influence the flow of the river directly and indirectly.Furthermore, the result from this study ( revealed that, maximum positive precipitation change may occur in Autumn (Sep.-Nov.)when most agricultural crops get matured and start harvesting while minimum precipitation change may occur during summer (June-August), when about 80% of the annual rainfall occurred, this climate variability can be potential threat for the farmers, who have limited ability to cope with the negative impacts of climate variability and overall ongoing economic development efforts in the basin.

Figure 6 :Figure 11 : 5 Figure 12 :FigureFigure 11 :
Figure 6: (a) Relative change of mean annual precipitation, and (b) change of mean annual Tmax and Tmin for three 10 time periods as compared to the baseline period of UBNRB using SDSM for HadCM3 and canESM2 GCMs under different scenarios Upper Blue Nile River Basin (UBNRB) extends from 7 o 45' to 13 o N and 34 o 30' and 37 o 45' E see Figure 1.It is one of the most important major basin of Ethiopia because it contributes to 45% of the countries surface water resources, 20% of

Table 1 .
However, the fifth phase of Coupled Model

Table 4 :
Table 4: Performance measure and ranking of models in terms of precipitation distribution at Abaysheleko

Table 5 :
Model ranking of statistical down scaling models during base line periodfor quantitative measure