Improving SWAT model performance in the Upper Blue

The Upper Blue Nile River Basin is confronted by land degradation problems, insufficient 17 agricultural production, and limited number of developed energy sources. Process-based hydrological 18 models provide useful tools to better understand such complex systems and improve water resources 19 and land management practices. In this study, SWAT was used to model the hydrological processes in 20 the Upper Blue Nile River Basin. The calibration was done in such a way that the parameterization had 21 a realistic representation of the interaction of land cover and soils properties. Comparisons between a 22 Climate Forecast System Reanalysis (CFSR) and a ground weather dataset were done under two 23 subbasin discretizations (30 and 87 subbasins) to create an integrated dataset to improve the spatial and 24 temporal limitations of both datsets. A SWAT Error Index (SEI) was also proposed to compare the 25 reliability of the models under different discretization levels and weather datasets. This index offers an 26 assessment of the model quality based on precipitation and evapotranspiration. SEI demonstrates to be 27 a reliable and useful method to measure the level of error of SWAT and develop better models. The 28 results showed the discrepancies of using different weather datasets with different levels of subbasin 29 discretization. Datasets under 30 subbasins achieved Nash-Sutcliff (NS) values of 0.15, 0.68 and 0.82; 30 while models under 87 subbasins achieved values of 0.05, 0.61, and 0.84, for the CFSR, ground and 31 integrated datasets, respectively. Based on the parameterization, the integrated dataset provided more 32 reliable results and a more realistic representation of the land use and soil conditions of this region. 33 34


Introduction
Water resources in the upper Blue Nile Basin are not being managed adequately; land use changes, fast population growth, land erosion, and deforestation are some of the causes currently affecting the watershed.Therefore, in order to improve and provide better land use management practices and mitigate the alarming erosion problems researchers need to understand the hydrological conditions of the basin.Physically based, distributed models have provided a very efficient alternative for watershed researchers for analyzing the impact of land management practices on soil degradation, agriculture, water allocation, and chemical yields (Setegn et al., 2008).Due to its versatility and applicability to complex watersheds, researchers have identified the Soil and Water Assessment Tool (SWAT) as one of the most intricate, consistent, and computationally efficient models (Neitsch et al., 2009;Gassman et al., 2007).Recent studies prove that SWAT has received international and interdisciplinary acceptance for modeling large and small watersheds (Malunjkar et al., Published by Copernicus Publications on behalf of the European Geosciences Union. 2015; Me et al., 2015;Rafiei Emam et al., 2016;Wang et al., 2017).SWAT provides a wide range of parameters to work with, allowing users to analyze several hydrological processes.It also has the advantage to have been developed to analyze the interaction of several hydrological parameters and the impact of land management practices specifically for large and complex basins; thus, it is a good model to be applied in the upper Blue Nile Basin.However, due to the lack of a unifying theory to accurately model the interaction of the hydrological processes, complex hydrological models suffer from overparameterization and high predictive uncertainty (Sivapalan, 2006).Therefore, it is difficult to simulate the complex interactions of hydrological processes and weather conditions of watersheds without uncertainties.
Among all the input parameters, the meteorological data have the most significant impact on the water balance of a watershed.However, a common problem to set up hydrological models of the upper Blue Nile Basin is related to data limitations.In developing countries, the distribution of meteorological stations is irregular and dispersed (Worqlul et al., 2014).Other weather data problems are related to measuring gauges; many weather data parameters contain missing data periods, and in several cases erroneous measurements are also possible.Thus, many models are often set up based on limited and incomplete data, which may lead to less reliable models.This lack of hydrological and climatic data has impeded in-depth studies of the hydrology of the upper Blue Nile Basin (Tekleab et al., 2011).Several previous studies have modeled the entirety and also small catchments of the Nile Basin, providing good and meaningful results (Tibebe and Bewket, 2011;Setegn et al., 2008Setegn et al., , 2010;;Swallow et al., 2009;Mulungu and Munishi, 2007).However, most of the hydrological models are built for the Lake Tana basin and its sub-basins: Gumara, Ribb, Gilgel Abay, and Koga (Chebud and Melesse, 2009;Setegn et al., 2008Setegn et al., , 2010a, b;, b;Wale, 2008).Dessie et al. (2015) and Kebede et al. (2006) performed a very detailed daily water balance analysis and annual water budget for the Lake Tana basin where the runoff and outflows of ungauged catchments were estimated.Uhlenbrook et al. (2010) performed an analysis of the hydrological processes and responses of Gilgel Abay and Koga catchments by applying the HBV model.Other studies have modeled the entire upper Blue Nile Basin; for instance, Abera et al. (2017) performed a water budget analysis in the upper Blue Nile Basin where precipitation, outflow, and evapotranspiration analyses were done.Betrie et al. (2011) and Easton et al. (2010) also modeled and calibrated the upper Blue Nile Basin using discharge data to estimate sediment yield and erodible areas of the basin; values of the calibrated parameters for flow and sediment were also shown.Dessie et al. (2014) also performed a runoff and sediment yield analysis in the upper Blue Nile Basin, although the main analysis was done at the Lake Tana region.Tekleab et al. (2011) also modeled the upper Blue Nile Basin, where an interesting water balance analysis was done and monthly streamflow for several subcatchments was modeled.However, most of the studies at large scale in the upper Blue Nile Basin do not provide detailed values for the each of the water balance components of the basin.Another important issue when setting up SWAT models concerns the right number of sub-basins, because the number of meteorological stations to be used by SWAT will depend on the number of sub-basins.For instance, if two stations are located within one sub-basin, SWAT will choose the station nearest to the center of the subbasin; the other station will be disregarded.However, if more sub-basins are created in a model, and these two stations lie in different sub-basins, then both stations will be considered by SWAT, which provides different water balance results.
Therefore, the first objective of this study has been the comparison of different weather datasets at large scale and under different sub-basin discretization levels.Two models were created using different subcatchment discretization (30 and 87 sub-basins), hereafter named SWAT30 and SWAT87, respectively (Fig. 3).The time frame of the models was from 1990 to 2004, using a 4-year warm-up period (1990-1993), a 6-year calibration period (1994-1999), and a 5-year validation period (2000)(2001)(2002)(2003)(2004).This comparison provided a better understanding of the effects of different sub-basin discretization levels on the total water balance of a watershed.It also helped to identify the temporal and spatial constraints of both datasets.Roth and Lemann (2016) performed a comparison between CFSR and conventional data in small catchments in the Ethiopian highlands, where they showed that the CFSR data provided unreliable results.However, Roth and Lemann (2016) made it clear that the CFSR data were tested only in very small catchments ranging from 112 to 477 ha and not at large scale, also suggesting that CFSR data should be carefully checked and compared with conventionally measured data of similar climatic stations.Furthermore, this study proposes an integration of CFSR and conventional weather data to be used at large scale in the upper Blue Nile Basin with an area of approximately 199 812 km 2 .Additionally, the CFSR stations used in the study were compared with conventionally measured data.Based on the obtained statistical results, the integration of these two datasets provides better models and a better representation of the magnitudes and distribution of the different weather variables in the upper Blue Nile Basin.
After a hydrological model has been set up, a critical point to determine its quality is the water balance.Therefore, in addition to graphical assessments, other statistical indicators as the Nash-Sutcliffe coefficient (NS), percent bias (PBIAS), and ratio of the root mean square error (RMSE) to the standard deviation of measured data were proposed by Moriasi et al. (2007).Based on these commonly used statistical indicators, most of the SWAT models provide very good results for discharge values at the outlet of a basin (van Griensven et al., 2012).However, the evaluation of the models based on both evapotranspiration and water balance is not discussed in detail, and the evapotranspiration behavior of a catchment is usually not presented.Several published documents could even report unrealistic parameter values (van Griensven et al., 2012).Therefore, the second objective of this study has been to propose an index, the SWAT error index (SEI), to quantify the level of error of a hydrological model.The SEI uses flexible weighting values for the relative root mean square error (rRMSE) obtained from measured flow discharge data and satellite evapotranspiration data.SEI proved to be a useful additional method to develop models that can provide a better representation of the water balance of a watershed.

Study site
The upper Blue Nile Basin, also known as Abay Basin, is located in the northwestern highlands of Ethiopia, approximately between latitude 7 • 40 and 12 • 51 N, and longitude 34 • 25 and 39 • 49 E, with elevations raging between 483 and 4248 m a.s.l.The total area of the upper Blue Nile Basin is approximately 199 812 km 2 , including two sub-basins shared with the Republic of the Sudan in the northern region.The climate in the upper Blue Nile Basin fluctuates from humid to semi-arid and it is mainly dominated by latitude and altitude, with average temperatures ranging from 13 • C in the southeastern to 26 • C in the southwestern regions.The lowest rainfall data detected during the current research period (1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004) correspond to the eastern region for the sub-basins of Beshilo, North Gojjam, South Gojjam, Welaka, Jemma, Muger, Guder, and Fincha, where the precipitation drops be-low 1000 mm yr −1 (Figs. 1 and 4); meanwhile, the highest precipitation ranges belong to the western region (Didessa, Wenbera, Anger, Dabus, and Beles), with precipitation above 1900 mm yr −1 (Figs. 1 and 4).The topographic disparity and variations in altitude of the upper Blue Nile Basin have a great impact on the weather, soil, and vegetation conditions.Consequently, rainy seasons are very variable in this watershed; for instance, the total discharge peaks at the El Diem gauging station can reach 7000 m 3 s −1 and dry seasons can go as low as 100 m 3 s −1 (Figs. 7 and 8).Soils in the upper Blue Nile Basin are mainly dominated by 10 types (Fig. 2): Eutric Nitosols, Eutric Cambisols, Humic Fluvisols, Cambic Arenosols, Chromic Vertisols, Dystric Cambisols, Eutric Fluvisols, Eutric Regosols, Orthic Acrisols, and Pellic Vertisols (FAO, 2015).

Datasets
A Shuttle Radar Topographic Mission digital elevation model (SRTM DEM) from the Consultative Group on International Agricultural Research -Consortium for Spatial Information (CGIAR-CSI) was used to set up the model.This DEM has a resolution of 90 m and was used to perform an automatic watershed delineation of the upper Blue Nile Basin, where the flow direction, flow accumulation, and stream network were automatically determined by SWAT.
The second input dataset was a land use map, which was obtained from the GIS portal of the International Livestock Research Institute (ILRI) and corresponds to the year 2004 (http://data.ilri.org/geoportal/catalog/main/home.page).
The soil map used for these models was developed by the Food and Agriculture Organization of the United  Nations (FAO-UNESCO).This world soil map was prepared by FAO and UNESCO at 1 : 5 000 000 scale (http://www.fao.org/soils-portal/soil-survey/soil-maps-and-databases/ faounesco-soil-map-of-the-world/en/).The information provided by this map was used in combination with the Harmonized World Soil Database v1.2, a database that combines existing regional and national soil information (http://www.fao.org/soils-portal/soil-survey/soil-maps-and-databases/ harmonized-world-soil-database-v12/en/).
The last input dataset was the meteorological information.Two weather datasets from different sources were used to set up the models.The first weather dataset was collected from the National Meteorology Agency of Ethiopia (NMA).The data used for these models correspond to 42 stations distributed in the upper Blue Nile Basin (Fig. 3).However, only 15 of these stations are capable of measuring all five parameters needed to set up SWAT: rainfall, temperature, relative humidity, solar radiation, and wind speed.Moreover, few of these 15 station have complete and continuous data available for the entire period under study (1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004).For instance, the collected data for solar radiation were limited to 2 stations only, wind speed was available for 4 stations, only maximum temperature was available for 4 stations, relative humidity was available for 3 stations, and precipitation was available for all 42 stations.Additionally, the quality of these observed data is somehow questionable.Many meteorological stations are more than 10 years old, and their constant technical failure due to the lack of continuous expert maintenance also questions the quality of the data.A large part of the available ground data has been collected from old stations that could have in many cases malfunctioning, defected, and outdated devices.The second weather dataset was the Climate Forecast System Reanalysis (Fig. 3), a dataset that has been produced by the National Centers for Environmental Prediction (NCEP) (http://globalweather.tamu.edu/).CFSR data bring several uncertainties due to their multiple spatial and temporal interpolations (Dile and Srinivasan, 2014).These data were generated using different assimilation techniques that include satellite radiances, advanced coupled atmospheric, oceanic, and land surface modeling components.The global atmospheric resolution of CFSR data is approximately 38 km.These atmospheric, oceanic, and land surface output products are available at a 0.5 • × 0.5 • latitude and longitude resolution.Both weather datasets used for these models correspond to the period 1990-2004.
For the analysis of the quality of the SWAT models, monthly flow discharge data and evapotranspiration data were used.The flow discharge data were obtained from the Ministry of Water, Irrigation and Electricity of Ethiopia and correspond to the gauging stations at Kessie and El Diem at the main stream of the upper Blue Nile Basin (Fig. 3).For the evapotranspiration analysis, data from the MOD16 Global Terrestrial Evapotranspiration Project (http://www.ntsg.umt.edu/project/mod16) were obtained.The global evapotranspiration data from MOD16 are regular 1 km 2 land surface datasets for the 109.03 million km 2 of vegetated area in the whole world at different time intervals (8 days, monthly, and annual) from which monthly data generated specifically for the Nile Basin were used.

Water balance and evapotranspiration processes in SWAT
Water balance in watersheds is one of the most important factors used to determine if a model is good enough for any particular application.Hence, analyses of the processes involved in the estimation of the water balance of a watershed (evapotranspiration, runoff, and groundwater) can provide more details about the hydrological behavior of a watershed and can be used to understand the interaction of main hydrological processes (Zhang et al., 1999).For the input data processing and hydrological estimation, SWAT uses two levels of discretization: sub-basins and hydrologic response units (HRUs).HRUs are contained in the sub-basins and are defined based on the land use map, soil map, and slope classes.HRUs allow the model to reflect differences in evapotranspiration and other hydrologic conditions for each crop and soil type.The water balance in SWAT is calculated for each HRU using the following formula (Neitsch et al., 2009): where SW t is the final soil water content (mm), SW 0 is the initial soil water content on day i (mm), R day is the amount of rainfall on day i (mm), Q surf is the amount of surface runoff on day i (mm), E a is the amount of evapotranspiration on day i (mm), W seep is the amount of water entering the vadose zone from the soil profile on day i (mm), and Q gw is the amount of return flow on day i (mm).SWAT can estimate the evapotranspiration using several methods, from which the Hargreaves and Penman-Monteith methods were compared in this study (Figs. 11 and 12).The Hargreaves method calculates the potential evapotranspiration using minimum and maximum daily temperatures as input data (Hargreaves and Samani, 1982).This method was chosen as a better option for the upper Blue Nile Basin due to the data scarcity of the meteorological stations in the basin.The Hargreaves equation can be used with the sole input of temperature data, while the Penman-Monteith requires more data, for instance, wind speed, solar radiation, and relative humidity.The Hargreaves method has been recommended for computing potential evaporation in cases when only the maximum and minimum temperatures are available (Allen et al., 1998).A study from Tekleab et al. (2011) was also able to successfully use the Hargreaves equation to calculate the potential evaporation in the upper Blue Nile Basin.Several improvements were made to the original equation since 1975 (Hargreaves and Samani, 1982).The final form of the Hargreaves equation used in SWAT andpublished in 1985 (Hargreaves andSamani, 1985) is as follows (Neitsch et al., 2009): where λ is the latent heat of vaporization (MJ kg −1 ), E 0 is the potential evapotranspiration (mm day −1 ), H 0 is the extraterrestrial radiation (MJ m −2 day −1 ), T max and T min are the maximum and minimum air temperature for a given day ( • C), respectively, and T av is the mean air temperature for a given day.
Following the potential evapotranspiration, the actual evapotranspiration must be calculated.Initially, SWAT calculates the evaporated water intercepted by the canopy; then, maximum transpiration and soil evaporation are calculated.Evaporation from canopy is very significant in forested areas and in several cases can be higher than transpiration.Transpiration for the Hargreaves equation is calculated as (Neitsch et al., 2009) where E t is the maximum transpiration on a given day (mm H 2 O), E 0 is the potential evapotranspiration adjusted for evaporation of free water in the canopy (mm H 2 O), and LAI is the leaf area index.Evaporation from the soil on a given day is calculated with following equation (Neitsch et al., 2009): where E s is the maximum soil evaporation on a given day (mm H 2 O), E 0 is the potential evapotranspiration adjusted for evaporation of free water in the canopy (mm H 2 O), and cov sol is the soil cover index.

Weather data processing and integration
If input data are used without the respective analyses, models will provide less reliable results.Also, even small errors in temperature or precipitation can result in considerable inaccuracies and impacts on the model results (Maraun et al., 2010).Tekleab et al. (2011) and Uhlenbrook et al. (2010) checked the data quality of streamflow data in the upper Blue Nile Basin based on comparison graphs and additionally a double mass analysis.In this study, the data quality and consistency of the time series on monthly basis in terms of magnitude and spatial distribution of the five input variables required by SWAT were also analyzed through comparison graphs  to determine the deficiencies of the two datasets (CFSR and ground datasets) and to form an integrated dataset.
In the first case, the ground dataset was used without alterations to create the SWAT models.This ground dataset obtained from the NMA corresponds to 42 stations in the upper Blue Nile Basin, where most of the meteorological stations were located in the eastern part of the watershed (Fig. 3).Additionally, the data obtained from these stations had several months of missing data, leading to temporal uncertainties.
For the second case, the SWAT models were set up using the CFSR dataset, also without alterations.This dataset is evenly distributed at 38 km resolution, with over 100 stations available for the upper Blue Nile Basin, and is temporally continuous.
However, after performing a quality check through a comparison of maps and graphs between the ground and CFSR datasets , it was noticed that not all the weather variables from CFSR are reliable.The precipitation distribution appeared to be underestimated in the eastern region of the upper Blue Nile Basin and overestimated in the western region (Fig. 4).The map created from the ground stations (Fig. 4b) showed a precipitation distribution in the western region that was the result of SWAT using the precipitation values from the nearest stations.Two stations in the eastern part, Alem Ketema and Adet (Figs. 5a,b and 6a,b), showed the underestimation of the CFSR rainfall at the eastern region, and Ayehu (Figs. 5c and 6c) showed the overestimation of the CFSR rainfall in the western region.For this reason, additional CFSR rainfall stations were not used in the integrated dataset.However, the graphical and statistical comparisons of the few available stations for relative humidity, temperature, and solar radiation showed an acceptable level of agreement between the ground and CFSR datasets.The seasonal behavior and magnitudes of the values for these variables are similar; additionally, the 1 : 1 graphs showed an acceptable degree of matching (Fig. 6).For instance, the values for relative humidity for Debre Tabor and Aykel with both datasets show very similar values (Figs.5d, e and 6d, e).The comparisons of maximum temperature for Aykel also showed a good degree of matching (Figs.5g and 6g), although for Bahir Dar the results were not very good, showing  and 6h).The solar radiation comparison at Bahir Dar (Figs. 5i and 6i) also showed a good agreement between both datasets, although results at Debre Tabor (Figs. 5j and 6j) showed slightly different results.The exception was the wind speed data, which in both cases at Adet and Ayehu (Figs. 5k, l and 6k, l) were overestimated by the CFSR dataset.Therefore, these two datasets were integrated to form a third input dataset for SWAT with the objective of overcoming their spatial and temporal limitations.Tekleab et al. (2011) and Uhlenbrook et al. (2010) filled in missing streamflow data of the upper Blue Nile Basin using regression analysis, which is also a good approach to fill in missing meteorological values.However, in this study, the missing values of the ground dataset refer to complete time series of a specific station and variable.Thus, to create the integrated dataset, the 42 rainfall stations of the ground dataset were taken as the basis; this means that the locations of the weather stations of the final integrated dataset correspond to the locations of the 42 rainfall stations of the ground dataset.From there, the missing variables (relative humidity, temperature, and solar radiation values) of those 42 rainfall stations were completed by using the variables of their nearest CFSR stations.The integrated dataset has 42 stations where the data for each variable were combined as follows: the precipitation is formed by 42 rainfall stations taken entirely from the ground dataset; the relative humidity is formed by 3 stations from the ground dataset and 39 stations from the CFSR dataset; the maximum temperature is formed by 4 stations from the ground dataset and 38 stations from the CFSR dataset; the values for the minimum temperature were taken totally from the CFSR dataset; the solar radiation was formed by 2 stations from the ground dataset and 40 stations from the CFSR dataset; no wind speed data were used in the models.However, missing daily values within a variable were completed by the built-in SWAT weather generator.This integrated dataset contained more data than the ground dataset and also provided more reliable precipitation values and distribution than those provided by the CFSR dataset.

Parameterization for the calibration and validation of the models
One of the major constrains of hydrological modeling is the difficulty of the parameterization of different variables (Hauhs and Lange, 2008).The correct combination of the values of the parameters influencing the ground water, runoff, and evapotranspiration processes is a key point in a model calibration.The characterization of watersheds considering their most influential variables is a good approach to determine the predictive capabilities of a model (McDonnell et al., 2007).Initially, it is recommended to perform calibrations for annual discharge values once acceptable results are acquired; a calibration based on monthly values can be performed to achieve more detailed results (Neitsch et al., 2009).
During a model calibration, a potential value can be assigned for each parameter and for each HRU, which would generate a large number of parameters.However, these values can also be applied as a global modification to estimate parameters by multiplying or adding values.Table 2 shows the parameterization applied to the respective regions in the watershed to calibrate streamflow at Kessie and El Diem, where r stands for relative values and v for values to be replaced.The same parameterization was applied to all the models with different subcatchment delineations and data sources.Land coverage, soil types, and slope have a great impact on the total water balance, and a calibration with wrong parameters values will only produce models with good statistical results but with less realistic representation of the actual properties of the watershed.Therefore, the values of the parameters were modified within the ranges specified by the SWAT input/output documentation 2012 (Arnold et al., 2012).For instance, the available water content of the soils was calibrated in such a way that it did not change the physical properties of the soils.The curve number 2 (CN2) values were defined within different ranges based on the type of land cover.

Calibration and validation with flow discharge
In the case of hydrological modeling, the limitation of the data quality and capabilities of the model to represent the complexity of the hydrological process often constitutes obstacles.Therefore, models must be calibrated, and a statistical analysis is also required to determine how reliable the results of the model are prior to their applications (Bastidas et al., 2002).Since sediment data for the upper Blue Nile Basin are very limited, the calibration and validation of the models were done using flow discharge data only.The calibrated stations were Kessie and El Diem at the mainstream of the Blue Nile River (Fig. 3).For the automatic calibration, the Sequential Uncertainty Fitting version 2 (SUFI-2) was used to efficiently calculate the coefficient of determination (R 2 ) and NS as likelihood measures, trying to catch the seasonal dynamics and magnitudes of the measured discharge data.SUFI-2 is a sequential parameter estimation method that operates within parameter uncertainty domains (Tanveer et al., 2016).SUFI-2 performs several iterations, where each iteration provides better results than the previous iteration and reduces the parameters ranges.In SUFI-2, the objective is to capture most of the observed values within the 95PPU (95 % prediction uncertainty) range at the same time that thinner 95PPU range is preferable.The 95PPU represents the uncertainty in the model outputs.Therefore, the simulation starts assuming large and physically meaningful parameter ranges, so that the measured data fall within the 95PPU and continuously decrease the ranges of the 95PPU and produce better results.The final 95PPU is the 95 % of the observed data captured within the final 95PPU band, which are defined by E. I. Polanco et al.: Improving SWAT model performance in the upper Blue Nile Basin the final parameter intervals.Therefore, the best simulation is the best iteration within the 95PPU, and considering that is difficult to claim a specific parameter range for a certain watershed, any solution within the 95PPU should be an acceptable solution.The fit of simulated results within the 95PPU is quantified through the p factor and r factor.The p factor is the percentage of observed data that fall within the 95PPU and ranges from 0 to 1, while the r factor is the thickness of the 95PPU band and ranges from 0 to infinity.The quality of a calibration and the prediction uncertainty are judged based on how close the p factor is to 1 and how close the r factor is to 0 (Yang et al., 2007).A p factor of 1 and r factor of 0 represent the measured data.As the number of iterations increases, SUFI-2 continues to reduce the 95PPU thickness and produces smaller values for the p factor and r factor, trying to find a better combination of the parameter values.The uncertainty in SUFI-2 is expressed as a uniform distribution of parameter ranges, and parameter uncertainties are considered for any possible source in variables, for instance, model inputs, model structure, model parameters, and also measured data (Abbaspour et al., 2015).The uncertainties in the outputs are expressed as the 95PPU.The uncertainty analysis in SUFI-2 is based on the concept that a single parameter value generates a single model response, while a parameter range or propagation of the parameter uncertainty leads to the 95PPU.
The coefficient of determination (R 2 ) is a measure of how well the regression line represents the data and gives a measure of the proportion of the fluctuation of a variable that is predictable from another variable.The values for this coefficient denote the strength of the linear relation between Q m and Q s , representing the percentage of the data closest to the line of best fit.The R 2 objective function provided in SWAT-CUP is as follows: where Q indicates discharge values, "m" and "s" stand for observed and simulated values, respectively, and i is the ith measured or simulated data.NS is widely used as goodness-of-fit indicator that expresses the potential predictive ability of a hydrological model (Nash and Sutcliffe, 1970).The Nash-Sutcliffe objective function provided in SWAT-CUP is as follows: where Q indicates discharge values, "m" and "s" stand for observed and simulated data, respectively, and the bar stands for the average values.

Actual evapotranspiration analysis
In addition to the calibration and validation of the SWAT models with flow discharge, comparisons with evapotranspiration data could also provide more details to quantify the reliability of hydrological models.Therefore, actual evapotranspiration data for the upper Blue Nile Basin were obtained from the MODIS Global Terrestrial Evapotranspiration Project (MOD16).These are global estimated data from land surface by using satellite remote sensing data.These data are intended to be used to calculate regional water balances; hence, they are a very important source of data for watershed management and hydrological models analyses.
The original MOD16 evapotranspiration (ET) algorithm (Mu et al., 2007) was based on the Penman-Monteith equation (Monteith, 1965), while the current MOD16 ET has used the improved evapotranspiration algorithm (Mu et al., 2011).In this improved algorithm, the sum of the evaporation from the wet canopy surface, transpiration from the dry canopy surface, and evapotranspiration from the soil surface constitute the total daily ET (Mu et al., 2011).The formulae for the total daily ET(λE) and potential ET (λE POT ) are where λE wet_C is the evaporation from the wet canopy surface, λE trans is the transpiration from the dry canopy surface (plant transpiration), λE SOIL is the evaporation from the soil surface, λE POT_trans is the potential plant transpiration, and λE SOIL POT is the potential soil evapotranspiration.
Previous studies have already shown that the annual ET values derived from the MOD16 algorithm are lower than those provided by hydrological models, principally when using the Hargreaves method.For instance, Ruhoff et al. (2013) detected an underestimation of 21 % in the evapotranspiration provided by MOD16 in the Rio Grande Basin, Brazil, where the underestimation was mainly caused by the misclassification of the land use.Sun et al. (2007) also identified certain disadvantages in the MOD16 evapotranspiration.Nevertheless, in this study, the evapotranspiration estimations from SWAT were compared with satellite evapotranspiration data.This was done only as comparison and not with the objective of calibrating the models, and also as a test to understand the performance of the proposed SEI.
Evapotranspiration estimations shown as percentage of the average annual precipitation are frequently given for the upper Blue Nile Basin.However, these percentages would yield totally different amounts depending on the average annual precipitation provided by different weather data sources and under different sub-basin discretization.Therefore, a comparison of the actual evapotranspiration data provided by MOD16 with the values calculated by SWAT under the Hargreaves and Penman-Monteith equations was done to show the level of discrepancy between datasets (Figs.11, 12, and 14).MOD16 ET data are available only for the period 2000-2010; hence, the comparison was done only for 5 years (2000)(2001)(2002)(2003)(2004).

SWAT error index
A common problem of hydrological models is the wrong combination of the values of the calibrated parameters, which can also lead to good graphical results, and consequently good statistical values, but wrong water balance values.Consequently, good R 2 and NS values do not always denote the reliability of a model.R 2 and NS are common statistical parameters used to evaluate and compare time series in hydrological models (Abbaspour, 2015;De Almeida Bressiani et al., 2015;Dile and Srinivasan, 2014;Gebremicael et al., 2013).Additionally, rainfall distribution, parameterization, and evapotranspiration are also crucial points to be considered in any hydrological model.Therefore, in this study, after good calibration and validation values for R 2 and NS were achieved, and after a comparison between the SWAT ET and MOD16 ET values was done, an index (the SEI) to quantify the model quality has been introduced.This index is intended to be used only as an additional indicator to assess the reliability of the SWAT model, where the rRMSE was chosen as the fitting function.Several reliable measured flow discharge datasets are available for rivers, but that is not the case for evapotranspiration data.However, satellite evapotranspiration data are available for most watersheds in the world.Furthermore, the measured discharge dataset and the satellite estimated evapotranspiration dataset do not have the same level of reliability.Therefore, SEI uses different weighting values (W 1 and W 2 ) to define differences in the level of reliability of the datasets: 0.7 for flow discharge and 0.3 for evapotranspiration.The proposed equation for SEI is as follows: The first part of the equation corresponds to the rRMSE of the values obtained from the discharge data, where Q oi is the observed discharge data (m 3 s −1 ), Q si is the simulated discharge data (m 3 s −1 ), Q o max is the maximum value of the observed discharge data, and Q o min is the minimum value of the observed discharge dataset.The second part of the formula corresponds to the rRMSE achieved from the evapotranspiration data that were obtained from MOD16, where ET oi is the MOD16 evapotranspiration values, ET si is the SWAT-simulated evapotranspiration data, and ET o max and ET o min are the maximum and minimum values of the MOD16 evapotranspiration data, respectively.W 1 and W 2 are the assigned weighted values for discharge and evapotranspiration, respectively.SEI ranges from 0 to +∞, with 0 corresponding to the ideal value.When the SEI value of the model is closer to 0, the model will have a better match with the flow discharge and the evapotranspiration data.Since SEI includes the rRMSE values for discharge and evapotranspiration data, a model with good SEI results represents a model with a good agreement between these two hydrological processes, which are two important processes influencing the water balance of a watershed.By analyzing the SEI results, the quality of the combination of the parameter used for the calibration could also be evaluated and is less expectable to have a wrong parameterization.SEI was tested for two cases: the first one in whole upper Blue Nile Basin and the second in the Ribb subcatchment in the Lake Tana region.

Impact of different subcatchment discretization levels and rain gauge combinations
After analyzing the different datasets under different discretization levels, it was detected that the input data and the parameterization have a critical impact not only on the water balance but also on the sub-basins' distribution.The water balance analysis was done for two calibrated stations, three datasets, and two different sub-basin distributions.Water balance results for the upper Blue Nile Basin and also the values www.hydrol-earth-syst-sci.net/21/4907/2017/ Hydrol.Earth Syst.Sci., 21, 4907-4926, 2017  for the different hydrological processes and models are given in Table 3; values for these hydrological processes from the literature are also given in Table 1 (Cherie, 2013;Mengistu and Sorteberg, 2012).The average annual precipitation in the upper Blue Nile Basin differs between the literature (Table 1) and also between dataset sources (Table 3).The uncertainty of the rainfall in the upper Blue Nile Basin is also noticeable when models with different sub-basin delineations are compared and show different values (Table 3, Figs.7 and 8 for El Diem; Figs. 9 and 10 for Kessie; with SWAT30 and SWAT87, respectively).With the values provided in Table 2, it was possible to obtain good statistical values for the calibrated models (Table 4).
Figures 7 and 8 show the magnitude and dynamics of the measured and estimated monthly discharge flow at El Diem.The integrated dataset provided good statistical values for R 2 and NS (Table 4) under both discretization levels.The other models using the ground and CFSR datasets also showed good R 2 results but very low NS values, with the exception of SWAT87 with ground data (Table 4, Figs.7 and 8).Although R 2 is always high in all the models, R 2 is a coefficient that measures only the dynamics of a model, meaning that the models behave with accuracy matching the seasonality of the rainfalls and dry periods in the upper Blue Nile Basin.However, NS is probably a more important factor to be considered, as it can be used to quantitatively describe the accuracy of model outputs.Calibrations and validations at Kessie showed good statistical values for the models using the ground and integrated datasets, achieving good R 2 and NS values (Table 4,Figs. 9 and 10).
SWAT30 under the CFSR dataset provides an average annual precipitation of 1253 mm (Table 3), while SWAT87 av-   0.78;ground data: 0.80,0.76;and CFSR data: 0.74,0.37,respectively.erage annual precipitation increases to 1481 mm.This rainfall increase provided by the CFSR dataset is caused by the number of sub-basins, SWAT87 considered more stations than SWAT30.However, both average annual precipitation values compared to the other two datasets and to the literature (Table 1) are still within acceptable ranges for the upper Blue Nile Basin, and it is not the main factor affecting the water balance, but its distribution in the watershed (Fig. 4).Figures 9 and 10 showed how CFSR data underestimate the precipitation in the eastern part of the basin (at Kessie) compared to those provided by the ground and integrated datasets.Figures 9 and 10 also showed the effect of the number of sub-basins on the simulated discharge flow.The flow discharge provided by the CFSR data is slightly higher in SWAT87 compare to SWAT30, although in both cases this dataset continues to underestimate the flow discharge at Kessie.As the precipitation in the watershed changes in magnitude and distribution, the parameterization for the calibration of the models will be different.Therefore, in order to meet good R 2 and NS for the model with an incorrect precipitation distribution (in this case, the CFSR data), the values of the parameters needed to be modified to unrealistic values.

Average annual evapotranspiration and the impact of different data sources and potential evapotranspiration methods
The evapotranspiration has been another critical factor subject to analysis in this study.Depending on the weather dataset, the evapotranspiration values in the upper Blue Nile Basin varied from 729 mm yr −1 in SWAT30 with the CFSR dataset up to 932 mm yr −1 in SWAT30 with the in-  tegrated dataset.SWAT models using the ground and integrated datasets and the Hargreaves equation showed acceptable discharge values and trends compared to those of measured discharge data (Figs.7 and 8).However, the models overestimated the evapotranspiration values compared to those provided by MOD16 (Fig. 11).Nevertheless, when using the Penman-Monteith method, the SWAT models using the ground and integrated datasets provided more similar evapotranspiration values, and better R 2 and NS values compared to the values given by the MOD16 evapotranspiration data (Fig. 12).The best match with the evapotranspiration values provided by MOD16 is obtained using the CFSR dataset; this model provided low evapotranspiration values (Fig. 12) and consequently overestimated the flow discharges (Figs. 7 and 8).For the second test done in the Ribb subcatchment, the evapotranspiration rates provided by the ground and CFSR datasets were much better, having relatively good statistical values compared to those obtained at large scale in the upper Blue Nile Basin (Figs. 13 and 14).

SEI evaluation
In the first case, SEI results for the El Diem station (Table 5) showed that the behavior and capability of SEI to quantify the level of error of a model through an evaluation of both flow discharge and evapotranspiration estimations is good.
For instance, values in Table 5 showed that the lower the value of the discharge data is, the more the value for evapotranspiration tends to increase.This is because the flow discharge data are being matched; however, the evapotranspiration increases and tends to overestimate those values provided by MOD16 ET.If MOD16 ET had a good representation of the evapotranspiration data of a watershed, then the rRMSE values for both discharge and evapotranspiration values should be closer to 0, which could provide better SEI val-  ues (in the second test done at Ribb subcatchment).However, SEI showed that the models using the integrated datasets are more reliable than the other two datasets, achieving SEI values of 0.29 and 0.27 for SWAT30 and SWAT87, respectively.It also demonstrated that the CFSR dataset is less accurate, with an SEI value of 0.4 for both SWAT30 and SWAT87.
In the second test done at the Ribb subcatchment, the calibration with flow discharge data provided good statistical results, where the CFSR dataset achieved R 2 and NS values of 0.81 and 0.75, respectively, and the ground dataset achieved R 2 and NS values of 0.85 and 0.83, respectively (Fig. 13 and Table 6).Unlike the SEI test performed for the entire upper Blue Nile Basin, the statistical results obtained from the comparison of the evapotranspiration data in the Ribb subcatchment are significantly better.The CFSR dataset achieved R 2 and NS values of 0.78 and 0.47, respectively, while the ground dataset achieved R 2 and NS values of 0.59 and 0.24, respectively (Fig. 14 and Table 6).SEI showed better values than those obtained from the first test done in the whole Blue Nile Basin.The CFSR dataset provided better R 2 and NS values than the ground dataset for the evapotranspiration analysis; however, the ground dataset performed better during the calibration with outflow data (Table 6).The SEI value for both datasets was 0.16, a much better value that those obtained in the first test (Table 5).
This second test provides a better understanding of how SEI works; it also proved how using reliable evapotranspiration data can improve the SEI values.

Conclusions
The CFSR dataset and a conventional observed ground dataset were analyzed in terms of statistical results, water balance, and precipitation distribution in the upper Blue Nile Basin.After detecting their limitations and disadvantages, an integration of both datasets was proposed with the purpose of overcoming their uncertainties and limitations.This data integration method was effectively used in the upper Blue Nile Basin to create a better SWAT model and can also be applied in other watersheds where observed data are limited and incomplete.However, data analyses and tests should always be performed before performing an integration for other watersheds.Despite its limitations, the CFSR dataset continues to be an important source that can be very useful in regions where conventional measured data are not available.A comparison of the three datasets under different discretization levels was also performed.This comparison was important for obtaining a better understanding of how crucial the sub-basin discretization process is during a SWAT model setup.The comparisons showed that the three input datasets, under models with a different number of sub-basins, yield the model using the CFSR dataset was not able to achieve good water balance results under similar parameterization.The quality of the CFSR rainfall data is not reliable for the upper Blue Nile Basin, although this case cannot be generalized for other watersheds in the world.However, this dataset needs to be equally verified in other watersheds before it is used.For the second case, the three datasets were analyzed in more detail using SWAT87, and although an exact number of the correct precipitation amounts in the upper Blue Nile Basin cannot be given, CFSR data showed an overestimation of the rainfall and also a wrong precipitation distribution compared to the other datasets.Additionally, the model under 87 sub-basins was the model that provided more details in terms of the number of HRUs and also achieved better statistical values.Therefore, this study proposes that 87 is a suitable number of sub-basins for the upper Blue Nile Basin.SWAT87 is more suitable to perform several types of hydrological analyses and propose watershed management practices in the Blue Nile Basin.
Furthermore, the SEI has proved to be a useful additional tool to express the level of error of SWAT models.This index used the weighted rRMSE of the discharge and evapotranspiration data.SEI was tested in two locations, being the second case done at the Ribb subcatchment more accurate.Nevertheless, further tests and improvements should be done to this index.SEI also showed that the integrated dataset successfully achieved better and more reliable results than the ground and CFSR datasets.The integrated dataset improved the results of the model, obtaining better R 2 , NS, and SEI values.
Although further improvements must be made to the methods proposed in this study, the integration of datasets, the sub-basin delineation, and the application of the SEI are important approaches that can be applied in other watersheds and can significantly help to develop better hydrological models.

Figure 1 .
Figure 1.Official sub-basin distribution of the upper Blue Nile Basin.

Figure 2 .
Figure 2. FAO/UNESCO soil map of the upper Blue Nile Basin.

Figure 3 .
Figure 3. Weather and hydrometric gauging stations in the upper Blue Nile Basin under two discretization levels of 30 and 87 sub-basins (SWAT30 and SWAT87).

Figure 4 .
Figure 4. Spatial annual rainfall variation in the upper Blue Nile Basin using two different data sources: the CFSR dataset (a) and the ground dataset (b).

Figure 14 .
Figure 14.Average monthly evapotranspiration in the Ribb subcatchment.Statistical results achieved R 2 and NS values of the CFSR dataset: 0.78, 0.47; and ground dataset: 0.59, 0.24, respectively, compared to the MOD16 data.

Table 1 .
Average annual water balance components in the upper Blue Nile Basin based on different literature.

Table 2 .
Parameterization of the SWAT models using the SUFI-2 algorithm for the period 1990-2004.BSN means applied to the entire basin.

Table 3 .
Water balance analysis in the upper Blue NileBasin (1990Basin ( -2004)).Water balance in the Blue Nile Basin (all values in mm yr −1 )