Evaluation of multiple climate data sources for managing environmental resources in East Africa

Managing environmental resources under conditions of climate change and extreme climate events remains among the most challenging research tasks in the field of sustainable development. A particular challenge in many regions such as East Africa is often the lack of sufficiently long-term and spatially representative observed climate data. To overcome this data challenge we used a combination of accessible data sources based on station data, earth observations by remote sensing, and regional climate models. The accuracy of the Africa Rainfall Climatology version 2.0 (ARC2), Climate Hazards Group InfraRed Precipitation (CHIRP), CHIRP with Station data (CHIRPS), Observational-Reanalysis Hybrid (ORH), and regional climate models (RCMs) are evaluated against station data obtained from the respective national weather services and international databases. We did so by performing a comparison in three ways: point to pixel, point to area grid cell average, and stations’ average to area grid cell average over 21 regions of East Africa: 17 in Ethiopia, 2 in Kenya, and 2 in Tanzania. We found that the latter method provides better correlation and significantly reduces biases and errors. The correlations were analysed at daily, dekadal (10 days), and monthly resolution for rainfall and maximum and minimum temperature (Tmax and Tmin) covering the period of 1983–2005. At a daily timescale, CHIRPS, followed by ARC2 and CHIRP, is the best performing rainfall product compared to ORH, individual RCMs (I-RCM), and RCMs’ mean (RCMs). CHIRPS captures the daily rainfall characteristics well, such as average daily rainfall, amount of wet periods, and total rainfall. Compared to CHIRPS, ARC2 showed higher underestimation of the total (−30 %) and daily (−14 %) rainfall. CHIRP, on the other hand, showed higher underestimation of the average daily rainfall (−53 %) and duration of dry periods (−29 %). Overall, the evaluation revealed that in terms of multiple statistical measures used on daily, dekadal, and monthly timescales, CHIRPS, CHIRP, and ARC2 are the best performing rainfall products, while ORH, I-RCM, and RCMs are the worst performing products. For Tmax and Tmin, ORH was identified as the most suitable product compared to I-RCM and RCMs. Our results indicate that CHIRPS (rainfall) and ORH (Tmax and Tmin), with higher spatial resolution, should be the preferential data sources to be used for climate change and hydrological studies in areas of East Africa where station data are not accessible.


Introduction
In Sub-Saharan Africa (SSA) about 80 % of people living in poverty will continue to depend on the agriculture sector as their major income sources under continuing global change (Dixon et al., 2001;IFPRI, 2009).Unlike in other regions of the world, agricultural activities in SSA are marked by low production, mainly due to poor natural resource management, rainfall amount and variability, economy, and technologies.According to IFPRI (2009), reducing poverty in SSA is becoming more challenging due to rapid population growth and associated decline in the quality and availability Published by Copernicus Publications on behalf of the European Geosciences Union.
of environmental resources (e.g.water and soil).Additionally, food security and livelihoods of people are threatened by the direct impacts of change in climate such as the increasing frequency of extreme events and weather variability impacts on the production and productivity of agricultural lands (Malo et al., 2012).In general, the impact of climate change in Africa ranges from social and economic to health, water, and food security, which is a threat to the lives of Africans (Urama and Ozor, 2010;Gan et al., 2016).
These challenges outlined hold in particular for the eastern parts of SSA, including Ethiopia, Kenya, and Tanzania.The population (> 80 %) mainly depends on agriculture for their livelihood in this region and agriculture-based income contributes 40 % to the country's gross domestic product (GDP) (FAO, 2014).Observed changes in extreme climate events such as recurring droughts and floods have a tremendous impact on the socio-economy of the region (Gebrechorkos et al., 2018).Devastating droughts in SSA linked to the high variability (seasonal and inter-annual) of rainfall (Sheffield et al., 2013) are projected to increase in frequency (IPCC, 2007(IPCC, , 2014;;Niang et al., 2014).In addition to the projected impact, the region is already facing significant food security issues and natural-resource-based clashes (UNEP, 2011;World Bank, 2012).
The impacts of future climate change in East Africa vary from region to region.In order to understand the impacts of future climate at the regional and local scale, ground station data with high spatial and temporal resolution are crucial.Regions with poor ground observations are highly vulnerable to climate threats (Wilby and Yu, 2013), which holds particularly for developing countries.In Africa, high-quality climate data from meteorological field stations are scarce (Dinku et al., 2013) and inconsistencies exist between other data products, largely due to a limited number of ground stations, merging and interpolation methods (Huffman et al., 2009;Nikulin et al., 2012;Sylla et al., 2013), limited time resolution, and limited documentation quality.In addition, climate data with high temporal and spatial resolution, even if collected by the national meteorological agencies, are often not available due to data sharing policies.With advancements of technologies and research activities, a number of climate data products from different sources (remote sensing, climate model, and reanalysis) have been produced over the last decades that can fill the data gap particularly for droughtprone regions (Gan et al., 2016) and can be used for hydrological and climate change studies.
Several satellite-based rainfall estimates have been developed over the last decades (Sapiano and Arkin, 2009;Zambrano-Bigiarini et al., 2017).In Africa, a list of rainfall and temperature products are available that can be used for climate change studies, such as the African Rainfall Climatology version 2.0 (ARC2) from the Climate Prediction Center (CPC) of the National Oceanic and Atmospheric Administration (NOAA) with a spatial resolution of 0.1 • (Novella et al., 2013) and Climate Hazards Group InfraRed Precipita-tion (CHIRP), and CHIRP with Station data (CHIRPS) from the Climate Hazard Group (CHG) with a spatial resolution of 0.05 • (Funk et al., 2015).In addition, the Multi-Source Weighted-Ensemble Precipitation (MSWEP) (Beck et al., 2017), Tropical Applications of Meteorology using Satellite and ground-based observations (TAMSAT) (Tarnavsky et al., 2014), TAMSAT African Rainfall Climatology And Time series (TARCAT) (Maidment et al., 2014), and data from the Enhancing National Climate Services (ENACTS) initiative (Dinku et al., 2014) are available at varying spatial and temporal resolutions and for longer periods.
As another source of climate information, climate modelderived data are suitable tools for assessing climate variability and change.Globally, reanalysis-based climate products, such as the Observational Reanalysis Hybrid (Sheffield et al., 2006), Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2) (Gelaro et al., 2017), and Climate Forecast System Reanalysis (CFSR) (Saha et al., 2010), are widely used for climate and hydrological studies.Moreover, dynamically downscaled data from global climate models (GCMs) are widely used in regional-and localscale climate studies.Regional climate models (RCMs) produced from dynamically downscaled GCMs provide spatial resolutions that suit end users (Sun et al., 2006).However, downscaling of climate information from GCMs to assess the impact of climate change on environmental resources at regional or smaller scale has only recently been performed, e.g. as dynamical downscaling within the CORDEX community (CORDEX-Africa; see e.g.Abiodun et al., 2016).In Africa (CORDEX-Africa domain) the spatial resolution of RCMs is available at about 0.44 • (∼ 50 km) and at varying temporal resolutions.In East Africa, a number of studies have been done with the applications of RCMs for climate studies (Anyah andSemazzi, 2006, 2007;Diro et al., 2011;Endris et al., 2013;Segele et al., 2009).
Before being used as input to different climate or hydrological models, climate data products need to be evaluated against field-based meteorological stations.For studying climate change and climate extremes, data with high accuracy and from long periods (> 30 years) are required.In addition, current hydrological (e.g.Soil and Water Assessment Tool, SWAT; Neitsch et al., 2002) and climate models (e.g.Statistical DownScaling Model, SDSM; Wilby and Dawson, 2004) require daily time series of rainfall and temperature covering long periods.Considering these requirements, concerning lengths of time series and temporal resolution on the one hand and the limited availability of station data on the other hand, it is not surprising that comprehensive evaluations of climate data products, particularly on a daily timescale, are not available for East Africa to the best of our knowledge.However, a few studies are available based on monthly gridded data (e.g.Cattani et al., 2016;Kimani et al., 2017), for limited time periods.Moreover, Kimani et al. (2017) only considered CHIRPS, while a more comprehensive evaluation and comparison of different data sources would be highly de-sirable.Based on the data requirements of impact models, the climate data products to be included in such an evaluation should be selected or excluded based on high spatial and temporal (i.e.daily) resolution, quality (missing values), and temporal coverage (length of time series), while also taking the results from previous studies (e.g.Cattani et al., 2016;Bayissa et al., 2017;Kimani et al., 2017) in to account.
Therefore, this study aims at comparing and evaluating the available climate data products for Ethiopia, Kenya, and Tanzania at the highest possible spatial and temporal (i.e.daily, for reasons of comparability extended also to dekadal and monthly) resolution against station data using the most widely applied and accepted statistical and graphical evaluation methods.Results of our study will help overcome the data scarcity in the study area, in terms of spatial coverage and temporal resolution gaps of daily, dekadal, and monthly climate data products that can be used for hydrological and climate change and impact studies at a watershed or regional scale.In addition, the data sets can be used for local and regional climate projections using climate models, such as the Statistical DownScaling Model (SDSM) (Wilby and Dawson, 2004).
2 Study area and data

Study region
The study focuses on the evaluation of daily, dekadal, and monthly climate data sources for regions of East Africa, particularly Ethiopia, Kenya, and Tanzania (Fig. 1).The region is divided by the Great Rift Valley and is topographically one of the most diverse and complex parts of Africa, characterized by multiple rainfall regimes.Generally, the rainfall cycle (climatological annual cycle) in East Africa is linked to the position changes of the intertropical convergence zone (ITCZ) (Endris et al., 2013).Variability in the rainfall patterns in this region is partly induced by local factors such as the heterogeneity of the land surface and complex topography and their interaction with global climate forcing systems.Countries of the region face similar weather and climate variabilities (spatial and temporal variabilities) and increasing temperature and decreasing precipitation trends (Pricope et al., 2013).In addition, all East African countries face similar issues, such as frequent droughts, floods, poverty, and a lack of clean and adequate water supply.The conditions could worsen in the near future due to climate change; therefore, sustainable adaptation and mitigation strategies are required, which rely on advanced climate and hydrological models and the respective data inputs.

Data sets
The reference data sets used for the evaluation of multiple data products in this study are based on daily rainfall, maximum temperature (T max ), and minimum temper-ature (T min ) derived from 332 rain gauges and synoptic stations.Station data for Ethiopia were provided by the National Meteorological Agency (NMA) of Ethiopia for the period 1954-2016.The daily data provided by NMA were carefully and extensively checked for their quality and some missing data were filled in from hard copies.For Kenya and Tanzania, the global summary of the day available at the National Climate Data Center (NCDC) (https: //www.ncdc.noaa.gov/,last access: 14 March 2017) is used.For evaluation, based on the criteria outlined above, we considered satellite-based rainfall estimates, Observational-Reanalysis Hybrid (ORH), and a historical period of RCMs driven by GCMs to be compared against field-based meteorological stations.The three satellite-based rainfall estimates fulfilling all criteria are the African Rainfall Climatology Version 2.0 (ARC2) (Novella et al., 2013), the Climate Hazards Group InfraRed Precipitation (CHIRP) and CHIRP with Station data version 2 (CHIRPS) (Funk et al., 2015).Not included was, for example, TAMSAT, which is available at higher spatial and temporal resolution and for a longer time period but contains considerable data gaps (Maidment et al., 2017, https://icdc.cen.uni-hamburg.de/1/daten/atmosphere/tamsat-rainfall-africa/, last access: 7 January 2017) during the evaluation period.ENACTS and TAR-CAT are available only on dekadal (10 days) timescales.In addition, MERRA-2 and CFS-R are not included in this study due to their coarse spatial resolution compared to the other reanalysis products (i.e.ORH).
ARC2 is the second version of the ARC and is compatible with the algorithm of the Rainfall Estimation Version 2 (RFE 2.0) (Novella et al., 2013).The product is a composite of 3hourly geostationary infrared data, which makes it different from RFE, centred over Africa provided by the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT) and quality-controlled daily rainfall records acquired from the Global Telecommunication System (GTS) gauges.ARC2 is consistent with the historical data sets of the Climate Prediction Center Merged Analysis of Precipitation (CMAP) (Xie and Arkin, 1997) and Global Precipitation Climatology Project (GPCP) (Novella et al., 2013).The data set is updated regularly and it is available at a spatial resolution of 0.1 • covering the period from 1983 to present.ARC2 is available at the International Research Institute climate data library (IRI/LDEO, 2016).
CHIRPS is a semi-global rainfall product designed for drought monitoring and global environmental changes (Funk et al., 2015).The product provides daily, pentadal, dekadal, and monthly data at a 0.05 • spatial resolution available at the Climate Hazards Group (CHG; ftp://ftp.chg.ucsb.edu/pub/org/chg/products) and the International Research Institute climate data library (IRI/LDE, 2016).CHIRPS combines a 0.05 • resolution of satellite images and data from ground stations to form a gridded rainfall time series.Station data (see also below) are used to produce a preliminary 2-day rainfall product by blending data from sparsely located GTS gauges with rainfall estimates retrieved from the cold cloud duration (CCD) at every pentad.In addition, the final product is developed by blending the best available monthly and pentadal station data with the monthly and pentadal rainfall estimates retrieved from the CCD, respectively, to produce a gridded rainfall product (Funk et al., 2015).The development process of CHIRPS includes the 0.05 • monthly precipitation climatology (CHPclim), satellite-only Climate Hazards Group InfraRed Precipitation (CHIRP), and station blending techniques.CHIRP is available at the Climate Hazards Group (CHG, 2017).The second version of CHIRPS, which is updated regularly, provides an improved daily rainfall time series (1981-present) with a spatial resolution of 0.05 • ranging from 50 • S to 50 • N (and all longitudes) (Funk et al., 2015).The development process of CHIRPS and its application in drought monitoring in Africa (e.g.Ethiopia) is explained in detail by Funk et al. (2015).CHIRPS is not only used for drought monitoring, but also for other global environmental applications (Zambrano-Bigiarini et al., 2017), water resource management, and climate dynamics (Ceccherini et al., 2015;Deblauwe et al., 2016).ORH is a global (Sheffield et al., 2006) and regional (Northern/West/East Africa) (Chaney et al., 2014) 3-hourly, daily, and monthly meteorological data set covering the period between 1901 and 2012.ORH is developed by a spatial downscaling of the NCEP-NCAR reanalysis (Kalnay et al., 1996) up to a spatial resolution of 0.1 • using a bilinear interpolation.ORH merges multiple data products such as the NASA Langley Surface Radiation Budget (SRB), the monthly temperature data from the University of East Anglia Climate Research Unit (CRU), Tropical Rain-fall Measuring Mission Multi-satellite Precipitation Analysis (TMPA) (Huffman et al., 2007), and other observationalbased rainfall products (Chaney et al., 2014).The spatial downscaling of ORH is done with the inclusion of changes in elevation and it is evaluated against ground stations (global summary of the day) available at the US National Climatic Data Center (NCDC).ORH is corrected for temporal inhomogeneity and biases, and random errors are omitted through assimilation with quality-controlled and gap-filled ground station data available at NCDC (https://www.ncdc.noaa.gov/,last access: 28 March 2017) as a global summary of the day (Chaney et al., 2014).These data are freely available from the Terrestrial Hydrology Research Group, Princeton University (http://hydrology.princeton.edu,last access: 12 May 2016; Terrestrial Hydrology Research Group Princeton University, 2016).Even though ORH is not updated regularly, it has been widely used in climate and hydrological studies (e.g.Troy et al., 2011;Wang et al., 2011;Demaria et al., 2012;Sheffield et al., 2014) Compared to the other rainfall products, monthly ground station data from Ethiopia, Kenya, and Tanzania are included in CHIRPS.Evaluating CHIRPS based on ground station data might thus raise concerns about the independence of data.However, not all stations used in this study are included in CHIRPS and in general the stations are not consistently used in the development process of CHIRPS.In addition, the station data used in CHIRPS are mainly a monthly total from a limited number of stations.For example, in Ethiopia, where all station data originate from NMA, during January 1983-February 1983and August 2005-December 2005, the monthly stations used in CHIRPS declined from 139 to 132 and from 175 to 169, respectively.In 2015, the number of stations included in CHIRPS even declined to below 10.Moreover, in Kenya and Tanzania, during the period of January 1983-December 2005 the number of stations used in CHIRPS declined from 142 to 62 and from 171 to 55, respectively (ftp://chg-ftpout.geog.ucsb.edu/pub/org/chg/products/CHIRPS-2.0/diagnostics/).Besides the difference in temporal resolution (monthly vs. daily) and the number of stations between station data included in CHIRPS and the validation data set, the latter deviated from the former since we used original data provided by NMA (Ethiopia) which were quality-controlled and extended by adding data from hard copies.Overall, while not fully independent, the relation between CHIRPS and the validation data set should be weak, besides the fact that there is no other (fully independent) validation data set available.
Historical data (control model runs) of the CORDEX RCMs are also used as a potential source for rainfall, T max , and T min data.RCMs are climate models with a higher spatial resolution compared to GCMs.The driving data of RCMs are derived from GCMs or reanalysis data and can include greenhouse gases (GHGs) and aerosol forcing.Compared to GCMs, RCMs consider local factors such as complex topography and land cover inhomogeneity in a physically based manner (IPCC, 2007).In Africa, dynamical downscaling was performed in a large effort within the CORDEX community (CORDEX-Africa).Within CORDEX-Africa the continent's climate was dynamically modelled by an international consortium, providing a spatial resolution of about 50 km.According to the IPCC report ( 2007), RCMs can be used for a wide range of applications such as climate change studies.Following the recommendation of Endris et al. (2015), the historical data derived from two CORDEX RCMs, RCA (Samuelsson et al., 2011), and COSMO-CLM or CCLM (Baldauf et al., 2011), driven by HadGEM2-ES (MOHC, United Kingdom), MPI-ESM-LR (MPI, Germany), and GFDL-ESM2M (NOAA/GFDL, United States) are used.Rainfall, T max , and T min products of both RCMs are retrieved from the Earth System Grid Federation (ESGF, 2016) data portal.

Selection of validation areas and ground stations
The evaluation of multiple daily, dekadal (10 days), and monthly rainfall, T max , and T min products were conducted for selected basins of Ethiopia (EthioShed1-EthioShed17), Kenya (KenShed1 and KenShed2), and Tanzania (TanzShed1 and TanzShed2) (Fig. 1).The polygons in Fig. 1 are river basins retrieved from the global river basins available in the WaterBase data portal hosted by the United Nations University (UNU-INWEH: http://www.waterbase.org/,last ac-cess: 18 January 2017).In most regions of Africa not only are the density and availability of field-based meteorological stations limited, but their accessibility is very restricted for many reasons.For this study, it was only possible to get daily station data from the National Meteorological Agency (NMA) of Ethiopia with a reasonable spatial and temporal coverage.Therefore, the selection of validation areas is based on the availability, quality, and density of field-based meteorological stations during the period of 1983-2005.It was almost impossible to find multiple stations in one satellite grid cell.For Kenya and Tanzania, therefore, stations with more than 10 years (> 50 % of the study period) were included for evaluation (Table 1).
The quality of selected stations was checked and extremely high rainfall records during dry seasons, such as daily rainfall of > 480 mm preceding and following dry days, were excluded.Finally, a total of 132 stations were found suitable for comparison, 2 to 12 stations located in the validation areas.In addition to these stations in the validation areas, 78 stations, randomly distributed over the region, are used to compare on an individual basis with the rainfall and temperature products.Compared to Kenya and Tanzania, the quality, continuity, and spatial and temporal coverage of stations were better in Ethiopia and only stations with missing values of less than 20 % were considered.The availability of multiple stations in a validation area helps to check the quality of individual stations by using methods such as the double mass curve (Vernimmen et al., 2012) and allows for replacement of missing values of one station from a nearby station.

Comparing ground data with satellite, observational reanalysis, and climate model-based data
The most commonly used method to compare ground observations with other data products such as satellite-based rainfall estimates and climate model outputs is point (station) to pixel comparison.When comparing daily rainfall, particularly in very complex topography, on a point to pixel basis it can be challenging to acquire reasonable agreement.Therefore, in this study we used point to pixel, point to area grid cell average, and stations' average to area grid cell average comparison to evaluate the accuracy of each product.The area grid cell average is the average number of pixels covering the basin or the validation area.Similarly, the station average is used to indicate the average value of the stations inside the validation area.Therefore, during the comparison process, the individual or the stations' average is compared to the area grid cell average of the product.The most commonly used statistical methods such as the Pearson correlation coefficient (CC), bias, relative bias (R bias ), mean absolute error (MAE), root mean square error (RMSE), and index of agreement (IA) (Cohen Liechti et al., 2012;Daren Harmel and Smith, 2007;Moazami et al., 2013) (1) The average differences and systematic bias of each product are given as bias (Eq.2) and R bias (Eq.3).Bias can be positive (overestimation) or negative (underestimation) according to the accuracy of each product.
Bias = The MAE and RMSE (Eqs. 4 and 5) are well known and accepted indicators of goodness of fit, which shows the differences between ground observations and model or other product outputs (Legates and McCabe, 1999).
The IA (Willmott, 1981) is another widely used indicator of goodness of fit between observed and model output.IA (Eq. 6) describes how much of the model or product output (rainfall, T max , and T min products) are error-free compared to the ground observations.IA = In addition to the above statistical methods, the Taylor diagram (Taylor, 2001) is used to summarize the statistical relationship between ground station data and the products for rainfall, T max , and T min .In this diagram, the relationships between the two fields are explained by the correlation coefficient (R), centred mean square (rms) difference (E ), and standard deviation (σ ).The diagram is useful for evaluating the accuracy of multiple data sources or model output against a reference or observational data (IPCC, 2001).A single point on the diagram displays three statistical values (R, E , and σ ) and their relationship is given by Eq. ( 7).
where σ 2 f and σ 2 r are the variance of the model and observation fields and R is the correlation coefficient between the two fields (Eq.8).
In the diagram, the distance from the reference point (observed data) is given as the centred rms difference of the two fields (Eq.9).A model with no error would show a perfect correlation to the observations.
where f is the test (e.g.model or satellite) field and r is the reference (observed) field, whereas σ f and σ r are the standard deviations of the model and reference fields (Eqs.10a and b).
Additionally, rainfall characteristics such as the number of wet days, duration and amount of wet periods, duration of dry periods, and daily and total rainfall are used to evaluate the accuracy of individual rainfall products by comparing to the observed data.Rainfall characteristics are widely used indicators in rainfall modelling (Wilby and Dawson, 2007;Jebari et al., 2012) and include the number of wet days (days yr −1 ), which is the count of days with rainfall per year; duration (days) of wet and dry periods, indicating the average number of consecutive wet and dry days during the study period; and the amount of wet periods (mm), indicating the amount of rainfall observed during the identified wet period.

Validation of satellite, observational reanalysis, and climate model-based products
The correlation of each rainfall product with station data is summarized in Fig. 2 for the 21 validation areas, and details (scatter plots) are provided in the Supplement (Fig. S1).The results show that in most of the validation areas, CHIRPS, followed by CHIRP and ARC2, is more strongly correlated with station data compared to ORH and the individual RCMs (I-RCM).In addition to the lower correlation, ORH and I-RCM showed large biases in all the validation areas (Fig. S1).
Based on the results in Fig. 2 and the scatter plots provided in the Supplement (Fig. S1), CHIRPS and CHIRP are the most accurate rainfall products, with higher correlation and lower biases, and ARC2 and ORH are the second best products.I-RCM and RCMs' mean (RCMs) correlate weakly in most of the validation areas.In addition, I-RCM (not shown in Fig. 2 but provided in the Supplement Table S1) and RCMs show a strong over-and underestimation of monthly rainfall compared to the other products.In EthioShed1, for example, CHIRPS and CHIRP are shown to be the most accurate products, while ARC2 and ORH showed higher dispersion above and below the regression line (see Fig. S1).Similarly, in EthioShed4 both CHIRP and CHIRPS have an equal R 2 , but in terms of biases (points below and above the regression line) CHIRPS performed better.The observed biases in CHIRP and higher correlations in CHIRPS in all the validation areas highlight the role of the station-satellite data blending techniques.Compared to other validation areas, the agreement of products in EthioShed16 is comparably weak, and CHIRPS and CHIRP showed the higher correlation (0.7) compared to ARC2, ORH, and RCMs.
As for the daily, dekadal, and monthly resolution, the comparison is performed in three ways: point to pixel, point to area grid cell average, and stations' average to area grid cell average using the methods described in Sect.3.2.An explanatory example is given in Table 2, using stations of EthioShde1, displaying the difference in comparing products through point (station) to pixel, point to area grid cell average, and stations' average to area grid cell average.The agreement of each product with station data on a daily timescale and in the point to pixel comparison is weak, with significantly higher biases and errors.For rainfall, in general, the latter method, stations' average to area grid cell average, provides better correlation, higher index of agreement, and lower biases and errors.Compared to the point to pixel method, the stations' average to area grid cell average improves the correlations of ARC2, CHIRP, and CHIRPS by 81.3 %, 65.7 %, and 8 %, respectively.In addition to the correlation, the method reduces the RMSE by more than 66 %.Compared to ARC2 and CHIRP (Table 2), CHIRPS gives a significantly higher correlation and IA and lower biases and RMSE.During area averaging, extremely high rainfall events obtained for a location from the various data products are levelled off by averaging, and this makes the product more representative of the area.In most of the rainfall products, there are occasionally higher daily rainfall values recorded and the averaging removes these extremes, which are much higher than the observed station data in the area.Compared to the point to pixel method, the second method, the point to area grid cell average, provides a reasonable correlation.
For T max and T min , only ORH, I-RCM, and RCMs are compared with station data.For 21 validation areas, ORH proved to be the most accurate product for both T max (Fig. 4) and T min (Fig. 5).In comparison to I-RCM and RCMs, ORH showed a significantly higher correlation and lower biases and errors in most of the validation areas.In 7 of the 21 validation areas, RCMs showed a higher correlation in T max than ORH and I-RCM.However, for T min , ORH in 20 of the 21 validation areas showed a higher correlation.In general, I-RCM and RCMs showed higher RMSE and biases in most of the validation areas compared to ORH.Next to ORH and compared to I-RCM, RCMs appeared to be the best data source, particularly for T max .RCMs showed a relatively higher correlation and lower biases and errors compared to I-RCM in most of the validation areas.
The agreement of each product increases with decreasing temporal resolution, from daily to dekadal and monthly resolutions.Including the historical data of each individual RCM, all RCMs, and ORH, the overall comparison using some of the statistical methods is summarized in Figs. 3, 4, and 5   1).
for rainfall, T max , and T min , respectively.The evaluation of each rainfall product (ARC2, CHIRP, CHIRPS, ORH, and RCMs) showed a different degree of agreement with station data (Fig. 3).The performance of the individual RCMs (I-RCM) for all the validation areas is provided in the Supplement (Table S1).At a daily timescale, CHIRPS, followed by ARC2 and CHIRP, proved to be the most accurate rainfall product compared to ORH, I-RCM, and RCMs in all the validation areas.In general, out of the 21 validation areas, CHIRPS, ARC2, and CHIRP showed a higher correlation in 17, 3, and 1 validation areas, respectively.In addition to the higher correlation, CHIRPS, CHIRP, and ARC showed lower RMSE than ORH, I-RCM, and RCMs.Similarly, CHIRPS and CHIRP showed lower biases than observed in ARC2, ORH, I-RCM, and RCMs in most of the validation areas.
(−15.1 % deviation) and average daily rainfall (−23.8 % deviation) compared to CHIRPS and ORH.CHIRP, on the other hand, showed higher overestimation of the number of wet days and duration and amount of wet periods (> 59.7 % deviation) and underestimation of the duration of dry periods and average daily rainfall (−62 % deviation) compared to the other products.Moreover, RCMs, next to CHIRP, showed higher overestimation of the number of wet days and duration and amount of wet periods (> 44.9 % deviation) and total rainfall amount (11.4 % deviation) and underestimation of the average duration of dry periods and daily rainfall by about 41 %.In general, the observed rainfall characteristics are captured well by CHIRPS, with a percentage difference from the observations of −0.17 % to −17.6 % for the number of wet days and duration of dry periods, respectively, compared to CHIRP, ARC2, ORH, I-RCM, and RCMs (Table 3).
The average daily rainfall of the study region retrieved from CHIRPS is displayed in Fig. 6 for the study period 1983-2005.Using CHIRPS as reference rainfall data, the absolute difference is presented in Fig. 6 for ARC2, CHIRP, ORH, and individual RCMs (HadGEM2, GFDL, and MPI).The average daily rainfall from the individual RCMs (I-RCM) shows large discrepancies from CHIRPS compared to CHIRP, ARC2 and ORH.The comparison at a daily timescale, particularly of rainfall, is challenging and more emphasis is given to this evaluation compared to dekadal and monthly resolutions.RCMs (RCA4 and CCLM) driven by HadGEM2-ES (HadGEM2), MPI-ESM-LR (MPI), and GFDL-ESM2M (GFDL) are used in this study.For RCMs driven by all GCMs, the average is used.The daily rainfall, T max , and T min maps of GFDL display the result of a single RCM (RCA4) driven by GFDL-ESM2M for the period Table 3. Summary of daily rainfall characteristics retrieved from multiple rainfall products and averaged over the validation areas of Ethiopia, Kenya, and Tanzania.Values in brackets give the deviation from the observed value (%).The value which comes closest to the observed value is highlighted in bold.

Rainfall characteristics
Obs. of 1983-2005.Higher and lower average daily rainfall values are displayed by GFDL and ORH, respectively (Fig. 6).However, all the products showed a similar tendency in capturing the daily rainfall distribution: higher in the western and lower in the eastern part of the region.In addition, the average daily T max and T min (Fig. 7) of the region show relatively higher disagreement between ORH and I-RCM, which is given as an absolute difference from ORH.Even though I-RCM shows high deviation from ORH, it showed higher agreement in Ethiopia, Kenya, and Tanzania for T max and T min .

Validation of satellite, observational reanalysis, and climate model-based products at dekadal and monthly resolutions
To understand the role of higher spatial resolution in improving the agreement with station data, a similar statistical evaluation was performed using the coarse resolution of CHIRPS (0.25 • ).Compared to the coarse resolution of CHIRPS, the daily improved version (0.05 • ) used in this study showed an increased correlation of up to 3.2 % in all the validation areas.In line with the daily evaluation, the comparison was extended to dekadal and monthly resolutions for rainfall, T max , and T min using the same statistical methods.For this analysis the observed daily ground observations and data from ARC2, CHIRP, CHIRPS, ORH, I-RCM, and RCMs were aggregated to dekadal and monthly resolutions.With decreasing temporal resolution (daily to monthly), the agreement of each product showed a marked improvement in all the validation areas.In addition to the increase in correlation, biases (bias and R bias ) and errors (MAE and RMSE) in rainfall are decreased at dekadal and monthly resolutions.At dekadal and monthly resolution, the agreement of all rainfall products with station data increased compared to daily resolutions and the results for eight validation areas of Ethiopia, Kenya, and Tanzania are given in Fig. 8.The same plots -with similar results -for another 13 areas are provided in the Supplement (Fig. S2).Similar to the daily evaluation, CHIRPS appeared to be the most accurate rainfall product both at dekadal and monthly resolutions in most of the validation areas compared to the other products.In addition to the higher correlation of CHIRPS with station data at monthly and dekadal timescales, the centred mean square (rms) difference and the standard deviation are close to the observations in most of the validation areas.Following CHIRPS, CHIRP appeared to be the second best data source for dekadal and monthly rainfall and in three validation areas (EthioShed3, 15, and 16) showed a slightly higher correlation than CHIRPS.In two validation areas (KenShed1 and 2), ARC2 showed a slightly higher correlation than CHIRP and CHIRPS.However, in KenShed2, ARC2 showed a higher deviation from the observed value compared to CHIRP and CHIRPS.CHIRPS has, for example, almost identical standard deviation to the station data in all the validation ar- eas except in areas with a lower number of ground stations (EthioShed12-15 and TanzShed1).Overall, CHIRPS, CHIRP, and ARC2 were found to be the best performing rainfall products, while ORH, I-RCM, and RCMs are the worst performing products.
Moreover, for T max and T min , the correlation of ORH, I-RCM, and RCMs increased from daily to dekadal and monthly resolutions.The agreement of each product with station data, for eight validation areas of Ethiopia, Kenya, and Tanzania, is given in Figs. 9 and 10 for T max and T min , respectively.The same plots -with similar results -for another 13 areas are provided in the Supplement (Figs.S3 and S4 for T max and T min , respectively).Compared to I-RCM and RCMs, the correlation between ORH and station data is higher in most of the validation areas.In addition, ORH showed lower centred mean square (rms) difference and biases (bias and R bias ).In addition, compared to the I-RCM and RCMs, the standard deviation of ORH is close to the respective observations in most of the validation areas.Compared to I-RCM, the standard deviation and centred mean square (rms) difference of RCMs are lower in most of the validation areas.

Discussion
Detection of rainfall characteristics by satellite observations or climate model simulations' output (GCMs and RCMs) is very challenging as compared to temperature.This is especially evident in East Africa, where the topography is complex and characterized by multiple rainfall regimes.In particular, it is difficult to estimate rainfall with satellite imageries in the mountainous region of East Africa (Cattani et al., 2016) because these products inevitably do not represent the regional rainfall patterns and complexity of the re-gion's topography (Romilly and Gebremichael, 2011).Here, for an improved understanding of the climatic conditions of this complex region and their impact on environmental resources, daily rainfall, T max , and T min products from highresolution satellite imageries, observational reanalysis, and climate models outputs are compared against ground observations.Such an evaluation was not available until now for the considered region.Therefore, an in-depth evaluation was performed, more specifically, on a daily timescale, of the satellite-based rainfall products (ARC2, CHIRPS, and CHIRP), ORH, and RCMs (CCLM and RCA) driven by three GCMs.ARC2, CHIRP, and CHIRPS are rainfall products, whereas ORH and RCMs provide rainfall, T max , and T min .
From the comparison (using point to pixel, point to area grid cell average, and stations' average to area grid cell average methods), the stations' average to area grid cell average showed the best correlation and least biases and errors in all the validation areas.A study by Duan et al. (2016) in the Adige basin (Italy) found that comparing rainfall products such as CHIRPS on a watershed scale showed a marked improvement in overall agreement compared to the point to pixel method on daily and monthly timescales.Comparing the coarse resolution of satellite products and of RCMs using the point to pixel method cannot be expected to result in a high agreement with station data.Ground stations provide point data measured over continuous time periods, whereas satellite products provide area averages based on discontinuous (rain) estimates.Field-based stations (as point measurements) cannot be considered as reference data for the evaluation of area-based rainfall estimates (Cohen Liechti et al., 2012;Wang and Wolff, 2010), if not compared at a monthly or annual timescale.This is similar to our finding that the point to pixel comparison for all products inside and outside the validation areas show weak statistical relations with ground stations (see e.g.Table 2).The correspondence of all products at a daily timescale and in all the validation areas was found to be comparably weak, and the findings are in agreement with earlier studies (Cohen Liechti et al., 2012;Dembélé and Zwart, 2016).
At a daily timescale, CHIRPS, followed by ARC2 and CHIRP, showed higher correlation and lower errors and biases in all the validation areas compared to ORH, I-RCM, and RCMs.In addition, CHIRPS captured the daily rainfall characteristics well, while ARC2 showed higher underestimation of the average daily and total rainfall.The agreement of all the rainfall products increases from daily to dekadal and monthly timescales (Fig. 8), and this is consistent with other studies (Cohen Liechti et al., 2012;Dembélé and Zwart, 2016;Kimani et al., 2017).Generally, CHIRPS, with a high spatial resolution, followed by CHIRP and ARC2, was the best performing rainfall product in terms of correlation, biases, and errors and in characterizing regional rainfall characteristics.By contrast, ORH, I-RCM, and RCMs appeared to be less precise rainfall products at all timescales and in all validation areas.When looking at the performance of different data products in the selected validation areas (Fig. S1), dispersion is comparably higher in areas with a lower number of ground stations.An additional confounding factor could be the very complex topography of the region.This might explain why products with coarser spatial resolution (ORH, I-RCM, and RCMs) showed higher dispersion compared to products with higher spatial resolution (CHIRPS, CHIRP, and ARC2).
The daily rainfall data (global summary of the day) available at the National Climate Data Center (NCDC) needed to be controlled for quality before application.In East Africa, particularly Ethiopia, the data available at NCDC are very poor and only few stations are available.Therefore, products developed based on the global summary of the day such as ORH cannot be expected to provide as accurate results, particularly for the most complex climate variable, rainfall, as CHIRPS and ARC2.CHIRPS incorporates monthly station data obtained from different regional meteorological organizations, e.g. from Ethiopia, Kenya, and Tanzania.In all the validation areas one to seven stations were included in the development of CHIRPS in different months during 1981-2005.In EthioShed1 (Table 2), for example, six of the nine stations we considered in this study are included in CHIRPS.The inclusion of monthly station data can be assumed to improve CHIRPS' performance compared to other rainfall products.This particular feature of CHIRPS (compared to CHIRP and other data products) is somewhat problematic for our analysis, since the correlated data are not fully independent.However, since only monthly data from a limited number of stations were included in CHIRPS, the dependency is rather weak and indirect.In fact, the improved performance of CHIRPS was even shown in areas where station data are not included (e.g.Arijo, Bedele, and Hurma stations in EthioShed1) and on a daily timescale.
Even though ORH was one of the worst performing rainfall products, it appeared to be the most accurate data source for T max and T min at daily, dekadal, and monthly resolutions compared to I-RCM and RCMs.Nikulin et al. ( 2012) presented a detailed comparison of daily gridded observations with multiple RCMs, including RCA and CCLM, and they found large discrepancies over the whole region of Africa.However, in this region, RCMs appeared to be the second best data source for both T max and T min , and I-RCM is less precise, with slightly higher biases and errors.In this region, other studies (Endris et al., 2013;Kim et al., 2014)   that the multi-model or ensemble mean of CORDEX RCMs provides reasonable results compared to individual RCMs (I-RCM).The systematic bias of I-RCM and RCMs is higher in most of the validation areas compared to the other products, particularly for rainfall, that can be improved by applying different bias correction techniques such as empirical quantile mapping (Lafon et al., 2013;Maraun, 2013;Teng et al., 2015) before application to different hydrological and climate models (e.g.SDSM).In general, in topographically complex regions such as East Africa, RCMs require further improvements in terms of spatial resolution and accuracy by adding more local information to the modelling process, particularly for precipitation.

Summary and conclusion
The evaluation of rainfall, T max , and T min from different sources against station data was performed for large parts of East Africa (Ethiopia, Kenya, and Tanzania) using three methods: point to pixel, point to area grid cell average, and stations' average to area grid cell averages.Compared to the other two methods, the last-mentioned method (stations' average to area grid cell average) provides a better correlation and index of agreement (IA) and lower errors (MAE and RMSE) and biases (bias and R bias ).Using this method, individual rainfall, T max , and T min products were compared at daily, dekadal (10 days), and monthly resolutions.At a daily timescale, CHIRPS, ARC2, and CHIRP provide a better agreement with station data compared to ORH, I-RCM, and RCMs.Compared to CHIRPS and CHIRP, ARC2, ORH, I-RCM, and RCMs showed higher biases and errors in most of the validation areas.Overall, the performance of CHIRPS is higher than the other rainfall products in capturing the daily rainfall characteristics, such as the number of wet days, duration of wet and dry periods, total and daily rainfall, and amount of wet periods.ARC2 better captures duration of wet and dry periods, but showed higher underestimation of the total and daily rainfall and number of wet days compared to CHIRPS and CHIRP.I-RCM and RCMs, on the other hand, showed higher overestimation of the number of wet days, duration and amount of wet periods, and total rainfall and underestimation of the average duration of dry periods and daily rainfall.ORH, conversely, appeared to be one of the worst performing rainfall products for the study region, but the most accurate product, compared to I-RCM and RCMs, for T max and T min at a daily timescale in most of the validation areas.The evaluation of the above products at dekadal and monthly timescales showed that CHIRPS with high spatial resolution (0.05 • ) has a higher correlation and lower errors and biases than the other rainfall products.As the temporal resolution gets coarser (e.g.monthly), the correlation between ground observations and the above products significantly increases.In addition, biases (bias and R bias ) and errors (MAE and Hydrol. Earth Syst. Sci., 22, 4547-4564, 2018 www.hydrol-earth-syst-sci.net/22/4547/2018/ RMSE) significantly decreased.Similar to that of rainfall, the comparison at dekadal and monthly resolution showed an improved correlation and lower errors and biases for both T max and T min .Compared to I-RCM and RCMs, ORH with higher spatial resolution was found to be more accurate at dekadal and monthly resolutions.Next to ORH, RCMs showed a better performance than I-RCM, with lower biases and errors.
In general, CHIRPS for rainfall and ORH for T max and T min performed best in the considered regions of Ethiopia, Kenya, and Tanzania.Further studies need to confirm whether this finding holds for other regions as well, and our approach may represent a blueprint of how to address this question.Since CHIRPS and ORH are available with higher spatial and temporal resolution and for longer periods, these data sources can be used for long-term climate studies (trend, variability, and extreme indices) and input for climate or hydrological models.Considering the typical need for daily data for model input, it remains to be investigated whether poor daily data with a limited bias and similar variance are an acceptable replacement of missing station data when used for impact model studies.In addition, the products can be used to check the plausibility of available ground stations or substitute ground observations in regions of Ethiopia, Kenya, and Tanzania where ground station data are not available or accessible.

Figure 3 .
Figure 3. Statistical evaluation of daily rainfall retrieved from ARC2, CHIRP, CHIRPS, ORH, and RCMs against ground observations: CC (a), RMSE (b), and R bias (c) during the period of 1983-2005 for 21 validation areas of East Africa (see Table1).

Figure 4 .
Figure 4. Statistical evaluation of daily T max retrieved from GFDL, HadGEM2, MPI, ORH, and RCMs against ground observations: CC (a), RMSE (b), and R bias (c) over the period of 1983-2005 for 21 validation areas of East Africa.

Figure 5 .
Figure 5. Statistical evaluation of daily T min retrieved from GFDL, HadGEM2, MPI, ORH, and RCMs against ground observations: CC (a), RMSE (b), and R bias (c) over the period of 1983-2005 for 21 validation areas of East Africa (see Table1).

Figure 6 .
Figure 6.Average daily rainfall (mm day −1 ) map of East Africa retrieved from CHIRPS (a) for the study period 1983-2005.For rainfall data from ARC2, CHIRP, ORH, and RCMs (b), the absolute difference (mm day −1 ) from CHIRPS is displayed.All the maps are given in a 0.05 • spatial resolution.

Figure 7 .
Figure 7. Map of average daily T max and T min ( • C) for East Africa generated from ORH (a) for the study period 1983-2005.For temperature data from HadGEM2, GFDL, and MPI (b), the absolute difference ( • C) from ORH is displayed.All the maps are given in a 0.1 • spatial resolution.

Figure 8 .
Figure 8.Taylor diagram displaying the agreement between ground observations and synthesized dekadal and monthly rainfall over eight validation areas of Ethiopia, Kenya, and Tanzania covering the period of 1983-2005.

Figure 9 .
Figure 9.Taylor diagram displaying the agreement between ground observations and synthesized dekadal and monthly T max over eight validation areas of Ethiopia, Kenya, and Tanzania covering the period of 1983-2005.

Table 1 .
General characteristics of selected validation areas and meteorological stations covering the time period1983-2005.
are used.CC (Eq. 1) is www.hydrol-earth-syst-sci.net/22/4547/2018/ Hydrol.Earth Syst.Sci., 22, 4547-4564, 2018 applied to evaluate the agreement of individual products (P ) to station data (O).A value of CC close to 1 shows a perfect positive fit between the products and station data.CC = N i=1

Table 2 .
An example of the statistics used to compare ground rainfall data with satellite products (e.g.ARC2, CHIRP, and CHIRPS) in EthioShed1