Comparison of rainfall estimations by TRMM 3 B 42 , MPEG and CFSR with ground-observed data for the Lake Tana basin in Ethiopia

Planning for drought relief and floods in developing countries is greatly hampered by the lack of a sufficiently dense network of weather stations measuring precipitation. In this paper, we test the utility of three satellite products to augment the ground-based precipitation measurement to provide improved spatial estimates of rainfall. The three products are the Tropical Rainfall Measuring Mission (TRMM) product (3B42), Multi-Sensor Precipitation Estimate–Geostationary (MPEG) and the Climate Forecast System Reanalysis (CFSR). The accuracy of the three products is tested in the Lake Tana basin in Ethiopia, where 38 weather stations were available in 2010 with a full record of daily precipitation amounts. Daily gridded satellite-based rainfall estimates were compared to (1) pointobserved ground rainfall and (2) areal rainfall in the major river sub-basins of Lake Tana. The result shows that the MPEG and CFSR satellites provided the most accurate rainfall estimates. On average, for 38 stations, 78 and 86 % of the observed rainfall variation is explained by MPEG and CFSR data, respectively, while TRMM explained only 17 % of the variation. Similarly, the areal comparison indicated a better performance for both MPEG and CFSR data in capturing the pattern and amount of rainfall. MPEG and CFSR also have a lower root mean square error (RMSE) compared to the TRMM 3B42 satellite rainfall. The bias indicated that TRMM 3B42 was, on average, unbiased, whereas MPEG consistently underestimated the observed rainfall. CFSR often produced large overestimates.


Introduction
Precipitation is a major component of the water cycle, and is responsible for depositing approximately 505 000 km 3 (or on average 990 mm) of the fresh water on the planet (Ramakrishna and Nasreen, 2013).It is one of the major water balance components of the global water budget.Although the spatial and temporal variability of precipitation is important, unless large numbers of rain gauge stations are available, capturing variability is difficult (Chaubey et al., 1999;Pardo-Igúzquiza, 1998).However, ground-based rainfall observation station networks are often unevenly and sparsely distributed in developing countries (Kaba et al., 2014).For example, the Rahad, Dindir and Welaka sub-basins in the Blue Nile basins, Ethiopia, each had only one rainfall station, despite a catchment area greater than 5000 km 2 .This situation is not likely to improve in the near future.This is far below the World Meteorological Organization (WMO) standard of one station for 100 to 250 km 2 in area for mountainous regions (WMO, 1994).The poor coverage introduces Published by Copernicus Publications on behalf of the European Geosciences Union.large uncertainties into rainfall distribution estimation, and will evidently undermine the dependability of hydrologic models used in simulating flow (both low flows and floods), sediment load and nutrient fluxes (Kaba et al., 2014).The unavailability of good quality rainfall data renders hydrologists reluctant to deal confidently with pressing and unprecedented societal questions vis-à-vis food deficits, global warming, climate change, water scarcity and water shortage issues (Baveye, 2013).
The growing availability of high-resolution (and near-realtime) satellite rainfall products can help hydrologists to obtain more accurate precipitation data, particularly in developing countries and remote locations where weather radars are absent and conventional rain gauges are sparse (Creutin and Borga, 2003;Kidd, 2001).Satellite-derived rainfall estimates have become a powerful tool for supplementing the ground-based rainfall estimates.Recently, Earth observation data for environmental or societal purposes have become readily available through Earth observation (EO) satellites and data distribution systems.Some of the freely available spatially distributed rainfall estimates are the Tropical Rainfall Measuring Mission (TRMM; Simpson et al., 1988), EUMETSAT's Meteorological Product Extraction Facility (MPEF), Multi-Sensor Precipitation Estimate-Geostationary (MPEG), the Climate Forecast System Reanalysis (CFSR), the NOAA/Climate Prediction Center morphing technique (CMORPH), precipitation estimation from remotely sensed information using artificial neural network (PERSIANN), the Naval Research Laboratory's blended product (NRLB), and more.
Passive microwave (PM) and thermal infrared (TIR) sensors are the most widely used channels of the electromagnetic spectrum for satellite rainfall estimation (Huffman et al., 2007;Negri et al., 1984;Joyce et al., 2004;Kidd et al., 2003).A TIR sensor provides useful information on storm clouds based on top cloud temperature.The assumption in the TIR is that relatively cold clouds are associated with thick and high clouds that tend to be associated with the production of high rainfall rates (Haile et al., 2010).One of the limitations with a TIR sensor is that it only uses the top cloud temperature from which the depth of the cloud is inferred (Todd et al., 2001), and it also underestimates warm rain and misidentifies cirrus clouds as rain (Dinku et al., 2011).Microwave sensors utilize a more direct way of retrieving precipitation from satellites; they gather information about the rain rather than the cloud (Todd et al., 2001).The absorption of microwave radiation by liquid water and its scattering by ice particles can be related to rainfall over the ocean and over land (Ferraro, 1997).The disadvantage of PM sensors is that they are not available on geostationary satellites, which makes them have a longer latency (Heinemann et al., 2002).A combination of both, microwave (MW) data from polar orbiting satellites and IR data from geostationary systems, is an obvious approach to overcoming some of the shortcomings in the estimation of precipitation.In this study, satellite-estimated rainfall by TRMM 3B42 (hereafter, simply "TRMM"), MPEG and CFSR is validated by comparing the estimates with the ground observation rainfall data in the Lake Tana basin, Ethiopia.
Validation of satellite rainfall products in the Ethiopian highlands will give an insight into how the different products perform in this region.In general, three seasons exist in Ethiopia.The main rainfall season from June to September, called the "Kremt" season, accounts for a large proportion of the annual rainfall (approximately 86 %), and the dry season extending from October to January called "Bega" is followed by a small rainy season called "Belg".The most important weather systems that cause rain over the country include the intertropical convergence zone (ITCZ), the Red Sea convergence zone (RSCZ), the tropical easterly jet (TEJ) and the Somalia jet (NMSA, 1996;Seleshi and Zanke, 2004).The main rainy seasons were found to be significantly correlated with the El Niño-Southern Oscillation (ENSO; Camberlin, 1997), and most of the drought seasons in Ethiopia are more likely to occur during warm ENSO events (Seleshi and Demaree, 1995).
A number of studies have been done to validate TRMM in the Ethiopian highlands (Dinku et al., 2010;Tsidu, 2012).These studies have focused on comparisons of gridded satellite rainfall estimations to ground rainfall observation data.This study validates satellite rainfall products in two ways: by comparing satellite-gridded rainfall data to point observation data and, second, by comparing satellite areal rainfall estimates to areal ground-observed rainfall interpolated by the Thiessen polygon method for the major sub-basins of Lake Tana.The Lake Tana basin is selected to take advantage of a relatively higher rainfall observation station density and the availability of daily rainfall data.These rainfall products are selected for comparison given the fact that the state-ofthe-art algorithms are used to generate them.They are also freely available for use in Africa.For example, Bahir Dar University, in collaboration with the Tana sub-basin office and the University of Twente, the Netherlands, have established a GEONETCast ground-receiving station (Wale et al., 2011) that makes the MPEG satellite rainfall product locally available.In addition, all three rainfall estimates (TRMM, CFSR and MPEG) have a relatively high spatial resolution, global coverage and high temporal resolution.
The general objective of the study is to examine which of the three freely available satellite products gives the best estimates of the spatial distribution of rainfall in mountainous terrain of Ethiopia.The satellite estimates are compared with a relatively dense network of ground rainfall observation stations distributed across the Lake Tana basin for the year 2010 for which we were able to obtain the most dense distribution of daily precipitation data.

Description of the study area
The study is carried out in the Lake Tana basin source of the Blue Nile River in the northwestern highlands of Ethiopia, with a total catchment area of 15 000 km 2 .The lake covers around 3060 km 2 at an altitude of 1786 m.The lake is located at 12 • 00 N, 37 • 15 E around 564 km from the capital Addis Ababa (Wale, 2008).The basin has a complex topography with significant elevation variations ranging from 1786 to 4107 m.The long-term annual average rainfall from 1994 to 2008 ranges from 2500 mm south of Lake Tana to 830 mm west of Lake Tana.Fig. 1 shows the spatial distribution of the rain gauge station network in and around the Lake Tana basin with a TRMM and CFSR grid.

Data availability
The data required for this study, gauge-observed rainfall data, are collected from the Ethiopian National Meteorological Agency (ENMA).Long-term average annual rainfall from 1994 to 2008, daily rainfall data for the year 2010 and station location and elevation for 51 stations in and around the Lake Tana basin are obtained from ENMA.Some stations did not record the rainfall consistently on a daily basis or, for other stations, the location and the elevation were not known.Thirty-eight stations remained that have continuous daily rainfall data for the selected study period (2010).Of these 38, there are seven stations classified as Class 1 (synoptic stations), where all meteorological parameters are measured every hour.The majority of the 17 stations are Class 3 (ordinary stations), where only rainfall and maximum and minimum temperature are collected on a daily basis.The remaining 14 stations are Class 4; only daily rainfall amounts are recorded.Some of the MPEG data at 15 m temporal intervals are acquired in near real time from the low-cost satellite image reception station established at Bahir Dar University, Institute of Technology (Wale et al., 2011).The daily aggregated MPEG data from 00:00 till 23:45 UTC, in mm day −1 , are available online at ftp://ftp.itc.nl/pub/mpe/msg/.TRMM gridded rainfall estimates are collected from the ftp site, available at ftp://disc2.nascom.nasa.gov/data/s4pa/TRMM_L3/TRMM_3B42_daily/.The daily gridded CFSR rainfall data can be collected from http://rda.ucar.edu/datasets/ds094.1/.

Methods
The predicted satellite rainfall estimate and observed gauged rainfall data have different spatial and temporal scales.The ground observation consists of 38 daily observations of point rainfall amounts irregularly distributed across the Lake Tana basin (Fig. 1).The MPEG, TRMM and CFSR rainfall consists of spatially distributed time series regularly gridded data with spatial resolutions of 3 km, 0.25 • (≈ 27 km at the Equator) and 38 km, respectively.A detailed description of TRMM, MPEG and CFSR data is provided in Appendix A. The average annual rainfall from 1994 to 2008 is plotted against the station elevation to see the stations likely affected by convective precipitation and those very much affected by a combination of orographic and convective precipitation.The backwards elimination technique was used to obtain the linear trends with elevation in the long-term average rainfall.The backward elimination technique successively eliminates the weakest independent station (variable), after which the regression will be recalculated (Xu and Zhang, 2001).If removing the variable significantly weakens the linear model, then the variable is re-entered; otherwise, it is deleted.This procedure is then repeated until only useful variables remain in the linear elevation-rainfall model.
The gridded satellite rainfall estimation is linked to the ground rainfall observations in two ways:

Point-to-grid comparison
The grids of satellite rainfall estimation (MPEG, TRMM and CFSR) are compared to the ground rainfall observation data within the satellite grid box.This means that point ground observation data are compared against satellite grid data of sizes of 3 × 3 km, 0.25 × 0.25 • and 38 × 38 km for MPEG, TRMM and CFSR, respectively.Finally, the comparison on monthly and annual bases is done by applying standard statistics.

Areal comparison
Satellite rainfall estimation is compared with the interpolated observed rainfall stations.The ground rainfall observations are interpolated adopting a Thiessen polygon method and compared with the respective satellite rainfall estimation for the major gauged river basins of Lake Tana; the accuracy is measured using standard statistics.The major river basins in Lake Tana used for this study are Gilgel Abay, Gumara, Ribb and Megech; according to Kebede et al. (2006), these rivers contribute approximately 93 % of the surface water inflow.

Ground rainfall observation station (GROS)
There are 51 meteorological stations operated by ENMA in the study area.Some of them have no location information and/or the actual elevation provided is not considered reliable.For the 38 selected stations, daily rainfall is available in the 2010 study period.Monthly rainfall amounts for selected stations are given in Fig. 2. Long-term annual average rainfall varies between 830 and 2500 mm yr −1 from 1994 to 2008.
Approximately 86 % of the annual rainfall falls between June and September.

Statistical measures
Three statistical measures were used to compare the satellite rainfall estimates with the ground rainfall observations consisting of the coefficient of determination (R 2 ), multiplicative bias (bias) RMSE.The coefficient of determination (R 2 ) is used to evaluate the goodness of fit of the relation.R 2 addresses the question of how well the satellite rainfall estimates correspond to the ground rainfall observations: it is the degree of linear association between the two terms; see Eq. (1).
where R 2 is the coefficient of determination, G i the ground rainfall measurements, S i the satellite rainfall estimates, and n the number of data pairs.RMSE measures the difference between the distributions of the ground-observed rainfall and the distribution of satellite rainfall estimation, and calculates a weighted average error, weighted according to the square of the error.RMSE is useful when large errors are undesirable.The lower the RMSE score, the closer the satellite rainfall estimation represents the observed ground rainfall measurement; see Eq. ( 2).

RMSE
where RMSE is the root mean square error, G i the ground rainfall measurements, S i the satellite rainfall estimates, and n the number of data pairs.Bias is a measure of how the average satellite rainfall magnitude compares to the ground rainfall observation.It is simply the ratio of the mean satellite rainfall estimation value to the mean of the ground rainfall observed value.A bias of 1.1 means the satellite rainfall is 10 % higher than the average ground rainfall observations; see Eq. (3).where G i are the ground rainfall measurements and S i the satellite rainfall estimates.

Result and discussion
The long-term annual average rainfall from 1994 to 2008 is plotted against station elevation to see the rainfall-elevation relation (Fig. 3).Two clear relationships can be observed; the first one shows a 50 mm of rainfall increase for every 100 m elevation increase and the second trend observed was a 125 mm rainfall increase for every 100 m elevation increase.These two relations can be explained by stations likely affected by convective rainfall only (rectangles) and those very much affected by a combination of orographic and convective precipitation (in circles) in Fig. 3.

Point-to-grid comparison
The satellite rainfall estimates are aggregated to monthly temporal intervals, and the monthly satellite rainfall estimation was extracted for the 38 station locations.The observed ground rainfall and the extracted satellite rainfall for all 38 stations are depicted for the three standard statistical techniques in Fig. 4a-c.As shown in Fig. 4a, the monthly MPEG and CFSR have strong correlations with the ground rainfall observation stations (GROS).For MPEG, the coefficient of determination ranges from a maximum of 0.99 (Enfranz station) to a minimum value of 0.63 (Yismala station).On average, 78 % of the total observed rainfall variation is explained by the MPEG satellite rainfall estimate.The CFSR has a coefficient of determination ranging from 0.63 to 0.99 for Shembekit and Gassay, respectively; on average, 86 % of the total observed rainfall variation is explained by CFSR rainfall data for the 38 stations.The correlation between TRMM and GROS on a monthly basis is weak, with a maximum coefficient of determination of 0.29 (Addis Zemen station) and a minimum value of 0.00.Multiple stations did not show a correlation with TRMM data.On average, only 7 % of the total observed rainfall variation is explained by the TRMM satellite rainfall estimates.The root mean square error in Fig. 4b gives very much the same trends as in Fig. 4a.The MPEG and CFSR have a much better RMSE (ranging from 0.63 to 9.5 mm day −1 ), while TRMM has a RMSE ranging from 3.8 to 11.8 mm day −1 .
Thus, MPEG and CFSR rainfall estimates are clearly better related to gauged rainfall than TRMM.This is in agreement with the findings of Dinku et al. (2008), where, on average, TRMM-3B42 captures only 15 % of the rainfall variability for the whole of Ethiopia.
Finally, if we look at the rainfall distribution throughout the year, we found that the rainfall estimates of MPEG and CFSR agree with the ground-based observation of 84 to 86 % of the annual rainfall that occurs in the rainy monsoon phase from June to September, as exemplified in Fig. 5 for Addis Zemen and Agre Genet stations.In contrast, TRMM finds that only 30 % of rainfall is during the rainy season.Fig. 6 shows the spatial distribution of total rainfall for the year 2010 from MPEG, CFSR and TRMM.
The bias calculated (Fig. 4c, logarithm of bias) for MPEG, TRMM and CFSR ranges from 0.2 to 0.9, 0.5 to 1.9 and 0.24 to 2.69, with average values of 0.43, 1.0 and 1.3, respectively.The MPEG is consistent in underpredicting the observed rainfall; on average, it underestimates it by 57 %.The TRMM overestimates for 15 stations, and it underestimates for the remainder.The CFSR also overestimates for 24 stations, and it has the largest standard deviation of bias indicating the spread of the bias between stations.
Stations likely affected by convective rainfall (22 stations, marked in rectangles in Fig. 3) have a better correlation coefficient and a smaller RMSE than the stations likely affected by a combination of orographic and convective precipitation (16 station, marked in circles in Fig. 3).The bias also indicated that stations likely affected by both convective and orographic rainfall will have a higher bias than the stations likely affected by convective rain only.This is quite reasonable, because orographic lifting of the moist air will lead to precipitation, while the cloud-top temperature is still relatively warm.Satellite rainfall products may not detect the rainfall from the warm clouds, as the cloud-top temperature would be too warm for TIR thresholds (Dinku et al., 2008), and there will not be much ice aloft to be determined by PM sensors, but both sensors can detect the rainfall from the deep convection (Tsidu, 2012).

Areal comparison
Stations likely affected by convective rainfall are interpolated using the Thiessen polygon method, and their weights on areal rainfall for the major watersheds are determined (Fig. 7).Gilgel Abay watershed has two stations likely affected by convective rainfall; Megech has three, Gumara six and Ribb seven stations.The areal observed rainfall is compared with the areal satellite rainfall estimation for the major gauged river basins in Lake Tana. Figure 8 shows the correlation and RMSE of areal ground rainfall observation stations (GROS) vs. MPEG, areal GROS vs. TRMM and areal GROS vs. CFSR for the major river basins of Lake Tana. Figure 9 shows the bias of satellite rainfall estimation compared with the ground observation stations.
The areal MPEG and CFSR satellite rainfall estimations have a very high coefficient of determination above 0.8; on average, both MPEG and CFSR captured 93 % of the areal observed rainfall variability in the major river sub-basins of Lake Tana (Fig. 8).Overall, the areal satellite rainfall estimates for the major river basins have a smaller RMSE and a higher R 2 compared to the results of point-to-grid comparison.This is because the stations used for areal observed rainfall estimations are the stations likely affected by convective   around the Lake Tana basin for 2010.Two approaches were used in the evaluation: the precipitation of the point-gauged data was compared to satellite-predicted rainfall for the grid in which the rainfall station was located; and all satellite grid-based prediction was compared with the areal interpolated observed rainfall stations that were only influenced by convective rainfall.The performance of MPEG and CFSR satellite rainfall estimates for both point-to-grid and areal  comparisons was better than that for the TRMM satellite rainfall amounts.Although the MPEG satellite rainfall underestimated consistently the ground-observed rainfall by an average of 60 %, it captured the rainfall pattern well.CFSR satellite rainfall also captured the observed rainfall pattern, but it overestimated for some and underestimated for the other stations.TRMM rainfall estimated poorly the ground rainfall observations for both point-to-grid and areal comparison.
The ground observation data indicated that 86 % of the annual rainfall occurred from June to September, and the MPEG and CFSR indicated approximately the same percentage.The TRMM indicated that only 30 % of the annual rainfall occurred during the rainy season of June to September.Although the TRMM 3B42 bias is adjusted with monthly gauged rainfall data and has performed well in many parts of the world (Ouma et al., 2012;Javanmard et al., 2010), such an adjustment was not made for the Ethiopian highlands, because observed rainfall data were not made available to the TMPA research team (Haile et al., 2013).Based on the study period for the study area, MPEG has performed better in capturing the spatial and temporal patterns of observed rainfall.The result suggested that there should be a further calibration for the TRMM 3B42 rainfall product to capture the temporal variation of rainfall, and MPEG can easily be calibrated by a correction factor to capture the observed rainfall.

Figure 1 .
Figure 1.The Lake Tana watershed, showing the TRMM and CFSR grids and the location of the available and selected rainfall stations (90 m digital elevation model as background).

Figure 3 :
Figure 3: (a.) Elevation verses long-term annual average rainfall relations in the Lake Tana 3

Figure 6 .Figure 7 :Figure 7 .
Figure 6.Spatial distribution of the annual rainfall estimates for the year 2010 from MPEG, CFSR and TRMM data.

Figure 8 .
Figure 8. R 2 and RMSE of areal ground observed rainfall vs. the satellite rainfall estimate for the major river basins in Lake Tana.

Figure 8 :
Figure 8: R-Squared and RMSE of areal ground observed rainfall versus satellite rainfall estimate for the major river basins in the Lake Tana.

Figure 9 :Figure 9 .
Figure 9: Bias of areal ground observed rainfall versus satellite rainfall estimate for the major river basins in the Lake Tana.