Can we trust remote sensing ET products over Africa?

Evapotranspiration (ET) is one of the most important components in the water cycle. However, there are relatively few direct measurements of ET (using flux towers), whereas various disciplines ranging from hydrology to agricultural and climate sciences, require information on the spatial and temporal distribution of ET at regional and global scale. Due to limited data availability, attention has turned toward satellite based products to fill observational gaps. Various remote sensing data products have been developed, providing a large range of ET estimations. Across Africa only a limited number of flux towers 5 are available which are insufficient for systematic evaluation of remotely sensed (RS) derived ET products. Thus we propose a methodology for evaluating RS derived ET data at the basin scale using a general water balance (WB) approach, where ET is equal to precipitation minus discharge for long-term annual averages. Firstly, RS ET products are compared with WB inferred ET for basins without long-term trends present. The RS products are then assessed according to spatial characteristics through analysing two land cover elements across Africa, irrigated areas and water bodies. A cluster analysis is also conducted 10 to identify similarities between individual ET products. Finally, the RS products are evaluated against the Budyko equation. The results show that CMRSET, SSEBop and WaPOR rank highest in terms of estimation of long-term annual average mean ET across basins with low biases. Along with ETMonitor, the same three products rank highest in spatial distribution of ET patterns across Africa. GLEAM and MOD16 consistently rank the lowest in most criteria evaluation. Many of the products analysed in this study can be trusted depending on the study under question, keeping in mind some of these products have large 15 biases in magnitude estimation. However our recommendation would be the three highest ranked products being CMRSET, SSEBop and WaPOR. Copyright statement. Author(s) 2019. This work is distributed under the Creative Commons Attribution 4.0 License.

understanding the impacts of potential changes on the hydrological cycle under a changing climate, to name a few (Teuling et al., 2009;Vinukollu et al., 2011a;Mu et al., 2011). However, the estimation of ET at large scales has always been a difficult task due to direct measurement of ET being possible only at point locations, for example using flux towers (Trambauer et al., 2014). Flux tower data can be openly accessed through FLUXNET, a global network of micrometeorological flux measurement 5 sites that measure the exchange of CO2, water vapor and energy between the biosphere and the atmosphere (Baldocchi et al., 2001). From the latest FLUXNET 2015 dataset, there are only six eddy covariance sites in Africa, from which latent heat (LE) measurements can be obtained, which can be converted to ET. Figure 1 shows the distribution and data availability of the sites. Gap filled LE data using the Marginal Distribution Sampling (MDS) technique is available at these locations however, a general lack of energy balance closure is found at many sites (Wilson et al., 2002). For this reason LE can also be obtained 10 with a correction factor applied for energy balance closure and thus, reduces the number of data points and sites available for use. Due to the limited data availability of observed point data for the entirety of the African continent a method of evaluating ET estimations using data other than point measurements is required.
Recent advances in satellite based products provide promising data to fill these observational gaps (Alkema et al., 2011;Miralles et al., 2016). ET cannot be directly measured by satellite based measurements, but can be derived from physical 15 variables that can be observed from space, such as latent heat and surface heat using the surface energy balance. In addition, due to passing frequencies and cloud interference, interpolations in time are required. In this respect remote sensing derived ET cannot be interpreted as direct satellite observations but as model outputs based on satellite forcing data . Satellite observations often give useful information on the spatial variability, however the products tend to suffer from a large bias. Therefore, large-scale estimations of ET are most commonly products of remote sensing based models, hydrological 20 models and land-surface models (Trambauer et al., 2014). More recently, remote sensing ET products have been developed using Machine Learning (ML) approaches such as Model Tree Ensemble (MTE) or Artificial Neural Network (ANN) combined with observed flux tower data or model outputs used as training sets (Tramontana et al., 2016;Jiménez et al., 2011;Jung et al., 2017;Alemohammad et al., 2017).
With this large range of approaches to estimate ET, large differences are observed among the products and therefore, validation is required. Since it is difficult to validate ET estimates using observed data, an alternate method of inferring ET for a river basin is used. Assuming the change in water storage (soil moisture, lakes, deltas) is negligible at the river basin scale, ET becomes equal to precipitation (P) minus discharge (Q) (Miralles et al., , 2011Vinukollu et al., 2011b). Using this general Water Balance (WB), it is possible to gain understanding of the magnitude of ET within a given basin and hence to 5 estimate biases in ET estimation by the different ET products. Unfortunately, the period of observation for measured discharge for certain basins is limited or do not overlap with RS derived estimations of ET. For this reason long-term annual averages for time series without trends are used.
This study focuses on a methodology for evaluating RS derived ET products from discharge observations and observation based precipitation to derive ET using a WB approach, at the continental scale over Africa using long-term averages for non-10 overlapping time periods. A trend analysis is conducted in order to justify the use of different time periods. Spatial variability is analysed using specific land cover elements that tend to have a higher or lower ET such as water bodies and irrigated areas.

Data
The remote sensing derived ET products being evaluated in this study include WaPOR, GLEAM, MOD16, SSEBop, WE-CANN, FLUXNET-MTE, ETMonitor and CMRSET. All data are projected and gridded on a 0.0022 • × 0.0022 • geographic 15 grid and averaged at yearly temporal resolution. Table 1 summarizes the characteristics of the remote sensing products being used.

GLEAM
The Global Land Evaporation Amsterdam Model (GLEAM) is a physically based model that estimates terrestrial evapotran-20 spiration using satellite observations (Miralles et al., 2011. It consists of three different calculation schemes, namely, (1) rainfall interception driven by rainfall and vegetation observations; (2) potential evaporation calculated using the Priestley and Taylor (P-T) equation (Priestley and Taylor, 1972) and driven by satellite observations; and (3) a stress factor attenuating potential evaporation based on a semi-empirical relationship between microwave vegetation and optical depth (VOD) observations and root zone soil moisture estimates (Alemohammad et al., 2017). GLEAM ET estimates are provided at daily temporal 25 resolution from 1980-2013 and 0.25 • × 0.25 • spatial resolution.

WaPOR
The Food and Agriculture Organisation's data portal to monitor Water Productivity through Open access of Remotely sensed derived data (WaPOR) offers products related to water productivity (WP) derived mainly from freely available remote sensing satellite data (FAO, 2018). Actual evapotranspiration estimates are the sum of the soil evaporation (E) and canopy transpiration 30 data. Terrestrial ET includes evaporation from wet and moist soil, rain water intercepted by the canopy and transpiration through stomata from plant leaves and stems. ET datasets are calculated using (Mu et al., 2011) improved algorithm from the initial developed algorithm in (Mu et al., 2007) and is based on the P-M equation. Improvements include; evaporation from wet soil; nighttime ET; simplified calculation of vegetative fraction cover; adding soil heat flux; improving estimates of stomatal conductance, aerodynamic resistance and boundary layer resistance and separating dry and wet canopy surfaces (Mu et al.,

SSEBop
The operational Simplified Surface Energy Balance (SSEBop) model estimates ET as a function of the land surface temperature (T s ) from remotely sensed data and reference ET (ET o) from global weather datasets using the Simplified Surface Energy 15 Balance (SSEB) method developed by Senay et al. (2007Senay et al. ( , 2011b. The Surface Energy Balance (SEB) is first solved for each pixel for a reference crop condition using the standard P-M equation and is adjusted according to T s through an ET fraction approach, which accounts for the spatial variability of water availability and vegetation health in the landscape (Savoca et al., 2013). SSEBop uses pre-defined, seasonally dynamic boundary conditions that are unique to each pixel for "hot/dry" and "cold/wet" reference points defined in Bastiaanssen et al. (2014) and Allen et al. (2007). SSEBop ET estimates are provided at 20 a spatial resolution of 0.0096 • × 0.0096 • and at either monthly or annual temporal resolution for the period 2001-2017.

WECANN
The Water, Energy and Carbon Cycle with Artificial Neural Networks (WECANN) retrieves monthly estimates of Latent Heat Flux (LE) using the Artificial Neural Network (ANN) approach. The LE estimates, converted to ET in this study using a coefficient, uses remotely sensed solar-induced fluorescence (SIF) estimates along with remotely sensed estimates of precipitation, 25 temperature, soil moisture, snow cover and net radiation as inputs. Different observations and/or model-based estimates of LE are used to produce the training dataset using a Bayesian perspective (Alemohammad et al., 2017). WECANN LE estimates are provided at a spatial resolution of 1 • × 1 • and at monthly temporal resolution for the period 2007-2015.

FLUXNET-MTE
The FLUXNET Model Tree Ensemble (FLUXNET-MTE) provides global fluxes of LE, converted to ET in this study, derived from empirical upscaling of eddy covariance measurements from the FLUXNET global network (Baldocchi et al., 2001). The MTE method uses an ensemble learning algorithm by training the MTEs for LE using site-level explanatory variables and fluxes and then applying these established MTEs using gridded datasets of the same explanatory variables .

5
MTE LE estimates cover a period from 1982-2012 at a spatial resolution of 0.5 • × 0.5 • and at a monthly temporal resolution.

ETMonitor
ETMonitor is a process based model using mainly satellite observations to estimate ET at the global scale. In order to calculate ET, different modules for different land cover classes are used, including soil evaporation and plant transpiration for soil-plant systems based on the Shuttleworth-Wallace (Shuttleworth and Wallace, 1985) model, an analytical module for rainfall inter-10 ception loss by vegetation canopies, a water evaporation module for water bodies based on the P-M equation and a sublimation module for snow/ice surfaces . The ET estimates are available globally, covering a period from 2008-2012 at a spatial resolution of 0.0096 • × 0.0096 • and at daily temporal resolution.

CMRSET
CMRSET provides estimates of ET based on surface reflectances from MODIS-Terra and interpolated climate data. The al- 15 gorithm uses Enhanced Vegetation Indices (EVI) through its relationship with Leaf Area Index (LAI) and Global Vegetation Moisture Indices (GVMI) which provides information on vegetation water content and allows the separation of surface water and bare soil to scale derived P-T potential evapotranspiration (Guerschman et al., 2009 sources of EWEMBI are ERA-Interim reanalysis data (ERAI), WATCH forcing data methodology applied to ERA-Interim reanalysis data (WFDEI), eartH2Observe forcing data (E2OBS) and NASA/GEWEX Surface Radiation Budget data (SRB) (Dee et al., 2011;Weedon et al., 2014;Stackhouse et al., 2011). The dataset covers the entire globe at a spatial resolution of 0.5 • × 0.5 • and daily temporal resolution from 1979 to 2013.

Climate Hazards group Infrared Precipitation with Stations (CHIRPS) dataset uses the Tropical Rainfall Measuring Mission
Multi-satellite Precipitation Analysis version 7 (TMPA 3B42 v7) to calibrate global Cold Cloud Duration (CCD) rainfall estimates as well as a 'smart' interpolation of gauge data from Meteorological Organization's Global Telecommunication System (GTS) (Funk et al., 2015). The product is available for 50 • S to 50 • N and all latitudes at a spatial resolution of 0.05 • × 5 0.05 • at daily, pentadal and monthly temporal resolution covering the period from 1981-2017.

Discharge data
Discharge data was obtained from the Global Runoff Data Centre (GRDC) for the majority of basins and from the Vrije Universiteit Brussels (VUB) Department of Hydrology and Hydraulic Engineering (HYDR) for the Nile and Blue Nile basins.
All data was initially obtained at either daily or monthly temporal resolution and aggregated to monthly and yearly averages.

Reference potential evapotranspiration data
The datasets used for reference potential evapotranspiration (PET) was developed by Deltares (Sperna Weiland et al., 2015).
The datasets are derived from the WFDEI dataset with a resolution of 0.5 • × 0.5 • and downscaled based on a high resolution Digital Elevation Model (DEM) from the Shuttle Radar Topography Mission at 90m resolution (Sperna Weiland et al., 2015).
Three datasets were used for PET based on the Hargreaves, P-M and P-T approaches respectively. The global PET datasets 20 have a spatial resolution of 0.083 • × 0.083 • and daily temporal resolution covering the period 1979-2012.

Methodology
A methodology to evaluate RS derived ET estimations is presented next: 1. Preprocessing and data analyses 2. Comparison using WB inferred ET estimates

Preprocessing and data analyses
Due to the limited availability of direct observations of ET across Africa, we infer ET estimates at the river basin level using the WB approach. The long-term WB assumes a negligible change in storage (discussed further in Section 5) and therefore the total inflow (P) is equal to the total outflow (ET and Q) and therefore ET is equal to P minus Q, according to the following equation: For all the major river basins in Africa, discharge data from GRDC and other sources were analysed at their outlets based on data availability and quality. As seen in Fig. 2, from fifty four major basins in Africa we found twenty seven basins with sufficient and quality discharge data at the outlet. The spread of only these twenty seven basins covers the majority of the African continent. Since direct observations of precipitation from gauges were not used, three different precipitation products, 10 as described above, are used for comparison. Using these available discharge data and precipitation data from EWEMBI, CHIRPS and MSWEP, the annual average ET was estimated across each of the twenty seven basins using equation 1 and the long-term average was calculated.
As mentioned previously, using the general WB to infer average ET across different basins poses the problem of limited to no overlapping time periods between the data sources. Thus, we investigated whether or not annual trends can be detected 15 from the inferred ET. If the data show no major trends across the different basins then it can be justified to evaluate the ET estimations using long-term averages from different time periods (discussed in Section 5).
The Mann-Kendall (MK) (Mann, 1945;Kendall, 1948)  After conducting the test, if trends were present, these basins were discounted from the analyses.
Lastly, a cluster analysis was performed, using the method followed by Wartenburger et al. (2018) on the RS products and the 5 MPM to investigate the overall level of similarity between the individual products in terms of spatial variability. The long-term average map for each product and the MPM were used whereby the pairwise Euclidean distance between each dataset for each pixel was calculated and evaluated. Each of the maps used were resampled to 0.0096 • × 0.0096 • for computation efficiency.

Comparison using WB inferred ET estimates
In order to conduct comparisons of ET estimations, all RS derived products were projected to WGS 84, EPSG:4326 on a 10 0.0022 • × 0.0022 • grid, the highest spatial resolution of the products being analysed. The nearest neighbors interpolation method was used for any resampling required from course to high resolution. The estimations were then combined to give a single map of the long-term annual average ET across Africa. The time periods averaged for each product can be found in Table 1. These maps were then clipped for each of the basins being analysed and the basin mean long-term annual average ET recorded. From these results the correlation, average difference and weighted average difference with the estimated WB ET 15 using all three precipitation products was calculated. The ranking of the RS ET estimations for the correlation, average and weighted average difference was based on the mean performance against the three WB ET estimates derived using EWEMBI, CHIRPS and MSWEP precipitation data.

Performance with characteristics land cover elements
Two types of land cover elements were evaluated in this study. A map with areas equipped for irrigation actually irrigated by 20 FAO and Rheinische Friedrich-Wilhelms-University (Siebert et al., 2013) and a map of water bodies was obtained from the Global Reservoir and Dam (GRanD) database (Lehner et al., 2011) were used to evaluate how well the ET products identified spatial characteristics. Two steps were used, firstly the maps were evaluated visually using the same colour scale. Secondly, since for water bodies the ET should be more or less equal to the PET, the long-term annual average mean ET estimates across water bodies by the products were compared with the long-term annual average mean PET estimates across water bodies 25 by calculating the difference between them. For irrigated areas, average crop coefficients (kc=ET/PET) for maize, wheat and sugarcane estimated by FAO were used as a reference. Thus, the long-term annual average mean ET estimates across irrigated areas were divided by the long-term annual average mean PET estimates across irrigated areas to find the average crop coefficient (kc) across irrigated areas. The difference between the reference kc from FAO and estimated kc using RS ET estimates and PET derived using Hargreaves, P-M and P-T were then found. Ranking was based firstly on visual inspection 30 of the distribution of irrigated areas and water bodies between the products and secondly on the smallest average difference of mean ET of irrigated areas and water bodies with the specified reference conditions.

Evaluation using the Budyko curve
The Budyko equation partitions precipitation into streamflow and ET by describing the relationship between mean annual ET and long-term average water and energy balance at catchment scales (Sposito, 2017) as seen in Fig. 3. Budyko (1974) developed this approach for the physics of catchment ET by postulating on the phase transformation of green water to vapor and thus that ET reflects not only the partitioning of water but also radiant energy at the vadoze zone and atmosphere interface 5 (Sposito, 2017;Gerrits et al., 2009) following equation 2.
Since the Budyko curve provides a reference condition for the water balance assuming it correctly describes the partitioning of P into Q, then we can use this information to see how well our products perform in estimating ET. For each of the basins under study, we calculated ET/P and PET/P and plotted these against the Budyko curve. We derived long-term annual average 10 basin mean PET estimates for Hargreaves, P-M and P-T approaches. We also used P from EWEMBI, CHIRPS and MSWEP separately to compare the results. The ranking of the RS products using the Budyko evaluation is based on the smallest difference with the Budyko ET estimations for the average ET across the basins for the three PET approaches, Hargreaves, P-M and P-T.

15
In this section we present the obtained results for the different methodology stages.

Preprocessing and data analyses
Figure 4 (left) shows the annual average ET estimates for the twenty seven basins with available discharge and precipitation data. The spread of the ET across the basins seems to be consistent with the climate, where basins in the semi-arid to arid northern and southern parts of Africa show lower ET than the more centrally located basins known to be more tropical.
For the MK test to be accurate a minimum of ten data points should be used. However, of the twenty seven basins being 5 tested, eleven basins did not have sufficient data points for an accurate analyses invalidating these results. Table 2 shows results from conducting a MK test for monotonic trends in the ET estimates inferred from the WB approach for the remaining twenty seven basins across Africa with available discharge data. ET estimates for twenty two of the basins show no trends, while three basins show trends, Cunene and Okavango increasing trends and the Nile a decreasing trend. Two basins, Rufiji and Tana, did not have any overlapping precipitation and discharge data to calculate ET for analyses. For the basins with fewer than ten data 10 points, the MK test was conducted on the collected precipitation and discharge data used to calculate ET. From the eleven basins analysed, five basins, the Blue Nile, Lake Chad, Save, Tana and Void, show a increasing or decreasing trend in either the precipitation or discharge as seen in Table 2. Thus from the MK trend analyses conducted on ET, P and Q estimated, seven basins showed a trend in at least one of the three variables and thus were eliminated from the study. Two groupings or clusters are observed when looking at the similarity between individual products and the MPM (Fig. 5).
We see one cluster formed with three products, CMRSET, SSEBop and WaPOR, with SSEBop and WaPOR being slightly more similar than with CMRSET. And a second cluster with the remaining products and the MPM showing the most similar products being WECANN with MPM and GLEAM with MTE.  Figure 6 shows the correlation of the long-term annual average basin mean ET estimates and the different RS ET products, the MPM and the WB inferred ET using the different precipitation products across the twenty basins. For all products the  ET 1979ET -80, 1982ET -84, 1990ET -2005 Table 3 shows the ranking of the RS products based on the mean WB ET derived using the three precipitation products for 15 each of the calculated statistics. Considering the correlation is relatively high for all the products, we see that the higher ranked products are WaPOR, SSEBop and CMRSET, while GLEAM and ETMonitor are ranked as the lowest. Figure 9 shows a section of the Nile basin where large irrigation occurs from the Nile Delta in Egypt all the way down to the Gezira scheme in Sudan. This area was selected as it was easiest to view the differences between products on how well they 20 performed in showing the spatial distribution of ET since the ET is relatively higher in these areas than surrounding areas. Most of the products are able to capture the spatial distribution of irrigation patterns in this area with some products performing better than others except for GLEAM. Even the courser products, WECANN and MTE can also slightly capture higher ET in these  . Percentage difference between long-term mean WB inferred ET and RS derived ET across basins using three different precipitation products (EWEMBI (left), CHIRPS (middle) and MSWEP (right)) larger irrigation areas. As expected the higher resolution products, WaPOR, CMRSET, SSEBop and ETMonitor capture the spatial patterns of ET across these areas very well. From visual inspection we ranked the performance of each of the products in capturing the spatial distribution of ET in irrigated areas as seen in Table 3. We see that WaPOR and CMRSET rank the highest while GLEAM and WECANN rank the lowest.

Performance with characteristics land cover elements
Figure 10 also compares the products at capturing spatial characteristics, this time using water bodies. We zoomed into Lake 5 Victoria to clearly be able to identify and visualise the spatial patterns of ET across a large lake. The majority of products here do not estimate ET across water bodies. Only CMRSET, ETMonitor, SSEBop and WaPOR estimate ET across Lake Victoria. We can see that CMRSET and ETMonitor show higher ET across the lake than SSEBop and WaPOR which show better characterisation of ET across water bodies, thus these products were ranked higher than the other two. The ranking based on visual inspection and magnitude can be found in Table 3. For all products that did not estimate ET across water bodies the 10 ranking was set to 9.   Figure 11 shows the difference between the reference crop coefficients of maize, wheat and sugarcane with the estimates long-term annual average mean crop coefficient across irrigated areas in Africa. It is clear that all products show underestimations in irrigated areas when compared with the reference crop coefficients. The shapefile used for defining the irrigated areas shows very small areas that are smaller than the highest resolution pixels from our products. Thus some of these irrigated areas are calculating ET but are not being accounted as irrigated areas within our products which may account for the 5 underestimation. We see that the three products that have the smallest difference with the reference crop coefficients are consistently CMRSET, WaPOR and SSEBop. Figure 12 shows the difference between the the long-term annual average mean ET across water bodies from the RS ET estimates and PET estimates using the Hargreaves, P-M and P-T approaches. Only four products are presented ETMonitor, CMRSET, WaPOR and SSEBop while all other products do not calculate ET across water bodies however, some water bodies are included in those products due to the resolution of the data. Therefore, only CMRSET,

10
ETMonitor, SSEBop and WaPOR have estimations of ET across water bodies. All four products tend to estimate ET across water bodies relatively well with small differences with the PET estimates across water bodies. All products underestimate ET across water bodies except for CMRSET which overestimates ET.   Table 3 shows the ranking of the different RS products based on the mean ET across irrigated areas and water bodies compared with PET estimates. We see that the highest ranked products for irrigated areas are CMRSET and SSEBop and the lowest are GLEAM and ETMonitor, while for water bodies the highest ranked products are ETMonitor and CMRSET.

Evaluation using the Budyko curve
The results of the Budyko analysis are shown in Figs. 13, 14 and 15 which shows estimations of long-term annual average basin 5 mean ET using the WB and RS products plotted against PET/P estimates calculated using EWEMBI, CHIRPS and MSWEP precipitation respectively. In each figure long-term annual average basin mean PET was calculated using Hargreaves, P-M and P-T approaches. We see that for all three precipitation products the different ET estimations across the basins follow the same trends with small differences in values. However, we see that the water limit and energy limit are exceeded by some ET models in figures 13, 14 and 15, thus when using EWEMBI, CHIRPS or MSWEP precipitation respectively. Exceeding the energy 10 limit implies water is being lost through the groundwater system for example and exceeding the water limit suggests there is an additional input of water beyond precipitation. SSEBop, WECANN and CMRSET exceed the water limit in a more basins relative to other products, however their ET estimations are not necessarily further from ET estimations using the Budyko   Table 3 shows the ranking of the different RS products based on the Budyko evaluation. We see that the highest ranked products are WECANN and MTE and the lowest ranked products are GLEAM and 5 MOD16.

Discussion
We make two assumptions in this paper regarding the methodology applied for evaluating RS derived ET estimates. The first assumption is that if no trends are present in long-term annual average mean WB inferred ET estimates across a basin, then long-term annual average mean ET estimates across basins can be compared with different time periods. This is true if long-   investigating trends in long-term ET and do not come to a consensus as to the cause or direction of the trend (Miralles et al., 5 2014;Douville et al., 2013;Jung et al., 2010;Zhang et al., 2016).
For this first assumption to hold, we must also address the possibility that regardless of whether there are no trends present, the mean ET from one period may be different from another period due to precipitation variability. In this case we analysed four basins for which the calculated WB ET estimations covered the different periods of all RS ET products being evaluated.
For each of the different RS product periods and for each of the four basins the corresponding mean WB ET was found. This was then subtracted from the calculated WB ET long-term mean. From Table 4 we see that the percentage differences in mean for the different periods ranges from 0 to a maximum of 7.4 percent of basin ET. Thus, in this study our assumption holds that if no significant trend can be found in annual long-term ET estimates then different time periods can be used due to lack of overlapping data.
The second assumption is that the water balance can be simplified to equation 1 where for annual long-term average estimates 5 the change in storage is negligible. Many studies make this assumption for long-term averages and basin scale averages (Du et al., 2016;Taniguchi et al., 2003;Wang and Alimohammadi, 2012;Carter, 2001;Budyko, 1974). However a recent study by Rodell et al. (2018)  The comparison between the RS products were carried out at the highest spatial resolution of the different products which is 0.0022 • × 0.0022 • . As we are resampling from coarse resolution to higher resolution the nearest neighbor method employed 15 for completing the resampling is sufficient as the magnitude and spatial characteristics will not be altered or lost (Porwal and Katiyar, 2014;Gurjar and Padmanabhan, 2005). It must also be kept in mind that the initial spatial resolution and the temporal period under comparison are not the same for each product and this may effect the ranking that we are considering. Also many of the coarser resolution products do not estimate ET across water bodies and this may therefore explain the large biases in  1979-20102000-20102008-20101980-20102000-20101982-20102003-2010-2010-2010 1979-2015 2000-2013 2008-2013 1980-2015 2000-2014 1982-2012 2003-2015 2009-2015 2007-2015 ET  1979-2014 2000-2013 2008-2013 1980-2014 2000-2014 1982-2012 2003-2014 2009-2014 2007-2014 1979-2015 2000-2013 2008-2013 1980-2015 2000-2014 1982-2012 2003-2015 2009-2015 2007- certain products when comparing ET estimations with the WB ET estimations. WaPOR, ETMonitor and WECANN have less than 10 years in total coverage in order to calculate their long-term annual average.
We used the assumption that where there is ample water ET equals PET (McMahon et al., 2013) and thus applied this assumption for evaluating our ET products for irrigated areas and water bodies. The assumption seems to hold quite well for some of the products when evaluating water bodies. We also used the assumption that for irrigated areas ET/PET should equal 5 the crop coefficient. However all products seem to be underestimating the long-term annual average ET over irrigated areas when compared with crop coefficients. One of the largest factors in this is that we take the entire irrigated area mean to calculate one crop coefficient, where we know there are many different crops being grown and thus should account for many different crop coefficients that is not feasible in this study. Also areas much smaller than the highest resolution product are being taken into consideration in the irrigated areas and thus ET in these areas are not being calculated as irrigated area pixels. This can 10 effect the calculation of ET and in most cases would be underestimated. Therefore the resolution of the product in this case, is also a factor for this underestimation.
Evaluation of the spatial characteristics is completed using two steps, the comparison of land cover elements with PET estimates and visual interpretation. There are two issues involved in this spatial comparison. Firstly, the evaluation is taking place based on products originating with different resolutions. Thus, the view that higher resolution products will outperform the coarser resolution products, which is generally the case. However, we can also see that coarser resolution products, namely WECANN and MTE, outperform the higher resolution product GLEAM in these spatial characteristics and thus, this is not always the case. Also, the spatial resolution of the ET estimates used may also be a critical element in determining which product is of use for the user. Secondly, the visual interpretation can be viewed as quite arbitrary and subjective to the evaluator's 5 eye. This again is the case, however by using land cover elements that are large and easy to visualize, such as irrigated areas and water bodies, the relative subjectivity can be reduced.
Looking at the overall level of similarity between the products in Fig. 5 we can see that for the cluster between CRMSET, SSEBop and WaPOR all products use MODIS as an input. SSEBop and WaPOR both use the P-M method for the calculation of ET, while CMRSET uses the P-T method. ETMonitor and MOD16 also use MODIS as an input with MODIS using the P-M 10 method for ET calculation and ETMonitor using both Shuttleworth-Wallace and the P-M method, however both are found in the second cluster. The remaining products within the second cluster use different inputs and different ET estimation methods.
Thus, no patterns can be inferred through the cluster analysis by looking at the input or ET calculation method. What is clear is that the first cluster contains the products which overall rank the best in terms of ET estimation based on the proposed methodology.

15
The overall ranking for each product was based on the average ranking of the different comparative elements. An overall ranking was performed including the visual inspection of the land cover elements, however was also performed without including the visual inspection due to this being rather subjective based on the analyst. This does not affect the ranking of the top three or the lowest two ranked products but changes the order of the products ranked in the middle. CMRSET, WaPOR and SSEBop are consistently ranked 1, 2 and 3, respectively. The lowest ranked products in both cases are GLEAM and MOD16. 20 MPM is consistently ranked 5 in both cases. MTE and WECANN rank higher without visual inspection from positions 6 to 4 and 6 to 5, respectively. ETMonitor's ranking position changes the most ranking lower without visual inspection going from position 4 to 7.

Conclusions
This study focuses on the question of whether or not we can trust remote sensing ET products over Africa. By trying to 25 overcome the problem of the lack of data for validation and evaluation purposes the methodology proposed can identify which products perform well in terms of biases, magnitudes and spatial characteristics. Using observations of discharge and observation based precipitation products to infer long-term annual average mean ET estimates at the basin scale and overcoming the lack of overlapping data for comparison by using different time periods for calculation of our long-term annual averages, RS derived ET estimations were evaluated. Based on the different elements being analysed CMRSET, WaPOR and SSEBop capture 30 the magnitude of ET showing small biases in the long-term annual average mean ET across basins. The same products also capture the spatial distribution of the ET patterns well along with ETMonitor. WECANN performs well in both the correlation and Budyko analysis. The high correlation statistics indicate a good spatial distribution of WECANN ET magnitudes but the product seems to show bias in ET estimations. This is contradictory with WECANN ranking high in the Budyko analysis which indicates small differences with ET estimates using the Budyko curve. GLEAM and MOD16 are consistently ranked low in both spatial pattern analysis and in terms of ET magnitude estimation. Therefore, if we answer our question of whether to trust remote sensing estimates of ET across Africa, the answer is not black and white. Yes, in general we can trust some products at least based on the products under evaluation in this study. CMRSET, WaPOR and SSEBop show low biases in estimations 5 and a good spatial distribution of ET patterns. Each of these products have relatively high resolutions and both CMRSET and SSEBop are global products. Depending on the study under question, other products can also be used, however the bias in magnitudes need to be kept in mind. From this analysis at the African scale, there are better products to use than GLEAM and MOD16 which do not perform well in many of the evaluated criteria.
Author contributions. IW and AVG conceived and designed the alternate methodology for evaluation of large scale RS ET products. IW performed the required data analysis using scripts written by IW. IW and AVG prepared the structure of the manuscript. IW wrote the initial draft of the paper. AVG and WB supervised the research and contributed to improving the manuscript prior to submission. MM contributed to improving the manuscript prior to submission. LJ made availabile ETMonitor data that is not openly accessible.
"Data Availability" -data used in this analysis that is openly accessible can be accessed when requested by emailing the first author.