Spatial evapotranspiration , rainfall and land use data in water accounting – Part 1 : Review of the accuracy of the remote sensing data

Introduction Conclusions References


Introduction
The demand for fresh water is increasing worldwide due to economic and population growth (Molden et al., 2007;Vörösmarty et al., 2010).Proper planning of such scarce water resources in terms of storage, allocation, return flow and environmental services is vital for optimizing the resource (Chartres and Varma, 2010).There is, however, a lack of fundamental data on vertical and lateral water flows, water stocks, water demand, and water depletion.At the same time, there is a decline in the network density of operational hydrometeorological field stations.The absence of adequate field data sets is an important obstacle for sound, evidence-based water resource management decisions.The consequence of data scarcity is more severe in transboundary river basins where, apart from collection, the accessibility of data is hindered by political issues (Awulachew et al., 2013).
Remotely sensed hydrological data are an attractive alternative to conventional ground data collection methods (Bastiaanssen et al., 2000;Engman and Gurney, 1991;Wagner et al., 2009;Neale and Cosh, 2012).Satellites measure the spatial distribution of hydrological variables indirectly with a high temporal frequency across vast river basins.There are many public data archives where every user can download preprocessed satellite data.Quality flags are often provided, as well as manuals with explanations on how the satellite data have been preprocessed and can be reproduced.These recurrent data sets are highly transparent, politically neutral and consistent across entire river basins, even for large basins Published by Copernicus Publications on behalf of the European Geosciences Union.P. Karimi and W. G. M. Bastiaanssen: Spatial evapotranspiration, rainfall and land use data -Part 1 such as the Nile and the Ganges.While certain satellite data sets have been processed to a first level of reflectance, emittance, and backscatter coefficients, others will even provide second level products that can be directly explored for water resource planning purposes (e.g., land cover, soil moisture, and rainfall).Evapotranspiration (ET) is one of the parameters that often requires additional processing of the spectral data; only a very few public domain data archives provide preprocessed ET data and, in fact, spatial ET modeling is still underdeveloped.Examples of several remotely sensed ET algorithms that could be applied to interpret raw satellite data into spatial layers of ET are well summarized in a recent book edited by Irmak (2012).
Time series of various hydrological variables such as precipitation, evapotranspiration, snow cover, soil moisture, water levels, and aquifer storage can be downloaded from public domain satellite-based data archives.With the right analytical tools and skills, these abundant data sets of hydrological processes can be used to produce information on water resource conditions in river basins.Tools such as Water Accounting Plus (WA+) (Karimi et al., 2013a, b) are expressly designed to exploit remote sensing estimates of hydrological variables.Water accounting is the process of communicating water related information about a geographical domain, such as a river basin or a country, to users such as policy makers, water authorities, basin managers, and public users.Water accounting information can be key to river basin management policy, especially when administrations are reluctant to share their -sometimes imperfect -in situ data with neighboring states and countries.WA+ can facilitate conflict management in internationally shared river basins.In addition to that, hydrological variables derived from remote sensing can also be used for spatially distributed hydrological modeling.Studies by Houser et al. (1998), Schuurmans et al. (2003), and Immerzeel and Droogers (2008) have, for instance, demonstrated that such inputs have improved hydrological model performance for river basins in Australia, the Netherlands and India, respectively.
A major point of criticism that is commonly laid down on remote sensing data has been the lack of accuracy.With the improvement of technology the accuracy has however improved significantly over the last 30 years; yet it is necessary to remain critical.It is important to note that the conventional methods of measuring hydrological processes (e.g., rainfall and discharge) are not flawless either and, thus, the accuracy of both types of measurements needs to be verified.There are also limitations with what conventional measurement methods can offer especially when spatially distributed data is concerned.For instance, the actual ET of river basins can hardly be measured operationally through ground measurements; therefore, the depletion of water remains difficult to estimate and quantify.Thus, ET is often ignored in water accounting frameworks such as the SEEA-Water system proposed by the United Nations Statistics Division (UN, 2007) and the Australian water accounting system (ABS, 2004).
Remote sensing techniques, however, can provide spatially distributed daily estimates of actual ET and this opens new pathways in the accounting of water depletion (Karimi et al., 2013a).
This paper investigates the errors and reliability of remotely sensed ET, rainfall, and land use based on a comprehensive literature review.The choice of the variables that have been investigated in this paper (ET, rainfall, land use/land cover) is based on the common use in hydrological and water resource management studies.Only recent publications on accumulated ET and rainfall for a minimum time period of one growing cycle have been consulted, which implies that some of the well-known reference papers are excluded because they relate to shorter flux observation periods.Elder remote sensing algorithms were also excluded.The companion paper (Karimi et al., 2015) investigates impacts of the errors associated with the satellite measurement for ET, rainfall and land use on the accuracy of WA+ outputs, using a case study from the Awash Basin in Ethiopia.See Appendix D for a glossary of the abbreviations used throughout the paper.
2 Remote sensing data for water accounting (WA+)

Evapotranspiration
Over the past decades several methods and algorithms to estimate actual ET through satellite measurements have been developed.Most of these estimates are based on the surface energy balance equation.The surface energy balance describes the partitioning of natural radiation absorbed at Earth's surface into physical land surface processes.Evapotranspiration is one of these key processes of the energy balance, because latent heat (energy) is required for evaporation to take place.The energy balance at Earth's surface reads where R n is the net radiation, G is the soil heat flux, H is the sensible heat flux, and LE is the latent heat flux.The sensible heat flux H is a function of the temperature difference between the canopy surface and the lower part of the atmosphere, and the soil heat flux G is a similar function related to the temperature difference between the land surface and the top soil.A rise of surface temperature will thus usually increase H and G fluxes.Evaporative cooling will reduce H and G, and result in a lower surface temperature.
The LE is the equivalent energy amount (W m −2 ) of the ET flux (kg m −2 s −1 or mm d −1 ).The net radiation absorbed at the land surface is computed from shortwave and long-wave radiation exchanges.Solar radiation is shortwave and is the most important supplier of energy.More information on the energy balance is provided in general background material such as Brutsaert (1982), Campbell and Norman (1998) or Allen et al. (1998).(Rosema, 1990), SEBAL (Bastiaanssen et al., 1998), TSEB (Norman et al., 1995), SEBS (Su, 2002;Jia et al., 2003), METRIC (Allen et al., 2007), ALEXI (Anderson et al., 1997), and ETWatch (Wu et al., 2012).The differences among these algorithms are often related to the parameterization of H , general model assumptions, and the amount of input data required to operate these models.
Other groups of ET algorithms are based on the vegetation index and its derivatives such as published by Nemani and Running (1989), Guerschman et al. (2009), K. Zhang et al. (2010), Mu et al. (2011), andMiralles et al. (2011).ETLook (Bastiaanssen et al., 2012) is a new ET model that directly computes the surface energy balance using surface soil moisture estimations for the top soil (to feed soil evaporation) and subsoil moisture for the root zone (to feed vegetation transpiration).Soil moisture data can be inferred from thermal measurements (e.g., Scott et al., 2003) or from microwave measurements (e.g., Dunne et al., 2007).Microwave measurements provide a solution for all weather conditions and can be applied at any spatial scale for which moisture data is available.
A different school of remote-sensing-based ET algorithms is built around the derivation of a relative value of ET using trapezoid/triangle methods.Trapezoid/triangle diagrams are constructed from a population of pixel values of surface temperature and vegetation index and used to infer the relative value of ET (e.g., Choudhury, 1995;Moran et al., 1994;Roerink et al., 2000;Wang et al., 2007).In these diagrams, the range of surface temperature values at a given class of vegetation index is the basis for determining relative ET, assuming that the lowest temperature in a certain range of vegetation index represents potential ET.The highest temperature coincides with zero evaporation.The main assumption in triangle/trapezoidal methods is that the variation in vegetation index relation to surface temperature is driven primarily by the variation in soil water content rather than differences in atmospheric conditions.
Merging different global ET products such as MOD16 (Mu et al., 2011) and ERA-Interim (Dee et al., 2011) at global and regional scales into one ET product is another approach that has been used by a group of scientists.This approach mainly uses statistical methods to combine ET products that are based on different methods, algorithms, and origins (e.g., global: Mueller et al., 2013;Africa: Trambauer et al., 2014;US: Velpuri et al., 2013).New ensemble ET prod-ucts on the basis of several open access and global-scale operational ET products from Earth observations are under development, but are not published yet.
Review papers on advanced algorithms for estimating spatial layers of ET have been published by Moran and Jackson (1991), Kustas and Norman (1996), Bastiaanssen (1998), Courault et al. (2005), Glenn et al. (2007), Gowda et al. (2007), Kalma et al. (2008), Verstraeten et al. (2008), andAllen et al. (2011).While these review papers provide a good understanding of the evolution of ET algorithm development, they rarely report the accuracies attainable, especially at a seasonal or longer time frame.

Rainfall
There are different algorithms to infer rainfall from satellite data.The four essentially different technologies are (i) indexing the number and duration of clouds (Barrett, 1988), (ii) accumulated cold cloud temperatures (Dugdale and Milford, 1986), (iii) microwave emissivity (Kummerow et al., 1996), and (iv) radar reflectivity (Austin, 1987).Techniques using microwave wavelength information are promising alternatives for measuring rainfall because of the potential for sensing the raindrops themselves and not a surrogate of rain, such as the cloud type.Microwave radiation with wavelengths in the order of 1 mm-5 cm has a strong interaction with raindrops, since the drop size of rain is comparable to this wavelength.This feature makes them suitable to detect rainfall intensity.Active microwave (radar) measurements of rainfall are based on the Rayleigh scattering caused by the interaction of rain and the radar signal (Cracknell and Hayes, 1991).Spaceborne radar measurements of rain intensity are possible with the precipitation radar (PR) aboard the NASA Tropical Rainfall Measuring Mission (TRMM) and Global Precipitation Mission (GPM) satellites, which assesses the attenuation of the radar signal caused by the rain.The PR has a pixel size of 5 km and can oversee a swath of 220 km.Unfortunately, it is usually necessary to evaluate the rainfall radar reflectivity factor empirically on a region-by-region basis over lengthy periods of time.In other words, rain radar systemsboth ground-based and satellite-based -need calibration for proper rainfall estimates.We will conclude later that most papers investigated in our review process do apply a certain level of field calibration.Several operational rainfall products based on satellite measurements have been created or improved more recently.Among the new ensemble rainfall products is the Climate Hazards Group InfraRed Precipitation Station (CHIRPS) that provides promising results (Funk et al., 2013).

Land use
Whereas land cover describes the physical properties of vegetation (e.g., grass, savannah, forest), land use denotes the usage of that land cover (e.g., pasture, crop farming, soccer field).Maps of land use are fundamental to WA+ be-cause it determines the services and processes in a spatial context.Different types of land use provide benefits and services such as food production (agricultural land), economic production (industrial areas), power generation (reservoirs), environmental ecosystems (wetlands), livelihoods etc., and they have an associated water consumptive use.Land use classification based on the use of water, differs from classical land use land cover maps that focus mainly on the description of woody vegetation such as forests and shrubs for ecological and woodland management purposes.WA+ needs land use maps focused on crop types (e.g., rainfed potatoes, irrigated maize) and the source of water consumed (e.g., surface water and groundwater).Some of the first maps dedicated for agricultural water management were prepared by Thenkabail et al. (2005), Cheema and Bastiaanssen (2010), Yalew et al. (2012) and Kiptala et al. (2013).Furthermore, land use classifications for WA+ at river basin scale require a pixel size of 30-100 m that can be delivered by Landsat-8 and Proba-V satellite data, respectively.It is expected that the arrival of Sentinel-2 data during the course of 2014 with pixel sizes ranging between 10-30 m and a short revisit time of 5 days will greatly enhance development of new land use classifications that are tailored for water use and water accounting.
Land use changes affect the water balance of river basins and thus also the amount of water flowing to downstream areas.Bosch and Hewlett (1982) and Van der Walt et al. (2004) discuss for instance how replacing natural vegetation by exotic forest plantations reduced the stream flow in South Africa.Maes et al. (2009) evaluated the effect of land use changes on ecosystem services and water quantity on basins in Belgium and Australia.The role of land use is thus a crucial component of sound water accounting and water resource management (Molden, 2007).
Land use is usually identified on the basis of spectral reflectance and its change with vegetation phonology.The reflectance in the near and middle infrared part of the electromagnetic spectrum, especially, is often related to certain land use classes.The relationship between reflectance and land use is however not unique, and field inspections are usually needed for better interpretation.Soil type, soil moisture and surface roughness all have an influence on reflectance.The health of the vegetation and factors such as the angle and size of leaves also affect the photosynthetic activity of the plants.There is another land use mapping technology that is entirely based on the difference in time profiles of spectral vegetation indices.Fourier analysis of vegetation index can be used to quantify land use classes and crop types (e.g., Roerink et al., 2003), especially when time profiles are linked to existing cropping calendars.
All the land use classification papers we reviewed report on a confusion matrix that describes the overall classification accuracy by showing how often certain land use classes are confused in the remote sensing analysis with other land use classes.Congalton (1991) and Foody (2002) give a full explanation on errors in land use data.

Accuracy of spatial evapotranspiration data
The lack of validation of spatial layers of ET is one of the drawbacks in defining the reliability of remotely sensed ET products.There are no reliable and low-cost ground-based ET flux measurement techniques, although new inventions are always underway (Euser et al., 2014).It is simply too costly to install instruments that have the capacity to measure ET operationally at various locations dispersed across a river basin.The main methods to measure ET at the field scale include lysimeters, Bowen ratio, eddy covariance systems, surface renewal systems, scintillometers, and classical soil water balancing.Lysimeters can be very accurate for in situ measurements of ET at small scale if they are properly maintained.Bowen ratio and eddy covariance flux towers and surface renewal systems are fairly accurate methods for estimating ET at scales of up to 1 km (Rana and Katerji, 2000), although not free of errors (e.g., Teixeira and Bastiaanssen, 2010;Twine et al., 2000).Scintillometers have the capability to measure fluxes across path lengths of 5-10 km (Hartogensis et al., 2010;Meijninger and de Bruin, 2000).
To deal with the problem of measuring ET fluxes in a composite terrain, large-scale field experiments in the African continent (e.g., Sahel: Goutorbe et al., 1997;southern Africa: Otter et al., 2002), the European continent (e.g., France: Andre et al., 1986;Spain: Bolle et al., 2006), the American continent (e.g., Kansas: Smith et al., 1992;Arizona and Oklahoma: Jackson et al., 1993) and the Asian continent (e.g., China: Wang et al., 1992: Korea: Moon et al., 2003) were set up to measure fluxes simultaneously within a certain geographic region at a number of sites with different land use classes.Several remotely sensed ET algorithms were developed and validated using these data sets.The limitation is however that the duration of these field campaigns was for budgetary reasons restricted to several weeks only.
Validation studies with different ET algorithms using the same spatial ground truth data sets are very interesting.The International Water Management Institute (IWMI) undertook for instance a validation study to determine the accuracy of various ET methods for irrigated cotton and grapes in Turkey (Kite and Droogers, 2000).Although here the period was not sufficiently long to encompass one growing season.The Commonwealth Science and Industrial Research Organisation (CSIRO) in Australia studied the predictions of eight different ET products, at a minimum monthly frequency and at a spatial resolution of at least 5 km, using flux tower observations and watershed data across the entire continent as part of the Water Information Research and Development Alliance (WIRADA) project (Glenn et al., 2011).The studied ET products were based on different methods including largescale water balance modeling, thermal imagery (Mcvicar andJupp, 1999, 2002), spectral imagery (Guerschman et al., 2009), inferred LAI (leaf area index; Y. Zhang et al., 2010), passive microwave (Bastiaanssen et al., 2012), and the global MODIS reflectance-based algorithm (Mu et al., 2007).The results showed that at annual-scale remote-sensing-based ET estimates, barring the global MODIS product that was at the time an unrefined method that needed improvements (Mu et al., 2011), had an acceptable mean absolute percentage error (MAPE) ranging from 0.6 to 18 % with an average MAPE of 6 % (King et al., 2011).Along similar lines, the Council for Scientific and Industrial Research (CSIR) in South Africa conducted a remote sensing study on a smaller scale to investigate the performance of three ET algorithms (Jarmain et al., 2009).
To assess the overall error in accumulated ET products, a comprehensive literature review was conducted and reported errors by various authors were synthesized.All the papers included in the review were published within the past 13 years (hence from the year 2000 onwards) and they cover a range of in situ measurements and remote sensing ET algorithms.The reviewed papers cover a range of remote sensing methods for ET measurements including SEBAL, METRIC, SEBS, TSEB, ALEXI, ET Watch, and SatDAET.In essence, the spatial ET layers reported in these papers were not a priori calibrated and the authors reported on the validation aspect.Since the primary purpose of this study was to quantify errors in accumulated ET, only papers that report errors on ET estimates over a minimum period of one growing cycle which on average is about 5-6 months, hereafter called seasonal ET, were consulted.Papers dealing with ET over shorter periods were thus excluded in our review (e.g., Anderson et al., 2011;Chávez et al., 2008;Gonzalez-Dugo et al., 2009;Mu et al., 2011).This, also, implies that GEWEX (Global Energy and Water Exchanges Project)-related field experiments could not be used because intensive campaigns with multiple flux covered periods of weeks only.The manifold flux campaigns organized by the US Department of Agriculture (Kustas et al., 2006;JORNEX: Rango et al., 1998;SALSA: Chehbouni et al., 1999) also did not meet our criterion.To be able to compare error levels from different studies only papers that report errors in terms of mean error were included in the review.Thus, some of the valuable papers on this topic that use RMSE (root mean square error) to describe errors without including mean error could not be included in the review (e.g., Batra et al., 2006;Cleugh et al., 2007;Guerschman et al., 2009;Venturini et al., 2008).The data sources consulted are summarized in Appendix A. It reflects the accumulated ET conditions encountered in 11 countries.Thirty-one publications met the criteria specified and were analyzed.One publication often contains more data points due to multiple models, multiple years, and multiple areas.Hence, the total number of points was n=46.Considering this number, the probability density function is unlikely to change if other papers -or more papers -were to be considered in the review.
The probability distribution of mean absolute percentage error in remote sensing ET estimates is presented in Fig. 1.The results demonstrate the absolute error of annual or seasonal ET to vary between 1 and 20 %.The average MAPE is 5.4 %, with a standard deviation of 5.0 %.It is evident from Fig. 1 that the distribution is positively skewed.These results are closely in line with findings by King et al. (2011) in Australia, both in terms of average and the range of error in ET estimates.
Many of the publications reported an error of less than 5 %, a remarkable good and unexpected result.In many cases, the authors of the papers were both the developers and the testers of the algorithms, and parameter tuning was possible.The left-hand bar in Fig. 1 is, we believe, a biased view of the reality.For this reason, the data points were fitted by means of a skewed normal distribution so that less weight is given to the class with exceptionally low errors.
There are seven papers that report a mean absolute percentage error of 1 % for the ET of cropland.Without exception, all these papers are based on the Surface Energy Balance Algorithm for Land (SEBAL) and its related algorithm Mapping ET at High Resolution with Internalized Calibration (METRIC).Apparently, these algorithms work well for crops, which was recognized earlier by Bastiaanssen et al. (2009) and Allen et al. (2011).Another interesting observation is that at river basin scale -i.e., the scale where water accounting is done -all papers report a MAPE of less than 5 %.These case studies include the 3 % difference between the measured ET and remotely sensed ET of selected river basins in Sri Lanka (Bastiaanssen and Chandrapala, 2003), 1.7 % difference observed by Singh et al. (2011) for the Midwest in the USA using the METRIC algorithm, 1.8 and 3 % differences observed by Wu et al. (2012) using ET Watch in the Hai Basin of the North China Plain, 5 % difference observed by Bastiaanssen et al. (2002) for the Indus Basin, 1 % difference observed by Evans et al. (2009) for the Murray-Darling Basin, and 0.6, 2.1, 3.9, and 18 % differences for different algorithms observed by King et al. (2011) for the Australian continent.
At the other end of the spectrum, the largest ET deviations were found by Jiang et al. (2009) for alkali scrubs in south Florida.They used the SatDAET algorithm which is an ET estimation method that uses the contextual relationship between remotely sensed surface temperature and vegetation index to calculate evaporative fraction (EF).They compared the estimated ET using SatDAET for both clear and cloudy days with ET from lysimeters and observed a 19 % difference for 1999.
There is no single preferred ET model.The selection of the algorithm depends on the application, the required spatial resolution, the period for which the ET fluxes should be estimated for, the size of the study area, the land use classes present, etc.A useful distinction is to discern global-scale models (few) and local-scale models (many).Also, the level of validation and application of these models widely differ.Whereas certain models are tested with a single experimental flux site, other models have been applied in more than 30 countries.
Considering this positive evaluation, spatial layers of ET should be encouraged for applications in water accounting and hydrological modeling.Except for Jhorar et al. (2011), Winsemius et al. (2008) and Rientjes et al. (2013), this is rarely done because water managers and hydrologists do not accept ET layers as being sufficiently accurate.This new analysis proves that the science of remote sensing in the last 13 years has advanced and that mapping of ET has become more reliable.

Accuracy of spatial rainfall data
A comprehensive literature review -similar to ET -was conducted for remote sensing rainfall products.Twenty-four peer reviewed papers that describe the accuracy of annual and seasonal rainfall from satellites, published over the last 5 years were reviewed (see Appendix B).Sixty-eight data points were reconstructed from these publications.The selected papers used various remote sensing rainfall products including TRMM, PERSIANN, RFE, ERA40, CMORPH, and CMAP.A common problem is the scale mismatch between rain gauges and the area integrated rainfall of one single microwave-based pixel of the satellite image.
Several of these papers compared different rainfall algorithms.Some also used the same field data to verify several rainfall algorithms.For example, Asadullah et al. (2008) compared five satellite-based rainfall estimates (SRFEs) with historical average rainfall data from gauges over the period 1960-1990 in Uganda.The difference between gauged data and SRFEs was found to vary between 2 and 19 %.Products such as CMORPH, TRMM 3B42, TAMSAT, and RFE underestimated rainfall by 2, 8, 12, and 19 %, respectively, while PERSIANN overestimated by 8 %.Stisen and Sanholt (2010) compared three global SRFE products, i.e., CMORPH, TRMM 3B42 and PERSIANN, and two SR-FEs made for Africa, i.e., CPC-FEWS v2 and a locally calibrated product based on TAMSAT data, with the average gauge rainfall in Senegal River basin.They concluded that rainfall estimation methods that are designed for Africa significantly outperform global products.This superior performance is attributed both to the inclusion of local rain gauge data and to the fact that they are made specifically for the atmospheric conditions encountered on the African continent.Of the global products, SRFEs from TRMM were found more accurate, presumably because monthly calibration of the 3B43 product is a default process of the algorithm.The global SRFEs showed an improved performance after bias correction and recalibration.The positive effects of the inclusion of rain gauge data in SRFEs is also reported in the study by Dinku et al. (2011), which compared five SRFEs with rain gauge data in the Blue Nile Basin.Several studies show that local calibration significantly improves the accuracy of satellite-based rainfall estimates: Almazroui et al. (2012) in Saudi Arabia, Cheema and Bastiaanssen (2012) in the Indus Basin, Duan and Bastiaanssen (2013) in the Lake Tana and Caspian Sea regions, and Hunink et al. (2014) in the highelevation Tungurahua Province in the Andes mountain range of Ecuador.
The error probability distribution function curve reconstructed from the a priori calibrated rainfall data set is shown in Fig. 2. The mean absolute percentage error varies between 0 and 65 %, and the average MAPE for calibrated satellite rainfall estimates is 18.5 %.The standard deviation is 15.4 %, with a positive skewness of 0.9.As with the density function for ET, the curve fitting of the distribution was forced with a skewed normal distribution to ensure that less weight is assigned to the class of 0-10 % deviation.This indicates that for the majority of case studies, the error in calibrated rainfall maps is less than 18.5 %.Large error bands were found for all rainfall algorithms, and no particular algorithm performs better in terms of variance.The unresolved problem of the pixel-gauge-scale mismatch is one major source of this problem.The average MAPE is 14, 17, 21, 23, 28, and 29 % for TRMM, ERA40, GPCP 1DD, CMORPH, RFE, and PERSIANN, respectively.These average values represent the average MAPE of each SRFE regardless of the product version.
The interim conclusions are therefore that (i) the processes to derive rainfall from satellite data are more complex than the derivation of ET and (ii) that the performance of existing rainfall products is not satisfactory and requires caution when applied for water accounting and hydrological modeling, despite the fact that most SRFEs have an a priori calibration procedure.More research and development of operational rainfall algorithms using various types of sensors is deemed necessary.

Accuracy of land use land cover maps
The publications listed in Appendix C were reviewed for land use estimations.Sixty-five papers were reviewed.Seventyeight data points were reconstructed from these papers.Rather diverging land use classes and data from 35 different countries were included in this comparative data set.The results are presented in Fig. 3.The shape of the probability density function of error differs from the ones obtained for ET and rainfall: it is tending towards a standardized normal distribution, which implies that the number of very good results and very poor results are similar.Table 2 provides a summary of the statistical results.The mean absolute percentage error, defined as 1 minus overall accuracy, for land use classification is 14.6 %, with a standard deviation of 7.4 % and a skewness of 0.35.
The overall performance is rather good, and this can be partially explained by the fact that high-resolution satellites were often used for the land use and land cover classification.The spectral measurements of Landsat and Aster satellites were especially often applied because they have bands suitable for the detection of a range of land use classes in the near-and middle-infrared part of the spectrum.To investigate the impact of the spatial resolution of the used imagery on the accuracy of the land use product, we divided the data points into two groups based on the reported resolution.The MAPE for land use classifications that are based on high-resolution images, 30 m and less, is 12.9 %, whereas for those that use moderate-and low-resolution images, more than 200 m, the MAPE is 19.8 %.The number of land use classes shows no significant impact on the overall accuracy of the map.The results reveal that the global-scale land cover maps have lower overall accuracy due to their large pixel size.The overall accuracies of global maps varies between 69 and 87 % with an average of 76.4 %, which is equivalent to a MAPE of 13-31 % and average of 23.4 %.This observation shows that global land cover maps should be used with caution in water accounting applications.The overall accuracy in the reviewed papers varies between 68 and 98 %.This is in good agreement with the suggested range of 70-90 % by Bach et al. (2006) in their review paper.The review also revealed that Landsat products, with 42 case studies out of the total 78, are the most commonly used imagery for land use land cover classification purposes.The free access Landsat-8 data may thus set the directions for near future development of land use classifications, especially when being complemented with Sentinel data.The Finer Resolution Observation and Monitoring -Global Land Cover (Gong et al., 2013) is an example of that.
Many land use studies are based on ground truth data sets that are used for controlling or supervising the classification process.The data in Appendix C thus have an element of a priori calibration which increases the overall accuracy.Without ground truthing, the overall calibration can be expected to be lower.Also, it must be noted that only the overall accuracy of the confusion matrix is used.While the overall accuracy might be acceptable, it is likely that the error in certain individual land use classes is significantly different.

Conclusions and way forward
The increasing number of satellite-based measurements of land and water use data are provided by generally accessible data archives, although evapotranspiration data sets are under development.Satellites provide spatial information with a high temporal frequency over wide areas, which make remotely sensed maps of land use and hydrological variables an attractive alternative to conventionally collected data sets.However, the uncertainty about the possible errors in remote sensing estimates has been an ongoing concern among the users of these products.The goal of this study was to investigate the errors and reliability of some of these remotely sensed hydrological variables created by advanced algorithms through an international literature review.Only recent data sets, not older than 13 years, were reviewed.
The main interest of this review was to understand the measure of error in remote sensing data for water accounting.The review focused on ET, precipitation, and land use classifications.A comprehensive literature review was conducted and for each variable several numbers of post-2000 peer-reviewed publications were consulted for reported differences between satellite-based estimates from conventional ground measurements.It is important to note that conventional ground measurements come with their own errors and uncertainty that should ideally be taken in consideration when used for verifying the accuracy of satellite-based estimates.This holds true for ET where the number of operational flux towers is limited, but also for rainfall that has distinct microscale variability and cannot be measured by a single gauge.However, in most documented studies these ground measurements are treated as "the best available estimates" in the absence of reliable information on their accuracy.As such, they are widely used to validate satellite-based data.The probability distribution functions of the mean absolute percentage errors for all three variables were created, and these functions have more value than a single research paper, with a single algorithm applied to a particular location.
The results show that the average MAPE for satellitebased estimates of annual or seasonal ET, rainfall, and land use classification are 5.4, 18.5, and 14.6 %, respectively.The largest error is thus associated with rainfall.Bias correction and local calibration of global and regional rainfall products seem to improve the quality of the data layers.However, more research is needed to improve remotely sensed rainfall estimation algorithms (e.g., CHIRPS), with a focus on downscaling procedures as the standard pixel size is often too large.Radar-based regional precipitation estimates that offer higher spatiotemporal resolutions are promising and need to be utilized further.Also, the attenuation of microwave signals between cellular communication networks can be used for assessing areal averaged rainfall.In addition, given the differences among reported precipitation measurements by different global and regional products for the same pixels, there is a need for a database that offers an ensemble based on a rigorous and statistically sound method.
In contrast to rainfall, the error in satellite-based ET is relatively small, especially at the aggregation level of a river basin.ET is a vital component of hydrological cycle and reliable estimates of local ET are the essential for modeling river basin hydrology accurately.Remotely sensed ET can be used both as input to distributed hydrological models, and as a means to calibrate the simulations, although locally large errors can occur.Nonetheless, despite its existing potential and accuracy, satellite-based ET is underutilized in hydrological studies.Contributing factors are presumably the difficulty to access and acquire reliable ET data through the public domain, and the difficulty to compare it with reliable field data.Thus, future focus should be on development of open access ET databases.Such efforts are now underway by various organizations such as the US Geological Survey, US Department of Agriculture, the Commonwealth Science and Industrial Research Organization of Australia and the Chinese Academy of Sciences.However, these products have not yet been made fully available to the public, albeit first estimates of an ensemble ET product are under development.There is also a need for higher-resolution ET data in terms of both spatial and temporal resolutions.This is a key factor if satellite-based ET data are to be used extensively in water management and hydrological studies.
Land use mapping was one of the earliest ways in which satellite imagery was used to produce environmental information and it is the most widely studied subject employing remote sensing.The quality of the classifications has improved over time by the availability of high-resolution images and local research projects.The low-resolution and operational land classification mapping product is, however, still the standard method.Global high-resolution land use and land cover databases are conceived as the next generation of information systems for WA+ and other applications, and the product created by Tsinghua University is a first example.The land use classifications come with an overall MAPE of 14.6 %, and accuracy of 85 %.This level of accuracy, although acceptable, calls for improvements given the wide use of these maps.Another important issue is the need for a new type of land use mapping dedicated to agricultural and river basin water management issues.This is of essential value when land use maps are used in hydrological and water management-related studies such as water accounting.
As revealed by the results of this review study, there is a great deal of heterogeneity regarding the accuracy and reliability of remote sensing data and methods.Oftentimes the reliability of remote-sensing-based products is rather case and location specific.Future research could, therefore, aim at cross-comparing remote sensing data and methods on ET, rainfall and land use for different regions.Ensemble mean ET products are currently under development.

Figure 1 .
Figure 1.Probability density function of the reported mean absolute percentage error in remotely sensed ET estimates.A season or longer period was considered.

Figure 2 .
Figure 2. Probability density function of the reported mean absolute percentage error in rainfall estimates from remote sensing.A season or longer period is considered.

Figure 3 .
Figure 3. Probability density function of the reported mean absolute percentage error in land use classification using remote sensing.

Bastiaanssen: Spatial evapotranspiration, rainfall and land use data -Part 1 509
Surface temperature is measured routinely by spaceborne radiometers such as the Advanced Very High Resolution Radiometer (AVHRR), Moderate Resolution Imaging Spectrometer (MODIS), Visible Infrared Imager Radiometer Suite (VIIRS), Landsat, Advanced Space borne Thermal Emission and Reflection Radiometer (ASTER), China-Brazil Earth Resources Satellite (CBERS), and the Chinese HJ and Feng Yung satellites.Remotely sensed surface temperature is the major input variable in ET algorithms.Examples of thermal infrared ET algorithms are provided by EARS Hydrol.Earth Syst.Sci., 19, 507-532, 2015 www.hydrol-earth-syst-sci.net/19/507/2015/ P. Karimi and W. G. M.

Table 1 .
Overview of the main existing regional and global-scale satellite-based data sources of rainfall.The column "gauge" indicates whether a calibration against ground data is included.

Table 2 .
Mean deviation of the input variables and the distribution of the error.

Table A1 .
Selected ET validation papers that describe experimental data sets covering a season or longer.

19, 507-532, 2015 518 P. Karimi and W. G. M. Bastiaanssen: Spatial evapotranspiration, rainfall and land use data -Part 1 Appendix B: Literature review on rainfallTable B1 .
Selected validation papers that describe experimental data sets covering a season or longer.

532, 2015 520 P. Karimi and W. G. M. Bastiaanssen: Spatial evapotranspiration, rainfall and land use data -Part 1 Appendix C: Literature review on land use and land coverTable C1 .
Selected validation papers that report on confusion matrices.