Benchmark products for land evapotranspiration : LandFlux-EVAL multi-dataset synthesis

Institute for Atmospheric and Climate Science, ETH Zurich, Zurich, Switzerland LERMA, Observatoire de Paris, Paris, France LSCE, UMR CEA-CNRS, Gif-sur-Yvette, France George Mason University, Fairfax, Virginia, USA VU University Amsterdam, The Netherlands Jet Propulsion Laboratory, California Institute of Technology, Pasadena, USA Max Planck Institute for Biogeochemistry, Jena, Germany Wageningen University, Wageningen, The Netherlands School of Geographical Sciences, University of Bristol, UK Water Desalination and Reuse Center, King Abdullah University of Science and Technology, Saudi Arabia


Introduction
In recent years, several global multi-year evapotranspiration data sets based on in situ observations or satellite retrievals of different indirect variables have been derived.In Mueller et al. (2011b), an evaluation of their characteristics and agreement within the LandFlux-EVAL (see www. iac.ethz.ch/url/LandFlux-EVAL)initiative over the time period 1989-1995 was presented, while the study of Jimenez et al. (2011) assessed a subset of these data sets over a shorter (3 yr) period but also assessed the radiative and sensible fluxes.These studies considered dedicated data sets that derive ET from combinations of observations or observationsbased estimates together with targeted algorithms (referred to as diagnostic data sets), ET from land surface model (LSM) simulations driven with observations-based forcing as well as ET from atmospheric reanalyses (i.e.computed with LSMs within a global model assimilating mostly atmospheric observations).The general main geographical structures related to the principal climatic regimes are present in all products, but relatively large differences in the absolute values among some of the products were observed (Mueller et al., 2011b;Jimenez et al., 2011).
Even though a large number of ET data sets are currently available and have been analyzed in these studies, a global benchmark for ET is missing.Such a benchmark data set would be useful for several purposes.Land-surface modellers and hydrologists often use ET to validate their model output, because it is one of the main components in the land water and energy budgets as well as a key driver for droughts (e.g.Sheffield et al., 2012;Seneviratne, 2012).Furthermore, agricultural and water-management communities estimate the water needed for irrigation with information on ET.Besides hydrological applications, changes in ET are also relevant for temperature variability and hot extremes (e.g., Seneviratne et al., 2006Seneviratne et al., , 2010;;Hirschi et al., 2011;Mueller and Seneviratne, 2012).Apart from mean ET values, corresponding uncertainty estimates are necessary for all kinds of applications.For these reasons, benchmark synthesis products of ET derived from existing data sets have been developed in the present study with the provision of different estimates of uncertainty.
The previous studies by Mueller et al. (2011b) and Jimenez et al. (2011) focused on spatial patterns of multiyear means and seasonal variations, respectively.However, the behavior of the LandFlux-EVAL data sets with respect to ET trends or multi-annual variations has not yet been investigated.Knowledge of the temporal changes of ET is important since it is a major component of the global water cycle.As the variability of water vapour is negligible at the global scale, the rates of precipitation and evaporation are similar.Within a changing climate, changes in the hydrological cycle are expected, but very difficult to determine.Observations indicate that precipitation over land increased by about 2.4 mm per decade from 1900from to 1988from (Dai et al., 1997, excluding , excluding North Africa in their analysis).Extending the analysis to the entire 20th century indicates a similar large trend (reduced by about 25 %, New et al., 2001).Increases of land evaporation over these periods are therefore expected.While some publications relate this behavior to a possible intensification of the hydrological cycle, this term is not well defined.Wild et al. (2008) explained the acceleration of the hydrological cycle since the mid-1980s with an increase in solar heating (global brightening) and in thermal heating (enhanced greenhouse gas forcing).While evaporation from ocean surfaces is likely to increase with increasing temperature (as warmer air can hold more water vapour), it is unclear whether ET from land surfaces could similarly increase due to possible limitations imposed by soil moisture content and vegetation physiology.
Due to a lack of relevant observations, respective trends of land ET could not be assessed until recently.The studies by Wang et al. (2010b) and Jung et al. (2010) are the first to investigate this issue over a relatively short time span from 1982from to 2002from and 1982from to 2008from , respectively. Wang et al. (2010b) ) found an increase in global land ET of 0.7 mm per yr 2 , using 1120 globally distributed stations (Wang et al., 2010a).Jung et al. (2010) performed a trend analysis based on a global data set empirically derived from in situ measurements of ET from the FLUXNET project and satellite remote sensing and surface meteorological data (Jung et al., 2009, hereafter referred to as MPIBGC data set), but also including eight other data sets.A tendency of increasing ET was found for the years 1982 to 1997, which indicates a possible intensification of the hydrological cycle.However, this trend was found to vanish after 1998.The decline in global land ET trend after 1998 was attributed to a decrease in moisture availability in Southern Hemisphere supply (i.e.water)-limited evaporative regimes, which might indicate that a limit to the temperature-driven acceleration of the hydrological cycle was reached during the 1998-2008 time period.Nonetheless, the article also mentioned that whether this tendency was related to a long-term trend or only decadal variability could not be assessed given the short time period considered (see also Douville et al., 2012).Another study based on satellite retrievals also found that the increasing trend in global land ET disappeared after 2000 (Yao et al., 2012).However, it is important to note that uncertainties in forcing data sets used to derive such ET trends are large and may entail spurious features linked to the use of reanalyses products assimilating non-homogeneous satellite products or variations in the density of stations considered in gridded precipitation products (e.g.Bengtsson et al., 2004;Seneviratne et al., 2004;Lorenz and Kunstmann, 2012;Sheffield et al., 2012).
Besides precipitation and temperature, radiation is an important forcing for ET.Studies on trends in radiative forcing do not agree well.Wild and Liepert (2010) for example noted a brightening since the 1990s, while Romanou et al. (2007) found a slight dimming.An evaluation of radiation data sets can be found in e.g.Troy and Wood (2009) or Jimenez et al. (2011).
The benchmark synthesis products presented in this study are used to assess the interannual variations of ET on the global scale and encompasses the largest number of ET products to date.Besides the evaluation of the temporal variability of the benchmark products and the underlying single data sets, the present study also compares these to precipitation, which is one of the most important drivers of ET, especially in soil-moisture-limited regions (see, e. g., Teuling et al., 2009;Seneviratne et al., 2010).

Merged benchmark synthesis products of evapotranspiration
We present here new multi-year merged synthesis products based on the analyses of existing land ET data sets that were available to us at the time of writing and fully cover one or both of the two studied time periods.A first product spans the time period 1989-1995 and includes 40 data sets, while the second is available for the longer time period 1989-2005 and includes 14 data sets (Table 1, all data set categories).Consistent with a previous analysis (Mueller et al., 2011b), the type of data sets included can be classified as diagnostic data sets, LSMs and reanalyses (see Sect. 1).Besides the two merged synthesis products based on all types of data sets, merged synthesis products from each of the individual data set types are also produced (see Table 1).The output statistics for each of the merged synthesis products are: Mean, median, 25th-percentile, 75th-percentile, interquartile range, standard deviation and minimum and maximum values of the ensemble of underlying data sets.All products are available in monthly and yearly temporal resolution, and as multi-year statistics.All merged synthesis products are made available through the internet (www.iac.ethz.ch/url/LandFlux-EVAL).

Overview of included data sets
An overview of the diagnostic data sets, LSMs and reanalyses considered for the preparation of the merged synthesis products is provided in Table 2.All data sets are available for the time period 1989-1995, while a subset is available over the period 1989-2005 (cross in 5th column in Table 2) and forms the basis of the merged synthesis products over this longer time period (see also Sect. 2.1).The table also lists information on the single data sets, such as the ET schemes, the number of soil layers used in the case of LSMs, the precipitation forcing data sets and other forcing variables used for the derivation of the respective data sets or, in the case of reanalyses, the land-surface schemes.
We considered here several additional data sets compared to the earlier analysis of Mueller et al. (2011b).The additional data sets included in this study are the diagnostic data set GLEAM (Global Land-surface Evaporation: The Amsterdam Methodology, Miralles et al., 2011a), as well as LSM estimates from the Water Model Intercomparison Project WaterMIP (Haddeland et al., 2011).Simulations from the Global Land Data Assimilation System I (GLDAS-I, Rodell et al., 2004) were included in Mueller et al. (2011b) but excluded in the present study because of spurious trends (see Supplement Fig. A1 and Rui, 2011), which arose because the source of forcing data changed several times over the GLDAS-I time period (Matt Rodell, personal communication).However, we included GLDAS-II simulations (see Rui, 2011) from one of these models (NOAH Version 3.3) which was produced recently with a consistent forcing data set (Princeton forcing, see Sheffield et al., 2006).
In GLEAM, the calculation of ET is based on the Priestley-Taylor equation and the Gash analytical model of forest rainfall interception (Miralles et al., 2011b).The model discriminates the different evapotranspiration components, i.e. interception, bare soil evaporation, transpiration and sublimation, and ET is coupled to soil moisture (Miralles et al., 2011a).Note that not all diagnostic estimates separately calculate these components or account for all of them, which leads to large differences especially in the Amazon region.The forcing data for GLEAM were all obtained from remote sensing products and synthesis of rain gauges (CPC, see Appendix A).
All WaterMIP simulations are driven with the same forcing data set (WATCH forcing, see Weedon et al., 2011), but the employed forcing variables and time steps differ.For a list of these variables as well as references for each model, see Haddeland et al. (2011).The differences between the Water-MIP models are large.Some models, for example, solve both the water and the energy balances at the land surface and are classified as (classical) LSMs, while others solve the water balance only and are classified as global hydrological models, GHMs (following the classification proposed by Haddeland et al., 2011, note that for simplicity, we refer to both as LSMs in most of the present article).Further, the Wa-terMIP models vary substantially in their complexity in the representation of ET (e.g.including or excluding interception and transpiration), runoff, groundwater, snow or frozen soil (for more details, see Haddeland et al., 2011).For more Table 2. Overview of ET data sets, including their ET scheme or land-surface schemes (LSS), along with the number of soil layers, precipitation forcing data set and atmospheric forcing variables.Model names with a star are classified as global hydrological models (GHMs, see text).Forcing variables are P : precipitation; T : air temperature; W : wind speed; Q: specific humidity; R: radiation; SP: surface pressure."na" denotes either not applicable or information currently not available.Note that GS-VISA and GS-CLMTOP cannot strictly be classified as aerodynamic approaches, since they include a carbon cycle and photosynthetic control on transpiration.Models with an x are included in the 1989-2005 merged synthesis product.Note that PT-JPL was referred to as UCB in Mueller et al. (2011b).information on all other data sets, the reader is referred to Mueller et al. (2011b) and Jimenez et al. (2011).

Processing of ET data sets and merged synthesis products
In order to prepare the merged synthesis products, we first interpolated all data sets on a common global grid of 1 degree longitude and latitude and aggregated daily values to monthly values where necessary.A spatial matching of the data sets was done, and if one grid point was covered by less than 70 % of the data sets, it was excluded from the fi-nal synthesis product (for the number of data sets originally available, see Supplement Fig. A2).Antarctica is missing in the synthesis product.Some of the data sets exhibit unrealistically large values (especially in the northern latitudes due to the viewing angle of satellites).For the merged synthesis products, we applied a physical constraint to exclude such values.An upper limit to the latent heat flux is given by the energy balance, i.e.ET should not exceed net surface radiation at a scale as large as our grid cells and for monthly values.For each grid point of the merged synthesis products, we calculated long-term monthly maxima of net radiation (from the Surface Radiation Budget (SRB) version 3) based on all available years .Monthly ET values exceeding the long-term monthly maximum net radiation of that month by more than 25 % were excluded, unless ET was smaller than 0.35 mm d −1 (128 mm yr −1 ), since for such small values, ground heat flux cannot be neglected (Bennett et al., 2008).Note, however, that if interception plays an important role, such as during winter time, ET can be several times larger than radiation due to additional energy input through advection.A further possible constraint might be applied from the assumption that ET should not exceed precipitation over a longer time period.However, we did not apply such a constraint because soil moisture depletion might play a role in some regions, and based on a small-scale analysis (such as single pixels), atmospheric water fluxes or runon could provide additional water input for ET.In order to exclude single data set values that were very different from those of the other data sets, we performed a statistical outlier detection after the application of the physical constraint, similar to that described in Weedon (2011), but applied on monthly values.
A movie in the Supplement shows the number of data sets at each grid point and time step after all these steps.Finally, the mean, median, 25th-percentile, 75th-percentile, interquartile range, standard deviation and minimum and maximum statistics of the ensemble of underlying data sets are derived and provided as monthly, yearly and multi-year statistics.

Merged synthesis products
The different merged synthesis products created from single categories only (diagnostic data sets, LSMs and reanalyses) and from all categories (see Table 1) coincide to a large extent in their global land mean ET (Fig. 1), with highest values in the merged product based on reanalyses only (563 mm yr −1 ) and lowest in that based on LSMs only (423 mm yr −1 ).The interquartile ranges (IQRs, 75th-percentile minus 25thpercentile) are largest in the merged products based on diagnostic data sets and reanalyses.The variation of global mean ET for the 1989-2005 (long) as well as for the 1989-1995 (short) merged product created from all data set categories is shown in Fig. 2 (median of yearly values).The long merged product shows slightly higher values.The largest difference in the list of data sets included in the short and long merged synthesis products is the inclusion of 28 LSMs (short) versus only 5 LSMs (long).WaterMIP and GSWP simulations are not available for the long version, and are therefore, due to their consistently low ET values (see Mueller et al., 2011b), the main reason for lower ET in the short product.The small difference in the temporal variations between the short and the long merged synthesis products is an indication that including a large number of dependent data sets (i.e.model simulations driven with the same forcing data, such as GSWP and WaterMIP runs) does not have a strong influence.Global mean ET shows a slight increase between 1989 and 1997 followed by a decrease until 2005 (Fig. 2).The merged synthesis product (long) shows a nearly identical interannual variation as that found in the MPIBGC data set in Jung et al. (2010).However, if we consider this variation in relation to the IQRs or the standard deviations, both shown in Fig. 2, the absolute ET trend change is very small and the interannual variations nearly vanish.
The reason for the large IQRs and standard deviations are the large differences in the absolute ET values of the single data sets.The IQRs and standard deviations based on the yearly anomalies of the underlying data sets (i.e.setting the mean of all data sets to zero before calculating the statistics), which is the quantity shown in Jung et al. (2010), are much smaller (can partly be seen also in Fig. 3).Note also that we consider more estimates than in the previous analyses from Jung et al. (2010).
The ET anomalies from all long merged synthesis products are shown in Fig. 3 (top left).The comparison reveals a very similar temporal evolution of ET in all four merged synthesis products.Therefore, in the remainder of this study, only the merged products based on all data set categories (long and short) will be analyzed.

Single data sets
The temporal variations of the 14 single data sets contributing to the long merged synthesis products are shown in Fig. 3 (top right and bottom panels).In these analyses of single data sets, we excluded unrealistically high ET values, setting a threshold of 4560 mm yr −1 (12.5 mm d −1 ).The LSMs (bottom left) and reanalyses (bottom right) are more consistent amongst one another in their yearly variations than the diagnostic data sets (top right).The ET time series of all LSMs and reanalyses peak between 1997 and 1999.Some of the diagnostic data sets peak in other years, such as 2001 in the case of PRUNI and 2000 in GLEAM and AWB.The trends for the two time periods 1989-1997 and 1998-2005 are listed in Table 3.The merged product as well as 5 single data sets display a significant negative trend (italic font) for  The reason for this could be that we calculate the trends over a shorter time period compared to Jung et al. (2010), who calculated them over 1982-1997 and 1998-2008.

Analyses of climate regions
We analyze here the two merged products (i.e.short and long, based on all data set categories) as well as precipitation data (average of CRU, GPCC, GPCP and CPC, for ref-erences and information on these data sets, see Appendix A and Biemans et al., 2009) in climate regions using the classification of Koeppen-Geiger (data available from http: //koeppen-geiger.vu-wien.ac.at).In order to facilitate the interpretation of the results, subregions have been merged to larger regions.The regions considered are displayed in Fig. 4.
Mean ET and precipitation are listed for the various climate regions in Table 4. Also included are the ET and precipitation trends (Theil-Sen estimator) from 1998-2005, i.e. for the period for which a decline in ET trend was found in Jung et al. (2010, 2009) can be explained by their reliance on reanalysis products, which were found here to display a tendency for high ET val-ues.Note, however, that values from different studies cannot be compared directly due to differences in land areas.The mean value of precipitation (average of CRU, GPCC, GPCP and CPC) amounts to 756 mm yr −1 .The difference between global precipitation and land ET corresponds to the water that leaves the continents as runoff and amounts to 263 mm yr −1 (34 406 km 3 yr −1 ).This value is in good agreement with values from other studies summarized in Syed et al. (2009).
The largest contribution to the global ET trend over 1998-2005, which amounts to −18.9 km 3 yr −2 (−0.14 mm yr −2 ), stems from the equatorial winter dry (Aw), arid desert (BW) and arid steppe (BS) climate regions, even though the latter two are characterized by very low per area values of ET.The study of Jung et al. (2010) showed that the decline in trend change is mainly due to Southern Hemisphere dry regions.We therefore treated the Northern and Southern Hemisphere of these regions (BW and BS) separately.Indeed, we find that even though they belong to the same climate regions, the Southern Hemisphere parts of the arid steppe (BS) and arid desert (BW) regions exhibit a large negative trend, while the Northern Hemisphere parts show very small (and positive) trends.
The signs of the trends in precipitation agree with the signs of the ET trends, except for the polar climate region (E).The opposite trends in the northern and southern hemispheric parts of the BS and BW regions can also be found in the precipitation data sets.Furthermore, the table shows that global ET has decreased much more strongly than global precipitation over the period 1998-2005.

Precipitation forcing
The 1989-1995 global mean land ET of each data set contributing to the synthesis product (short) is plotted against precipitation in Fig. 5.The precipitation value was taken from the forcing data of the respective ET data set as listed in Table 2.If precipitation was not available (for some Table 4. Mean ET of merged synthesis products 1989-2005(long), 1989-1995(short), mean precipitation 1989-2005 (average of CRU, GPCC, GPCP and CPC) and ET andprecipitation trends 1998-2005 in climate regions.Slope of trends (Theil-Sen) and significance (italic font, Mann-Kendal) estimated as for Table 3. diagnostic data sets), the average of four currently available observational data sets (CRU, GPCP, GPCC and CPC) was taken.Global mean values of these four precipitation data sets range from 730 to 803 mm yr −1 .The data set median of the merged synthesis ET product is indicated with a solid line, and the IQRs with dash-dotted lines.The single data sets are indicated with different symbols (groups) and colors (ET schemes).
We first compare simulations from the GSWP and the Wa-terMIP projects, which are each based on common forcing data sets (filled circles and stars/rhombi, respectively).The spread within the GSWP and WaterMIP simulations is similar, both globally and in most climate regions (see Supplement Fig. A3).However, the spread in the WaterMIP ensemble is smaller in some dry regions (Cs, Dw and Df), and larger in wetter regions (all equatorial regions).Looking at the WaterMIP GHMs and LSMs separately, we find that the GHMs (stars) are not clearly distinct from the LSMs (rhombi), which supports the findings from Haddeland et al. (2011), that this classification does not fully account for differences among the WaterMIP models.
In order to compare the influence of uncertainties in precipitation forcing to model structure, sensitivity simulations using the same model (here, the COLA model) and different precipitation forcing have been performed in the framework of GSWP (Schlosser and Gao, 2010) and are included in the Supplement Fig. A3.Evapotranspiration from simulations with differing precipitation (GSWP sens, noted with empty circles) shows a smaller range than from GSWP simulations from different models using the same forcing (filled circles), which has also been shown in Schlosser and Gao (2010).However, note that global mean ET from these sensitivity simulations is relatively low, indicating dry conditions in the COLA model, even if a forcing with high precipitation was employed.This possibly points to a dry bias of the model independently of the applied precipitation forcing, which could be the reason for the separation of this GSWP model in the cluster analyses reported in Mueller et al. (2011b).
The merged synthesis product based on all data sets exhibits an ET value of 550 mm yr −1 .Note that the global mean values in the analyses for Fig. 5 are higher than the ones given in Table 4.The reason is that for the analyses of single data sets, we only included those pixels of the merged product that were also available in all other data sets.Table 4, on the other hand, includes all land pixels (see Supplement Movie for data coverage).
The largest exceedance of precipitation over ET, on average, is found in the wettest climate regimes (Af, Am, Aw, Cw, Cf and Df), as expected.In several dry regions, especially the arid desert (BW) and arid steppe (BS) regions, some data sets reveal an ET exceedance over precipitation (see bisecting line through origin in Supplement Fig. A3).The reasons could be that (1) ET is too high, (2) precipitation is too low, (3) both ET and precipitation are correct, but the net depletion of soil water storage is larger than the volume of runoff generated over the period 1989-1995.
A comparison of the range between the lowest and highest values in precipitation and ET shows that the uncertainties in precipitation are larger than those of the ET data sets.This is not only the case for the global mean values, but also for single climate regions (Supplement Fig. A3).Large uncertainties in precipitation data sets have also recently been highlighted in Lorenz and Kunstmann (2012) -Penman-Monteith -Priestley-Taylor -Aerodynamic -Other Fig. 5. Scatter plot of ET (in mm yr −1 ) from each data set that is included in the short merged product (1989)(1990)(1991)(1992)(1993)(1994)(1995) versus precipitation from the corresponding forcing data set.If no precipitation data is used for the derivation of ET, the average of CRU, GPCP, CPC and GPCC has been used instead (see Table 2).The merged synthesis product's median is indicated with a full line, the IQR with dash-dotted lines.The precipitation value indicated is the average of all data sets.
be that ET estimates are constrained not only by the water, but also by the energy balance.This indicates that the uncertainty range in ET estimates will be difficult to reduce as long as the uncertainties in precipitation and radiation are not reduced.Jimenez et al. (2011), e.g.showed that the spread in net radiation data sets is nearly as large as the one in ET.

Conclusions
The intensity of the hydrological cycle determines the water availability and influences the climate system in various ways.Despite these important implications of possible changes in the hydrological cycle and with that, in ET, a global benchmark ET data set has long been missing.In the framework of the LandFlux-EVAL initiative (www.iac.ethz.ch/url/LandFlux-EVAL), several ET data sets based on observations (diagnostic data sets, LSMs and reanalyses) have been evaluated in previous studies (Mueller et al., 2011b;Jimenez et al., 2011), focusing on multi-annual means and seasonal cycles.The present study further investigates ET data sets.Global merged benchmark synthesis products of ET are derived and trends are analyzed in single LandFlux-EVAL data sets as well as in the merged ET products.
The benchmark synthesis products provide monthly, yearly and multi-year ensemble statistics for the time periods 1989-1995 (short) and 1989-2005 (long), respectively.For the creation of the short benchmark products, 7 diagnostic data sets, 29 LSMs and 4 reanalyses are considered, for the long products 5 diagnostic, 5 LSMs and 4 reanalyses.In order to address several demands on benchmark data sets, we created short and long merged synthesis products based on all data sets as well as based on each category.Monthly radiation is used as a physical constraint on maximum ET, and a statistical outlier detection is applied on the monthly ET estimates.The synthesis products include different statistics of the multi-data set ensemble, such as median, mean, 25th-percentile, 75th-percentile, interquartile range, standard deviation and minimum and maximum values.
Evapotranspiration from the merged benchmark synthesis products shows realistic interannual variations that correspond to those found in a previous study based on a smaller number of ET data sets (Jung et al., 2010).The negative trend in global land ET 1 between 1998-2005 amounts to −18.9 km 3 yr −2 (−0.14 mm yr −2 ).Most of this trend is attributed to the equatorial winter dry, arid desert and arid steppe regions.The latter two regions are determined by low per area ET and precipitation, but cover very large areas of the globe.Dividing these arid desert and steppe regions into Northern and Southern Hemisphere fractions, we find that the negative trend change arises from the southern part only, which is consistent with the results of Jung et al. (2010).However it is important to note that the signal is very small compared to the overall global land ET as well as compared to the uncertainty of absolute ET values (interquartile range or standard deviations of the merged synthesis products).In addition, it is still unclear whether this signal corresponds to a long-term trend or decadal variability.Finally, because of the reliance of all ET data sets on atmospheric input data sets, the influence of spurious trends in these data sets cannot be excluded.
Large uncertainties in absolute values of ET are found, which can partly be related to uncertainties in precipitation.Precipitation is one of the main drivers for ET in waterlimited evaporation regimes and overall in forests where interception can be large.As a consequence, it belongs to one of the main forcing variables for ET used in most diagnostic data sets and LSMs.Indeed, the spread in ET data sets is smaller than the spread in the corresponding precipitation data sets in our global analyses as well as in most climatic regions, which indicates that ET, as expected, is not only constrained by precipitation, but also by other variables such as radiation.In general, the absolute values of precipitation are higher than ET, as expected, globally and in wet climate regions.Global mean ET in the merged synthesis product amounts to 493 mm yr −1 (1.35 mm d −1 ), while precipitation to 756 mm yr −1 (2.07 mm d −1 ) (average of four observations-based data sets).The difference of 263 mm yr −1 (34 406 km 3 yr −1 , runoff) is in agreement with estimates from previous studies (an overview can be found in Syed et al., 2009).In dry regions, ET exceeds precipitation in several data sets.The merged synthesis product's (median) ET is always lower than average precipitation.Another important factor for the estimation of ET is the land cover type, especially in vegetated areas where transpiration is high.Even though the differences in land cover in the individual data sets may account for a large part of the uncertainties in the final merged synthesis products, including this information lies beyond the scope of our study.(1989)(1990)(1991)(1992)(1993)(1994)(1995) as well as the GSWP sensitivity runs versus precipitation from the corresponding forcing data set.If no precipitation data is used for the derivation of ET, the average of CRU, GPCP, CPC and GPCC has been used instead (see Table 2).The merged synthesis product's median is indicated with a full line, the IQR with dash-dotted lines.For abbreviations of climate regions, see Table 4.
In summary, we have presented here the first benchmark synthesis products for monthly, global land ET estimates.A reproduction of a negative trend in global ET during 1998-2005 with these benchmark synthesis products supports previous findings of a declining global ET trend over that period.However, caution is necessary when analyzing trends, because the considered time period is very short for trend analyses, the analyzed ET data sets are not totally independent from each other (e.g.same forcing data, similar methodologies), and agreement between them is not necessarily an indicator of their validity.Furthermore, spurious trends can be introduced through changes in the observing systems for the forcing variables (e.g.precipitation, radiation) of ET.In order to gain more confidence in ET estimates, not only are improvements in model parameterizations necessary, but so is a reduction of uncertainties in precipitation and radiation data in order to better constrain ET.

Appendix A Precipitation data sets
The observation-based precipitation data sets are from the Climate Research Unit (CRU) at the University of East Anglia, the Global Precipitation Climatology Centre (GPCC), the Global Precipitation Climatology Project (GPCP) and the unified gauge-based analysis of global daily precipitation from the climate prediction center (CPC) from the National Oceanic and Atmospheric Administration (NOAA Chen et al., 2008).These data sets are chosen for this investigation because (a) they are mainly based on observations, (b) they cover the period 1989-2005, and (c) they are forcing data sets employed for the diagnostic ET data sets used in this study.
The CRU precipitation data are based on rain gauge data, whose number varies over time between around 5000 and nearly 15 000 stations.The CRU TS3.1 data set covers the period 1901-2009.It has not been corrected for gauge biases, which vary with gauge type and can result in inhomogeneities in the records (New et al., 2000).
The NOAA CPC unified precipitation data set is created from quality-controlled daily precipitation gauge data, taking advantage of the optimal interpolation objective analysis technique (Chen et al., 2008).The retrospective version, covering 1979 to 2005, includes more than 30 000 gauge station data.
The GPCC monitoring product for the period 1986 to present is based on quality-controlled data from 7000 stations, which are interpolated into monthly area averages.This product delivers the in situ component for the satellite (microwave and infrared)-gauge combination GPCP (Huffman et al., 1995;Adler et al., 2003).The GPCP product includes gauge-bias corrections, but due to the limited length of satellite records, inhomogeneities arise (Adler et al., 2003).

Fig. 1 .
Fig. 1.Global mean ET of merged synthesis products based on all data sets, only the diagnostic, only LSMs and only reanalyses.The medians and interquartile ranges of the multi-year values for the short(1989-1995) and long (1989-2005)  merged products are shown.

Fig. 3 .
Fig. 3. Anomaly time series (1989-2005) of the four merged synthesis benchmark products (top left) and the individual diagnostic data sets (top right), LSMs (bottom left) and reanalyses (bottom right) that contribute to the long merged synthesis product.

Fig. A1 .
Fig. A1.Time series 1989-2005 of LSMs.In addition to the LSMs that contribute to the long-merged synthesis product, GLDAS-I simulations from the models CLM, MOSAIC and NOAH are shown.
Fig. A3.Scatter plot for all different climate regions of ET (in mm yr −1 ) from each data set that is included in the short merged products(1989)(1990)(1991)(1992)(1993)(1994)(1995) as well as the GSWP sensitivity runs versus precipitation from the corresponding forcing data set.If no precipitation data is used for the derivation of ET, the average of CRU, GPCP, CPC and GPCC has been used instead (see Table2).The merged synthesis product's median is indicated with a full line, the IQR with dash-dotted lines.For abbreviations of climate regions, see Table4.

Table 1 .
Number and type of data sets included in the 8 different merged synthesis products.
see also previous section).The sum of the Af Am Aw As BWBS Cs Cw Cf Cs Dw Df E

Table 3 .
Dirmeyer et al. (2006)12)wo time periods1989-1997 and  1998-2005of the merged (all) product and the single data sets.The slopes are estimated with the Theil-Sen estimator, which is robust against outliers.Significant values (non parametric Mann-Kendal two-sided test at 90 % level) are printed in italic font.Wang and Dickinson (2012)438 to 548 andDirmeyer et al. (2006)a mean of 497 mm yr −1 for different time periods.The larger values fromTrenberth et al. (