Accurately estimating large-scale evapotranspiration (ET) rates is essential to understanding and predicting global change. Evapotranspiration models
that are applied at a continental scale typically operate on relatively large spatial grids, with the result that the heterogeneity in land surface
properties and processes at smaller spatial scales cannot be explicitly represented. Averaging over this spatial heterogeneity may lead to biased
estimates of energy and water fluxes. Here we estimate how averaging over spatial heterogeneity in precipitation (

Estimates of evapotranspiration (ET) fluxes have significant implications for future temperature predictions. Smaller ET fluxes imply greater sensible
heat fluxes and, therefore, drier and warmer conditions in the context of climate change (Seneviratne et al., 2010). Surface evaporative fluxes (and
thus energy partitioning over land surfaces) are nonlinear functions of available water and energy and thus are coupled to spatially heterogeneous
surface characteristics (e.g., soil type, vegetation, and topography) and meteorological inputs (e.g., radiative flux, wind, and precipitation; Kalma
et al., 2008; Shahraeeni and Or, 2010; Holland et al., 2013). These characteristics are spatially variable on length scales of

Several studies have quantified the effects of land surface heterogeneity on potential evapotranspiration (PET) and latent heat (LH) fluxes and have
found that averaging over land surface heterogeneity can potentially bias ET estimates either positively or negatively. For example, Boone and Wetzel
(1998) studied the effects of soil texture variability within each pixel in the Land–Atmosphere–Cloud Exchange (PLACE) model, which has a spatial
resolution of approximately 100

Heterogeneity biases have also been identified in ET calculation algorithms that use remote sensing data as inputs. McCabe and Wood (2006) found that
remote sensing retrievals of ET are larger than the corresponding in situ flux estimates and characterized the roles of land surface heterogeneity and
remote sensing resolution in the retrieval of evaporative flux. McCabe and Wood (2006) used Landsat (60

Contrary to overestimation bias, many remotely sensed ET estimates that include parameters related to aerodynamic resistance are significantly
affected by heterogeneity, and underestimate ET as the scale increases (Ershadi et al., 2013). Because aerodynamic resistance is significantly
affected by land surface properties (e.g., vegetation height, roughness length, and displacement height), decreases in aerodynamic resistance at
coarser resolutions could lead to smaller estimates of evapotranspiration. Ershadi et al. (2013) showed that input aggregation from 120 to
960

Rouholahnejad Freund and Kirchner (2017) quantified the impact of subgrid heterogeneity on grid-average ET using a simple Budyko curve (Turc, 1954;
Mezentsev, 1955) in which long-term average ET is a nonlinear function of long-term averages of precipitation (

Heterogeneity bias in a hypothetical two-column model in the Budyko framework. The true average ET of the columns (gray circle) lies below the curve and is less than the average ET estimated from the average

The recognition that spatial averaging can potentially lead to biased flux estimates has prompted methods for representing subgrid-scale heterogeneities and processes within large-scale land surface models and ESMs. Accounting for land surface heterogeneity in large-scale ESMs is not merely constrained by limitations in both computational power (Baker et al. 2017) and the availability of high-resolution forcing data but also by the fact that the atmospheric and land surface components of some ESMs operate at different resolutions. There have been several attempts to integrate subgrid heterogeneity in ESMs while keeping the computational costs affordable. In “mosaic” approaches, the model is run separately for each surface type in a grid cell and then the surface-specific fluxes are area-weighted to calculate the grid cell average fluxes (e.g., Avissar and Pielke, 1989; Koster and Suarez, 1992). The “effective parameter” approach (e.g., Wood and Mason, 1991; Mahrt et al., 1992), by contrast, seeks to estimate effective parameter values at the grid cell scale that subsume the effects of subgrid heterogeneity. Estimating these effective parameters can be challenging because the relevant land surface processes typically depend nonlinearly on multiple interacting parameters, and land surface signals at different scales are propagated and diffused differently in the atmosphere. Alternatively, the “correction factor” approach (e.g., Maayar and Chen, 2006) uses subgrid information on spatially heterogeneous land surface processes and properties to estimate multiplicative correction factors for fluxes that are originally calculated from spatially averaged inputs at the grid cell scale. All three approaches try to reduce the heterogeneous problem to a homogeneous one that has equivalent effects on the atmosphere at the grid cell scale.

There is a growing need to understand how subgrid heterogeneity (and the atmosphere's integration of it) affect grid-scale water and energy fluxes
and to develop effective methods to incorporate these effects in ESMs (Clark et al., 2015, Fan et al., 2019). In a previous study, we proposed
a general framework for quantifying systematic biases in ET estimates due to averaging over heterogeneities (Rouholahnejad Freund and Kirchner,
2017). We used the Budyko framework as a simple estimator of ET and demonstrated theoretically how averaging over heterogeneous precipitation and
potential evapotranspiration can lead to systematic overestimation of long-term average ET fluxes from heterogeneous landscapes. In the present study,
we apply this analysis across the globe and highlight the locations where the resulting heterogeneity bias is largest. Our hypotheses, derived from
the Budyko framework as summarized in Eq. (4) below, are that (1) strongly heterogeneous landscapes, such as mountainous terrain, will exhibit greater
heterogeneity bias; (2) this bias will be larger in climates where

Budyko (1974) showed that long-term annual average evapotranspiration is a function of both the supply of water (precipitation) and the evaporative
demand (potential evapotranspiration) under steady-state conditions and in catchments with negligible changes in storage (Eq. 1; Turc, 1954;
Mezentsev, 1955):

Evapotranspiration rates are inherently bounded by energy and water limits. Under arid conditions ET is limited by the available supply of water (the water limit line in Fig. 1b), while under humid conditions ET is limited by atmospheric demand (PET) and converges toward PET (the energy limit line in Fig. 1b). Budyko showed that over a long period and under steady-state conditions, hydrological systems function close to their energy or water limits. These intrinsic water and energy constraints make the Budyko curve downward-curving.

In a heterogeneous landscape, like the simple example of two model columns in Fig. 1a,

In a previous study (Rouholahnejad Freund and Kirchner, 2017) we showed that when nonlinear underlying relationships are used to predict average
behavior from averaged properties; the magnitude of the resulting heterogeneity bias can be estimated from the degree of the curvature in the
underlying function and the range spanned by the individual data being averaged. Here we summarize these findings as building blocks of the current
study. The second-order, second-moment Taylor expansion of the ET function

Across a landscape of similar size to a typical ESM grid cell (1

Global distribution of 1 km resolution annual mean precipitation (

We also calculated the heterogeneity bias using Eq. (4), which describes how the nonlinearity in the governing equation and the heterogeneity in

Figure 3a–d illustrates the variability (quantified by SD) of 1

Global spatial distribution of variability (SD) of 1 km values of

Our results show that the topographic gradient and hence the variability in the aridity index across a given grid scale drives consistent,
predictable patterns of heterogeneity bias in evapotranspiration estimates at that scale. Equation (4) shows that this bias is equally sensitive to
fractional variability in

With an increased availability of spatial data, it is becoming standard practice to assess input data uncertainties and their propagated impacts on water
and energy flux estimates in land surface models. To quantify how choices among alternative input data products could affect the heterogeneity bias in
ET estimates, we calculated the heterogeneity bias at a 1

If we separate the heterogeneity biases shown in Fig. 4 according to Köppen–Geiger climate zones (Peel et al., 2007; Fig. 5a), we see that they
are distinctly higher in particular climate–terrain combinations. Estimated heterogeneity biases are higher in regions with temperate climates and
dry summers (climate zone Cs) and in regions with cold, dry summers (climate zone Ds), most likely due to the sharp spatial gradient in their water
and energy sources for evapotranspiration (Fig. 5b). These areas typically have high topographic relief, combined with seasonal climate. The
heterogeneity effects on ET estimates in these regions are expected to be even larger when a mechanistic model of ET is used. We expect that averaging
over temporal variations of drivers of ET, especially in places with strong seasonality, could substantially bias the ET estimates, but this cannot be
quantified in the Budyko framework due to its underlying steady-state assumptions. Figure 5b also illustrates the relative magnitudes of the
heterogeneity biases obtained with the four pairs of

The distribution of

Equation 4 shows that heterogeneity biases in Budyko estimates of ET are equally sensitive to the same percentage variability in

Because future increases in computing power will lead to ESMs with smaller grid cells, it is useful to ask how changes in grid resolution affect the
heterogeneity biases that we have estimated in this paper. To quantify the heterogeneity bias in ET estimates as a function of grid scale, we repeated
our analysis at various grid resolutions using Switzerland as a test case. We started with high-resolution (500

Heterogeneity bias in ET estimates at various scales across Switzerland, estimated from 500

Because evapotranspiration (ET) processes are inherently bounded by water and energy constraints, over the long term, ET is always a nonlinear
function of available water and PET, whether this function is expressed as a Budyko curve or another ET model. These nonlinearities imply that spatial
heterogeneity will not simply average out in predictions of land surface water and energy fluxes. Overlooking subgrid spatial heterogeneity in
large-scale land surface models could lead to biases in estimating these fluxes. Here we have shown that, across several scales, averaging over
spatially heterogeneous land surface properties and processes leads to biases in evapotranspiration estimates. We examined the global distribution of
this bias, its scale dependence, and its sensitivity to variations in

In this study, we used Budyko curves as simple models of ET, in which long-term average ET rates are functionally related to long-term averages of

According to our analysis, regions within the US that have temperate climates and dry summers exhibit greater heterogeneity bias in ET estimates
(Fig. 5). We show that the estimated heterogeneity bias at each grid scale depends on the variance in the drivers of ET at that scale (Fig. 4) and on
the choice of data sources used to estimate ET. Heterogeneity bias estimates were significantly larger across the contiguous United States when

We also explored how heterogeneity biases and their spatial distribution vary with the scale at which the climatic drivers of ET are averaged. We found that as heterogeneous climatic variables are aggregated to larger scales, the heterogeneity biases in ET estimates become greater on average and extend over larger areas (Fig. 6). At smaller grid scales, estimated heterogeneity biases do not completely disappear but instead become more localized around areas with sharp topographic gradients. Finding an effective scale at which one can average over the heterogeneity of land surface properties and processes has been a long-standing problem in Earth science. Our analysis shows that at smaller resolutions the average heterogeneity bias as seen from the atmosphere becomes smaller, but there is no characteristic scale at which it vanishes entirely (Fig. 6). The magnitude and spatial distribution of this bias depend strongly on the scale of the averaging and degree of the nonlinearity in the underlying processes. The heterogeneity bias concept is general and extendable to any convex or concave function (Rouholahnejad Freund and Kirchner 2017), meaning that in any nonlinear process, averaging over spatial and temporal heterogeneity can potentially lead to bias.

In the analysis presented here, we have assumed a value of 2 for the Budyko parameter

One should keep in mind that the true mechanistic equations that determine point-scale ET as a function of point-scale water availability and PET (if
such data were available) may be much more nonlinear than Budyko's empirical curves because these curves already average over significant spatial and
temporal heterogeneity. Thus, we expect that the real-world effects of subgrid heterogeneity are probably larger than those we have estimated in
Sects. 3 and 4 of this study. In addition, the 1

Budyko curves are empirical relationships that functionally relate evaporation processes to the supply of water and energy under steady-state
conditions in closed catchments with no changes in storage. Our analysis likewise assumes no changes in storage nor any lateral transfer between the
model grid cells, although both lateral transfers and changes in storage may be important, both in the real world and in models. Unlike the Budyko
framework, ET fluxes in most ESMs are often physically based (not merely functions of

The SRTM digital elevation database (Jarvis et al., 2008) can be downloaded from

The supplement related to this article is available online at:

ERF and JWK designed the study. ERF ran the analysis. YF contributed to the design of the study and discussions. ERF and JWK wrote the paper.

The authors declare that they have no conflict of interest.

Elham Rouholahnejad Freund acknowledges support from the Swiss National Science Foundation (SNSF; grant no. P2EZP2_162279). The authors thank Massimiliano Zappa of the Swiss Federal Research Institute WSL for providing the 500

This research has been supported by the Swiss National Science Foundation (grant no. P2EZP2_162279).

This paper was edited by Miriam Coenders-Gerrits and reviewed by Ryan Teuling and two anonymous referees.