Evapotranspiration (ET) influences land–climate interactions, regulates the hydrological cycle, and contributes to the Earth's energy balance. Due to its feedback to large-scale hydrological processes and its impact on atmospheric dynamics, ET is one of the drivers of droughts and heatwaves. Existing land surface models differ substantially, both in their estimates of current ET fluxes and in their projections of how ET will evolve in the future. Any bias in estimated ET fluxes will affect the partitioning between sensible and latent heat and thus alter model predictions of temperature and precipitation. One potential source of bias is the so-called “aggregation bias” that arises whenever nonlinear processes, such as those that regulate ET fluxes, are modeled using averages of heterogeneous inputs. Here we demonstrate a general mathematical approach to quantifying and correcting for this aggregation bias, using the GLEAM land evaporation model as a relatively simple example. We demonstrate that this aggregation bias can lead to substantial overestimates in ET fluxes in a typical large-scale land surface model when sub-grid heterogeneities in land surface properties are averaged out. Using Switzerland as a test case, we examine the scale dependence of this aggregation bias and show that it can lead to an average overestimation of daily ET fluxes by as much as 10 % across the whole country (calculated as the median of the daily bias over the growing season). We show how our approach can be used to identify the dominant drivers of aggregation bias and to estimate sub-grid closure relationships that can correct for aggregation biases in ET estimates, without explicitly representing sub-grid heterogeneities in large-scale land surface models.

Earth's surface and subsurface are characterized by spatial heterogeneity spanning wide ranges of scales, including scales that cannot be explicitly resolved by large-scale Earth system models (ESMs), which are typically run at resolutions of 10–100 km. Averaging over this finer-scale heterogeneity can bias model estimates of water and energy fluxes and hence alter future temperature predictions. Earth system model estimates of global terrestrial evaporation differ substantially from atmospheric reanalyses based on in situ and satellite remote sensing observations (Mueller et al., 2013), but it is unclear how much of these differences could be attributed to errors in capturing sub-grid heterogeneity.

Several recent studies (e.g., Fan et al., 2019; Shrestha et al., 2018) have emphasized the need to account for land surface heterogeneity in large-scale ESMs. Despite recent community efforts in refining ESMs' spatial resolution (Huang et al., 2016; Rauscher et al., 2010; Ringler et al., 2008; Skamarock et al., 2012; Zarzycki et al., 2014), the grid resolution of present-day ESMs is still too coarse to explicitly capture important effects of surface heterogeneity. Whether the solution lies in hyper-resolution large-scale land surface modeling remains an open question, because heterogeneities that are important to land–atmosphere fluxes will not be fully resolved even at scales of 100 m (Beven and Cloke, 2012).

The effects of aggregating over spatial heterogeneity in land surface models have been assessed using several approaches. Most of these approaches compare grid-cell-averaged energy and water fluxes with flux estimates for finer-resolution grids or for grid cells that are subdivided into mosaics of several surface types which separately exchange momentum, energy, and water vapor with the overlying atmosphere (e.g., Giorgi, 1997). Several studies have reported increases in average evapotranspiration (ET) (e.g., Kuo et al., 1999; Boone and Wetzel, 1998; Hong et al., 2009; McCabe and Wood, 2006; El Maayar and Chen, 2006), and at least one has reported decreases in grid-cell-averaged ET (Ershadi et al., 2013), as model grids are coarsened and less spatial heterogeneity is accounted for. Shrestha et al. (2018) studied the effects of horizontal grid resolution on ET partitioning in the TerrSysMP Earth system model and found that the aggregation of topography decreases average slope gradients and obscures small-scale convergence and divergence zones, directly impacting surface and subsurface flow. They observed 5 % and 8 % decreases in the transpiration/evapotranspiration ratio for a dry and a wet year, respectively, when their model grid cells were coarsened from 120 to 960 m. All these studies calculate the effects of land surface heterogeneity on ET fluxes using numerical experiments that refine the model's spatial resolution, either directly or through the use of land surface mosaics.

Quantifying the effect of sub-grid-scale heterogeneity on grid-cell-averaged fluxes is especially important when highly nonlinear processes are involved. Regardless of scale, the main challenge is not to explicitly represent the heterogeneity in all its details, but instead to define an appropriate scale-dependent sub-grid closure relationship that recognizes the important heterogeneities within the grid elements and the nonlinearities in the processes (Beven, 2006). Such a sub-grid closure scheme would capture the effects of sub-grid heterogeneity in large-scale land surface models without forcing them to run at finer spatial resolutions.

We have recently proposed a general theoretical framework, based on Taylor series expansions, that quantifies the aggregation bias that results from averaging over sub-grid heterogeneity when grid-cell-averaged ET is estimated (Rouholahnejad Freund and Kirchner, 2017; Rouholahnejad Freund et al., 2020a). In contrast to the numerical experiments described above, this theoretical framework does not depend on a particular evapotranspiration model or grid scale. Our previous work demonstrated this framework using Budyko curves as a see-through “toy” model, leaving open the question of how strongly ET estimates would be affected by sub-grid heterogeneity in a more typical mechanistic evapotranspiration model. Here we use the mechanistic evapotranspiration model GLEAM (Global Land-surface Evaporation: the Amsterdam Methodology) to quantify how aggregation biases vary across a range of scales, using Switzerland as a case study. We show how our Taylor expansion framework can be used to quantify the sensitivity of ET fluxes to heterogeneity in their individual drivers. We further demonstrate how this framework can be used to estimate correction factors (i.e., sub-grid closure relationships) that account for the effects of sub-grid heterogeneity without explicitly modeling it, and we show how these correction factors can be used to improve grid-scale ET estimates. Because our framework is not model-specific, the analysis presented here could also be applied to many other evapotranspiration algorithms.

Most large-scale land surface models calculate ET as a function of available water and energy at daily time steps. They typically multiply an estimate of potential evapotranspiration (PET) by a conversion factor to calculate actual evapotranspiration. PET is generally understood as the maximum rate of evapotranspiration from a large area (to avoid the effect of local advection) covered completely and uniformly by actively growing vegetation with adequate moisture at all times (Brutsaert, 1984). Models typically estimate PET using the Penman equation (Penman, 1948; intended for open water surfaces), the Penman–Monteith equation (Monteith, 1965; Monteith and Unsworth, 1990; intended for reference crop evapotranspiration by adding atmospheric transport processes and stomatal resistance to Penman's open water evaporation), or the Priestley–Taylor equation (Priestley and Taylor, 1972; intended for open water and water-saturated crops and grasslands). The conversion factor that is used to estimate ET from PET typically depends on plant physiology and on the water that is available for evaporation.

Here, we employ an ET algorithm that is used by several land surface models
(i.e., GLEAM; Miralles et al., 2011; Martens et al., 2017), in which actual ET is
calculated as a fraction of PET. This fraction is expressed as a
multiplicative factor, often called a stress factor, which ranges between 0
and 1 and thus limits ET rates. Under wet conditions, ET can equal PET
(stress factor equals one), while under dry conditions, PET is multiplied by
a stress factor smaller than one depending on the degree of water stress.
This approach is employed by the GLEAM model, among others. GLEAM is a
diagnostic satellite-data-driven method that is used to estimate global land
evaporation fluxes. GLEAM uses the Priestley–Taylor formula and remotely
sensed datasets of radiation and temperature to calculate PET. In GLEAM,
actual ET is calculated by constraining PET estimates by a stress factor
that is based on estimates of root-zone soil moisture. The root-zone soil
moisture is derived from a multilayer water balance module that describes
the infiltration of precipitation through the vertical soil profile. ET
estimates from GLEAM have been applied in many studies (e.g., Miralles et
al., 2013, 2014; Greve et al., 2014; Jasechko et al., 2013). GLEAM operates on daily time steps at 0.25

GLEAM calculates ET as an explicit function of the stress factor and
potential evaporation:

GLEAM uses the Priestley–Taylor approach to calculate PET (Priestley and
Taylor, 1972):

ET is a nonlinear function of its drivers. An intrinsic property of any
nonlinear function is that the average of the function will not equal the
function evaluated at the average inputs (e.g., Rastetter et al., 1992;
Giorgi and Avissar, 1997; Rouholahnejad Freund and Kirchner, 2017). Thus
averaging over sub-grid heterogeneity in ET drivers, as large-scale land
surface models do, would be expected to lead to biased ET estimates, even if
the underlying equations were exactly correct. For an ET function of three
variables, namely

The approach outlined in Eq. (7) is general and could be extended to other land surface modeling schemes. The partial derivatives in Eqs. (8)–(14), of course, are specific to the GLEAM equations; for other models they would differ. More complex land surface model algorithms may not have such simple analytical derivatives; in that case, the derivatives can be evaluated numerically.

Drivers of ET (i.e., soil moisture, net radiation, and temperature) can be highly heterogeneous within the grid cells of typical ESMs. Soil moisture can show pronounced spatial variability, especially in areas where surface roughness, porosity, and permeability vary by orders of magnitude across a variety of length scales (Giorgi and Avissar, 1997). Temperature and incoming radiation vary significantly with season, elevation, altitude, and albedo. Switzerland, for example, shows strong local variations in average annual temperature, soil moisture content, net radiation, and albedo (Fig. 1; albedo values in Fig. S1 in the Supplement).

Spatial distribution of input data for the year 2004 at 500 m
resolution: annual mean

We quantified how averaging over spatial (and temporal) heterogeneities of ET drivers affects estimated ET at several grid scales across Switzerland, as an example case for which high-resolution data are available. Our analysis is based on 500 m input data of temperature (interpolation of MeteoSwiss data after Viviroli et al., 2009), net radiation (Viviroli et al., 2009), and soil moisture (simulations from the hydrological model PREVAH, Brunner et al., 2019; Speich et al., 2015; Orth et al., 2015; Zappa and Gurtz, 2003) at daily time steps for the 2004 growing season. Although our soil moisture data are derived from model simulations whose accuracy is difficult to assess due to the scarcity of real-world soil moisture measurements, for our purposes all that is necessary is that the simulated values exhibit realistically complex spatial variability.

We used the GLEAM equations, as outlined in Sect. 2, to calculate ET for each day at the 500 m resolution of these input data. We use these 500 m ET estimates as virtual “truth” for the purpose of our analysis, because our goal is not to determine whether GLEAM estimates of ET are accurate (compared to direct measurements, for example) but rather to quantify how spatial aggregation affects them.

To quantify how spatial aggregation affects model estimates of ET, we
calculated ET over larger spatial scales in two different ways. We first
calculated the arithmetic average of the 500 m ET estimates over

Figure 2a shows that the ET aggregation bias varies considerably across
Switzerland and also varies considerably with grid scale. The average
aggregation bias is higher at coarser grid scales, averaging 10 % at 2 and
1

Using our 500 m input data, we can test how well Eq. (7) estimates the
difference between the true average ET and the coarse-resolution modeled
ET at each grid scale. We used Eqs. (8)–(14) to calculate the partial
derivatives of the GLEAM equations for each grid cell and time step, using
the grid-cell-averaged values of the input data. We then multiplied these
derivatives by the corresponding variances and covariances among the 500 m
input data to obtain bias estimates via Eq. (15) for each grid cell and time
step:

Figure 2b shows that the aggregation bias estimated by Eq. (15) is generally similar, in both overall magnitude and spatial distribution, to the true aggregation biases calculated by the numerical experiment. This comparison is shown more explicitly in Fig. 3, in which the estimated aggregation bias is compared with the true aggregation bias for each grid cell at each grid scale. Figures 2 and 3 show that Eq. (15) is generally a good predictor of aggregation bias. Both the estimated aggregation biases (Fig. 2) and the true aggregation biases are markedly higher in regions of greater topographic complexity (Fig. S4).

Daily estimated aggregation bias in ET estimates (%, median of
daily biases in April–October 2004) versus daily true aggregation bias in ET estimates (%, median of daily biases in April–October 2004) at several
spatial scales. Estimated aggregation biases are calculated using Eq. (7).
True aggregation biases are calculated as differences between the finer-resolution ET estimates from finer-resolution input data, averaged over several spatial scales (average of functions) and ET values calculated from
average inputs at each spatial scale (function of averages). The coefficients of determination (

The Taylor expansion in Eq. (15) not only allows one to quantify the aggregation bias; it also allows one to quantify the relative importance of the three input variables (net radiation, soil moisture, and temperature) as drivers of that bias. Each of the terms in Eq. (15) combines a variance or covariance that expresses how variable the input data are and a second derivative that expresses how sensitive the average ET is to that variability. Each of these terms – a derivative multiplied by a variance or covariance – has the same units as ET, and thus they can be directly compared to one another.

Table 1 shows each of the aggregation bias terms, calculated over all of
Switzerland for the two arbitrarily chosen days mentioned in Sect. 2.3 (29 May and 18 July 2004). For these two example days, the aggregation bias is clearly dominated by a single term, associated with the variance of soil moisture. The variance in net radiation (

Daily estimated ET rates versus true average ET at each grid cell at several different grid scales (example day, 31 May 2004). The solid red symbols demonstrate the relationships between true average ET calculated using fine-resolution data at each grid cell and modeled grid-cell-averaged ET using grid-cell-averaged inputs in Eq. (8), for each grid cell at several different grid scales (overestimated). For comparison, the open symbols show true average versus average ET estimated by the Taylor expansion approach of Eq. (7), which corrects for sub-grid heterogeneity effects using only grid-cell-averaged estimates of the ET drivers and their small-scale variances and covariances (heterogeneity-corrected ET estimates, corrected).

Relative importance of different ET drivers in aggregation bias estimates (different terms in Eq. 15). Values are calculated for all of Switzerland for the two arbitrarily chosen days (29 May and 18 July 2004). The aggregation bias is dominated by the term associated with the variance of soil moisture for these two example days.

The Taylor expansion framework in Eq. (7) can be used not only to diagnose aggregation bias, but also to estimate sub-grid closure relationships that correct for the effects of small-scale heterogeneity. The variance and covariance terms in Eq. (7) express how sub-grid heterogeneity affects average ET at the grid scale, implying that these aggregation bias estimates could be used to improve grid-scale ET estimates, without explicitly modeling ET at high resolutions. This approach could be particularly useful in land surface algorithms that are part of coarser-resolution Earth system models; in such cases it may be much more efficient to evaluate Eqs. (7)–(14) at the coarse grid resolution than to directly evaluate the underlying ET model, Eq. (6), at high resolution. The Taylor expansion approach could also be attractive where we lack spatially explicit high-resolution maps of the ET drivers, but where their variances and covariances can nonetheless be estimated from other sources (i.e., from the variability of topography, mapped soil units, remote sensing data, etc.).

It is beyond our scope here to construct such variance and covariance estimates, but we can illustrate how they could potentially be used. The solid red symbols in Fig. 4 show the relationships between true average ET and modeled grid-cell-averaged ET, for each grid cell (and one example day, 31 May 2004) at several different grid scales. For comparison, the open grey symbols in Fig. 4 show average ET estimated by the Taylor expansion approach of Eq. (7), which corrects for sub-grid heterogeneity effects using only grid-cell-averaged estimates of the ET drivers and their small-scale variances and covariances.

The heterogeneity-corrected ET estimates shown by the open symbols in Fig. 4
cluster much closer to the

Averaging over spatially heterogeneous ET drivers leads to substantial aggregation biases in ET flux estimates from a typical mechanistic large-scale land surface model. This aggregation bias arises from the inherent nonlinearities in evapotranspiration processes, coupled with the inherent spatial heterogeneity in the driving factors. The joint effects of these nonlinearities and heterogeneities can be estimated using second-order Taylor expansions of the governing equations. Using Switzerland as a test case, we have shown that median aggregation biases of 10 %–35 % are common, even at grid scales substantially smaller than those typically used in land surface models (Fig. 2). These biases can be much larger for individual days (Figs. S2 and S3) and potentially have substantial consequences for water and energy flux estimates in land surface models and consequently for temperature predictions in coupled models. The overestimated evaporative fluxes would lead to overestimated latent heat fluxes and underestimated sensible heat fluxes, and thus potentially to underestimates of expected temperature increases in a changing climate. Unrealistically high evaporation estimates lead to cooler modeled temperatures and wetter modeled climates. Correcting for the aggregation bias in ET fluxes would lead to reduced evaporative cooling and increased atmospheric heating via sensible heat flux.

In coupled Earth system models, ET fluxes influence how surface temperature, net radiation, and soil moisture evolve through time and thus influence future values of ET. The analyses shown in Figs. 2–4 are based on static values for each day and thus do not account for the propagation of aggregation biases forward through time. Estimating the consequences of aggregation biases for dynamic modeling would require fully coupled Earth system model simulations rather than the single ET algorithm analyzed here. In a dynamic model, the Taylor expansion approach can potentially be used to correct for aggregation biases in each time step, using statistical models for the variances and covariances of the ET drivers. Thus, estimating aggregation biases in a dynamic model would not require explicitly simulating sub-grid heterogeneity at every time step. Correcting for aggregation biases at each modeling time step would prevent them from propagating further into future time steps or into the partitioning of future water and energy fluxes at the land surface. The present paper does not illustrate this dynamic correction for aggregation biases but establishes the theoretical framework for it.

The purpose of our analysis was to demonstrate how aggregation bias due to spatial heterogeneity can be quantified (Sect. 2.2 and 2.3), how its dominant drivers can be identified (Sect. 2.4.1), and how its effects can be efficiently corrected for, using sub-grid closure relationships (Sect. 2.4.2). For this demonstration, we chose GLEAM as an illustrative example and Switzerland as a topographically complex case study where high-resolution data on the ET drivers are available. Applications of this approach to more complex land surface models may require calculating the necessary derivatives (see Eq. 7) numerically rather than analytically, and applications where high-resolution data are unavailable may require statistically estimating the variances and covariances among the drivers of ET, based on their relationships with topography, soil types, land cover, etc. Using the approach outlined here, one can account for the effects of sub-grid heterogeneity without explicitly modeling ET at fine spatial resolution, which could be impractical due to computational costs or impossible due to a lack of fine-resolution input data.

In our analysis, spatial heterogeneity in soil moisture emerged as the dominant driver of aggregation bias in ET estimates. Particularly if this result can also be confirmed in other regions and climates, it points to the importance of improving our understanding of spatial patterns of soil moisture and what controls them. The lower topographic curvature of coarsely gridded landscapes can lead models to predict higher soil moisture at coarser grid scales (Kuo et al., 1999); higher soil moisture at larger grid scales would lead to even higher modeled values of ET, beyond the effects of the aggregation biases analyzed here. Soil moisture may also be substantially influenced by lateral subsurface transfers of water, which are ignored in our analysis and are also ignored by many land surface models. Overlooking lateral transfers could potentially bias ET estimates in large-scale land surface models (Fan et al., 2019), but this is beyond the scope of the present study.

Temperature data are an interpolation of MeteoSwiss data after Viviroli et al. (2009) and are originally from archive data of MeteoSwiss ground level monitoring networks. However, the acquired data may not be used for commercial purposes (e.g., by passing on the data to third parties or by
publishing them on the internet). As a consequence, we cannot offer direct
access to the data used in this study. Daily soil moisture saturation, net
radiation, and temperature data over Switzerland at a 500 m resolution for
the year 2004 can be retrieved from EnviDat at

The supplement related to this article is available online at:

ERF and JWK designed the study. ERF ran the analysis. MZ provided the soil moisture simulations and contributed to the discussions. ERF and JWK wrote the paper.

The authors declare that they have no conflict of interest.

We thank Ying Fan Reinfelder for numerous insightful discussions and for helpful comments on the manuscript. Elham Rouholahnejad Freund acknowledges support from the Swiss National Science Foundation (SNSF) under grant no. P2EZP2_162279.

This research has been supported by the Swiss National Science Foundation (grant no. P2EZP2_162279).

This paper was edited by Anke Hildebrandt and reviewed by two anonymous referees.