Sensitivity of potential evaporation estimates to 100 years of climate variability

Hydrological modeling frameworks require an accurate representation of evaporation fluxes for appropriate quantification of, e.g., the water balance, soil moisture budget, recharge and groundwater processes. Many frameworks have used the concept of potential evaporation, often estimated for different vegetation classes by multiplying the evaporation from a reference surface (“reference evaporation”) by crop-specific scaling factors (“crop factors”). Though this two-step potential evaporation approach undoubtedly has practical advantages, the empirical nature of both reference evaporation methods and crop factors limits its usability in extrapolations under non-stationary climatic conditions. In this paper, rather than simply warning about the dangers of extrapolation, we quantify the sensitivity of potential evaporation estimates for different vegetation classes using the two-step approach when calibrated using a non-stationary climate. We used the past century’s time series of observed climate, containing non-stationary signals of multi-decadal atmospheric oscillations, global warming, and global dimming/brightening, to evaluate the sensitivity of potential evaporation estimates to the choice and length of the calibration period. We show that using empirical coefficients outside their calibration range may lead to systematic differences between process-based and empirical reference evaporation methods, and systematic errors in estimated potential evaporation components. Quantification of errors provides a possibility to correct potential evaporation calculations and to rate them for their suitability to model climate conditions that differ significantly from the historical record, so-called no-analog climate conditions.


Introduction
Evaporation from the vegetated surface is the largest loss term in many, if not most, water balance studies on earth.As a consequence, an accurate representation of evaporation fluxes is required for appropriate quantification of surface runoff, the soil moisture budget, transpiration, recharge and groundwater processes (Savenije, 2004).However, despite being a key component of the water balance, evaporation figures are usually associated with large uncertainties, as this term is difficult to measure (Allen et al., 2011) or estimate by modeling (Wallace, 1995).
Research attempting to model the evaporation process has a long history (Shuttleworth, 2007).This research took two parallel tracks, with the meteorological community developing process-based models of surface energy exchange and the hydrological community considering evaporation as a loss term in the catchment water balance (Shuttleworth, 2007).To quantify the evaporation loss term, many hydrological modeling frameworks have used the concept of potential evaporation (Federer et al., 1996;Kay et al., 2013;Zhou et al., 2006), defined as the maximum rate of evaporation from a natural surface where water is not a limiting factor (Shuttleworth, 2007).With the progression from catchment-scale lumped models (such as HBV; Bergström and Forsman, 1973) to distributed models with increasing spatial resolution and spatially resolved data (such as SHE; Abbott et al., 1986), the explicit representation of land surface water budgets also increased (Ehret et al., 2014;Federer et al., 1996).To this end, estimation of evaporation from a variety of land sur-Published by Copernicus Publications on behalf of the European Geosciences Union.faces within the simulated domain is needed (Federer et al., 1996).More models were developed that included vegetation explicitly, commonly by describing the stomatal conductance of the vegetation as a function of environmental drivers (see Shuttleworth, 2007, and references therein).However, until now, these models have rarely been used in practice and merely have a scientific meaning.
With the development of the two-step potential evaporation approach, different equations to simulate reference evaporation have been suggested (Federer et al., 1996;Bormann, 2011;Shuttleworth, 2007) for use in both regional and global hydrological models (e.g., Sperna Weiland et al., 2012;Haddeland et al., 2011).However, due to their empirical nature, these equations are limited in their transferability in both time and space (Feddes and Lenselink, 1994;Wallace, 1995).Since the increasing need for predictions under global change (land use and climate) (Ehret et al., 2014;Coron et al., 2014;Montanari et al., 2013), the empirical nature of most commonly used potential evaporation approaches is a serious drawback (Hurkmans et al., 2009;Wallace, 1995;Shuttleworth, 2007;Witte et al., 2012).Thus, although the two-step approach may be warranted for practical reasons, both the reference evaporation and estimated crop factors include a series of empirical parameters that may affect the validity and general applicability of the estimated potential evaporation for a specific vegetation class.
Since the term "potential evaporation" has been used by the hydrologic community to refer to several different combinations of evaporation components in the past, it is important to re-introduce these definitions and to be very specific about nomenclature in future evaporation research.Total evaporation (E tot ) from a vegetated surface is the sum of three fluxes: transpiration (E t ), soil evaporation (E s ) and evaporation of intercepted water (E i ).E t and E s occur at a potential rate when the availability of water (soil moisture or interception) is not limiting.As we will only focus on potential rates in this paper, all values should be interpreted as potential, unless stated otherwise.Reference evaporation (E ref ) is defined as the rate of evaporation from an extensive surface of green grass, with a uniform height of 0.12 m, a surface resistance of 70 s m −1 , and an albedo of 0.23, actively growing, completely shading the ground and with adequate water (Allen et al., 1998).By definition, E i is not part of reference evaporation, as it is defined for a plant surface that is externally dry (Federer et al., 1996;Allen et al., 1998).Often, the term "reference evapotranspiration" is used instead, which is the sum of transpiration (E t ) and soil evaporation (E s ).By definition (Allen et al., 1998), the reference crop completely shades the ground and, hence, E s will be zero and E ref will equal E t of the reference crop (at least for daily estimates, when the soil heat flux can be assumed zero).This is in agreement with the definition of Penman (1956), who also stated that the oftenused expansion of the term reference "evaporation" to "evapotranspiration" was unnecessary.
E ref is used in the two-step method to estimate the potential evaporation, E p , of a crop or vegetation stand.E p will reduce to the actual evaporation, E a , in the case of water shortage or waterlogging.Here, we focus on the estimation of E p from E ref , by multiplying E ref by a crop factor K (Allen et al., 1998(Allen et al., , 2005;;Feddes, 1987;Penman, 1956).Different applications of crop factors exist.
-K t corrects for potential transpiration of a crop with a dry canopy only; i.e., This corresponds to the basal crop factors defined by Allen (2000), which are equivalent to the approach of Penman (1956).
-K ts corrects for both potential transpiration and potential soil evaporation for a crop with a dry canopy; i.e., This corresponds to the single crop factors defined by Allen et al. (1998).
-K tot corrects for potential total evaporation, i.e., transpiration, soil evaporation, and interception.Using K tot holds for crop factors that have been derived by soil water balance experiments, and especially from sprinkling experiments in the field, where water is applied in such quantities that soil water is not limiting for plant growth (Feddes, 1987).Sprinkling, however, leads to interception.So, crop factors like those of Feddes (1987) implicitly involve E i .Therefore, Feddes (1987) emphasizes that the presented crop factors "are averages taken over a population of "average", "dry", and "wet" years, that will certainly not be homogeneously distributed".The crop factor approach by Feddes (1987) is different from the single crop factor approach of  and Lablans, 1998).This condition is especially relevant for tall forests, which intercept a higher percentage of rainwater, under climatological conditions with significant rainfall (De Bruin and Lablans, 1998).Nevertheless, this crop factor approach is used in practice (Van Roosmalen et al., 2009).The objective of this paper is to assess the sensitivity of potential evaporation estimates for different vegetation classes using the commonly used two-step approach when calibrated based on a non-stationary climate.To this end, we use century-long meteorological observations representing the historic variability in climatic conditions at the De Bilt, the Netherlands, climate monitoring station.The past century's global warming, dimming and brightening periods (Suo et al., 2013;Stanhill, 2007;Wild, 2009;Wild et al., 2005) and their effects on evaporation provide an opportunity to evaluate the robustness of the two-step estimation of potential evaporation for non-stationary conditions.Given the twentieth century climate-induced variability in E ref and the projected increase for the near future, which has no historical analog (Fig. 1), it is of great importance to recognize the limitations of applying empirical coefficients outside their calibration range (i.e., extrapolation).This applies not only to transferring coefficients in space, as between climatic regions (Allen et al., 1998), but also in time.
The twentieth century global surface temperature can be characterized by two major warming periods; the first one from about 1925 to 1945, followed by a period of cooling, and a second starting in about 1975 and continuing to the present (Jones and Moberg, 2003;Yamanouchi, 2011).While the variations in temperature until the 1970s can be related to changes in global radiation, i.e., global dimming and brightening, this relationship no longer holds for the rapid warming since 1975 (Wang and Dickinson, 2013).Empirical equations for reference evaporation that use either radiation or temperature implicitly assume a relationship between the two variables.Given the non-linearity of evaporation components, it is not only questionable whether empirical equations for reference evaporation will be applicable under future climatic conditions (Shaw and Riha, 2011), but also whether they are applicable for the recent past.
Although the limitations of using empirical coefficients to calculate evaporation are generally well known, the potential errors that could be made by using such coefficients in evaporation calculations have, as far as we know, never been quantified.Thus, there is a need to raise the awareness of the uncertainty that may result from applying such an empirical estimation method outside its valid area (site and time specific).In this study, we systematically unravel the use of the two-step approach to simulate potential evaporation and identify and actually quantify systematic errors that may be introduced when empirical coefficients are applied outside their calibration period.Such extrapolations of time-variant model parameters are not only relevant for the calculation of potential evaporation, but also for hydrological modeling in general, thus limiting the temporal robustness of hydrological models (Ehret et al., 2014;Karlsson et al., 2014;Coron et al., 2014;Seibert, 2003).Quantification of errors, as demonstrated in this study, provides the possibility to (i) derive uncertainty ranges for the parameters, (ii) quantify the errors that are introduced by a specific method and set of parameters, and (iii) correct for the errors when they are predictable.

General approach
We use 108 years of meteorological observations to quantify the sensitivity of potential evaporation when calibrated using a non-stationary climate for various natural vegetation classes using the two-step approach.We investigate how empirical E ref methods and empirical K values affect the validity of the estimated potential evaporation for different vegetation classes, by applying empirical coefficients outside their calibration period.We vary the calibration period in both length (2-30 years) and reference period (in 1906-2013).
First (Sect.2.3), we simulate reference evaporation according to the process-based Penman-Monteith equation (E ref_PM ), which is considered the international standard method for estimating reference evaporation (Allen et al., 1998).In addition, we apply four empirical equations that contain constants derived for a calibration period (Fig. 2; Sect.2.3).From these simulations, we identify deviations between each empirical E ref method and the E ref_PM (Fig. 2; Sect.2.3).Secondly (Sect.2.4), we generate time series of the main components of potential evaporation, i.e., synthetic series of E t , E s and E i , for five different vegetation classes, using the SWAP soil-vegetation-atmosphere transfer (SVAT) scheme (Kroes et al., 2009;Van Dam et al., 2008) (Fig. 2; Sect.2.4).SWAP allows users to simulate potential evaporation for different vegetation classes directly (i.e., a one-step approach) by parameterizing the Penman-Monteith equation for each vegetation class implicitly rather than by using crop factors.These synthetic series are considered "observations" throughout the paper for all comparisons with estimates from the two-step approach.
Finally (Sect.2.5), we derive monthly crop factors for each vegetation type (5×) and for each E ref method (5×) based on the synthetic data of E t , E s and E i for a calibration period (e.g., 1906-1935) to simulate crop factor estimation using field measurements (Fig. 2; Sect.2.5).We use different (3×) definitions of crop factors: for transpiration (K t ), for transpiration plus soil evaporation (K ts ) and for total evaporation (K tot ).Next, we apply the two-step approach, using E ref and crop factors from the calibration period, to calculate daily "predicted" evaporation components (3×) for each veg- etation class (5×) and each E ref method (5×) for the entire period (1906-2013) (Fig. 2; Sect.2.6).Doing so, the empirical E ref methods and crop factors are applied outside their calibration range.From these simulations, we quantify the deviations introduced by the use of E ref and K, by comparing the evaporation components obtained with the two-step approach to the synthetic "observations" (Fig. 2; Sect.2.6).Each of these steps, which are executed for all calibration periods during the period 1906-2013 (2697×), are described in greater detail in subsequent sections.
Although SWAP may be expected to provide adequate evaporation values, its absolute accuracy is not discussed in this paper, because we focus on the sensitivity of the two-step approach using synthetic (hypothetical) data only.Therefore, the actual accuracy of SWAP is irrelevant for this paper.To ensure that our analysis is not biased by a specific choice of SWAP parameter settings, we considered different vegetation classes ranging from grasses to shrubs and forests.Doing so, we include different parameter sets that affect the variability and relative proportions of the calculated E components.For a detailed discussion of the SWAP model and its accuracy, please refer to Kroes et al. (2009) and Van Dam et al. (2008).By comparing potential evaporation components obtained from the two-step approach with the synthetic "observations" as simulated using the physical SWAP model, we are able to quantify the deviations introduced by using different E ref methods in combination with crop factors, as no other source of uncertainty is involved.1998) and corrected for systematic differences between measurement periods.Figure 3 shows the annual values and the 30-year moving averages of the variables used to calculate evaporation from De Bilt.

Meteorological data
Although the results are only valid for the site and period for which they were developed, the times series of radiation for De Bilt station resembles the global trends of global dimming/brightening.Values of global radiation (R s ) from De Bilt show a similar trend to the observations for Stockholm, as presented in Wild (2009).The data (Fig. 3) show an increase in temperature consistent with previous studies  Blaney and Criddle (1950) as in E = 1 λ c + d p 0.46T + 8.13 Allen and Pruitt (1986) Where Long time series of meteorological observations will, to some extent, not be homogeneous, for example due to changes in measurement devices over time.However, this does not affect the calculations herein, as the aim is to investigate the sensitivity of the two-step potential evaporation methodology to non-stationary climate, rather than to produce an exact reconstruction of the last century's climate conditions.In this way, changes in measurement accuracy with time simply represent another non-stationary trend in this data set.

Reference evaporation
Several methods are available for calculating reference evaporation, differing in complexity and empiricism (Sperna Weiland et al., 2012;Bormann, 2011;Federer et al., 1996).Here, we analyze five of these methods, given in Table 1: the physically based Penman-Monteith equation (PM), the radiationbased methods of Makkink (Mak) and Priestley-Taylor (PT), and the temperature-based methods of Hargreaves (Har) and Blaney-Criddle (BC).
The FAO-56 method (Allen et al., 1998), using PM parameterized for reference grass, is recommended as the international standard for calculation of E ref .Given the physical basis of PM, it can be used globally, without the need to estimate or calibrate its parameters (Droogers and Allen, 2002).In contrast, the methods of Mak, PT, Har, and BC contain empirical coefficients, derived for specific meteorologi-cal conditions and sites.Following Farmer et al. (2011) 1) by least squares regression against the simulated daily E ref_PM , for a specific calibration period.Subsequently, daily values of E ref are calculated for each method during the full period, i.e., 1906-2013, and deviations between the empirical E ref methods and E ref_PM are calculated.The sensitivity of E ref to the choice of calibration period is evaluated for each of the methods using E ref_PM as a basis.

Synthetic evaporation series
Synthetic time series of the three evaporation components are derived to systematically unravel the use of empirical crop factors.The synthetic time series are based on the SWAP physical model (Van Dam et al., 2008;Kroes et al., 2009) from which E t , E s and E i can be simulated separately.From these simulations, we derive monthly K values for each E ref method (5×) and vegetation class (5×) (Fig. 2; Sect.2.5), which are subsequently used to derive the corresponding potential evaporation components (5×5×3) using the two-step approach (Fig. 2  SWAP simulates the potential evaporation components of a crop or vegetation class based on the aerodynamic resistance, height, leaf area index (LAI), and albedo.SWAP uses the Penman-Monteith equation, parameterized for each vegetation class to simulate E t (potential transpiration) and E s (potential soil evaporation).In the case of intercepted precipitation, the values of E t and E s are reduced (Van Dam et al., 2008).Interception, which partly evaporates (E i ) and partly drips to the ground, is estimated following Von Hoyningen-Hüne (1983) and Braden (1985) for short vegetation and Gash et al. (1995) for forests.For an extended description of SWAP and the procedures for calculating E t , E s and E i , we refer the reader to Kroes et al. (2009) and Van Dam et al. (2008).Given the international recognition of the SWAP model and successful testing, we assume that the model is able to produce representative synthetic estimates of each evaporation component.
As K t and K ts are defined for a vegetated surface with a dry canopy (i.e., without interception) and K tot includes interception (see Sect. 1), two different SWAP runs are performed for each vegetation class, without and with interception.Throughout the paper, E t and E s are valid for conditions with a dry canopy, whereas E tot includes interception and its limiting effect on transpiration and soil evaporation.

Derivation of K t , K ts and K tot
We derive K t , K ts and K tot for each vegetation class (5×) and E ref method (5×) based on the synthetic E t , E s and E tot time series, and the equations given in Table 2. Similar to the calibration of E ref methods, K values are derived for a specific calibration period (e.g., 1906-1935).K values for each vegetation class and E ref method are derived as monthly averages over the calibration period., 1906-2013.This procedure corresponds to what is commonly done using the two-step approach, where the empirical parameters of an E ref method are fixed for the region in question, along with the corresponding K values.Here, we determine the deviation that is potentially introduced when this approach is applied outside its calibration range (period and region/site) in a changing environment, by comparing E t (E ref , K t ), E t & E s (E ref , K ts ), and E tot (E ref , K tot ) obtained by the two-step approach with the synthetic "observed" E t , E t & E s , and E tot series.

Calibration period and reference evaporation
Figure 4 shows the 30-year backwards-looking moving average E ref according to PM, Mak, PT, Har and BC, with the four latter models calibrated to fit the simulated E ref_PM for the first 30-year period, i.e., the calibration period 1906-1935.The minor differences seen between all 30-year mean E ref values during the calibration period (Fig. 4a, year 1935) indicate that each method was calibrated successfully.Using the calibrated equations, E ref 's are calculated for the period 1906-2013, i.e., also outside the calibration period.All empirical models are evaluated with respect to the physically based E ref_PM , which was also used when calibrating the empirical coefficients.The radiation-based methods, Mak and PT, deviate only slightly from PM on average, with no consistent bias (Fig. 4d), which can be explained by the relatively strong correlations between the trend in 30-year average E ref_PM (Fig. 4a) and R s and R n (Fig. 4f and g); Pearson's r equals 0.85 and 0.70 for E ref_PM vs. R s and E ref_PM vs. R n , respectively.The temperature-based methods, Har and BC, deviate systematically from PM, which can be explained by the relatively weak correlation between the trend in 30-year average E ref_PM (Fig. 4a) and T (Fig. 3b); Pearson's r equals 0.17 for E ref_PM vs. T .This shows that, as the energy used for evaporation mainly comes from direct solar radiation and, to a lesser extent, from air temperature, temperature-based models fail if radiation and temperature trends are weakly correlated (see Fig. 3; Pearson's r for 30year average R s vs. T equals 0.22).Additionally, Har and BC deviate in different directions (Fig. 4b and d): Har consistently underestimates E ref , whereas BC consistently overestimates E ref .This opposite trend is related to the decreasing trend in T max − T min (used in Har), while T increases (Fig. 3).All four empirical models are unable to reproduce the extreme high evaporation values predicted by PM, especially Har and BC (Fig. 4c).The deviations from E ref_PM are considerably larger for individual years (Fig. 4d) than for the 30-year moving average (Fig. 4b).
In practice, 30-year observed time series of evaporation are rarely available for calibration.Therefore, Fig. 5 shows the effect of calibration period length on estimates of E ref for the current climate .This effect is expressed as the maximum absolute deviation of the 30-year average with respect to E ref_PM .Figure 5 was compiled by first calibrat-  It should be noted that only deviations in 30-year averages are shown for varying calibration period lengths; deviations in the underlying yearly values are larger, as indicated by Fig. 4d.Additionally, the amplitude of the deviations shown in Fig. 4b and d would increase when calibrated using periods shorter than 30 years (Fig. 5).

Crop factors and potential evaporation components
Figure 6 gives monthly average synthetic evaporation components E t , E s and E tot , which were used to derive monthly crop factors (three methods: Table 2) for five vegetation classes and five E ref methods (Table 1), i.e., 3×5×5 crop factors for each calibration period.In contrast to the reference grass surface, the grassland of Fig. 6 does not fully cover the soil, which results in higher E s and lower E t .Fig-ure 7 shows simulated E t (the two-step approach) for the period 1906-2013, using empirical coefficients for each E ref method and matching K t values, all calibrated on the period 1906-1935.The general patterns in E t correspond to those of E ref (Fig. 4b), meaning that the deviations introduced by the two-step approach are mainly determined by the empirical coefficients in the E ref methods.
The deviation introduced for E t derived from E ref_PM and K t_PM is relatively minor compared to what is found for the empirical E ref methods, especially for short vegetation.Apparently, E ref_PM follows the trend in E t (also obtained using the Penman-Monteith equation, but parameterized for each vegetation class, Sect.2.4), and the ratio of E t and E ref_PM , used to estimate K t_PM , changes little with time for short vegetation.More significant effects of K t_PM are seen for taller vegetation, as climate-induced temporal changes in E ref_PM show a height-dependent non-linear relation to changes in E t (Allen et al., 1998).Therefore, the deviation introduced when using E ref_PM is larger for forests than for the short vegetation classes (Fig. 7).Similar to what is seen in Fig. 4, the deviations for individual years can be considerably larger than the climatic averages.
Figure 8 shows the sensitivity of crop factors, K, with respect to the calibration period length for heather and spruce forest.The variation in K decreases with increasing calibration length for all methods, but, except for E ref_Mak , the variability of K values for the empirical E ref methods is larger than for E ref_PM .These differences are especially notable for forests and illustrate that a poor relationship between the E ref method and the synthetic potential evaporation component (Sect.2.4) is compensated for by K values that thus show a larger variation over time.Remarkable is the low variability in K tot values for heather (Fig. 8c), which indicates that the variability seen for K t (Fig. 8a) is reduced by interception.However, for spruce forest, for which interception is much more dominant, interception increases the variability in K tot .
From Fig. 8a and d, it can be concluded that the deviations shown in Fig. 7 will increase when shorter calibration periods are used, irrespective of the applied E ref method.Figure 9 shows the effect of period (years and length) on the maximum absolute deviation made by the two-step approach for each E ref method and for K t , K ts and K tot .Figure 9 confirms that deviations in climatic average evaporation components obtained by applying the two-step approach will generally increase when shorter calibration periods are used.Additionally, Fig. 9 illustrates that deviations are (i) larger for tall vegetation than for short vegetation and (ii) larger for K tot than for K t and K ts for vegetation classes with high inter-ception, as is the case for spruce forest.The large deviations for E tot for spruce forest confirm the remark by De Bruin and Lablans (1998) that, for wet forest evaporation, the crop factor approach will not be sufficient.Nevertheless, when derived for a sufficiently long time series, the deviations level out and there is no detectable bias.

Propagation of dimming/brightening periods
In contrast to Fig. 9, which only shows the maximum absolute deviations for the 30-year average potential evaporation components for the years 1984-2013 as a function of the calibration period length, Fig. 10 includes the results of all underlying deviations for heather and spruce using E ref_PT .Figure 10 demonstrates that climate variability induces systematic overestimation or underestimation of the calculated potential evaporation components, depending on the calibration period used.The sign of the error strongly varies with the calibration period, and the inclusion of a single anomalous year can change the sign of the error.Figure 10 further shows that anomalous years or multiannual climate patterns tend to propagate considerable errors outside the calibration period to the current climate .The patterns of deviations from the synthetic "observations" show similarities to the global dimming and brightening periods (see Sect. 1): the first warming period (about 1925-1945) causes a systematic overestimation up to calibration lengths of 30 years, although specific calibration years may result in an underestimation for shorter calibration lengths.The succeeding period of cooling leads to a systematic underestimation, while the second warming period (starting around 1975) results in a more variable pattern.The latter may be linked to the finding of Wang and Dickinson (2013) that, in contrast to the years until the 1970s, there is no significant relationship between variations in temperature and global radiation in following years.
The patterns are comparable for E t , E t & E s , and E tot , based on K t , K ts , and K tot , respectively, for short vegetation classes.However, for tall vegetation classes with high interception capacity, e.g., spruce (Fig. 10f), using K tot results in a noisier pattern due to specific years of high precipitation.Additionally, including interception may shift the sign of the error.

Temporal robustness in hydrological modeling
In this paper, we systematically unraveled and quantified how empirical coefficients in the two-step approach affect estimates of potential evaporation.We used the past century's time series of observed climate containing nonstationary signals of multi-decadal atmospheric oscillations, global warming, and global dimming/brightening (Suo et al., 2013;Stanhill, 2007;Wild, 2009;Wild et al., 2005) to evaluate the sensitivity of the two-step approach to both the length of the reference calibration period and the reference years.To this end, we calibrated the empirical coefficients of the twostep approach based on different periods and showed that using the thus obtained empirical coefficients outside their calibration range may lead to systematic differences between E ref methods, and to systematic errors in estimated potential E components.The signs of the errors for calculated climatic average evaporation components differ, depending on the E ref method used, and on the specific period (length and years) of calibration.Hooghart and Lablans (1988) stated that, for the two-step approach, the correctness of empirical coefficients for the estimation of E ref is of minor importance, as these are compensated for by K.However, here we have shown that while this may be true within the calibration period, this statement does not hold when extrapolating.As potential evaporation is a key input in hydrological models, input errors will propagate to estimates of related processes, such as the soil moisture budget, droughts, recharge and groundwater processes.These results are important because the two-step approach, including extrapolating empirical coefficients, is frequently applied in hydrological modeling studies, as mentioned in the introduction.Ehret et al. (2014) state that "in hydrological modeling, it is often conveniently assumed that the variables presenting climate vary in time while the general model structure and model parameters representing catchment characteristics remain time-invariant."There is a clear parallel of this statement with the approach presented herein where meteorological conditions vary in time, while climatedependent (empirical) parameters are often fixed values.
For the Netherlands, published crop factors (K tot for E ref_Mak with coefficients C 1 = 0.65 and C 0 = 0) are 1.0, 0.8, 1.1, 1.2 and 1.3 for grass, heather, deciduous forest, pine forest and spruce forest, respectively (Spieksma et al., 1996).Values from our study for, e.g., the mean 30-year K tot values, were 0.8, 1.03, 1.02, 1.07 and 1.25, respectively.Problems arise in the applicability of the published and frequently re-used crop factors, as the climatic conditions used for fitting are rarely documented.The analysis herein provides insight into the uncertainty ranges that could be expected using published empirical coefficients (Fig. 8) and their potential impact on simulated potential evaporation components.
This study has shown that potential evaporation estimates are most accurate and stable with a long calibration period.However, even when using a long observed record, estimates may include errors due to the assumption of constant empirical coefficients in a non-stationary climate; i.e., the calibration period is not representative of current conditions.Evaporation estimates outside the calibration period are even more susceptible to non-stationarity when the calibration period is relatively short, as with areas where observed evaporation data are sparse.Finally, estimating evaporation based on published typical values without calibration is most susceptible to errors, as these parameters are typically global averages but also contain the non-stationary reference period issues identified in this paper.To remove bias by systematic input errors, as in, e.g., evaporation, it is common practice to tune models by calibration (Ehret et al., 2014, and references therein).Although model calibration may compensate for biased input data, resulting in more accurate results and comparable model efficiencies, such calibration limits the general applicability of models when the bias is not constant over time (Andréassian et al., 2004).Figures 4 and 7 show that such non-constant bias occurs for both E ref and potential E estimates, thus limiting their application outside the calibration range.
Although extrapolations to future periods will always include uncertainty, it is important to quantify and limit this uncertainty.This analysis provides such a quantification, identifying the sensitivity of evaporation estimates to extrapolation and representing information on ways to reduce potential errors; e.g., Fig. 7 quantifies the error that is made in extrapolations.A similar modeling approach as presented here could be used to identify climate-induced changes in potential evaporation components.Moreover, such information can be used to reduce potential errors, if the errors can be explained from, e.g., differences in climatic conditions between the periods of calibration and application.We did so for the errors identified for potential evaporation in Fig. 7, and found that for all E ref methods except for Har, and for all vegetation classes, the error correlates well with differences in relative humidity (R 2 > 0.78) (Fig. 3).This makes the errors predictable, and provides opportunities to correct for them.
Although we advocate using process-based evaporation simulations where possible, it should be emphasized that the two-step approach still can be a valuable concept, especially in regions with limited data availability.However, some considerations may strengthen the robustness of the two-step approach.First, our results show that applying radiation-based methods is preferred over temperature-based methods.Second, ideally, independent of the type of empirical method used, the coefficients should be recalibrated against measurements.Third, as such recalibration will practically often not be feasible, we advocate identifying changes in climatic conditions for the period of application and the calibration period, and quantifying, using a sensitivity analysis, how they may impact potential evaporation estimates.This provides uncertainty ranges that advance the interpretation of modeling exercises.

Implications for climate change impact studies
Poor transferability of parameter estimates made during calibration can have potentially large impacts for studies under non-stationary conditions (Coron et al., 2014), e.g., for climate change impact studies (Bormann, 2011;Karlsson et al., 2014).To improve the temporal robustness of hydrological modeling, Coron et al. (2014) propose, while adding it to the framework of the new "Panta Rhei" IAHS scientific decade (Montanari et al., 2013), to especially advance our abilities to estimate temporal variations in evaporation fluxes.This study contributes to this larger objective.
For climate change impact studies, applications of empirical models are particularly problematic, as empirical methods closely approximate observations of natural processes, but do not capture the underlying physics.When extrapolating to new climate regimes, these assumptions are not guaranteed to remain valid (Kay and Davies, 2008;Bormann, 2011;Arnell, 1999).Similar to our findings, simulating historic non-stationary climatic conditions, Kay and Davies (2008) demonstrate that E ref_PM and temperaturebased E ref methods give different projected evaporation estimates when applied to future climate model data.Additionally, Haddeland et al. (2011) show, using the WATCH climate forcing data (Weedon et al., 2011), that global hydrological models that differ in their choice of evaporation schemes show significantly different evaporation estimates.These large discrepancies in an important part of the water cycle may have a large effect on the modeled hydrological impacts of climate change and increase the uncertainty of impact estimates (Bormann, 2011;Kay and Davies, 2008;Haddeland et al., 2011).
To show the implications of using different empirical E ref methods in hydrological applications under recent climate change, without the need for numerous extensive model runs, we calculated the Standardized Precipitation and Evaporation Index (SPEI) for the period 1906-2013, with the empirical coefficients calibrated for the 30-year period 1906-1935.The SPEI (Beguería et al., 2014;Vicente-Serrano et al., 2010) is a commonly used meteorological drought index, which is a variant of the WMO-recommended Standardized Precipitation Index (SPI) (Guttman, 1999;Hayes et al., 2011;McKee et al., 1993).Unlike the SPI, which calculates precipitation accumulated over a period and then normalizes the accumulated value based on typical seasonal conditions, the SPEI instead normalizes the accumulated difference of the climatic water balance, defined as the difference between precipitation and E ref .This produces a time series of normalized values, such that an SPEI of 0 refers to typical conditions, an SPEI of −1 refers to a condition where (P − E ref ) is 1 standard deviation drier than typical, and vice versa for +1.For this example, the SPEI6 was calculated, normalizing the climatic water balance summed over the preceding 6 months, following the fitting procedures outlined in Stagge et al. (2015) and Gudmundsson and Stagge (2014).
Figure 11 shows the results of this analysis, with the assumed accurate SPEI6, based on E ref_PM , shown at the top, and the difference between this and SPEI6 for all other empirical reference evaporation models shown below.As with the results of E ref simulations, the Mak and PT models are closest to the observed signal (differences in the range from −0.2 to 0.2), while the Har and BC models produce greater variability ( SPEI6 = −0.5 to 0.5).Differences in this magnitude can make a large difference when interpreting drought risk.For example, the year 1947 produced a severe drought at the De Bilt site (SPEI6 = −2.2);however, all other methods underestimate E ref , producing SPEI6 values between −1.5 and −1.9.This in turn changes the interpretation of this drought from an event expected to occur once every 72 years to an event expected to occur once every 15-35 years.This is a significant difference in risk level that can be attributed to differences among the evaporation methods and a potentially non-representative calibration period.SPEI sensitivity to the E ref method is analyzed in greater detail in Stagge et al. (2014).

Conclusions
In this study, we thoroughly analyzed the robustness of the two-step approach to simulate potential evaporation.We quantified the magnitude of the systematic errors that may be introduced when empirical coefficients are applied outside their calibration period, depending on differences in climate, the period, and the length of the calibration period.Our hydrological models contain, to varying extents, empirical equations and coefficients, which limits their general appli-cability, and the estimation of potential evaporation is closely linked to climate variability.With our analysis, we want to raise awareness and provide a quantification of possible systematic errors that may be introduced in estimates of potential evaporation and in hydrological modeling studies due to straightforward application of (i) the common two-step approach for potential evaporation specifically, and (ii) fixed instead of time-variant model parameters in general.
The Supplement related to this article is available online at doi:10.5194/hess-19-997-2015-supplement.

Figure 1 .
Figure 1.Yearly and 30-year moving average E ref according to Penman-Monteith for De Bilt, the Netherlands, and projected E ref values for the period 2036-2065.Projections are based on national climate scenarios (Van den Hurk et al., 2006) developed by the Royal Netherlands Meteorological Institute (KNMI).Two of the scenarios have been found to be most likely (Klein Tank and Lenderink, 2009) and are presented here: scenario W (blue line) and W+ (red line).Both comprise a +2 K global temperature increase, but with, respectively, unchanged and changed (+) air circulation patterns in summer and winter.The scenarios were used to transfer the climatic conditions of 1976-2005 to the period 2036-2065 (Van den Hurk et al., 2006).

Figure 2 .
Figure 2. Flow chart of the methodology followed.E ref_x = E ref of empirical methods Mak, Har, PT and BC; R = number of reference evaporation methods, V = number of vegetation classes, and C = number of evaporation components.For the explanation of the other abbreviations, we refer to the introduction.
We use meteorological data from De Bilt, the Netherlands, covering the period 1906-2013, which were provided by the Royal Netherlands Meteorological Institute (KNMI).De Bilt (longitude = 5.177 • E, latitude = 52.101• N, altitude = 2 m) is the main meteorological site of the KNMI, located in the center of the Netherlands.Daily records are available for minimum and maximum temperature, sunshine hours, wind speed, and precipitation from 1906 onwards, and for global radiation from 1957.The observations are continuous, except for April 1945, where values from April 1944 are used instead.All required input variables are calculated for the period 1906-2013 following Allen et al. (1998).Observed global radiation was used to derive the Angstrom coefficients needed to calculate daily global radiation (Allen et al., 1998) from 1906 onwards.For consistency, we only use these simulated values for further analysis; simulations agree very well with observations (1957-2013, R 2 adj = 0.96).Wind speed, measured at different heights, was scaled to the reference height of 2 m following Allen et al. ( e s − e a ) = saturation vapor pressure deficit (kPa), T , T max and T min = mean, maximum and minimum temperature [ • C], p = mean daily percentage of annual daytime hours [%].C 1 , C 0 , α , β, a, b, c, d are the coefficients adjusted in the calibration.(Solomon et al., 2007) and a pattern of sunshine duration consistent with dimming and brightening for northwestern Europe identified bySanchez-Lorenzo et al. (2008).

Figure 4 .
Figure 4. E ref values for five methods for the period 1906-2013.Each empirical method calibrated on daily E ref_PM for the period 1906-1935.(a) 30-year moving average E ref , and (b) deviation of 30-year moving average E ref from E ref_PM .(c) Yearly variability in E ref for each method.(d) Yearly deviation of each E ref with E ref_PM .The boxplots show the minimum, first quartile, median, third quartile, maximum and outliers of the annual data.

Figure 5 .
Figure 5. Maximum absolute deviation in 30-year average E ref from 30-year average E ref_PM for the period 1984-2013, as a function of the length of the calibration period.

Figure 6 .
Figure 6.Illustration of synthetic "observed" potential evaporation components simulated with SWAP.The lines give monthly means over the period 1906-2013.E t and E s hold for a vegetation stand with a dry canopy only; E tot includes interception.

Figure 7 .
Figure 7. E t calculated for each vegetation class using each E ref method and matching K t calibrated on the 30-year period 1906-1935.Presented are 30-year moving averages in mm (top panels), and deviations with the synthetic E t for both the 30-year moving averages (center panels) and annual values (bottom panels).

Figure 8 .
Figure 8.Standard deviation (SD) for K t (a, d), K ts (b, e) and K tot (c, f) normalized to their mean values, for heather (a-c) and spruce (d-f), as a function of the length of the calibration period.K values are derived for each E ref method; K and E ref are calibrated on the same periods.

Figure 9 .Figure 10 .
Figure 9. Maximum absolute deviation with synthetic "observations" in mean E t (a, d), E t & E s (b, e), and E tot (c, f) for the period 1984-2013, for heather (a-c) and spruce (d-f), obtained by the two-step approach, as a function of the length of the calibration period.Presented as in Fig. 5, though using E ref and crop factors (K t , K ts and K tot ) to derive E t , E t & E s , and E tot .

Figure 11 .
Figure 11.SPEI6 time series with E ref based on PM (a).The subsequent figures show differences in SPEI with E ref based on Mak (b), PT (c), Har (d) and BC (e), calibrated on the period 1906-1935.
Allen et al. (1998)indicate that their crop factors should be multiplied by a factor of 1.1-1.3if interception, due to sprinkling irrigation for example, is involved (i.e., if interception is not simulated explicitly).This indicates that E i could significantly affect potential evaporation from a vegetated surface.As E i is largely driven by precipitation, a term that is generally not incorporated into E ref methods, it has already been stated that the crop factor approach only makes sense in times of drought, when interception does not contribute to the total evaporation (De Bruin Allen et al. (1998), as crop factors from the latter are by definition applied to correct for E t + E s , or for E t only (Allen, Hydrol.Earth Syst.Sci., 19, 997-1014, 2015 www.hydrol-earth-syst-sci.net/19/997/2015/ 2000).However,

Table 1 .
Equations used to calculate daily values of reference evaporation E ref (mm day −1 ) for the period 1906-2013 at De Bilt meteorological station.
, we consider E ref_PM as the best approximation of E ref .In order to reduce any systematic differences between E ref values, we estimate the empirical factors C 1 , C 0 , α , β, a, b, c, d of the other four E ref methods (Table ; Sect.2.6).Standard values for the vegetation classes and their schematization are taken from the National Hydrologic Instrument (NHI, http://www.nhi.nu/nhiuk.html;De Lange et al., 2014) of the Netherlands.The vegetation schematization is constant throughout the period 1906-2013; i.e., dynamic vegetation is not simulated.We consider five natural vegeta- Hydrol.Earth Syst.Sci., 19, 997-1014, 2015 www.hydrol-earth-syst-sci.net/19/997/2015/

Table 2 .
NHI (2008)used to calculate monthly average crop factors for each vegetation class andE ref method.=Et/ErefK ts (E ref )Crop factor for potential transpiration + soil evaporationK ts = (E t + E s ) /E ref K tot (E ref )Crop factor for total evaporation K tot = E tot /E ref tion classes: grassland (height = 0.5 m and no full soil cover, i.e., not to be confused with the reference grass), heather, deciduous forest, pine forest and spruce forest.Parameters are chosen followingNHI (2008)and are provided in the Supplement.It should be noted that we do not discuss the exact validity of the parameter values used, as we are only concerned with evaporation sensitivity to non-stationary climate within the range of typical vegetation.

components using the two-step approach
Potential evaporation components, E t , E t plus E s (hereafter E t & E s ) and E tot , for each vegetation class and method, are calculated from the daily E ref values by multiplying it by the corresponding K values, K t , K ts and K tot , respectively, for each vegetation class.Using these three definitions of crop factors separately allows quantification of the error that is made by correcting for each evaporation component.E ref estimates that are calibrated for a specific period, combined with K values determined for the same period, are used to calculate daily values of E t , E t & E s , and E tot for the full period, i.e.