Evaporation from a large lowland reservoir – (dis)agreement between evaporation methods at various timescales

Abstract. Accurate monitoring and prediction of surface evaporation becomes more crucial for adequate water management in a changing climate. Given the distinct differences between characteristics of a land surface and a water body, evaporation from water bodies require a different parameterization in hydrological models. Here we compare six commonly used evaporation methods that are sensitive to different drivers of evaporation, brought about by a different choice of parameterization. We characterize the (dis)agreement between the methods at various temporal scales ranging from hourly to 10-yearly periods, and we evaluate how this reflects in differences in simulated water losses through evaporation of lake IJsselmeer in The Netherlands. At smaller timescales the methods correlate less (r = 0.72) than at larger timescales (r = 0.97). The disagreement at the hourly timescale results in distinct diurnal cycles of simulated evaporation for each method. Although the methods agree more at larger timescales (i.e. yearly and 10-yearly), there are still large differences in the projected evaporation trends, showing a positive trend to a more (i.e. Penman, De Bruin–Keijman, Makkink and Hargreaves) or lesser extent (i.e. Granger–Hedstrom and FLake). The resulting discrepancy between the methods in simulated water losses of the IJsselmeer region due to evaporation is ranging from −4 mm (Granger–Hedstrom) to −94 mm (Penman) between the methods. This difference emphasizes the importance and consequence of the evaporation method selection for water managers in their decision making.


The resulting discrepancy between the methods in simulated water losses of the IJsselmeer region due to evaporation is ranging from -4 mm (Granger-Hedstrom) to -94 mm (Penman) between the methods. This difference emphasizes the importance and consequence of the evaporation method selection for water managers in their decision making.

Introduction
Surface evaporation is the second largest component (after precipitation) of the global hydrological cycle and it couples the 15 Earth's water and energy cycle (Beer et al., 2018). This hydrological cycle is projected to intensify in the future as a consequence of global warming (Huntington, 2006;Oki and Kanae, 2006;Jung et al., 2010). This intensification will result in higher and more extreme precipitation and enhanced surface evaporation, which can threaten our drinking water resources and enlarge the vulnerability of natural ecosystems and agricultural production (Arnell, 1999;Vörösmarty, 2000;Middelkoop et al., 2001;Verburg and Hecky, 2009;Trenberth et al., 2014). To avoid or alleviate negative consequences of droughts and floods, and to 20 guarantee ample access to high quality fresh water resources, an efficient water management system is required. In a changing climate, the ability to accurately monitor and predict surface evaporation therefore becomes even more crucial for adequate water management, which is particularly essential in densely populated and hydrologically sensitive areas such as the low-lying Netherlands.
water-limited. In case of open water bodies wind speed was found to be the most important driver of evaporation according to Granger and Hedstrom (2011). Despite the fact that water bodies comprise a large share of the total area in The Netherlands (∼17%, (Huisman, 1998)) and therefore form a crucial element in our water management system, in the past only few studies in The Netherlands have focussed on open water evaporation (Keijman and Koopmans, 1973;de Bruin and Keijman, 1979;Abdelrady et al., 2016;Solcerova et al., 2019). 15 During a short measurement campaign in Aug/Sept 1967 the vertical gradients of vapour pressure and temperature were measured at the water surface, and at 2 and 4 m above former Lake Flevomeer (Wieringa, 2019). It is assumed that the sign of these gradients can be used as a proxy for the sign of the turbulent exchange between the surface and the atmosphere. This proxy is used because there are no direct measurements available of E water during the 1967 measurement campaign. To analyse the difference between turbulent exchange above land surface and open water the observations of this measurement campaign 20 above Flevomeer were compared to observations conducted above a grassland in Cabauw. The results show distinct differences despite the similar behaviour of global radiation, which is generally regarded as the main driver of evaporation. It demonstrates that both the temperature and vapour pressure gradients above the lake are positive throughout most of the day and night. This refers to unstable situations which become strongest during the night when the air cools faster than the water surface. The continuous positive gradient of vapour pressure above the lake indicates that evaporation continues during the night, which is 25 in contrast to what is found above land. Another distinct difference between land and lake surfaces can be found in the timing of the peak of the temperature and vapour pressure gradients, which are a few hours earlier above land than over the lake. From this experiment it directly becomes clear that we can distinguish a difference in behaviour of the turbulent exchange above land compared to over a lake, which therefore should be acknowledged in hydrological models.
Water surfaces have different surface properties than land surfaces which leads to a difference in behaviour of the turbulent 30 exchange. One important difference is that solar radiation is able to penetrate a water surface, in contrast to a land surface which is not transparent. Therefore the energy balance is not preserved at the water surface, which entails heat storage in the water body. As a consequence of this thermal inertia, it was found that on shorter timescales the turbulent fluxes are not directly coupled to global radiation, and that the diurnal cycle of turbulent fluxes is smaller (Kleidon and Renner, 2017) and shifted in time compared to land surfaces (Venäläinen et al., 1999;Blanken et al., 2011). Blanken et al. (2000) found, similar to Granger and Hedstrom (2011), a relation between evaporation and the product of horizontal wind and vapour pressure gradient on daily timescales. In contrast, Nordbo et al. (2011) found the vapour pressure deficit alone to be more strongly correlated to evaporation rather than the product of wind and vapour pressure deficit. On intraseasonal timescales, Lenters et al. (2005) found evaporation to be a function of the thermal lag between temperatures of air and water. Resulting from the difference in properties and consequently its behaviour, the simulation of evaporation from lakes requires a different parameterization than 5 land surface evaporation in current traditional hydrological models, and with that accounting for the relevant driving processes at the timescale of interest.
In hydrological modelling, surface evaporation is a key source of uncertainty when it comes to its short-term prediction and long-term projection. Depending on the parameterization strategy of a model to capture the process of evaporation at the relevant timescale, different decisions are made to parameterize evaporation. In the past, most studies have focussed on the 10 parameterization of open water evaporation on weekly or even longer timescales (Finch, 2001;Zhan et al., 2019), where often potential evapotranspiration (PET) is improperly used as a proxy for lake evaporation neglecting heat storage (La, 2015;Duan and Bastiaanssen, 2017). However, parameterization at the hourly and daily timescales have remained underexplored. Melsen et al. (2018) recognized that through this parameterization, additional uncertainty is introduced to the model results. As a consequence, the effect of the parameterization strategy on the short-term prediction and long-term projection of evaporation can lead to profound differences in model results and therefore in (local) water management decisions. Our aim therefore is to study the effect of various parameterizations of evaporation on shorter-term local water management, and to investigate how these parameterization decisions affect long-term hydrologic projections. To this end, we compare to what extent six commonly 5 used evaporation parameterizations (dis)agree at various temporal scales, and look at the impact of the different methods under projected climate change.
2 Methods and data

Evaporation methods
There is a wide variety of methods that are used to estimate evaporation from water bodies. They can be subdivided into five 10 major types of approach, namely the pan method, water balance method, energy budget approaches, bulk transfer models and combination methods (Finch and Calver, 2008;Abtew and Melesse, 2013). In this study, we will systematically compare six methods that are commonly used to estimate surface evaporation and that are sensitive to different forcings (Table 1, see Table   A1(b) for the explanation of all variables).
The Penman method (Penman, 1948) has been chosen because it was originally developed for wet surfaces and because it 15 is the most commonly used method globally to estimate evaporation. The method developed by de Bruin and Keijman (1979) finds its origin in Penman's method, but has been based on observations done at lake Flevomeer, The Netherlands, where we will focus on in this study as well (see Sect. 2.2). Makkink's method (Makkink, 1957) is an even more simplified derivation of Penman's equation and is currently used in operational hydrological models in The Netherlands, which is why it is included in our comparison. To be able to compare it to methods that use other types of forcing, the methods of Granger and Hedstrom 20 (2011), using wind speed as main forcing, and the Hargreaves (1975) method, using solely air temperature, are incorporated. To also include a more physical-based method, FLake (Mironov, 2008) is used in this study. Below we will give a short description of the models that are used, and in Appendix A a more detailed description is given of the methods, including the assumptions that are made, and the input data that is needed for them (see Table A1).

25
The Penman method is a combination equation and it is based on the two fundamental factors that determine evaporation, namely: available energy and atmospheric demand (see Tab. 1). The effect of these factors combined is captured by the turbulent transfer and energy balance equations for a wet surface (Brutsaert, 1982;Tanny et al., 2008;Moene and van Dam, 2014).
In Penman's derivation, it was assumed that the available energy (Q n ) is equal to the net radiation (R n ), assuming other terms of the energy budget equation into the water body to be negligible, e.g. the water heat flux (G). This water heat flux is difficult to 5 measure, especially at smaller timescales, and is therefore often ignored (van Emmerik et al., 2013). However, for water bodies it is essential to account for heat storage changes as its storage capacity is significantly larger compared to land surfaces. Not accounting for heat storage changes can lead to (i) overestimation of E water in spring (Northern Hemisphere) when incoming radiation is used to warm up the water body instead of immediate release through E water , and (ii) underestimation of E water during autumn (Northern Hemisphere) when additional heat that was stored in the water body is released through E water . The Penman equation requires standard meteorological variables (net radiation, air temperature, wind speed and humidity) at one height.

De Bruin-Keijman
A similar expression to determine reference evaporation was proposed by de Bruin and Keijman (1979). They applied the Priestley-Taylor method to former lake Flevomeer in The Netherlands. The Priestley-Taylor method is a derivation of Penman's 15 method where the aerodynamic term in Penman's equation was found to be a constant proportion of the radiation term (Priestley and Taylor, 1972). de Bruin and Keijman (1979) adjusted this empirical relation to determine evaporation, namely that the aerodynamic term is linearly proportional to the radiation term with an additional constant added to that. These two parameters were found to vary during the year, but they are mostly taken as constants.

20
Another method that is based on Priestley-Taylor is the method of Makkink, which was developed for grassland areas in summertime in The Netherlands (Makkink, 1957). It only requires observations of global radiation and temperature, since it assumes that the water heat flux can be neglected with respect to net radiation, and that net radiation is about half of global radiation. The first assumption is only valid for land surfaces, and the second assumption considers average summers in The Netherlands. Makkink is currently used in operational hydrological models in The Netherlands and is applied to open water 25 using a correction factor.

Granger-Hedstrom
None of the methods described above include wind explicitly, although wind speed is recognized to be an important driving factor for evaporation. Granger and Hedstrom (2011) found the most significant correlation to exist between E water and wind speed at hourly timescales, and no direct relation was found with net radiation. The authors developed a simple model to quantify E water in which the key variables and parameters are: wind speed, land-water contrasts in temperature and humidity, and downwind distance from the shore.

Hargreaves
The Hargreaves method is an example of a simple and highly-empirical temperature-based model (Hargreaves, 1975;Hargreaves and Allen, 2003). Global surface radiation is frequently not readily available, therefore the Hargreaves method uses the 5 extra-terrestrial radiation, which depends on the angle between the direction of solar rays and the axis perpendicular to the atmosphere's surface, to simulate its seasonality. Furthermore it incorporates the range in maximum and minimum temperature as a proxy to estimate the level of cloudiness. The method has originally been designed for land surfaces at longer temporal scales and does not account for lake heat storage. However, previous studies have also shown that temperature-based model can perform reasonably well over lake surfaces at larger timescales (Rosenberry et al., 2007).

FLake
A more physical-oriented model is FLake which has been developed by Mironov (2008). This one-dimensional freshwater model is designed to simulate the vertical temperature structure and the energy budget of a lake. It consists of an upper mixed layer, of which the temperature is assumed to be uniform, and an underlying stratified layer of which the curve is parameterized using the concept of self-similarity (assumed-shape) (Kitaigorodskii and Miropolskii, 1970). The same concept 15 is used to represent the thermal structure of the ice and snow cover and of the thermally active layer of the bottom sediments.
Hargreaves and Allen (2003) FLake A two-layer parametric fresh water Mironov (2008) model. Capable to predict the vertical temperature structure.

Study area
This research study focusses on lake IJsselmeer, which is the largest fresh water lake in The Netherlands, bordering the provinces of Flevoland, Friesland, and Noord-Holland. In the north IJsselmeer is closed off from the Waddenzee by the Afsluitdijk embankment and in the southwest by the Houtribdijk embankment. The lake receives its fresh water supply (∼340 m 3 s −1 ) from the river IJssel and from the neighbouring polder systems, whereas the main outflow of the lake occurs at the 5 sluices of the Afsluitdijk under gravity. Covering an area of 1100 km 2 with an average depth of 5.5 m, the IJsselmeer can be considered a large shallow lake. The lake has an important hydrological function in the low-lying Netherlands both in flood prevention and fresh water supply for agricultural and drinking water purposes.

Data input sources
The models were forced with observed historical hourly data , as well as with simulated 3-hourly future climate 10 projections (2019 -2100) resulting from a regional climate model (RCM). In this study we systematically compare the models; therefore, we chose to give preference to a long-term dataset of observed meteorological variables above land, rather than a shorter-term dataset of the same variables in more close proximity to lake IJsselmeer. Following from this, we used the longterm hourly meteorological data observed in De Bilt, The Netherlands, situated at approximately 50 km distance. Depending on the required data for each model (see Table A1), this includes air temperature (T air ), global radiation (K in ), air pressure 15 (P), wind speed (u), wind direction, relative humidity (RH) and cloud cover. Furthermore, water surface temperature (WST) is required as a variable in the Granger-Hedstrom model, as well as to calculate outgoing longwave radiation. WST is not operationally measured at a regular base in The Netherlands, however the Dutch water authority (RWS) measures hourly water temperature at a depth of about 1 meter in the IJsselmeer since 2014. Another source of WST measurements is an experiment done in 1967 (Keijman and Koopmans, 1973;Wieringa, 2019) in former lake Flevomeer before that part of lake IJsselmeer was reclaimed. This experiment includes four weeks of data of amongst others WST, average water temperature, and T air and vapour pressure at 2 and 4 m height. Furthermore, FLake generates WST simulations as well. Remotely sensed satellite products inferring WST were not considered because of its low temporal resolution, but will be used in upcoming studies to infer the spatio-temporal distribution of E water . Comparing the different sources (not shown) and considering the availability of the data, it has been chosen to use the FLake simulations of WST for further analysis. To quantify the (dis)agreement between the models on the long-term projection of E water , we used climate projections generated with the RCM RACMO2 (van Meijgaard et al., 2008) driven by the global climate model (GCM) EC-EARTH 2.3 (Hazeleger et al., 2012). This long-term simulation covers the period 1950 -2100 and consists of 16 ensemble members at a spatial resolution of 0.11 • (∼ 12 km) available at 3-hourly timesteps (Aalbers et al., 2017). Each member of the ensemble has a slightly different atmospheric initial state perturbed in 1850, and the members can thus be considered as independent realiza-10 tions. EC-EARTH was forced with historical emissions until 2005 and future projections (2006 -2100) were generated using a substantial greenhouse gas emission scenario (RCP8.5). We used a grid cell representing location De Bilt. Direct evaporation observations from an eddy-covariance instrument at 10-min time resolution made in Cabauw, The Netherlands, from 1986 -2018 (Beljaars and Bosveld, 1997) are used to validate the climate change signal of E and its sign. 15 Methods that are sensitive to different types of forcings show a distinct diurnal signal of simulated evaporation. A wide variety of evaporation methods use K in as a driving force to simulate the diurnal cycle of evaporation, partly because K in affects temperature, vapour pressure deficit (VPD) and wind speed, which are all driving forces of evaporation. Therefore, an analysis of the diurnal signals and possible phase shift of these variables provide information on how these driving forces relate to each other and to the simulated evaporation. 20 We depict the diurnal cycle as function of time for one specific day where no clouds were present, as well as for the average diurnal cycle for the period 1960 -2018 using data from De Bilt, The Netherlands (see Fig. 2 and Section 2.3). Another way that we will use to analyse and illustrate the diurnal cycle and the relation to other evaporation methods, is to plot evaporation as a function of K in similar to Renner et al. (2018). When this cycle appears as a hysteresis loop, it indicates a phase difference between the addressed variable and K in . The size of the loop quantifies the magnitude of the phase lag, and the direction of the 25 loop denotes if the phase lag is negative or positive. The method of Hargreaves will be omitted in these diurnal cycle analyses.

Diurnal cycle
This method requires a temperature maximum and minimum (see Appendix A) over the considered time period. Therefore, it is not possible to calculate hourly simulated evaporation for this method with the given hourly observations of temperature.

Long-term trends
To explore how evaporation has been changing in the last decades (1960 -2018) and how it is projected to change in the future 30 (2019 -2100) in a changing climate, we will force the methods with historical observations and simulated future projections of meteorological variables resulting from the regional climate model RACMO, using a grid cell representing De Bilt (Section 2.3). The trend of the yearly averaged simulated evaporation rates will be detected based on weighted local regression, using the LOcally WEighted Scatter-plot Smoother (LOWESS) method (Cleveland, 1979). This method is applied to the observational data for the historical period, and to the average of the 16 RACMO members (Section 2.3) for the future period. Mean and standard deviation are calculated using the average of the RACMO members, where the standard deviation is calculated based on the de-trended time series.
2.6 Model (dis)agreement 5 A difference in sensitivity of the methods to drivers of evaporation can help to explain the (dis)agreement found in behaviour of simulated evaporation. To compare this sensitivity a perturbation of 1% will be imposed one by one on the daily observational data from De Bilt of four key variables, namely air temperature, solar radiation, wind speed and vapour pressure, without changing the other variables. For this, the percentage difference of simulated evaporation with perturbation of one of the variables, to the simulation of the baseline situation without any perturbation, will be calculated.

10
The correlations between the simulated evaporation using the different methods and various meteorological variables using data from De Bilt will be calculated based on Pearson's parametric correlation method, measuring the linear dependency between two variables. We will calculate the correlations for all the timescales ranging from hourly to 10-yearly period all based on hourly data. To ensure that the number of data points in the calculation is not influencing the results, we will apply bootstrapping to artificially create the same number of data points for each timescale. be seen that net radiation (R n ) peaks between 1pm and 2pm, which coincides with the peaks of evaporation simulated using the 20 radiation-based methods, i.e. Penman (PN), de Bruin-Keijman (BK) and Makkink (MK). Evaporation becomes negative during the night following net radiation in case of the first two methods. However, this is not realistic given the fact that the energy balance is not preserved at the water surface, which is assumed by these methods, and the energy released from heat stored in the water can exceed the net radiation at night. This will drive both a positive evaporation and sensible heat flux. Evaporation simulated with the wind-driven Granger-Hedstrom (GH) method is damped relative to the radiation-based methods, and it is 25 rather constant throughout the day, with a small peak following the signal of the wind speed. The signal of evaporation resulting from FLake (FL) is damped as well, and its peak is lagging 2 hours behind relative to R n , induced by a combination of K in , wind speed and T air . The total average diurnal evaporation is significantly lower than for the radiation-based methods. This can be explained by the remaining heat storage term of the energy balance in FL, which is used to warm up the water. This heat storage capacity is explicitly accounted for in FL. Hargreaves' temperature-based method (HA) is not depicted in the graph 30 because the method does not allow to calculate evaporation at hourly timescales given the available data (see Sect. 2.4). The diurnal range of the water temperature (T water ) is small, but shows a distinct peak four hours lagging behind the peak of R n , which indicates that during the day heat is stored in the water, which partly is released back during the night. The dotted line illustrating the average diurnal cycle of T water was artificially lowered by 7.5 degrees Celcius in this graph, merely to show its diurnal course and the timing of the peak.
5 Figure 4 illustrates the diurnal cycle of the (a) latent heat flux, (b) vapour pressure, and (c) temperature, which are plotted as a function of K in for a sunny day without any clouds in March 2007. It can be seen that evaporation simulated using the MK method is almost in phase with K in (phase lag is 7 min), which may be expected from a method that uses K in as the main driving force of evaporation. The largest phase lag (=114 min) results from the simulation with FL; this can be explained by the fact that this method explicitly accounts for heat storage changes, resulting in a less direct response of evaporation to K in .

10
The analysis of diurnal patterns thus shows that K in explains most of the variance in evaporation for the methods PN, BK and MK. However, the methods that incorporate heat storage capacity and/or WST as an explaining variable show less variation in evaporation during the day and therefore do not relate directly to K in .
Air temperature has a distinct counter-clockwise (CCW) loop which indicates that heat is stored in the lower part of the atmospheric boundary layer (ABL). However, the water surface temperature shows very little response to K in . This combination q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q (b) Saturated vapour pressure Vapour pressure deficit Vapour pressure q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q (c) Water surface temperature Air temperature T s − T a leads to another distinct hysteresis loop of the driving force of sensible heat flux, namely the gradient T s -T a , which has a clockwise (CW) direction. This implies that the latent heat flux is preceded by the sensible heat flux. The vapour pressure is rather constant throughout the day and does not display a clear hysteresis loop. However, the saturated vapour pressure, which is a function of T air displays pronounced CCW hysteresis. As a result, VPD also has a large CCW hysteresis loop, which is the key driver of E in the PN method.

5
Following the analyses of the diurnal cycle the occurence of storage and release of heat in the water body is demonstrated through the diurnal cycle of T water , and should therefore be accounted for by an evaporation method. FL is the method that mimics this thermal inertia effect most clearly in its diurnal cycle, which is also supported through its phase lag to K in . This makes the FL method the most realistic method to use at this timescale. The results from the GH and FL method differ from the other methods in the sense that the RACMO realizations do not result 10 in a significant trend in either the simulated historical or the projected future period, while in the historical period the simulation based on observations presents a positive trend when using the FL method. The latter can be explained by the fact that the FL method is most sensitive to K in , T air and wind speed, of which for both K in and wind speed the RACMO realizations do not show any trend (not shown here). The mismatch in the historical period for FL between the simulation based on observations and the simulations based on RACMO realizations can be attributed to the inability of RACMO to capture the trend in K in . 15 This inability will also influence the simulations made by the radiation-based models, which implies that for these methods simulated trends in the future period are likely to be stronger than demonstrated in figure 5. This will lead to even larger disagreement of simulated evaporation between the six methods.
Comparison between the methods of the overall positive trend that is projected to continue in the future period, reveals that not only the average simulated evaporation of all RACMO members (µ) will increase in the future for all the methods, ranging 20 from an increase of 0.02 mm d −1 (GH and FL) to 0.24 mm d −1 (PN), but also that the variability (σ) is projected to increase (see Fig. 5). This is also demonstrated in figure 6 for both the historical (orange) and future (blue) period. Here the spread is defined as the difference between the yearly average maximum and minimum value of the 16 RACMO members based on daily evaporation rates. For each method the average and the spread is projected to increase in the future, however, the rates at which this happens differ significantly between the methods. The largest difference in spread can be found between the GH method, 25 which has a spread of 0.32 mm d −1 , and PN method with a spread of 0.69 mm d −1 in the historical period, and increases to a range that varies from 0.36 mm d −1 (GH) to 0.94 mm d −1 (PN) in the future period. The methods that resemble each other most, and differ least in spread, are the PN and BK method during both the historical and future period.  The upper and lower set of panels depict the same information arranged per variable or per method, respectively. Most of the methods show the largest sensitivity to T air and K in . The purely temperature driven HA method is only sensitive T air . The wind-driven GH method is the method most sensitive to wind (1% difference), which is expected, but FL also is sensitive to wind showing a difference of 0.5% compared to the baseline situation without any perturbation. This implies that the behaviour    of these models would start to deviate more from the other methods if for instance the wind speed regime were to change in the future. In general FL is the most sensitive method, and therefore this method will start to differentiate more from the other methods when these four meteorological variables were to change in the future. and magnitude of the correlations. From our data the HA method can only be calculated at a daily timescale or coarser, hence the empty boxes at the hourly timescale. It becomes apparent that at hourly timescales there is a positive correlation between wind and simulated E ranging from 0.12 for the MK method, to 0.84 for the GH method. At larger timescales, this correlation becomes negative, except for the wind-driven GH method, and for FL it remains positive until a daily timescale. At the two largest timescales considered here ( Fig. 8e and f) the correlation with wind increases again, but these are statistically 10 insignificant except for the GH method. The correlation between net radiation and simulated E remains high at all timescales, except for the GH method. The latter method shows low correlations with the other methods and most meteorological variables, which mostly become statistically insignificant (indicated by the numbers in grey) from weekly timescales and larger. The correlation between net radiation and E simulated using FL increases steadily from 0.63 to 0.94 with increasing timescale.

Model (dis)agreement at various timescales
Regarding the correlation between E and T air , and E and VPD, the matrices show that the correlations increase steadily for 15 all evaporation methods, again with the exception of the GH method. However, it should be noticed that there is a dip in the correlation at yearly timescales, which is clearly visible in Figure 9. Furthermore, the correlation between the meteorological variables R n , T air and VPD was found to increase with increasing timescale.
The radiation-based methods agree highly with each other, as can be expected, over the entire range of timescales, with correlation values above 0.9. The GH method deviates most in terms of correlation from the other methods, which could 20 be attributed to the difference in sensitivity to wind speed. The HA method shows the largest increase in agreement to the radiation-based methods with increasing timescale. This can be attributed to the increasing correlation between temperature and net radiation, which are the main driving forces of the HA and the radiation-based methods, respectively. Comparing the average correlations between the models for each timescale reveals that the correlation increases with increasing timescale from 0.72 at hourly to 0.97 at 10-yearly timescale. This increase implies that the methods on average tend to agree more with each other at coarser timescales, which means that at smaller timescales the consequence of model choice becomes more apparent.

5
The average correlation at the daily timescale is lowered mainly as a consequence of the low correlations of the GH method to the radiation-based methods. It should be noted that insignificant values were not incorporated in the calculation of the average correlations.
In Fig. 9 the correlation (r) between simulated evaporation and VPD, T air and R n respectively are depicted to further explore the 10 (dis)agreement between the methods over the range of timescales. This correlation has been calculated based on observational data from De Bilt. In general there is a high correlation (r > 0.75) between simulated E and VPD, which is increasing with increasing timescales, up to correlations of 0.97 at the 10-yearly timescale. However, there is a small drop at the yearly timescale. We were not able to determine the exact cause of this, but it could be a result of cross-correlation in the data. The GH method deviates in behaviour compared to the other five methods, since it is the only method that does not explicitly include T air , and with that VPD through the saturated vapour pressure, which is a function of temperature. The GH method only includes a temperature gradient as additional information to wind speed. The correlation between simulated E and T air 5 increases from 0.57 at the hourly timescale up to 0.98 at the 10-yearly timescale, again with a small drop at the yearly timescale.
Furthermore, similar to the correlation with VPD, the GH method again demonstrates a different behaviour compared to the other methods. There is a strong increase of the correlation at the yearly and 10-yearly timescale for this method, which could be related to a change of the sign to positive values for the correlation between wind speed and T air at these timescales (see Fig.   8). The correlations to R n demonstrate clearly the resemblance between the radiation-based methods PN, BK and MK. The HA 10 method shows similar correlations as well, except for the largest two timescales where the correlation is lower. The FL method correlates better to R n with increasing timescale, confirming that at small timescales R n is not a direct driver of E as a result of heat storage. The correlation between R n and E simulated by the GH method is close to zero for daily timescales and larger.
However, at the hourly timescale the correlation is 0.32. This corresponds to the larger correlation between wind speed and R n at this timescale. 15 The general increase of the correlations with the meteorological variables that was found with increasing timescales, implies that ultimately these variables act as driving forces of potential evaporation. However, at the shorter timescales some of these methods may fail because the wrong controlling variables are used to simulate actual evaporation. This is especially the case for methods that use R n as a driver of evaporation and that omit the effect of thermal inertia as a result of heat storage.

Implications for water resources management
20 Table 2 presents for several years the water loss through evaporation expressed as water depths in mm for the IJsselmeer region.
Meteorological data from De Bilt was used to force the evaporation methods in case of the two average years and the two dry available in the IJsselmeer region, this stresses even more that the knowledge of the discrepancy between the methods is of great importance for water managers in their decision making.
This discrepancy is projected to increase in the future when comparing the results based on the average of the RACMO realizations for the years 2000 and 2100. The radiation-based and temperature methods show a growing water loss through evaporation, while for GH and FL the water loss remains similar. All methods show increasing evaporation rates leading to 5 lower water availability in the region, where the change in mean water depth from the year 2000 to 2100 is ranging from -4 mm (GH) to -94 mm (PN). This means that when water managers would rely on the PN method rather than on the GH method for instance, they will have to decide for instance to supply less water to the surrouding land surfaces.
The effect of the projected changes of evaporation in the future was found to be largest in summer (Fig. 10), which coincides with the period that has the highest evaporation rates. This summer period is therefore most critical regarding water resources 10 management, also considering the fact that more extreme periods of drought are expected to occur more frequently. During summer, evaporation is thus projected to increase, and at the same time it is likely that with increasing temperatures the river discharge from the IJssel will decrease (Görgen et al., 2010). So less water will be available during summer, while the water demand from the surrounding land surfaces will increase for agricultural purposes. The trade-offs water managers have to make therefore become very precarious, while knowing that the methods project significantly different results.  The aim of this study was to explore the effect of various conceptualizations of evaporation on shorter-term local water management and long-term hydrologic projections. We focussed on timescales ranging from hourly to 10-yearly periods, where we have (i) elaborated on the differences found in the diurnal cycle, but also (ii) quantified the (dis)agreement between the methods over the range of timescales in terms of correlations. And moreover, we studied how the evaporation rates according 5 to the various methods are projected to change in the future.
Our study characterized that there is a large effect of the type of forcing that is used by the evaporation method, especially on the simulated diurnal cycle. This difference is reflected in the total average daily evaporation, the timing of the evaporation peak and the day/night cycle. All radiation methods (PN, BK and MK) follow the diurnal cyle of the net radiation, where MK becomes zero at night, and PN and BK negative. The simulated evaporation resulting from the GH and FL methods are 10 more constant throughout the day and on average they show a continuation of evaporation during the night. Data of T water demonstrated heat storage during the day and release of heat during the night. FL is the method that mimics this effect of thermal inertia most clearly in its diurnal cycle. This makes the FL method the most realistic method to use at this timescale.
At larger timescales we found a disagreement between the methods in the trend of yearly averaged daily evaporation rates both for the historical (1960 -2018) and future period (2019 -2100). Although both the average and the variability of all methods 15 is projected to increase in the future, the rate at which this happens differs significantly between the methods. The average evaporation rate increase ranges from 0.02 mm d −1 (GH and FL) to 0.24 mm d −1 (PN).
A discrepancy at the whole range of evaluated timescales, from hourly to 10-yearly, is especially present between the radiation and temperature methods (PN, BK, MK and HA), and the wind driven (GH) and physical-based lake (FL) methods.
However, this disagreement generally decreases with increasing timescale, which is to be expected considering that ultimately 20 evaporation is constrained by the energy input in the system and the transport of water vapour. However, the difference at yearly timescales is still significant. This can be seen from the large differences found when translating the discrepancy of simulated evaporation into yearly water losses through evaporation for lake IJsselmeer. For an average year this can range between 417 mm (FL) and 817 mm (BK). During a sensitive dry year this discrepancy can result in water levels to be simulated to either rise or drop, solely depending on the evaporation method that is used. When considering future simulations, the projected change in 25 mean water losses through evaporation, expressed as water depths, ranges from -4 mm (GH) to -94 mm (PN) when comparing the year 2000 to 2100. This means that when water managers would rely on the PN method rather than on the GH method, they might have to decide to supply less water to the surrounding land in the future. Therefore, owing to the disagreement between the methods, it reveals that the choice of method is of great importance for water managers in their decision making. To gain confidence in which method is most reliable to use, now and in the future, we suggest to perform long-term direct observations 30 of E water at high temporal resolution. This will help to improve optimal parameterization of E water in hydrological models.

X)
2) Atmospheric stability can be described by land-lake temperature contrast 3) Non-linear over-all relation between Ewater )+ψq (z/L) T air , K in , u, RH, 1) Top layer of lake is assumed to be well mixed cloudiness, d lake , x fetch , 2) Thermocline in bottom layer is described with concept of self-similarity of temperature profile