Evaporation from a large lowland reservoir – (dis)agreement between evaporation models from hourly to decadal timescales

Abstract. Accurate monitoring and prediction of surface evaporation becomes more crucial for adequate water management in a changing climate. Given the distinct differences between characteristics of a land surface and a water body, evaporation from water bodies require a different parameterization in hydrological models. Here we compare six commonly used evaporation methods that are sensitive to different drivers of evaporation, brought about by a different choice of parameterization. We characterize the (dis)agreement between the methods at various temporal scales ranging from hourly to 10-yearly periods, and 5 we evaluate how this reflects in differences in simulated water losses through evaporation of lake IJsselmeer in The Netherlands. At smaller timescales the methods correlate less (r=0.72) than at larger timescales (r=0.97). The disagreement at the hourly timescale results in distinct diurnal cycles of simulated evaporation for each method. Although the methods agree more at larger timescales (i.e. yearly and 10-yearly), there are still large differences in the projected evaporation trends, showing a positive trend to a more (i.e. Penman, De Bruin-Keijman, Makkink and Hargreaves) or lesser extent (i.e. Granger-Hedstrom and FLake). 10 The resulting discrepancy between the methods in simulated water losses of the IJsselmeer region due to evaporation is ranging from -4 mm (Granger-Hedstrom) to -94 mm (Penman) between the methods. This difference emphasizes the importance and consequence of the evaporation method selection for water managers in their decision making.

, a relation between evaporation and the product of horizontal wind and vapour pressure gradient on daily timescales. In contrast, Nordbo et al. (2011) found the vapour pressure deficit alone to be more strongly correlated to evaporation rather than the product of wind and vapour pressure deficit. On intraseasonal timescales, Lenters et al. (2005) found evaporation to be a function of the thermal lag between temperatures of air and water. Resulting from the difference in 25 properties and consequently its behaviour, the simulation of evaporation from lakes requires a different parameterization than land surface evaporation in current traditional hydrological models, and with that accounting for the relevant driving processes at the timescale of interest.
To illustrate the difference between open water evaporation and evaporation above land surfaces, we used observations from a short measurement campaign. During this measurement campaign in Aug/Sept 1967 the vertical gradients of vapour pressure 30 and temperature were measured at the water surface, and at 2 and 4 m above former Lake Flevomeer (Wieringa, 2019). It is assumed that the sign of these gradients can be used as a proxy for the sign of the turbulent exchange between the surface and the atmosphere. This proxy is used because there are no direct measurements available of open water evaporation (E water ) during the 1967 measurement campaign. To analyse the difference between turbulent exchange above land surface and open water the observations of this measurement campaign above Flevomeer were compared to observations conducted above a grassland in Cabauw. The results show distinct differences despite the similar behaviour of global radiation, which is generally regarded as the main driver of evaporation. It demonstrates that both the temperature and vapour pressure gradients above the lake are positive throughout most of the day and night. This refers to unstable situations which become strongest during the night when the air cools faster than the water surface. The continuous positive gradient of vapour pressure above the lake indicates that evaporation continues during the night, which is in contrast to what is found above land. Another distinct 5 difference between land and lake surfaces can be found in the timing of the peak of the temperature and vapour pressure gradients, which are a few hours earlier above land than over the lake. From this experiment it directly becomes clear that we can distinguish a difference in behaviour of the turbulent exchange above land compared to over a lake, which therefore should be acknowledged in hydrological models.
Given the importance of open water evaporation in short-term prediction and long-term projection, the process should be 10 parameterized realistically in operational hydrological models. Depending on the parameterization strategy of a model to capture the process of evaporation at the relevant timescale, different decisions are made to parameterize evaporation. In the past, most studies have focussed on the parameterization of open water evaporation on weekly or even longer timescales (Finch, 2001;Zhan et al., 2019), where often potential evapotranspiration (PET) is improperly used as a proxy for lake evaporation neglecting heat storage (La, 2015;Duan and Bastiaanssen, 2017). However, parameterization at the hourly and daily timescales have remained underexplored. The effect of the parameterization strategy on the short-term prediction and long-term projection of evaporation can likely lead to profound differences in model results and therefore in (local) water management decisions.
Our aim therefore is to study the effect of various parameterizations of evaporation on shorter-term local water management, 5 and to investigate how these parameterization decisions affect long-term hydrologic projections. To this end, we compare to what extent six commonly used evaporation parameterizations (dis)agree at various temporal scales, and look at the impact of the different methods under projected climate change.
2 Methods and data 2.1 Evaporation methods

10
There is a wide variety of methods that are used to estimate evaporation from water bodies. They can be subdivided into five major types of approach, namely the pan method, water balance method, energy budget approaches, bulk transfer models and combination methods (Finch and Calver, 2008;Abtew and Melesse, 2013). In this study, we will systematically compare six methods that are commonly used to estimate surface evaporation and that are sensitive to different forcings (Table 1, see Table   A1(b) for the explanation of all variables).

15
The Penman method (Penman, 1948) has been chosen because it was originally developed for wet surfaces and because it is the most commonly used method globally to estimate evaporation. The method developed by de Bruin and Keijman (1979) finds its origin in Penman's method, but has been based on observations done at lake Flevomeer, The Netherlands, where we will focus on in this study as well (see Sect. 2.2). Makkink's method (Makkink, 1957) is an even more simplified derivation of Penman's equation and is currently used in operational hydrological models in The Netherlands, which is why it is included 20 in our comparison. To be able to compare it to methods that use other types of forcing, the methods of Granger and Hedstrom (2011), using wind speed as main forcing, and the Hargreaves (1975) method, using solely air temperature, are incorporated. To also include a more physical-based method, FLake (Mironov, 2008) is used in this study. Below we will give a short description of the models that are used, and in Appendix A a more detailed description is given of the methods, including the assumptions that are made, and the input data that is needed for them (see Table A1).

Penman
The Penman method is a combination equation and it is based on the two fundamental factors that determine evaporation, namely: available energy and atmospheric demand (see Tab. 1). The effect of these factors combined is captured by the turbulent transfer and energy balance equations for a wet surface (Brutsaert, 1982;Tanny et al., 2008;Moene and van Dam, 2014).
Starting from the energy balance principle combined with the flux-gradient approach, the following form of Penman's equation is derived: in which s is the slope of the saturated vapour pressure curve, γ the psychrometric constant, R n the net radiation, G the water heat flux, ρ air density, c p the specific heat of air, r a the aerodynamic resistance, and (e sat − e a ) the vapour pressure deficit 5 of the air. In Penman's derivation, it was assumed that the available energy (Q n ) is equal to the net radiation, assuming other terms of the energy budget equation into the water body to be negligible, e.g. the water heat flux. This flux is difficult to measure, especially at smaller timescales, and is therefore often ignored (van Emmerik et al., 2013). However, for water bodies it is essential to account for heat storage changes as its storage capacity is significantly larger compared to land surfaces. Not accounting for heat storage changes can lead to (i) overestimation of E water in spring (Northern Hemisphere) when incoming 10 radiation is used to warm up the water body instead of immediate release through E water , and (ii) underestimation of E water during autumn (Northern Hemisphere) when additional heat that was stored in the water body is released through E water . The Penman equation requires standard meteorological variables (net radiation, air temperature, wind speed and humidity) at one height.

15
A similar expression to determine reference evaporation was proposed by de Bruin and Keijman (1979). They applied the Priestley-Taylor method to former lake Flevomeer in The Netherlands. The Priestley-Taylor method is a derivation of Penman's method where the aerodynamic term in Penman's equation was found to be a constant proportion of the radiation term (Priestley and Taylor, 1972). de Bruin and Keijman (1979) adjusted this empirical relation to determine evaporation, namely that the aerodynamic term is linearly proportional to the radiation term with an additional constant added to that. These two parameters 20 were found to vary during the year, but they are mostly taken as constants.

Makkink
Another method that is based on Priestley-Taylor is the method of Makkink, which was developed for grassland areas in summertime in The Netherlands (Makkink, 1957). It only requires observations of global radiation and temperature, since it assumes that the water heat flux can be neglected with respect to net radiation, and that net radiation is about half of global 25 radiation. The first assumption is only valid for land surfaces, and the second assumption considers average summers in The Netherlands. Makkink is currently used in operational hydrological models in The Netherlands and is applied to open water using a correction factor.

Granger-Hedstrom
None of the methods described above include wind explicitly, although wind speed is recognized to be an important driving factor for evaporation. Granger and Hedstrom (2011) found the most significant correlation to exist between E water and wind speed at hourly timescales, and no direct relation was found with net radiation. The authors developed a simple model to quantify E water in which the key variables and parameters are: wind speed, land-water contrasts in temperature and humidity, 5 and downwind distance from the shore.

Hargreaves
The Hargreaves method is an example of a simple and highly-empirical temperature-based model (Hargreaves, 1975;Hargreaves and Allen, 2003). Global surface radiation is frequently not readily available, therefore the Hargreaves method uses the extra-terrestrial radiation, which depends on the angle between the direction of solar rays and the axis perpendicular to the 10 atmosphere's surface, to simulate its seasonality. Furthermore it incorporates the range in maximum and minimum temperature as a proxy to estimate the level of cloudiness. The method has originally been designed for land surfaces at longer temporal scales and does not account for lake heat storage. However, previous studies have also shown that temperature-based model can perform reasonably well over lake surfaces at larger timescales (Rosenberry et al., 2007).

15
A more physical-oriented model is FLake which has been developed by Mironov (2008). This one-dimensional freshwater model is designed to simulate the vertical temperature structure and the energy budget of a lake. It consists of an upper mixed layer, of which the temperature is assumed to be uniform, and an underlying stratified layer of which the curve is parameterized using the concept of self-similarity (assumed-shape) (Kitaigorodskii and Miropolskii, 1970). The same concept is used to represent the thermal structure of the ice and snow cover and of the thermally active layer of the bottom sediments.

Study area
This research study focusses on lake IJsselmeer, which is the largest fresh water lake in The Netherlands, bordering the provinces of Flevoland, Friesland, and Noord-Holland. In the north IJsselmeer is closed off from the Waddenzee by the Afsluitdijk embankment and in the southwest by the Houtribdijk embankment. The lake receives its fresh water supply (∼340 m 3 s −1 ) from the river IJssel and from the neighbouring polder systems, whereas the main outflow of the lake occurs at the 25 sluices of the Afsluitdijk under gravity. Covering an area of 1100 km 2 with an average depth of 5.5 m, the IJsselmeer can be considered a large shallow lake. The lake has an important hydrological function in the low-lying Netherlands both in flood prevention and fresh water supply for agricultural and drinking water purposes.

Data input sources
The models were forced with observed historical hourly data , as well as with simulated 3-hourly future climate projections (2019 -2100) resulting from a regional climate model (RCM). In this study we systematically compare the models; therefore, we chose to give preference to a long-term dataset of observed meteorological variables above land, rather than a shorter-term dataset of the same variables in more close proximity to lake IJsselmeer. Following from this, we used the longterm hourly meteorological data observed in De Bilt, The Netherlands, situated at approximately 50 km distance, rather than using the stations nearby in Stavoren or Lelystad. The latter stations only measure all needed variables since the end of 2002, and in Stavoren air pressure was never measured, while in De Bilt all variables required for the models are measured since 5 1960. We compared the stations of Stavoren and Lelystad to station De Bilt. This showed that temperature, relative humidity, air pressure and global radiation are comparable, while the wind speed is lower in De Bilt compared to Stavoren and Lelystad.
However, there is no substantial difference in the daily cycle and frequency distribution of wind speed between the locations (not shown).
Depending on each evaporation model (see Table A1), data that was used from station De Bilt includes air temperature (T air ), 10 global radiation (K in ), air pressure (P), wind speed (u), wind direction, relative humidity (RH) and cloud cover. Furthermore, water surface temperature (WST) is required as a variable in the Granger-Hedstrom model, as well as to calculate outgoing longwave radiation. WST is not operationally measured at a regular base in The Netherlands, however the Dutch water authority (RWS) measures hourly water temperature at a depth of about 1 meter in the IJsselmeer since 2014. Another source of WST measurements is an experiment done in 1967 (Keijman and Koopmans, 1973;Wieringa, 2019) in former lake Flevomeer before 15 that part of lake IJsselmeer was reclaimed. This experiment includes four weeks of data of amongst others WST, average water temperature, and T air and vapour pressure at 2 and 4 m height. Furthermore, FLake generates WST simulations as well.
Remotely sensed satellite products inferring WST were not considered because of its low temporal resolution, but will be used in upcoming studies to infer the spatio-temporal distribution of E water . Comparing the different sources (not shown) and considering the availability of the data, it has been chosen to use the FLake simulations of WST for further analysis. 20 To quantify the (dis)agreement between the models on the long-term projection of E water , we used climate projections generated with the RCM RACMO2 (van Meijgaard et al., 2008) driven by the global climate model (GCM) EC-EARTH 2.3 (Hazeleger et al., 2012). This long-term simulation covers the period 1950 -2100 and consists of 16 ensemble members at a spatial resolution of 0.11 • (∼ 12 km) available at 3-hourly timesteps (Aalbers et al., 2017). Each member of the ensemble has a slightly different atmospheric initial state perturbed in 1850, and the members can thus be considered as independent realiza-25 tions. EC-EARTH was forced with historical emissions until 2005 and future projections (2006 -2100) were generated using a substantial greenhouse gas emission scenario (RCP8.5). We used a grid cell representing location De Bilt. Direct evaporation observations from an eddy-covariance instrument at 10-min time resolution made in Cabauw, The Netherlands, from 1986 -2018 (Beljaars and Bosveld, 1997) are used to validate the climate change signal of E.

Diurnal cycle 30
Methods that are sensitive to different types of forcings show a distinct diurnal signal of simulated evaporation. A wide variety of evaporation methods use K in as a driving force to simulate the diurnal cycle of evaporation, partly because K in affects temperature, vapour pressure deficit (VPD) and wind speed, which are all driving forces of evaporation. Therefore, an analysis of the diurnal signals and possible phase shift of these variables provide information on how these driving forces relate to each other and to the simulated evaporation.
We depict the diurnal cycle as function of time for one specific day where no clouds were present, as well as for the average diurnal cycle for the period 1960 -2018 using data from De Bilt, The Netherlands (see Fig. 2 and Section 2.3). Another way that we will use to analyse and illustrate the diurnal cycle and the relation to other evaporation methods, is to plot evaporation 5 as a function of K in similar to Renner et al. (2018). When this cycle appears as a hysteresis loop, it indicates a phase difference between the addressed variable and K in . The size of the loop quantifies the magnitude of the phase lag, and the direction of the loop denotes if the phase lag is negative or positive. The method of Hargreaves will be omitted in these diurnal cycle analyses.
This method requires a temperature maximum and minimum (see Appendix A) over the considered time period. Therefore, it is not possible to calculate hourly simulated evaporation for this method with the given hourly observations of temperature.

Long-term trends
To explore how evaporation has been changing in the last decades (1960 -2018) and how it is projected to change in the future (2019 -2100) in a changing climate, we will force the methods with historical observations and simulated future projections of meteorological variables resulting from the regional climate model RACMO, for which the spatial grid cell of the RACMO model is used that represents the location of De Bilt (Section 2.3). The trend of the yearly averaged simulated evaporation rates 15 will be detected based on weighted local regression, using the LOcally WEighted Scatter-plot Smoother (LOWESS) method (Cleveland, 1979). This method is applied to the observational data for the historical period, and to the average of the 16 RACMO members (Section 2.3) for the future period. Mean and standard deviation are calculated using the average of the RACMO members, where the standard deviation is calculated based on the de-trended time series. 20 A difference in sensitivity of the methods to drivers of evaporation can help to explain the (dis)agreement found in behaviour of simulated evaporation. To compare this sensitivity a perturbation of 1% will be imposed one by one on the daily observational data from De Bilt of four key variables, namely air temperature, solar radiation, wind speed and vapour pressure, without changing the other variables. For this, the percentage difference of simulated evaporation with perturbation of one of the variables, to the simulation of the baseline situation without any perturbation, will be calculated.

25
The correlations between the simulated evaporation using the different methods and various meteorological variables using data from De Bilt will be calculated based on Pearson's parametric correlation method, measuring the linear dependency between two variables. We will calculate the correlations for all the timescales ranging from hourly to 10-yearly period all based on hourly data. To ensure that the number of data points in the calculation is not influencing the results, we will apply bootstrapping to artificially create the same number of data points for each timescale. The diurnal signal of several meteorological variables and simulated and observed evaporation are depicted in figure 3. The solid lines are the results from a sunny day without any clouds in March 2016, while the dotted lines reflect the average diurnal cycles. It can be seen that net radiation (R n ) peaks between 1pm and 2pm, which coincides with the peaks of evaporation sim-5 ulated using the radiation-based methods, i.e. Penman (PN), de Bruin-Keijman (BK) and Makkink (MK). The peaks of these radiation-based methods are simulated two hours later than the observed evaporation peak in Cabauw. Simulated evaporation becomes negative during the night following net radiation in case of the PN and BK method. However, this is not realistic given the fact that the energy balance is not preserved at the water surface, which is assumed by these methods, and the energy released from heat stored in the water can exceed the net radiation at night. This will drive both a positive evaporation 10 and sensible heat flux. Evaporation simulated with the wind-driven Granger-Hedstrom (GH) method is damped relative to the radiation-based methods, and it is rather constant throughout the day, with a small peak following the signal of the wind speed.
The signal of evaporation resulting from FLake (FL) is damped as well, and its peak is lagging 2 hours behind relative to R n , induced by a combination of K in , wind speed and T air . The total average diurnal evaporation is significantly lower than for the radiation-based methods. This can be explained by the remaining heat storage term of the energy balance in FL, which is 15 used to warm up the water. This heat storage capacity is explicitly accounted for in FL. Hargreaves' temperature-based method (HA) is not depicted in the graph because the method does not allow to calculate evaporation at hourly timescales given the available data (see Sect. 2.4). The diurnal range of the water temperature (T water ) is small, but shows a distinct peak four hours lagging behind the peak of R n , which indicates that during the day heat is stored in the water, which partly is released back during the night. The dotted line illustrating the average diurnal cycle of T water was artificially lowered by 7.5 degrees Celsius 20 in this graph, merely to show its diurnal course and the timing of the peak. Figure 4 illustrates the diurnal cycle of the (a) latent heat flux, (b) vapour pressure, and (c) temperature, which are plotted as a function of K in for a sunny day without any clouds in March 2007. It can be seen that evaporation simulated using the MK method is almost in phase with K in (phase lag is 7 min), which may be expected from a method that uses K in as the main 25 driving force of evaporation. The largest phase lag (=114 min) results from the simulation with FL; this can be explained by the fact that this method explicitly accounts for heat storage changes, resulting in a less direct response of evaporation to K in .
The analysis of diurnal patterns thus shows that K in explains most of the variance in evaporation for the methods PN, BK and MK. However, the methods that incorporate WST (method GH) and heat storage capacity (FL) as an explaining variable show less variation in evaporation during the day and therefore do not relate directly to K in .

30
Air temperature shows a pronounced counter-clockwise (CCW) loop when plotted as a function of global radiation, which indicates that heat is stored in the lower part of the atmospheric boundary layer. However, the water surface temperature shows very little response to K in . This combination leads to another distinct hysteresis loop of the driving force of sensible heat flux, namely the gradient T s -T a , which has a clockwise (CW) direction. This implies that the latent heat flux is preceded by the However, the saturated vapour pressure, which is a function of T air displays pronounced CCW hysteresis. As a result, VPD also has a large CCW hysteresis loop, which is the key driver of E in the PN method.
Following the analyses of the diurnal cycle the occurence of storage and release of heat in the water body is demonstrated through the diurnal cycle of T water , and should therefore be accounted for by an evaporation method. FL is the method that 5 mimics this thermal inertia effect most clearly in its diurnal cycle, which is also supported through its phase lag to K in . This makes the FL method the most realistic method to use at this timescale. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q (b) Saturated vapour pressure Vapour pressure deficit Vapour pressure q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q (c) Water surface temperature Air temperature T s − T a The results from the GH and FL method differ from the other methods in the sense that the RACMO realizations do not result in a significant trend in either the simulated historical or the projected future period, while in the historical period the simulation 10 based on observations presents a positive trend when using the FL method. The latter can be explained by the fact that the FL method is most sensitive to K in , T air and wind speed, of which for both K in and wind speed the RACMO realizations do not show any trend (not shown here). The mismatch in the historical period for FL between the simulation based on observations and the simulations based on RACMO realizations can be attributed to the inability of RACMO to capture the trend in K in . This inability will also influence the simulations made by the radiation-based models, which implies that for these methods simulated trends in the future period are likely to be stronger than demonstrated in figure 5. This will lead to even larger disagreement of simulated evaporation between the six methods.

Long-term simulations and projections
Comparison between the methods of the overall positive trend that is projected to continue in the future period, reveals that not only the average simulated evaporation of all RACMO members (µ) will increase in the future for all the methods, ranging from an increase of 0.02 mm d −1 (GH and FL) to 0.24 mm d −1 (PN), but also that the variability (σ) is projected to increase 5 (see Fig. 5). This is also demonstrated in figure 6 for both the historical (orange) and future (blue) period. Here the spread is defined as the difference between the yearly average maximum and minimum value of the 16 RACMO members based on daily evaporation rates. For each method the average and the spread is projected to increase in the future, however, the rates at which this happens differ significantly between the methods. The largest difference in spread can be found between the GH method, which has a spread of 0.32 mm d −1 , and PN method with a spread of 0.69 mm d −1 in the historical period, and increases to 10 a range that varies from 0.36 mm d −1 (GH) to 0.94 mm d −1 (PN) in the future period. The methods that resemble each other most, and differ least in spread, are the PN and BK method during both the historical and future period. The upper and lower set of panels depict the same information arranged per variable or per method, respectively. Most of 15 the methods show the largest sensitivity to T air and K in . The purely temperature driven HA method is only sensitive T air . The wind-driven GH method is the method most sensitive to wind (1% difference), which is expected, but FL also is sensitive to wind showing a difference of 0.5% compared to the baseline situation without any perturbation. This implies that the behaviour of these models would start to deviate more from the other methods if for instance the wind speed regime were to change in the future. In general FL is the most sensitive method, and therefore this method will start to differentiate more from the other 20 methods when these four meteorological variables were to change in the future. hence the empty boxes at the hourly timescale. It becomes apparent that at hourly timescales there is a positive correlation 25 between wind and simulated E ranging from 0.12 for the MK method, to 0.84 for the GH method. At larger timescales, this correlation becomes negative, except for the wind-driven GH method, and for FL it remains positive until a daily timescale. At the two largest timescales considered here ( Fig. 8e and f) the correlation with wind increases again, but these are statistically insignificant except for the GH method. The correlation between net radiation and simulated E remains high at all timescales, except for the GH method. The latter method shows low correlations with the other methods and most meteorological variables, 30 which mostly become statistically insignificant (indicated by the numbers in grey) from weekly timescales and larger. The correlation between net radiation and E simulated using FL increases steadily from 0.63 to 0.94 with increasing timescale.

Model (dis)agreement at various timescales
Regarding the correlation between E and T air , and E and VPD, the matrices show that the correlations increase steadily for all evaporation methods, again with the exception of the GH method. However, it should be noticed that there is a dip in the  The differences of these statistics between the historical and future period are given in green. In the lowest panel the yearly averaged observed evaporation at meteorological site Cabauw is presented in red.  correlation at yearly timescales. Furthermore, the correlation between the meteorological variables R n , T air and VPD was found to increase with increasing timescale.
The radiation-based methods agree highly with each other, as can be expected, over the entire range of timescales, with correlation values above 0.9. The GH method deviates most in terms of correlation from the other methods, which could be attributed to the difference in sensitivity to wind speed. The HA method shows the largest increase in agreement to the 5 radiation-based methods with increasing timescale. This can be attributed to the increasing correlation between temperature and net radiation, which are the main driving forces of the HA and the radiation-based methods, respectively. Comparing the average correlations between the models for each timescale reveals that the correlation increases with increasing timescale from 0.72 at hourly to 0.97 at 10-yearly timescale. This increase implies that the methods on average tend to agree more with each other at coarser timescales, which means that at smaller timescales the consequence of model choice becomes more apparent.

10
The average correlation at the daily timescale is lowered mainly as a consequence of the low correlations of the GH method to the radiation-based methods. It should be noted that insignificant values were not incorporated in the calculation of the average correlations.
In Fig. 9 the correlation (r) between simulated evaporation and VPD, T air and R n respectively are depicted to further explore the 15 (dis)agreement between the methods over the range of timescales. This correlation has been calculated based on observational data from De Bilt. In general there is a high correlation (r > 0.75) between simulated E and VPD, which is increasing with increasing timescales, up to correlations of 0.97 at the 10-yearly timescale. However, there is a small drop at the yearly timescale, which is also visible in figure 8. We were not able to determine the exact cause of this, but it could be a result of cross-correlation in the data. The GH method deviates in behaviour compared to the other five methods, since it is the only 20 method that does not explicitly include T air , and with that VPD through the saturated vapour pressure, which is a function of temperature. The GH method only includes a temperature gradient, which is determined using water temperature resultant from FL model and air temperature from De Bilt, as additional information to wind speed. The correlation between simulated E and T air increases from 0.57 at the hourly timescale up to 0.98 at the 10-yearly timescale, again with a small drop at the yearly timescale. Furthermore, similar to the correlation with VPD, the GH method again demonstrates a different behaviour compared to the other methods. There is a strong increase of the correlation at the yearly and 10-yearly timescale for this 5 method, which could be related to a change of the sign to positive values for the correlation between wind speed and T air at these timescales (see Fig. 8). The correlations to R n demonstrate clearly the resemblance between the radiation-based methods PN, BK and MK. The HA method shows similar correlations as well, except for the largest two timescales where the correlation is lower. The FL method correlates better to R n with increasing timescale, confirming that at small timescales R n is not a direct driver of E as a result of heat storage. The correlation between R n and E simulated by the GH method is close to zero for daily timescales and larger. However, at the hourly timescale the correlation is 0.32. This corresponds to the larger correlation between wind speed and R n at this timescale. The general increase of the correlations with the meteorological variables that was found with increasing timescales, implies that ultimately these variables act as driving forces of potential evaporation. However, at the shorter timescales some of these methods may fail because the wrong controlling variables are used to simulate actual evaporation. This is especially the case for methods that use R n as a driver of evaporation and that omit the effect of thermal inertia as a result of heat storage.
3.4 Implications for water resources management 5 Table 3 presents for several years the water loss through evaporation expressed as water depths in mm for the IJsselmeer region.
Meteorological data from De Bilt was used to force the evaporation methods, and although there likely will be variation in the spatial distribution of the evaporation rate over the lake, until now it is unknown to which degree this variation exists. The following results thus will merely show the difference between the methods rather than the spatial distribution when applied to the IJsselmeer region. For this we considered two average years (1986 and 2009)   model simulations (De Lange et al., 2014) and are used as a reference only (see Table 2).
Large differences were found between the evaporation methods in terms of the amount of water losses through evaporation annually. The HA method and FL are at both ends of the spectrum, where HA simulates largest losses through evaporation (745 mm), while FL simulates less than half of that (310 mm) in 1986. During dry years the total water loss through evaporation 5 increases to on average 677 mm, but within a specific year, e.g. 2003, it ranges from 868 mm using the HA method to 376 mm using FL to simulate evaporation. Due to the large differences of simulated evaporation between the various methods, the result of the simulated water balance of the lake could be positive or negative. This means that based on the method used to simulate evaporation, the water managers would make different decisions on whether to stop the water inlet to the surrounding land for agricultural purposes or not for instance, or whether more or less water needs to be discharged to the Waddenzee to keep the 10 level of the IJsselmeer region within a safe range. And because there are no direct operational measurements of evaporation available in the IJsselmeer region, this stresses even more that the knowledge of the discrepancy between the methods is of great importance for water managers in their decision making.
Based on the results of the RACMO realizations, it is demonstrated that the discrepancy between the methods is projected to increase from the year 2000 to 2100. The radiation-based and temperature methods show a growing water loss through 15 evaporation, while for GH and FL the water loss remains similar. All methods show increasing evaporation rates leading to lower water availability in the region, where the change in mean water depth from the year 2000 to 2100 is ranging from -4 mm (GH) to -94 mm (PN). This means that when water managers would rely on the PN method rather than on the GH method for instance, they will have to decide for instance to supply less water to the surrounding agricultural land.
The effect of the projected changes of evaporation in the future was found to be largest in summer (Fig. 10), which coincides 20 with the period that has the highest evaporation rates. This summer period is therefore most critical regarding water resources management, also considering the fact that more extreme periods of drought are expected to occur more frequently. During summer, evaporation is thus projected to increase, and at the same time it is likely that with increasing temperatures the river discharge from the IJssel will decrease (Görgen et al., 2010). So less water will be available during summer, while the water demand from the surrounding land surfaces will increase for agricultural purposes. Therefore, the trade-offs that water 25 managers need to make become very precarious, especially knowing that their decision is based on a certain chosen method that can differ significantly from another method in total predicted evaporation.

Conclusions
The aim of this study was to explore the effect of various conceptualizations of evaporation on shorter-term local water management and long-term hydrologic projections. We focussed on timescales ranging from hourly to 10-yearly periods, where we have (i) elaborated on the differences found in the diurnal cycle, but also (ii) quantified the (dis)agreement between the methods over the range of timescales in terms of correlations. And moreover, we studied how the evaporation rates according 5 to the various methods are projected to change in the future.
Our study characterized that there is a large effect of the type of forcing that is used by the evaporation method, especially on the simulated diurnal cycle. This difference is reflected in the total average daily evaporation, the timing of the evaporation peak and the day/night cycle. The methods that use the radiation approach (PN, BK and MK) follow the diurnal cyle of the net radiation, where MK becomes zero at night, and PN and BK negative. The simulated evaporation resulting from the GH and 10 FL methods are more constant throughout the day and on average they show a continuation of evaporation during the night.
Data of T water demonstrated heat storage during the day and release of heat during the night. FL is the method that mimics this effect of thermal inertia most clearly in its diurnal cycle. This makes the FL method the most realistic method to use at this timescale. At larger timescales we found a disagreement between the methods in the trend of yearly averaged daily evaporation rates both for the historical (1960 -2018) and future period (2019 -2100). Although both the average and the variability of 15 all methods is projected to increase in the future, the rate at which this happens differs significantly between the methods. The average evaporation rate increase ranges from 0.02 mm d −1 (GH and FL) to 0.24 mm d −1 (PN).
A discrepancy at the whole range of evaluated timescales, from hourly to 10-yearly, is especially present between the radiation and temperature methods (PN, BK, MK and HA), and the wind driven (GH) and physical-based lake (FL) methods.
However, this disagreement generally decreases with increasing timescale, which is to be expected considering that ultimately 20 evaporation is constrained by the energy input in the system and the transport of water vapour. However, the difference at yearly timescales is still significant. The difference between the methods at yearly timescale is also demonstrated when the total yearly water losses through evaporation for lake IJsselmeer are calculated. For an average year this can range between 417 mm (FL) and 817 mm (BK). During a sensitive dry year this discrepancy can result in water levels to be simulated to either rise or drop, solely depending on the evaporation method that is used. When considering future simulations, the projected change in mean 25 water losses through evaporation, expressed as water depths, ranges from -4 mm (GH) to -94 mm (PN) when comparing the year 2000 to 2100. This means that when water managers would rely on the PN method rather than on the GH method, they might have to decide to supply less water to the surrounding land in the future. Therefore, owing to the disagreement between the methods, it reveals that the choice of method is of great importance for water managers in their decision making. To gain confidence in which method is most reliable to use, now and in the future, we suggest to perform long-term direct observations 30 of E water at high temporal resolution. This will help to improve optimal parameterization of E water in hydrological models.