Contribution of Potential Evaporation Forecasts to 10-day streamflow forecast skill for the Rhine river

Medium term hydrologic forecast uncertainty is strongly dependent on the forecast quality of meteorological variables. Of these variables, the influence of precipitation has been studied most widely, while temperature, radiative forcing and their derived product potential evapotranspiration (PET) have received little attention from the perspective of hydrological forecasting. This study aims to fill this gap by assessing the usability of potential evaporation forecasts for 10-day-ahead streamflow forecasting in the Rhine basin, Europe. In addition, the forecasts of the meteorological variables are compared with 5 observations. Streamflow reforecasts were performed with the daily wflow_hbv model used in previous studies of the Rhine using the ECMWF 20-year meteorological reforecast dataset. Meteorological forecasts were compared with observed rainfall, temperature, global radiation and potential evaporation for 148 subbasins. Secondly, the effect of using PET climatology versus using observation-based estimates of PET was assessed for hydrological state and for streamflow forecast skill. 10 We find that: (1) there is considerable skill in the ECMWF reforecasts to predict PET for all seasons, (2) using dynamical PET forcing based on observed temperature and satellite global radiation estimates results in lower evaporation and wetter initial states, but (3) the effect on forecasted 10-day streamflow is limited. Implications of this finding are that it is reasonable to use meteorological forecasts to forecast potential evaporation and use this is in medium-range streamflow forecasts. However, it can be concluded that an approach using PET climatology is also sufficient, most probably not only for the application shown 15 here, but for most models similar to the HBV concept and for moderate climate zones. As a by-product, this research resulted in gridded datasets for temperature, radiation and potential evaporation based on the Makkink equation for the Rhine basin. The datasets have a spatial resolution of 1.2x1.2 km and an hourly timestep for the period from July 1996 through 2015. This dataset complements an earlier precipitation dataset for the same area, period and resolution. 20


Introduction
Hydrologic forecasting has the aim of predicting the future state of important hydrologic fluxes, most notably streamflow.Throughout the process of forecasting, from model setup via initial state estimation to forecast run, meteorological forcing is a key component.Precipitation is known to be the main driver of hydrological processes, and most of the forecast uncertainty is attributed to inaccurate precipitation forcing (Cuo et al., 2011;Pappenberger et al., 2005).As a consequence, most attention has been given to the accuracy of precipitation forecasts.See for example the review of Cloke and Pappenberger (2009).
Evaporation is a result of the interaction between meteorological forcing and physical and physiological processes at the land surface.Meteorological forcing provides the potential energy (potential evaporation or PET) for evaporative processes to take place.There are many formulas to estimate the potential energy available for evaporation, which can be divided in three types of formulas based on their Published by Copernicus Publications on behalf of the European Geosciences Union.
Constraints on data availability have led to additional approximations for potential evaporation.A common approximation is the calculation of a monthly potential evaporation climatology or PET demand curves (Bowman et al., 2016).This climatology is then used as a driver for both historic potential evaporation and future potential evaporation.
Hydrological models have proven to be insensitive to the difference between variable potential evaporation forcing and climatological monthly potential evaporation forcing with respect to the model's potential to estimate streamflow after calibration (Andréassian et al., 2004;Oudin et al., 2005a;Oudin et al., 2005b).However, in forecasting, different choices in the handling of forcing data can be made between the historic update step and the forecast step, while the hydrological model, as a rule, remains the same.It therefore remains relevant to understand how a single model reacts to potential evaporation forcing.Insensitivity to the type of potential evaporation during the process of calibration does not mean that a model is insensitive to the form of potential evaporation input.
As mentioned above, there has been little attention to the forecast skill of the secondary forcing variables temperature and radiation in the hydrological context of potential evaporation.Furthermore, there is an easy and often used practice of avoiding potential evaporation forecasts by using a potential evaporation climatology.Therefore, the objective of this study is to assess to what extent potential evaporation forecasts can contribute to streamflow forecast skill.
This question is evaluated for the Rhine basin in Europe (Fig. 1).The Rhine is one of the basins currently employed as a case study for the IMproving PRedictions of EXtremes (IMPREX) project, which aims to improve predictions and management of hydrological extremes through climate services (van den Hurk et al., 2016).
Several studies already directly addressed some aspects of operational ensemble flow forecasts in the Rhine.Renner et al. (2009) showed that at the time meaningful hydrological ensemble forecasts could be produced up to a 9-day lead time for the Rhine River based on ECMWF ensemble meteorological forecasts.Reggiani et al. (2009) used a Bayesian ensemble uncertainty processor to improve the assessment of uncertainty in the ensemble forecast.Terink et al. (2010) applied downscaling techniques to ERA15 ECMWF reanalysis data to develop a downscaling strategy for regional climate models (RCMs).Verkade et al. (2013) developed postprocessing techniques to improve the precipitation and temperature ECMWF forecasts for the hydrological model.Photiadou et al. (2011) compared two historical precipitation datasets and assessed the influence of precipitation datasets on model results.Recently, van Osnabrugge et al. (2017) developed a high-resolution hourly precipitation dataset for use with gridded hydrologic models.
To answer the research question, model experiments are performed, but first the data and hydrological model are presented (Sect.2).Second, the model experiments are described, which also partitions the main question into three subquestions (Sect.3) which are subsequently answered (Sect.4).The paper concludes with a discussion on the results in the wider context of evaporation modeling in hydrologic forecasting and the conclusions (Sect.5).

Data and model
Observational data have been preprocessed for use with a grid-based hydrological model.The data were processed with hourly time resolution, on a 1.2×1.2km grid spatial resolution, and for the period mid 1996 through 2015.All source data to derive the gridded estimates come from sources that supply their data in near real time, making the datasets suitable for operational forecasting.For this study all data were aggregated to a daily time step.The hourly datasets are downloadable through the 4TU data center (van Osnabrugge, 2017(van Osnabrugge, , 2018)).

Precipitation
For this study the precipitation dataset from van Osnabrugge (2017) is used.The precipitation dataset has been derived using the genRE interpolation method based on ground measurements and the HYRAS (Rauthe et al., 2013) climatological precipitation dataset (van Osnabrugge et al., 2017).

Temperature
Temperature observations  are interpolated on the same 1.2 × 1.2 km grid as the precipitation data.Temperature is derived from interpolation of ground measurements with correction for elevation using the SRTM digital elevation model (Farr et al., 2007) and standard lapse rate as follows.
To calculate temperature T x at a given grid cell x from a number of n surrounding stations, determine a set of weights based on inverse-distance squared weighting between all stations (typically the n closest stations) and the grid cell.This step can have a threshold for maximum distance.d i,x is the distance between station i and cell x: . (1) Second, interpolate the measured temperature T m,i with the weights as with standard inverse-distance squared interpolation:  (2) Third, calculate the temperature lapse correction term T γ ,x as the weighted difference between the height of the grid cell H x and the height of the considered stations H i multiplied by the lapse rate γ .
Note that T γ ,x is static for a fixed configuration of the measurement network if γ is taken to be a constant.In this study the configuration of the measurement network changed based on the number of reporting stations at each time step.A constant lapse rate was assumed: γ = 0.0066 ( The final temperature estimate for grid cell x is obtained by adding T γ ,x and T m,x : (4)

Downward shortwave surface radiation flux
The availability of solar radiation measurements at the surface has proven to be spatially and temporally inadequate for many applications, with remotely sensed solar radiation products having the largest potential to remedy this (Journée and Bertrand, 2010).Remotely sensed solar radiation estimates from the Land Surface Analysis Satellite Application Facility (LSA-SAF) were found to be in closer agreement with ground observations than reanalysis datasets such as the Système d'Analyze Fournissant des Renseignements Atmosphériques à la Neige (SAFRAN) reanalysis (Carrer et al., 2012) and ERA-Interim (Jedrzej et al., 2014).For this study, downward shortwave radiation is resampled and merged from the EUMETSAT Surface Incoming Solar Radiation (SIS) (Mueller et al., 2009) and Downward Surface Shortwave Flux (DSSF) (Trigo et al., 2011) products from the Climate Monitoring Satellite Application Facility (CM-SAF) and LSA-SAF, respectively.Gaps in the satellite data are filled with the ERA5 surface solar radiation downwards (ssrd) parameter from the 4d-var reanalysis (Copernicus Climate Change Service, 2018).ERA5 was found to have comparable mean bias with satellite-derived products for inland stations (Urraca et al., 2018).
In earlier research it has been shown that LSA-SAF (2005-current) and CM-SAF (1983-2005) can consistently be merged into one time series (Jedrzej et al., 2014).The products of the different SAFs are comparable in terms of bias and standard deviation (Ineichen et al., 2009).

Makkink potential evaporation
There are different approaches in making use of remotely sensed data to calculate evapo(transpi)ration.One branch of research aims to calculate actual evapotranspiration directly from satellite imagery (Su, 2002).Applications range from estimating the global evaporation flux (Mu et al., 2011), water resources management (Bastiaanssen et al., 2005) and constraining model parameters for a gridded model (Immerzeel and Droogers, 2008).
For operational use, PET estimates can be derived from satellite data only, or from a combination of satellite imagery and ground measurements.Bowman et al. (2017) explored the use of MODIS to provide a daily PET, both as dynamic PET (Spies et al., 2015) and PET climatology (Bowman et al., 2016) for a gridded and lumped version of the Sacramento Soil Moisture Accounting (SAC-SMA) model.The model was recalibrated for each PET input.No configuration with MODIS-derived PET showed consistent improvements across all basins in their case study.Still, it was concluded that the combination of dynamic PET in combination with a gridded model had the best overall results (Bowman et al., 2017).
A disadvantage of using satellites such as MODIS is their temporal coverage which is restricted to a single overpass at a set time each day giving one instantaneous value.This can be resolved by assuming a sinusoidal development of PET over the day (Kim and Hogue, 2008), but the limitation is clear.This disadvantage is resolved by using geostationary satellites.For example, Jacobs et al. (2009) used solar radiation from the NOAA GOES geostationary satellite in combination with ground observations to calculate daily PET with the Penman-Monteith equation.
Here, potential evaporation is calculated from geostationary satellite radiation estimates and ground observations of temperature with the method proposed by Makkink (1957), which is applicable with remotely sensed radiation estimates (de Bruin et al., 2016).PET calculated with Makkink's equation is a reference crop evapotranspiration, which means that crop factors apply determined by the hydrological model.In the setup of our hydrological model the crop factor was determined by land use.A crop factor of 1.15 is applied to the forested areas, and 1.0 to all others.
The reasons for choosing the Makkink equation are that (1) it only needs radiation and temperature, for which gridded estimations are available, and that (2) the Makkink equation is used by the Royal Netherlands Meteorological Institute (KNMI), so that the work presented here is compatible with ongoing local research (Hiemstra and Sluiter, 2011).
The potential evaporation is calculated based on air temperature T ( • C) and downward shortwave radiation R g (W m −2 ) for accumulation period t (s) (Hiemstra and Sluiter, 2011): with ψ the psychrometric constant, λ the latent heat of water, the slope of the saturation vapor pressure curve and ρ w the density of water calculated by The Makkink potential evaporation calculated for each time step is called "near real time" (PET NRT ).The potential evaporation climatology (PET Clim ) was calculated by averaging over the full time period (20 years) for each day (Fig. 2).

ECMWF reforecast
The  model cycle for certain days for the last 20 years.The reforecast obtained for this study was produced with model cycle 43r1 (Buizza et al., 2017).The first forecast is on 10 March 1996 and the last forecast on 29 December 2015, with reforecasts alternating every 3 or 4 days.Forecasted Makkink potential evaporation (PET Fcast ) is calculated based on the t2m (T ) and ssrd (R g ) variables using Eqs.( 5)-( 9).Temperature was first downscaled to the model resolution using the standard lapse rate as used in the interpolation of the temperature observations as follows: with T the temperature given by the ECMWF forecast on the ECMWF resolution, h the average height of the DEM corresponding to the footprint of the ECMWF grid cell, h x the height of cell x in the model, and γ the lapse rate.

Hydrological model
wflow is a modular hydrological modeling framework that allows for easy implementation and prototyping of regular grid hydrological model concepts in python-pcraster (Schellekens et al., 2017).The hydrological model concept used is the HBV (Hydrologiska Byråns Vattenbalansavdelning) model concept (Lindström et al., 1997) applied on a grid basis.The generated runoff is routed through the river network with a kinematic wave approach (Schellekens et al., 2017).In the following this model is referred to as wflow_hbv.The setup of the hydrological model is the same as used in assessing the validity of the genRE precipitation dataset (van Osnabrugge et al., 2017).The model was parameterized through calibration with a generalized likelihood uncertainty estima-tion (GLUE) like procedure (Beven and Binley, 1992), using HYRAS precipitation as forcing data (Winsemius et al., 2013a, b).The model is taken "as is" and is not recalibrated for each PET forcing, the effect of which has been studied extensively elsewhere (e.g., Bowman et al., 2017;Oudin et al., 2005a).

Experimental setup
The analysis consists of a meteorological part and a hydrological part (Fig. 3).

Analysis of meteorological forecast skill
In this analysis we aim to answer the following question.What is the forecast skill of temperature, radiation and potential evaporation compared to precipitation?For this purpose the observations and forecasts are spatially averaged over 148 subbasins (Fig. 1).Time series of observations and forecasts are then used to calculate the mean continuous ranked probability skill score (CRPSS) for each basin and each season (MAM, JJA, SON, DJF).
The mean continuous ranked probability score (CRPS) is an overall measure of forecast quality and is calculated by in which F y (y) is the cumulative distribution function of the forecast variable and H(y ≥ x) the Heaviside step function that assumes probability 1 for values greater than or equal to the observation and 0 otherwise (Brown et al., 2010).Interpretation of the mean CRPS is similar to interpretation of a root mean square error.Both scores have no fixed upper bound, their magnitude is determined by the variable, and lower scores are better, with 0 the perfect score.
The limits of the mean CRPS vary depending on the basin and season, and it is therefore difficult to compare between basins and season.For this reason the CRPS is translated into the continuous ranked probability skill score, which measures the performance of a forecasting system relative to a reference forecast.The reference forecast here is seasonal climatology.As such the CRPSS equals 1 for a perfect forecast and 0 when the forecast ensemble does not score a better CRPS than the CRPS calculated for the climatological distribution.

CRPSS
Additionally, the relative mean error (RME) is calculated for the mean of the forecasts Y i to detect relative biases in the mean: in which Y i is the mean of the ensemble for forecast i and x i the corresponding observation.
The above scores are calculated with the Ensemble Verification System (EVS), a software package to calculate ensemble verification metrics (Brown et al., 2010).

Analysis of the effect of PET forecasts on streamflow predictions
In this second part of the analysis we aim to answer the following questions.To answer the first question, the wflow_hbv model is consecutively forced with PET Clim and PET NRT .Four states and two fluxes are exported for analysis: (1) upper soil reservoir, (2) lower soil reservoir, (3) interception storage, (4) soil moisture store and fluxes, (5) discharge and (6) actual evaporation.For the different states and fluxes the mean difference (MD) is calculated for each grid cell.This is done for each season to investigate seasonality of differences.The MD is calculated as To answer the second question, two hindcast runs are performed with PET Fcast and PET Clim as PET forcing, respectively.To avoid effects caused by the initial state, all forecasts start from the initial states derived from the PET NRT simulation.Forecast skill scores are calculated as for the meteorological variables for 20 discharge gauges and for each season.Different from the meteorological verification exercise, the metrics are calculated for the forecasts with reference to the model output, and are not compared with observations.The reason for this was that differences between observation and forecast stem from many different sources, including errors in the initial state.Subsequently, a forecast that is "too wet" might compensate in the 10-day forecast for initial states that were "too dry".For this reason the effect of the meteorological forecast was isolated by calculating the verification metrics against modeled streamflow.This also avoids issues of perceptive bias due to the model being calibrated on another PET forcing; one of the PET types might simply perform better because it is more like the original PET used in calibration.
Streamflow gauges for analysis were selected such that 1.only gauges were chosen for which the model was deemed behavioral as expressed by a KGE score threshold of 0.5; 2. only one gauge was selected for each stream in the basin, except for the Rhine River itself, for which two additional gauges were chosen.If multiple gauges in the same stream were present, the gauge most downstream was chosen.The gauge "most downstream" was selected by sorting on mean yearly discharge and picking the highest; and 3. from the then remaining list, the largest 20 streams were selected for analysis.
The streamflow locations are shown in Fig. 1 as black squares including the name of the river.

Analysis of meteorological forecast skill
The forecast skill is assessed for all catchments and for each season.Seasons are Northern Hemisphere seasons spring (MAM), summer (JJA), autumn (SON), and winter (DJF).Figure 4 shows the mean CRPSS calculated for subsamples of all forecast-observation pairs for different levels of exceedance, P (X ≥ x), for each variable.Simply put, the CRPSS value at P (X ≥ x) = 0.1 is calculated for the top 10 % of observations and the CRPSS value at P (X ≥ x) = 0.7 is calculated for the highest 70 % of the observations.P (X ≥ x) is calculated over all observations from all seasons.This means that for some seasons, for example temperature in winter, there is a lower limit in P (X ≥ x), because the highest temperatures do not occur during winter.
On the other hand, the response of the CRPSS curve is flat for high P (X ≥ x) for temperature during summer, as all summer temperatures fall in the highest 60 % of temperatures of the whole year.The same is shown for the CRPS, Fig. 5, and the RME, Fig. 6.There is no skill in the ECMWF forecast beyond 10 days for daily precipitation.This is consistent with the 9-day lead time in streamflow forecasts found by Renner et al. (2009).The skill is best in winter and worst in summer, which is expected based on the dominating meteorological processes (frontal systems in winter and convective events in summer).The total amount of precipitation is underestimated after a 1-day lead time (Fig. 6).
There is more skill in the forecast for the variables temperature and incoming shortwave radiation.Likewise, there is considerable skill remaining in the potential evaporation forecast.For temperature the 1-day forecast is close to perfect for autumn and spring.The skill in temperature forecast is similar for spring, summer and autumn but worse during winter.The spread, the difference in skill between basins, is also largest during winter and spring.The RME shows that there is a small negative bias in the temperature forecasts.The RME for winter is largest; however, it should be noted that the RME is the mean difference weighed by the mean of the observations (Eq.13).As the mean temperature in winter is closer to zero, this results in larger RME.Still, also when expressed in absolute values, the error for temperature during winter is larger than for other seasons (Fig. 5).
For radiation there is already quite a considerable loss in skill after 1 day, but then the CRPSS remains quite stable for longer forecasts, notably during spring and autumn.There is a larger decline in skill for summer and for extreme low radiation values in winter.In absolute terms, the CRPS is related to the magnitude of the average radiation for each season, with the smallest absolute errors for winter and the largest during summer (Fig. 5).In terms of bias, we see that the relative mean error increases with lower P (X ≥ x) (Fig. 6, row 3).This indicates that low values are slightly overestimated while high values are slightly underestimated, making the radiation forecasts slightly less extreme than the observations.This is further demonstrated in Fig. S1 in the Supplement, which plots the RME for different levels of nonexceedance (P (X ≤ x)), as opposed to exceedance in Fig. 6.
The skill of the potential evaporation forecast is closely tied to the skill in radiation forecast, both because Makkink potential evaporation is directly proportional to radiation and because of the larger uncertainty in the radiation forecast.The forecast skill has the same properties as those found for the radiation forecast.A small difference is that part of the forecast skill in temperature is found back in a slightly improved forecast skill after 10 days for PET compared to radiation in summer.
Overall, there is relatively little spread in skill between basins, with the 10th and 90th percentiles close to the mean and following the same trajectory.The difference in skill between the different seasons is larger than the spread between basins, especially for the variables temperature, radiation and potential evaporation.This difference in skill between seasons is partly misleading.For example, the forecast skill for radiation in winter (Fig. 4, purple line) appears to be an outlier.However, the whole range of occurrences of extreme high and low radiative forcing is compressed in a limited part of P (X ≥ x).Although the forecast over the whole range of winter radiative forcing is lower than that for the other seasons, the top 10 % of winter radiative forcings are actually among the best predicted.
Likewise, high temperatures receive higher skill scores than low season temperatures.This is even more distinct in the radiation forecasts.This does, however, not mean that the forecasts of such rare events are more accurate: both RME (Fig. 6) and CRPS (Fig. 5) are larger for high extremes, CRPSSs are aggregated into mean (solid), 10th and 90th percentiles (dashed).Note that the CRPSS at P (X ≥ x) = 0.1 or 0.7 is calculated over, respectively, the 10 % and 70 % highest observation-forecast pairs, conditioned on the observations.meaning larger errors for those forecasts.Still, taking into account the rarity of the event by calculating the CRPSS, which is the skill of the forecast relative to the skill of a random draw from the climatology, temperature, radiation and potential evaporation forecasts are found to add most information for extreme high values, even though the error of those forecasts is larger than for more "average" values (values with a higher probability of occurrence).

Influence of dynamic PET on initial states
Dynamic potential evaporation leads to lower actual evaporation (AET).The difference is largest for summer and spring (Fig. 7).Part of this lower evaporation is from a reduction in interception as the interception storage is more filled on average under dynamical forcing.This can be explained by the correlation between precipitation events and low potential evaporation.On rainy days the dynamic potential evaporation is generally lower, which decreases the amount of interception evaporation.Under climatological forcing the energy available is not reduced, and thus more water evaporates from the interception store.The latter is sometimes taken into account in hydrological models by adding a potential evaporation reduction function dependent on the intensity of precipitation to correct the PET climatology.For example, the HBV model has this option (Schellekens et al., 2017).
The lower evaporation with dynamic PET forcing cascades through the different model storages, accumulating in a mostly wetter lower zone (LZ) storage under dynamic forcing.Finally, the lower evaporation results in higher discharge throughout the Rhine basin (see Figs. S2-S7).Exceptions are the high Rhine during spring and to a lesser extent during autumn, and several areas during winter when there is very little effect overall.The wetter conditions also result in higher peak discharges.As these higher discharges are a result of the temporal dynamics of the potential evaporation input, we expect to find a similar effect on forecasted discharges.As will be shown later (Fig. 9), this is indeed the case, albeit very limited.
Figure 5. Continuous ranked probability score (CRPS) for the four forcing variables for the 148 HBV subbasins for the whole year.CRPS is aggregated into mean (solid), 10th and 90th percentiles (dashed).Note that the CRPS at P (X ≥ x) = 0.1 or 0.7 is calculated over, respectively, the 10 % and 70 % highest observation-forecast pairs, conditioned on the observations.

Influence of PET forecast on streamflow forecast
The CRPSS for streamflow forecast is hardly influenced by potential evaporation forcing type.At first sight, the skill scores obtained with dynamic or climatological PET are identical.Small differences only become visible when taking a close-up of the differences by subtracting one from the other (Fig. 8).However, the small difference in skill grows with lead time.The influence of PET forcing type becomes more intuitive when looking at the RME.An increasing drift with lead time between PET forcing types is visible (Fig. 9).Interestingly, this drift in RME is almost uniform over all subsets of predicted discharge.The drift is positive, which means that forecasted PET leads to slightly higher forecasted discharges, as expected based on the results of the influence of variable PET on the initial states.
Analyzed for each season separately, there is a little more to discover about the role of potential evaporation forecasts and the sensitivity of forecast skill to the meteorological forecast in general.The contribution of the meteorological forecast to streamflow forecast uncertainty is largest for sum-mer, as shown by the largest decrease in CRPSS for the 10day forecast in summer compared to the other seasons.The CRPSS especially "dips" for the most extreme discharges, which is not as strong for spring and autumn, and especially compared to the flat response of the CRPSS for the highest 30 % of discharges in winter.
In terms of the effect of potential evaporation climatology versus forecasted potential evaporation, the influence is largest (but still quite small) for summer and spring.This is tied to the potential evaporation being of larger magnitude; there is hardly a response for winter, where there is the lowest potential evaporation.
The influence of PET forecasts on low flow prediction is further examined by calculating the scores for different levels of non-exceedance P (X ≤ x), instead of exceedance, so that the score value at P (X ≤ x) = 0.1 is calculated for the 10 % smallest observations and the score value at P (X ≤ x) = 0.7 is calculated for the lowest 70 % of the observations.Not only has the choice of PET forcing for the forecast hardly any effect on the forecasted streamflow (Fig. 10, bottom row), but the forecast skill of low discharge is also affected Figure 6.Relative mean error (RME) for the four forcing variables for the 148 HBV subbasins for the whole year.RME is aggregated into mean (solid), 10th and 90th percentiles (dashed).Note that the CRPSS at P (X ≥ x) = 0.1 or 0.7 is calculated over, respectively, the 10 % and 70 % highest observation-forecast pairs, conditioned on the observations.only slightly by the skill of the meteorological forecast in general.The meteorological forecast skill declines with lead time (e.g., Fig. 4), but the forecast skill of low percentile discharge remains almost perfect (very close to 1) compared to the model under perfect forcing.

Conclusions
This paper presented a simple and straightforward investigation with an operational forecasting practice perspective.First, observation data were preprocessed for use in the gridded wflow_hbv model.Second, the wflow_hbv model was subjected to dynamical and climatological PET forcing.Three aspects were analyzed: (1) the skill in meteorological forecast, (2) the effect of PET forcing on initial states and (3) the effect of PET forcing on forecast skill.
Nine to 10 days is the upper limit on forecast lead time for daily precipitation for the ECMWF forecast in the Rhine basin, with only very little skill remaining compared to climatology.There is considerable skill in daily temperature, radiation and potential evaporation forecasts, also after 10 days.Variable PET forcing resulted in lower evaporation and in wetter initial states and higher modeled discharges.
The main result of this study is that potential evaporation forecasts improved streamflow forecasts only slightly.This confirms earlier results that the influence of random errors on estimated streamflow was generally not measurable when comparing model runs directly, needing a 20% systematic bias in PET to influence model outcomes significantly (Parmele, 1972).Likewise, Fowler (2002) concluded that climatological PET estimates produced a soil water regime very similar to that derived with actual daily PET values, including extreme periods, for a site in Auckland, New Zealand.
There is a wider discussion on evaporation modeling in hydrological models (Andréassian et al., 2004;Oudin et al., 2005a;Oudin et al., 2005b) to which the results here might add a new perspective: that of evaporation as a process relevant for medium-term forecasts.This is directly also a limitation of this research; only the influence on forecasts up to 10 days was investigated.The influence on seasonal forecasting might be more substantial, considering that the modeling of evaporation strongly influences the partitioning between runoff and evaporation in the longer-term water balance (Bai et al., 2016).
Further limitations are that only one model was tested (wflow_hbv) and for one climate zone (moderate temperate).The model was calibrated originally on a different PET climatology than studied here and was not recalibrated.The latter is not seen as a limitation.Deliberately not recalibrating the model enabled us to focus on the changes in modeled processes instead of comparably vague assessments based on model performance expressed in efficiencies, with the effects brought forward by the PET forcing hidden somewhere in the parameter space.
In the analysis, forecasting metrics were calculated over subsets of observation-forecast pairs conditioned on the observations.Alternatively, the subsets could have been conditioned on the mean of the forecasts.This would present more intuitive information for a forecaster at the time of a forecast when the observation is by definition not yet known (Lerch et al., 2015).
The idea to look at potential evaporation forecast was instigated as part of a program to improve forecasts of low flows.Indeed, it is a recurring hypothesis that potential evaporation forecasts should aid especially in making low flow predictions.The uniform response of several skill scores for different subsets of observed discharge does not support this idea; there is no special gain for low flows.
Instead, from our model results it follows that the correct prediction of a drought is firstly dependent on a correct forecast of no rain.Low flow recession is subsequently determined, in the absence of further feedback mechanisms, solely by the storage-discharge relationship of, in this case, the lower zone representing the saturated zone as well as the routing of surface water.
The follow-up question then is the following.Is this true in reality, or is this a model deficiency?Should we rethink hydrological modeling to incorporate more feedbacks on evaporation?Certainly there are models with more complex representation of evaporative processes.These are valid and important questions, especially in the light of hydrologic response to change in climate drivers.However, from the results presented here, it should not be expected that a better understanding of evaporative processes and feedbacks will  result directly in a significant increase in 10-day predictive skill for streamflow.
Data availability.Gridded precipitation, temperature, radiation and potential evaporation used in this study are available through the 4TU data center; see van Osnabrugge (2017) and van Osnabrugge (2018).
Author contributions.BvO prepared the data, performed the analyses and wrote the article.RU contributed to improving the article and experimental setup.AW supplied part of the scripts used in the analysis and contributed in preprocessing steps of the forecasts as well as contributing to the design of the model experiments and final article text.
Competing interests.The authors declare that they have no conflict of interest.

Figure 1 .
Figure 1.Map of the Rhine basin, Europe.Black lines delineate 148 subbasins used in the analysis of the meteorological forecast skill.Square markers show the locations used for forecast skill analysis.

Figure 2 .
Figure 2. Difference between climatology and near-real-time potential evaporation.Shown for the year 2004 for grid cell x : 200, y : 200.

Figure 3 .
Figure 3. Flow chart of the model experiment.Blue boxes represent data products.Green boxes depict modeling activities.Arrows represent the flow of data for historical runs (blue lines) and forecast runs (black).The red boxes indicate the areas for analysis of the results, each box targeting a research subquestion.
1.To what extent are initial states affected by the use of climatological versus near-real-time potential evaporation?2.To what extent can potential evaporation forecasts contribute to streamflow forecast skill?

Figure 4 .
Figure 4. Continuous ranked probability skill score (CRPSS) for the four forcing variables benchmarked against sample climatology for the 148 HBV subbasins.CRPSSs are aggregated into mean (solid), 10th and 90th percentiles (dashed).Note that the CRPSS at P (X ≥ x) = 0.1 or 0.7 is calculated over, respectively, the 10 % and 70 % highest observation-forecast pairs, conditioned on the observations.

Figure 7 .
Figure 7. Seasonal mean difference in calculated actual evaporation (AET) for each season.Actual evaporation includes evaporation from interception.

Figure 8 .
Figure8.CRPSS for forecast runs (forecasted PET, climatological PET) and their difference benchmarked against model output for the 20 largest sub-catchments in the Rhine basin.CRPSSs are aggregated into mean (solid), 10th and 90th percentiles (dashed).Note that the CRPSS at P (X ≥ x) = 0.1 or 0.7 is calculated over, respectively, the 10 % and 70 % highest observation-forecast pairs, conditioned on the observations.

Figure 9 .
Figure9.RME for forecast runs (forecasted PET, climatological PET) and their difference for the 20 largest streams in the Rhine basin.RME scores are aggregated into mean (solid), 10th and 90th percentiles (dashed).Note that the CRPSS at P (X ≥ x) = 0.1 or 0.7 is calculated over, respectively, the 10 % and 70 % highest observation-forecast pairs, conditioned on the observations.