This paper presents an analysis of the effects of biased extended streamflow prediction (ESP) forecasts on three deterministic optimization techniques implemented in a simulated operational context with a rolling horizon test bed for managing a cascade of hydroelectric reservoirs and generating stations in Québec, Canada. The observed weather data were fed to the hydrological model, and the synthetic streamflow subsequently generated was considered to be a proxy for the observed inflow. A traditional, climatology-based ESP forecast approach was used to generate ensemble streamflow scenarios, which were used by three reservoir management optimization approaches. Both positive and negative biases were then forced into the ensembles by multiplying the streamflow values by constant factors. The optimization method's response to those biases was measured through the evaluation of the average annual energy generation in a forward-rolling simulation test bed in which the entire system is precisely and accurately modelled. The ensemble climate data forecasts, the hydrological modelling and ESP forecast generation, optimization model, and decision-making process are all integrated, as is the simulation model that updates reservoir levels and computes generation at each time step. The study focussed on one hydropower system both with and without minimum baseload constraints. This study finds that the tested deterministic optimization algorithms lack the capacity to compensate for uncertainty in future inflows and therefore place the reservoir levels at greater risk to maximize short-term profit. It is shown that for this particular system, an increase in ESP forecast inflows of approximately 5 % allows managing the reservoirs at optimal levels and producing the most energy on average, effectively negating the deterministic model's tendency to underestimate the risk of spilling. Finally, it is shown that implementing minimum load constraints serves as a de facto control on deterministic bias by forcing the system to draw more water from the reservoirs than what the models consider to be optimal trajectories.

Hydropower is one of the most reliable renewable energy sources currently available. Managing a hydropower system can be relatively simple, such as for a single run-of-river generating station, or can be very complex, such as when multiple cascading reservoir-generating stations are to be operated simultaneously. The optimal management of available and incoming water volumes and the effects on downstream elements of the system must consider many sources of uncertainty. For complex systems, the operational decisions must be made based on inflow forecasts, which contain uncertainty derived from the hydrological modelling chain. Model initial states, incoming weather data and model structure and parameterization, all contribute to the overall uncertainty in the streamflow forecasts (Liu and Gupta, 2007). One way to include the uncertainty in the water resource management process is to work in a probabilistic framework. Ensemble streamflow prediction (ESP), a dynamic method that uses historic climate data as future weather scenarios, was designed to provide multiple scenarios of possible future inflows for a given initial model state, thus allowing the exploration of possible outcomes instead of a single outcome, as would be the case with a deterministic forecast (Day, 1985). Typically, the ESP methodology is implemented for long-term forecasts where numerical weather prediction systems are not reliable (i.e. greater than a few weeks). The skill of ESP is largely based on the persistence of the initial states, which themselves depend on the dominating processes. In the case of snowmelt-dominated catchments, ESP forecasts can be relatively skilful at the beginning of the snowmelt period, whereas at the beginning of the snow accumulation phase, the initial states play essentially no role in long-term inflow forecasts, meaning that the ESP skill is entirely based on the climatology (Harrigan et al., 2018). Recent development in sub-seasonal to seasonal forecasts might eventually replace the ESP method on these time frames, making the ESP only useful for longer-term forecasts (i.e. more than 3 months).

ESP forecast quality can be assessed using many metrics, ranging from the simple mean error (ME) to probability integral transform (PIT) histograms (Hamill, 2001) and reliability diagrams (RD). The current state of literature suggests that a good ensemble forecasting system must produce ensembles that are unbiased, sharp and reliable, meaning that the uncertainty in the ensemble faithfully represents the real-world uncertainty. A review of ensemble forecast quality metrics can be found in Hashino et al. (2007), and an analysis of benchmarking methods is presented in Pappenberger et al. (2015).

ESP forecasts can suffer from errors in ensemble mean (bias) and spread (dispersion and variability) when compared to the actual outcomes as measured over a long time period (Wood and Schaake, 2008). The hydrological community has found clever ways to correct bias in ESP forecasting, such as correcting precipitation amounts of input climate data (Crochemore et al., 2016; Chen et al., 2014; Voisin et al., 2010; Gneiting et al., 2005) or modifying initial conditions with data assimilation (Liu and Gupta, 2007; DeChant and Moradkhani, 2011). Ensemble forecast spread (including ESP and short-term, weather-forecast-driven forecasts) has also been tackled recently by different methods such as pre-processing of inputs (Arsenault et al., 2016) and post-processing of model outputs (Pagano et al., 2013; Zalachori et al., 2012; Boucher et al., 2012, 2015; Hashino et al., 2007). Zhao et al. (2011) analyzed the effect of streamflow forecast uncertainty for reservoir operation for both deterministic and ensemble forecasts, showing that improved uncertainty representation could lead to better decision-making. Bias correction of ESP forecasts was evaluated in Hashino et al. (2007), who reported improvements in seasonal volumetric forecast quality with the application of three bias-correction techniques based on transformations derived from historical simulations and observations. Boucher et al. (2012) assessed the economic aspect of hydropower generation using short-term (10 d) ensemble forecasts and found that post-processing the forecasts using the best member method (Roulston and Smith, 2003) and a similar method proposed by Fortin et al. (2006) improved reservoir management and energy generation. The post-processing was performed on the entire length of the forecasts. Boucher et al. (2015) then analyzed statistical post-processing methods in short-term ensemble forecasts, namely for bias correction, on synthetic data generated following normal and gamma distributions. They explored the effects of post-processing on ensemble spread but did not consider the impacts on reservoir management. Anghileri et al. (2016) showed that their 1-year climatology-based ESP forecasts were 35 % less informative than perfect forecasts and that improving the ESP forecasts had value on a wide range of reservoir characteristics using seasonal to inter-annual lead times. Côté and Leconte (2015) studied the impacts of ESP under-dispersion on electricity generation of a hydropower system located in Québec, Canada, and concluded that under-dispersion in the ESP ensemble negatively impacts the operating policy of the system, but the impacts differ depending on the optimization algorithms used to derive the policy. In this study, we explicitly analyze the effects of biases in ESP forecasts and verify the robustness of the same hydropower system to these biases. To our knowledge, this is the first attempt to quantify the impacts of long-term ESP forecast biases on a hydropower system's performance, although similar studies have been performed for short-term forecasts (Cassagnole et al., 2017).

This study aims to identify and quantify the effects of ESP forecast bias on the average hydropower output of the Saguenay–Lac-St-Jean (SLSJ) system when managed under different conditions, notably (1) using three optimization and decision-making algorithms and (2) with and without minimum load constraints (MLCs). In other words, how does a biased ESP affect the reservoir management policy optimization and hydropower generation considering unknown future inflows? Understanding the effects of ESP forecast biases will help quantify the true value of unbiased forecasts and optimization methods in a hydropower generation context and help understand how to maximize generation efficiency and increase expected overall profits. A test bed that emulates the real-world system and that can be run in the hindcast mode was developed to measure the hydropower generation over the past 25 years. This method was selected to limit differences between the various simulations by ensuring that the system is consistent between them.

The next section presents the study area and data, Sect. 3 introduces the methods and models used in this study, and Sect. 4 details the obtained results. Sections 5 and 6 respectively contain the discussion and concluding remarks.

This study was performed on a hydroelectric system in the province of Québec, Canada. The hydropower system in question, the SLSJ hydropower complex, is wholly owned and operated by Rio Tinto Aluminum's Power Operations (RTA) and is used mainly to supply the large energy requirements of the company's aluminum smelters. On average, the system does not produce enough hydropower to fulfil the energy needs of the smelters; therefore MLCs (or generation constraints) are imposed to minimize the amount of energy that must be purchased (Arsenault et al., 2013). More details on the operational constraints are presented in Sects. 2.3 and 3.4.

Study area location, hydropower generating stations and reservoirs.

Generating station and reservoir characteristics in the SLSJ hydroelectric complex.

Reservoir characteristics on the SLSJ hydroelectric complex.

Rio Tinto holds water rights on a 75 000 km

Lac-Manouane (LM) contains the reservoir of Lac-Manouane (RLM). RLM is a managed 2657 hm

Passes-Dangereuses (PD) contains a large reservoir (5227 hm

CD contains a smaller reservoir (345 hm

Chute-à-la-Savane (CS) is the last sub-basin on the Peribonka River. A run-of-river generating station, Chutes-à-la-Savane (CCS), defines its outlet.

Lac-Saint-Jean (LSJ) is the largest catchment, at over 45 000 km

Water drawn from the RLSJ is finally routed to two parallel powerplants sharing a reservoir small enough to be considered run of river, although the hydraulic head is approximately 46 m at the Chute-à-Caron power plant (CCC) and approximately 63 m at the Shipshaw power plant (CSH) due to the difference in downstream elevations.

All data for this study were taken from operational databases. Observed streamflow (and mass-balance-derived inflows for managed sites) were taken from hydrometric gauges owned and operated by RTA. These data were used to calibrate the hydrological model but were otherwise not used in the study. Instead, a proxy for observed streamflow was generated as described in Sect. 3.2. Climate data fed to the hydrologic model, including precipitation and maximum and minimum temperatures, were collected by RTA's private network of 22 weather stations.

Hydrometric data are available from as early as 1916 for the LSJ sub-basin; however only in 1953 were all the other sites gauged and recorded. Climate data are also available starting in 1953, when there were fewer stations, until the present day. A major investment in weather stations was made in 1986, by which point the entire network was up and running as it is today. In all cases, weather data were interpolated over the catchment to drive a distributed hydrological model.

All other data used in this study, such as detailed generating station characteristics, operating rules, import–export energy contracts and power contracts reflect the current operational state of the system; however, they are proprietary and cannot be disclosed in this paper. A high-level overview is nonetheless given here to contextualize the problem, and the mathematical description of the problem and optimization models is given in Sect. 3.4. In essence, the hydropower complex is used to generate electricity for aluminum smelters. By the nature of these smelters, the power level must never drop below a certain threshold (minimum load) or the aluminum extraction by electrolysis process could be ruined for whole batches of aluminum product. Therefore, contracts are in place with other utilities as a backup to provide power, should the generating stations fail to meet demand. Contracts also exist in the other direction, should more power than required be generated. Moreover, the generation planning is largely influenced by seasonal climate variations, with the lowering of water levels in reservoirs during winter and filling during the spring. ESP forecasts are used to estimate the best water drawdown decisions for each day and for each site. This “decision” is a set of streamflow values that must be either drawn from the reservoirs or used to drive the turbines at each power plant for the day. Water levels in the largest reservoir (RLSJ) must also be kept within bounds, as the reservoir is used for tourism and agriculture on top of hydropower generation. The optimal reservoir management strategy is the one that will allow reducing the overall cost of energy. Also, water spillage has a negative impact on the hydropower production, as it increases the tailrace elevation and reduces the net head. Otherwise, spilling is not penalized, since sometimes, due to limits on reservoir constraints, the water can add a negative price, while water must be spilled to ensure safe management of the reservoir.

The current seasonal water management process of the system uses the
traditional ESP forecast approach (Day, 1985) based on a 64-year historical
climate record. A hydrological model produces the inflows for a specified
duration, normally between 3 and 6 months lead time. Then, an optimized
water release policy is computed for each scenario in the ESP ensemble. A
deterministic optimization algorithm that uses inflow scenarios as possible
future realizations of the inflow calculates the optimal releases at each
site (the set of water releases at all sites is referred to as a
decision). In the optimization algorithm, a piecewise linear
approximation of the hydropower function of powerhouses is used (Hamann and
Hug, 2014), which linearizes the optimization model. One problem with
managing reservoirs is that the optimization algorithms attempt to empty the
reservoirs at the end of the forecast window because the future value of
water in the reservoir is zero unless otherwise specified. Therefore, a
water value function was derived for each day of the year (365 values) by
using a sampling stochastic dynamic programming (SSDP) algorithm on
historical inflow data (Faber and Stedinger, 2001; Côté et al.,
2011; Côté and Leconte, 2015). This ensures that no matter the
forecast data and duration, there is always a value function that can
estimate the value of water remaining in the reservoir at the end of the
period. These value functions are also approximated by hyperplanes,
essentially tangential surfaces to an

In this study, the operational setting was precisely and accurately modelled
in a test bed which allows simulating the historical operation of the
hydropower system, as shown in Fig. 2. The cycle in Fig. 2 represents the
hydropower simulation test bed and its sequential and repeating steps. Each
of these steps is described below and in the following sections.

The process starts at day

The ESP forecasts for 120 d are prepared according to the procedure presented in Sect. 3.2.

The hydrological model is run for each historical climate scenario, always starting with the same initial conditions.

The generated ESP forecast is used to drive the optimization method, which also depends on the initial state of the system (reservoir levels at each site).

The optimal decision based on step 4 is selected and implemented, and the system is simulated using this decision. This results in a modification in the reservoir states for the next step as well as in estimations of energy generation for the period and flow rates at each site for the current time step.

Steps 2–5 are repeated for each 3 d time step on the 25-year simulation period.

Over-the-loop system simulation test-bed diagram.

Furthermore, in this study, the test bed was run using ESP forecasts with
varying levels of bias. This allowed evaluating the energy generation,
spillage and reservoir levels at each time step for a given ensemble inflow
forecast dataset, therefore permitting the quantification of the effects of
bias on reservoir management and energy generation. Furthermore, criteria
for selecting a decision from the deterministic approach were analyzed by
selecting decisions of monotonically increasing percentile values (e.g.
10th, 20th

Finally, once a decision is made, it is applied to the system, thus defining the generation of flows and spills at each powerhouse and spillway for the period. A system simulation model evaluates the generated electricity at each site and updates the reservoir levels. The test bed then moves forward in time, repeating the process for all periods until the last day for which ESP forecasts are available. It is important to note that the test bed runs on a 3 d time step to maintain a reasonable computing time, and each ESP inflow forecast is 120 d long (forty 3 d periods). The average power output from the entire system on all periods is finally computed. The test bed was run with and without the imposed minimum load constraints to assess the system's sensitivity to these constraints.

Three main steps were required to perform the study: (1) preparation of ESP forecasts with varying levels of bias, (2) implementation and application of the reservoir management optimization algorithms, and (3) simulation in a forward-rolling test bed.

An initial set of ESP forecasts (one forecast per 3 d simulation period)
was produced by sampling the climatological record, as proposed by Day (1985). One important consideration is the need to derive adequate
hydrological model initial conditions for each forecast; otherwise the
initial model error would already contribute to the ESP forecast bias.
Therefore, this study uses a proxy for observed streamflow derived from the
hydrological model initialized with empty reservoirs and driven by the
observed climate data over the entire period. The first year is used for
the spin-up and is removed from the rest of this study. Because the model
generated this synthetic streamflow, forcing the initial conditions to be
perfect is trivial, as all one must do is rerun the model with the historic
climate data once more until the forecast date, using the same
empty-reservoir initial states as the proxy run. The historic simulated
streamflow is considered to be the forecast target, bypassing all issues
related to the hydrological model's errors and its representation of the
initial conditions as compared to the true hydrological conditions at the
start of the forecast. The method has been used previously, (e.g. in Shukla
and Lettenmaier, 2011 and Greuell et al., 2018) where the
pseudo-observations are shown to be the best estimate of the true conditions
of the catchment. The ESP forecasts were generated following a
straightforward procedure:

The hydrological model CEQUEAU (Charbonneau et al., 1977) is calibrated on each of the five sub-basins using the dynamically dimensioned search algorithm (DDS; Tolson and Shoemaker, 2007) according to the procedure in Arsenault et al. (2014). CEQUEAU is a grid-based distributed model which is set up on a 10 km resolution grid in the SLSJ basin. It uses daily gridded temperature and precipitation data as inputs.

Once the model is calibrated, a single simulation was performed using the observed climate data for 1953–2016, thus generating the pseudo-observed streamflow time series. The entire matrix of state variables for each simulation period was also saved for future use. No data assimilation was performed at any of the periods because from this point forward, the model-simulated discharge is used instead of the observed measured flows. Therefore, the model simulation and pseudo-observed flow are always in perfect agreement at each time step.

The initial date of the test bed was selected to be on 1 December, 1990; 1 December corresponds to the first day of the hydrological year, where the model states are completely independent from the previous year's hydrology. The year 1990 was used to begin the test-bed simulation because it offered a good compromise between access to “historic” climate data (the test bed is blind to future climate and therefore cannot use it in the ESP forecast, so 1953–1989 gives a reasonable starting point), providing room ahead of the period to run the test bed for evaluating the method's performance (1990–2016). Consequently, the hydrological model was set up with its state variables from 1 December 1990 from the initial simulation (point 2) above.

The ESP forecasts were generated for each day. After some tests (not
shown here), the forecast length was set to 120 d. It is anticipated that
after 120 d of ESP forecasts, the information gain is marginal at best
because the produced forecasts for a longer-term period would follow the observed
distribution. Tests showed that fewer than 120 d could see some cases
where the climate ensemble does not completely merge with the distribution
of actual climate outcomes on this system. The ESP construction begins by
identifying climate data series starting on the same day as the test bed's
current day (1 December in this example) for each of the years on
record to date. For example, for a 120 d ESP forecast, member 1 would
represent climate data from 1 December 1953, member 2 would represent data from 1 December 1954, and so on. The final ESP forecast is therefore a 120 d by

The last step in producing ESP forecasts for this study was to add bias
to the ensemble means. To do so, ESP forecast members were multiplied by a
factor to shift the distribution upwards (factor

Bias for each catchment discriminated per season and per
percentile of observed flow

The ESP forecasts were assessed to identify their biases and reliability on
each catchment. Figure 3 shows the relative bias (RB), defined as the
average difference between the forecast mean

Reliability diagrams for seasonal forecasts. Panels

Relative bias between the forecast mean and observed 120 d inflow volumes and classified by the season during which the forecast was made.

The forecasts are generally skilful and reliable. The forecasts remain reliable and skilful for almost all cases except for the LM basin in winter, where flows are very low to begin with (DJF; Fig. 4e–h).

In this study, three optimization methods are used to compute the water releases at the reservoirs (Fig. 5). Each one produces a linear programming model that is solved by the Xpress linear solver. It is worth mentioning that while more efficient optimization algorithms exist, such as the stochastic dynamic programming (SDP) and variants (stochastic dual dynamic programming, SSDP, etc.), implementation can be challenging, especially for more complex multi-reservoir systems (Côté and Leconte, 2015). This is why many hydropower utilities and companies still use simpler deterministic methods for day-to-day operations (i.e. as described in Fan et al., 2016). This study concentrates on deterministic methods only, and the results should only be interpreted in this context.

Overview of the three optimization algorithms and their decision points.

The first model is a deterministic approach where the water release
decisions are the ones that are optimized for the inflow sequence that has
the median volume. In this case, only one optimization is required to
compute the water release decisions. In this study, we analyzed the effects
of utilizing a scenario other than the median on the overall generation by
taking scenarios based on each of the 10 deciles when ranked according to
the average inflow volume for the complete length (40 periods of 120 d) of
the scenario. The optimization model solved in this case consists in
minimizing the production cost function in Eq. (2):

The system is constrained by mass-balance equations that describe the
dynamics of the reservoirs in series, as shown in Eqs. (4) and (5):

The third and final approach, which we refer to as the “unique decision
method”, is a deterministic method based on a scenario tree approach
(Carpentier et al., 2013; Fan et al., 2016; Séguin et al., 2016) but
incorporates only one branching at the end of the first period. While the
algorithm is deterministic in that it returns the same response to the
identical inputs, it does make use of multiple scenarios, and in that sense
it can be seen as probabilistic. The algorithm computes a single decision at
time

For the SLSJ systems (four reservoirs and five powerhouses), the biggest instance for the linear programming model is composed of 30 000 variables and 160 000 constraints for the largest ESP ensemble, which contains 40 periods of 3 d time steps. This continuous linear programming model is easily solved by Xpress in less than 3 s using the Newton barrier method (Wright, 2001). All ensemble members are considered equiprobable.

It is important to recall that even though the optimization methods are deterministic in nature, they are used in a test bed containing uncertainty and therefore operate in a stochastic setting, with unknown future inflows.

The entire test-bed simulation was performed with the differently biased ESP forecasts and for each optimization method. Furthermore, the effects of minimum load constraints (minimum generation that must be maintained to power the smelters) were investigated by running the entire set-up twice: once with the imposed constraints and again with the constraints from Eq. (8) removed.

The sensitivity of the operational approach (Optimization method 1) to the inflow percentile selection was first investigated. Results are shown in Fig. 6 for the cases with and without MLCs. It is important to note that the actual generation figures are not made available due to their sensitive nature. However, the numbers are presented relative to an arbitrarily fixed baseline value.

Average energy generation of the entire system with
varying levels of ESP forecast bias when the decision linked to the
optimization of the median inflow scenario is used to derive the water
drawdown policy. Panel

From Fig. 6, it can be seen that the MLCs significantly change the
optimization problem's behaviour. For example, in the case with MLCs,
selecting a member representing a higher percentile in the ensemble
decreases the overall performance, whereas in the case without MLCs, the
opposite is true (Fig. 6a). In Fig. 6b and c, four percentiles are
selected and evaluated for each case. In all cases, the unbiased
pseudo-observed streamflow is used as the actual realization. Each year's
values are compared to the long-term average generation figure, resulting in
some positive values (better than the long-term average) and some negative
values (worse than the long-term average). Note that the units represent
average annual efficiency (AAE; MW m

Furthermore, the MW ratios with MLCs in Fig. 6a are lower than with the unconstrained system, which is expected due to the reduced degrees of freedom. The MLC force the system to generate energy even in low-head states, reducing the overall efficiency due to lower head and water shortages. On the other hand, the unconstrained method can lower the energy generation in dryer periods to maintain a more efficient generation profile and minimize water shortages.

Figure 7 shows the performance of the second and third optimization methods (unique decision vs median decision), which are based on the information content of the entire ESP forecast rather than that of a single member. It includes the generation values for different levels of bias both without (Fig. 7a) and with (Fig. 7b) MLCs. Note that the y-axis values are different for ease of viewing in Fig. 7.

Average power generation using median decision and
unique decision optimization algorithms as a function of bias without

A few interesting points emerge from Fig. 7. First, the elimination of MLCs allowed producing approximately 0.5 % more energy. This means that it could be possible to perform a cost–benefit analysis to determine if the advantages of increasing total generation outweigh the costs of backup contracts for when the minimum loads cannot be sustained. However, this is out of the scope of this paper.

Second, the median decision optimization method is clearly inferior to the unique decision method. On average, the unique decision method outperforms the median decision method for all levels of forced bias. While the values seem small (between 0.5 % and 1 %), it is important to remember that when applied to the absolute energy values, these differences become important enough to justify further investigation.

Another noticeable artefact in Fig. 7 is the larger spread in values between the cases with and without MLCs. It would seem that the constraints imposed upon the system make the decision-making process more robust to bias in the ESP forecasts. Also of note is the fact that the unbiased (no added bias) ESP forecast is the optimal set for the system with MLCs, whereas a slight positive bias seems to improve the results for the unconstrained system. To provide an explanation to this finding, the average reservoir storage levels in the two head reservoirs (LM and PD), which have the most influence on the system, were plotted for the different ESP forecast bias levels while using all ensemble members. Only results for the unique decision method are presented in Fig. 8.

Average reservoir storage level as a function of time for
various bias levels without

It is apparent in Fig. 8 that the MLCs impose a higher rate of water drawdown during the winter period (January to June) to meet the power required for the smelting operations (Fig. 8b). In doing so, the head is reduced as is the overall efficiency as compared to the unconstrained system (Fig. 8a). In the unconstrained system, the optimization algorithm aims to keep water levels as high as possible to maximize hydraulic head and energy production, which it is unable to achieve under MLC conditions.

It is important to note that the reservoir levels in Fig. 8 are 25-year
averages. Both simulations start with identical reservoir levels and evolve
over the 25-year simulation period. Also, it is seen that the

In this paper, it must be acknowledged that the test bed is a simplified approximation of the real-world system. The results obtained herein can be considered estimates of what an automated decision-making system would return. However, the real system is managed by engineers whose experience can lead them to modify the actual decision to mitigate risk or take advantage of unusual situations. Nonetheless, as more and more entities manifest interest in an “over-the-loop” forecasting and decision-making framework (Mendoza et al., 2017; Pagano et al., 2016; Liu and Gupta, 2007), it is imperative that the role of each component be well understood, including that of the optimization algorithms and ESP forecasts. As was shown, in some instances, biases in ESP forecasts can be beneficial, and any work to correct this bias could in fact be negatively affecting overall generation performance. This highlights the importance of active collaboration between hydrologists and operations research specialists in hydropower system management. In the same vein, the simplification of the system is reflected in the hydrological model, which would normally introduce new biases. The methodology used herein eliminates this bias, but it would need to be taken into account in a real-world application.

Furthermore, the test bed is run in 3 d increments, with each period being attributed the average value for the 3 d (inflows, power output and reservoir levels). This aggregation creates situations where normally a spillway would be opened on day 2 of 3, but in the test bed the decision must be made for the entire period. Therefore, small inefficiencies are introduced. Nonetheless, the tested methods in this paper were all subjected to this constraint so the results are still comparable. However, comparing these results with a real-world case would highlight these differences. One way to overcome this problem would be to run the system on a daily time step; however, tests on a shorter period showed that the difference is negligible for our case study. The slight differences between the 3 d and 1 d time steps did not justify the large increase in computing time that would have been necessary.

Finally, in the real-world system, the ESP forecasts can be evaluated for
bias and/or dispersion issues with a long enough historical record. Luckily,
RTA's dataset covers more than 60 years, which allowed quantifying the ESP
error structure. Other systems with fewer data might not have this luxury
and would have to rely on shorter length series to establish ESP forecasts.
A caveat to this information is that this study supposes stationary
conditions, whereas it is possible that climate change has affected recent
years or could affect future years. In this study, average inflow volumes
between the calibration period and simulation period over the entire system
differ by less than 2 % (1405 m

The three optimization approaches were investigated to understand how they are affected by biased inputs. First, from Figs. 5–8, it is clear that the results strongly depend on the application of generation constraints (MLCs). The biases clearly affect all methods less when MLCs are applied. It seems that the MLCs constrain the system enough that impact of the ESP forecast is lessened; thus the biases are also less impactful.

Second, for the first method, i.e. optimization on the median inflow scenario, biasing the ensemble had a direct impact on the results, indicating that the method is less robust to the inputs than stochastic optimization methods. Similar conclusions were revealed in Fan et al. (2016). The same is true for the second method, i.e using the median decision from a set of deterministically optimized scenarios.

One possible explanation is that while the median decision method uses all scenarios to take the median decision, it must make a compromise to do so, and thus it discards the entire information content contained in the other scenarios. The unique decision method, on the other hand, optimizes the entire tree and is therefore more informed than the median decision method. In all cases, the unique decision approach (method 3) seems to be more robust to inflow biases and generally performs better at maximizing hydropower output. This is to be expected, as stochastic methods are known to be more efficient by considering this uncertainty (Faber and Stedinger, 2001).

Throughout this study, it is shown that forcing a positive bias on the
forecasted inflows helped generate more power even if the actual realized
inflows were kept intact. Of course, ideally the forecasts should be
unbiased and well dispersed, and the optimization method should be able to
use that information without adding biases of its own. In this study,
however, the optimization methods are not perfect and introduce biases in
the decisions. The increases in the relative MW shown in Fig. 7a for the
case without MLCs is due to the “deterministic bias”, i.e. the tendency of
deterministic methods to be overconfident, introduced by the optimization
method. The deterministic bias is also observed in Philbrick and Kitandis (1999) and is a consequence of the optimization model's perfect foresight of
the future inflows. The model thus overestimates the capacity of the system
to manage the reservoir at a high head without spilling water. In a rolling
horizon test bed, high-flow scenarios will cause larger spillage than
expected with the deterministic optimization. It follows that a positive
bias introduced in the forecast (or selecting member percentiles

The effect of bias levels on the reservoir storages (Fig. 8) is informative because it shows that the optimization methods are directly influenced by the inflow volumes they are given in the ESP forecasts. Larger forecasted inflow volumes correlate to lower reservoir levels and vice versa. From the optimization algorithm's point of view, it is better to increase generation despite slightly lower efficiency than to maintain maximum efficiency and then spill the large perceived inflows. Since the actual realized inflows are lower than the ESP forecasts on average, this translates to lower average reservoir levels (and thus lower hydraulic head) but fewer unproductive spills. On the other end of the spectrum, negative biases in the ESP forecasts force the optimization methods into higher reservoir storages to save water and operate at maximum efficiency with the highest hydraulic head possible. In this case, when the actual inflow materializes, it is on average higher than the anticipated inflows, which makes unproductive spills more frequent. Therefore, low biases can also lead to lower-generation figures. These two effects follow from the same deterministic bias as that demonstrated in Fig. 7.

These seemingly trivial findings beg the following question: if high and low biases are penalizing the generation figures, how can the results in Fig. 7 be explained? Recall that for the unconstrained system, positive biases actually increased overall generation, with a 5 % positive bias being the optimum (Fig. 7a). It is important to recall that the original forecasting approach produced a slightly dry bias (Table 3) of approximately 1 %–2 %. However, the 5 % wet bias still outperformed the 2 % wet bias trials, indicating that more is at play than the simple bias correction of the original forecasted ensemble.

These results can again be explained by the deterministic nature of the
optimization algorithms. In all cases, the optimization algorithms do not
consider uncertainty in the ESP forecasts and find the optimal decision
according to the predicted inflows. By their very nature, for a given volume
of water, they will find the best management policy, which is one that
maintains as much head as possible and minimizes spillage. Unfortunately,
the actual inflow volumes are sometimes larger than expected, which makes
costly spills necessary. For the unconstrained system, by increasing the
bias levels slightly, the model will change its behaviour to draw more water
to reduce the spills it considers inevitable. However, the actual, lesser
inflows are then less likely to force unwanted spills. The same holds true
for the constrained system; however the MLCs are a natural buffer that force
the reservoirs into lower storage ranges, thus also minimizing spills.
Therefore, the imposed constraints guard against the deterministic
optimization method's blindness to uncertainty. While these results are
demonstrated on the SLSJ hydropower system, the theory should be applicable
to all hydropower systems which are noticeably affected by hydraulic head
and that use deterministic optimization algorithms. The optimal levels of
bias for unconstrained systems will most probably vary from site to site
depending on the hydrological forecast quality and the system's complexity
and capacity. Furthermore, an element that must be considered is the size of
the reservoirs as compared to the 3 d inflows. Inflow ratios range from
1.1 % to 12 % of reservoir capacity when using the average inflow
values. It is expected that the smaller reservoirs will be more strongly
affected by biases in the ESP forecast, as they could lose efficiency (from net
head elevation or by having less time to react to prevent spilling) much
quicker than a relatively large reservoir. Fortunately, in this study the
most powerful generating stations are backed by large reservoirs or are run
of river and are thus not affected by this problem. The CCD generating
station is the most vulnerable in this regard and only contributes 235 MW out
of the total

This study aimed to identify the effects of ESP forecast biases on deterministic optimization methods used for managing water drawdown policies in a hydroelectric complex. A test bed simulating the real-world system was set up to identify how three decision-making methods were affected by ESP forecast biases and minimum load constraints. For RTA's SLSJ system, it was possible to identify and quantify the energy gains and losses due to each of these factors. A few key points stand out and should be kept in mind during the implementation of a forecast–optimization–decision framework in hydropower management.

First, the results tend to indicate that the unique decision algorithm performs better overall than taking the median decision amongst the decisions for all scenarios taken independently. The information content of the entire tree provides better results than the compromised solution of the median decision and makes better use of a large ESP ensemble.

Second, in systems where there are fewer constraints, a slight positive bias in the ESP forecasts allowed compensating the lack of consideration for uncertainty of the deterministic optimization algorithms, making the entire system more efficient on average, and essentially compensating the optimistic optimization bias by an optimistic inflow forecast. The amount of bias was quantified at 5 % for this particular system. Other systems operating under deterministic optimization should also behave similarly, with bias levels varying from one site to the next, but this remains to be validated. This study seems to indicate that a more heavily constrained system would be more robust to bias because of its reduced degrees of freedom, which limits the frequency of full-reservoir states. Logically, a system that would be so constrained as to not be flexible at all (e.g. always producing maximum energy) would not be affected by the forecast biases, as the forecasts would play no role in the operating policy.

For the study site, the optimal set-up was found to be a unique decision optimization method with the full ESP forecasts and with no minimum load constraints. However, operational needs mandate that the MLCs be respected, but looking to exchange energy with partners through contracts could be a viable strategy for increasing overall power generation and system efficiency. This would inevitably allow generating more energy overall (due to removal of the constraints). The hypothetical question here is the following: could it be possible to negotiate a contract that allow purchasing more power when in deficit (to cover the minimum loads of the smelters), and would the increased long-term generation be sufficient to cover the cost of such a contract? This should be investigated in future work.

While this study looked at deterministic optimization methods, stochastic methods could also be implemented and tested to compare their behaviour when subjected to biased inputs. Future work could also analyze the combined effects of bias and under-dispersion used with deterministic and stochastic optimization methods. This would pave the way to better understanding of the effects of ESP forecast properties for optimal water drawdown policies for hydropower system management.

The climate and flow data can be accessed by writing to the authors and filling a non-disclosure agreement under certain conditions.

RA performed the hydrological model simulations and prepared the forecasts. PC implemented the hydropower optimization algorithms and simulation test bed. Both authors analyzed the results equally and each wrote a section of the paper.

The authors declare that they have no conflict of interest.

The authors would like to thank Kenjy Demeester for his help in extracting and preparing the data. We would also like to thank Rio Tinto for the datasets and models used in this study. Finally, we wish to acknowledge the contributions made by anonymous reviewers, who helped in shaping it into its current form.

This paper was edited by Micha Werner and reviewed by three anonymous referees.

^{®}Xpress Optimization Suite: Xpress-Optimizer Reference manual – Release 20.00, Fair Isaac Corporation, available at: