AI-based techniques for multi-step streamflow forecasts: Application for multi-objective reservoir operation optimization and performance assessment

Streamflow forecasts are traditionally effective in mitigating water scarcity and flood defense. This study developed an Artificial Intelligence (AI)-based management methodology that integrated multi-step streamflow forecasts and multiobjective reservoir operation optimization for water resource allocation. Following the methodology, we aimed to assess 10 forecast quality and forecast-informed reservoir operations performance together due to the influence of inflow forecast uncertainty. Varying combinations of climate and hydrological variables were inputs into three AI-based models, namely Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Least Squares Support Vector Machine (LSSVM), to forecast short-term streamflow. Based on three deterministic forecasts, the stochastic inflow scenarios were further developed using Bayesian Model Averaging (BMA) for quantifying uncertainty. The forecasting scheme was further coupled with a multi15 reservoir optimization model, and the multi-objective programming was solved using the parameterized Multi-Objective Robust Decision Making (MORDM) approach. The AI-based management framework was applied and demonstrated over a multi-reservoir system (25 reservoirs) in the Zhoushan Islands, China. Three main conclusions were drawn from this study: 1) GRU and LSTM performed equally well on streamflow forecasts, and GRU might be the preferred method over LSTM, given that it had simpler structures and less modeling time; 2) Higher forecast performance could lead to improved reservoir 20 operation, while uncertain forecasts were more valuable than deterministic forecasts, regarding two performance metrics, i.e., water supply reliability and operating costs; 3) The relationship between forecast horizon and reservoir operation was complex and depended on the operating configurations (forecast quality and uncertainty) and performance measures. This study reinforces the potential of an AI-based stochastic streamflow forecasting scheme to seek robust strategies under uncertainty. 25

3 resources allocation (Turner et al., 2017). Therefore, when forecasts are used to support reservoir operation, they should be 65 assessed in which conditions they can help make better decisions. Moreover, forecast uncertainty and error generally grow up with the increase of the forecast horizon (Maurer and Lettenmaier, 2004;Denaro et al., 2017;Zhao et al., 2019). A decision maker may doubt whether longer forecast lead times provide more sufficient information for a decision purpose or not. It is crucial to determine an efficient forecast lead time that can provide appropriate inflow information for reliable reservoir release decisions for making the best use of forecast information. However, few studies have demonstrated the applicability and 70 effectiveness of the forecast horizon in a forecast-based reservoir operation system (Xu et al., 2014;Anghileri et al., 2016).
There is a continuous need for in-depth study to conduct posterior evaluations of forecasts with different forecast lead times and obtain the efficient forecast horizon for water allocation.
A decision maker must allocate limited water to different water use sectors considering the conflicting objectives (e.g., benefits and costs) and multiple uncertainties (e.g., forecast uncertainty) in a forecast-based reservoir operation system. Multi-objective 75 programming (MOP) is a useful tool for helping decision makers facilitate decision making with multiple conflicting objectives (Fang et al., 2018b;Guo et al., 2020c), which can offer feasible methods for generating compromise decision alternatives. Some MOP approaches have been widely developed to tackle the uncertainty associated with the decision making processes, such as multi-objective fuzzy programming (Zimmermann, 1978;Pishvaee and Razmi, 2012;Ren et al., 2017) and multiobjective stochastic programming (Xu et al., 2014;Xu et al., 2020;Zhang et al., 2020). These approaches generally convert 80 the multi-objective functions into a single-objective deterministic problem through a fuzzy programming method or a constraint operator. They can effectively deal with the uncertainties between objectives and/or constraints by integrating the decision makers' aspiration levels. However, they may encounter difficulties due to the need for pre-determined individual preference or reasonable bounds for all objectives. In comparison, multi-objective robust decision making (MORDM) is an effective way to handle such difficulties (Kasprzyk et al., 2013;Yan et al., 2017). It can generate many alternative solutions 85 (Pareto solutions) that do not require assumptions about decision makers' preferences and enhance the robustness of the optimization process. Besides, MORDM, by parameterizing the decision space, can avoid the curse of dimensionality in some MOP approaches, and simplify computational complexity and reduce the running time (Giuliani et al., 2016;Salazar et al., 2017).
In summary, there are still several challenges in forecast-informed reservoir optimization. To address these challenges, the 90 specific research questions of this study are: (1) Can GRU achieve the same accuracy in the streamflow forecast compared to LSTM with fewer parameters and more straightforward structures?
(2) In which conditions can an improvement in forecast skill be translated into an improvement in reservoir operation optimization? 95 (3) How do such short-term inflow forecasts with different forecast horizons be used to optimize the multi-reservoir system to impact operation results? https://doi.org/10.5194/hess-2020-617 Preprint. Discussion started: 16 December 2020 c Author(s) 2020. CC BY 4.0 License.
To answer the questions mentioned above, we build an AI-based management framework, which integrates multi-step streamflow forecasts and multi-reservoir operation optimization. We strive to: (1) simulate inflow using LSTM, GRU, and LSSVM and verify their effectiveness on short-term deterministic streamflow forecasts; (2) generate stochastic inflow 100 scenarios using BMA for refining uncertainty characterization; (3) develop the parameterized MORDM framework for a multireservoir operation system and inform decision making by assessing the value, that is, the operation benefits gain or the induced cost of forecasts with a particular lead time. As a case study, including one recipient reservoir storing water from the continental diversion project and 24 supply reservoirs storing water from local rainfall, 25 reservoirs supplying water for four water plants in the Zhoushan Islands, China, are chosen to assess the performance of the AI-based forecast and the forecast-informed 105 operation.

Methodology
The experimental approach followed in the study is shown in Figure 1 and described in the following sections.

Machine learning (ML) methods 110
This part gives a brief introduction to long short-term memory (LSTM), gated recurrent unit (GRU), and least square support vector machine (LSSVM).

Long short-term memory (LSTM)
LSTM network is one of the recurrent neural networks (RNNs) developed by Hochreiter and Schmidhuber (1997), and the basic structure of an LSTM cell is illustrated in Figure 2(a). It is an improved RNN aiming to solve problems such as gradients 115 in long-term memory and backpropagation. The LSTM cell has three gates maintaining and adjusting its cell state and hidden state, including the forget gate, input gate, and output gate. The forget gate determines what information would be thrown away from the cell state. The input gate decides which information is used to update the cell state. The output gate controls which information stored in the current cell state flows into the new hidden state. In Figure 2(a), the state (ct), and the hidden state (ht) of the LSTM cell are updated as follows (Hochreiter and Schmidhuber, 1997): 120 Input gate: Potential cell state: Cell state: Output state: https://doi.org/10.5194/hess-2020-617 Preprint.
b represent the input weight matrix, recurrent weight matrix, and bias vectors for the forget, input-output, and potential cell gates, respectively. 125

Gated recurrent unit (GRU)
GRU networks were proposed as a modification of LSTM networks with a more straightforward structure (Cho et al., 2014).
The specific structure of the GRU cell is shown in Figure 2(b). Compared with LSTM, GRU has only two control gates, including a reset gate and an update gate. The update gate is applied to control how much information of the previous step is brought into the current step, while the reset gate is used to control the degree of ignoring the information of the previous state. 130 In this way, GRU is superior to LSTM in terms of computer modelling time and parameter updates. In Figure 2(b), the state (ct) and the hidden state (ht) of the GRU cell are updated as follows (Cho et al., 2014): Update state: Potential cell state: Cell state: Hidden state: where , , tt rz and t c represent the reset, update, and potential cell state, respectively. ☉ denotes the element-wise multiplication of vectors, tanh(· ) is the hyperbolic tangent; xt represents the input vectors, ht-1 denotes the last hidden cell state and the initial state of ht is h0= 0. σ(· ) represents the logistic sigmoid function. [ , , ]  represent the input weight matrix, recurrent weight matrix, and bias vectors for the reset, update, and potential cell gates, respectively.

Least squares support vector machine with grey wolf optimizer (GWO-LSSVM)
LSSVM is a modified version of SVM, proposed by Suykens and Vandewalle (1999), to reduce the computational time of 140 SVM. SVM uses the quadratic program to formulate the training process of modeling procedure, while LSSVM aims to employ the least-squares loss functions. The LSSVM non-linear function is expressed as (Suykens et al., 2002): where     is the mapping function that maps the input x into a d-dimensional feature vector, w is a weight vector, and b represents bias. In LSSVM, a minimum objective function is proposed to estimate ω and b (Suykens et al., 2002).
that has the following constraints (Suykens et al., 2002): 145 where e is the error variable and γ is the regulative constant. The objective function can be obtained to solve the optimization problems in Eq. (13) by introducing the Lagrange multipliers α and transferring the constraint problem into an unconstrained one (Suykens et al., 2002): By finding the partial derivative of Eq. (15) with respect to w, b, αi, and ei, the following equation can be derived: where σ 2 is the kernel parameter. In this study, the parameter γ and σ were optimized using grey wolf optimizer (GWO). Please see more details on GWO in Guo et al. (2020d).
1, 2, kK  ， ) and the observed data D, the posterior distribution of Q can be calculated as (Hoeting et al., 1999): 160  (Hoeting et al., 1999): where 2 k  is the variance of the model k f . BMA weights can be calculated using optimization methods. In this study, the Expectation-Maximization (EM) is used to identify the BMA parameters (weight k w and variance 2 k  ) and then to estimate 165 the release interval (Lee et al., 2020). Details of BMA can also be found in Hoeting et al. (1999).

Forecast performance measures
Three performance indicators were applied to assess the deterministic forecast performance of the three data-process models.
They were Nash-Sutcliffe efficiency (NSE) (Nash and Sutcliffe, 1970), root mean square errors (RMSE) (Karunanithi et al., 1994), and mean absolute error (MAE) (Legates and McCabe Jr., 1999). They are expressed as below. 170 where T is the number of samples; In addition, two performance indicators were used to evaluate the performance of ensemble forecast models, i.e., the containing ratio (CR), and average deviation amplitude (D), were adopted for assessing the goodness of the prediction bounds (Xiong et al., 2009).
where ,lt Q and ,ut Q represent the lower and upper prediction bounds of streamflow (m 3 /s), respectively. Clearly, models with higher CR values but lower D values would produce better performance. 180

Parameterized multi-objective robust decision making (MORDM)
Multi-objective robust decision making (MORDM) provides a mechanism for stress-testing operational alternatives under uncertainty. We identify and evaluate different operational transfer strategies for water allocation in the Zhoushan Islands using the MORDM method. The main steps of the MORDM framework are (Hadjimichael et al., 2020): (1) problem formulation, including the possible actions (i.e., decision variables) and performance measures; (2) generating alternative 185 management actions using multi-objective evolutionary algorithms (MOEAs); (3) perform an uncertainty analysis and identify robust solutions. Problem formulation is a critical step in the MORDM framework (Zeff et al., 2014). To reinforce reservoir operation under uncertain forecasts, the objectives are instead evaluated over stochastic inflows. The uncertainties are then mitigated using a robust approach (Giuliani and Castelletti, 2016;Guo et al., 2020b), e.g., the principle of insufficient reason, minimax, and minimax regret approaches. 190 In general, the decision variables in the multi-reservoir optimization problem are the volumes of water to be allocated each day. Open-loop strategies prefer each decision in a time series as an independent decision. In contrast, closed-loop control strategies prescribe decisions conditioned on system state variables (Quinn et al., 2017a). We use the direct policy search (DPS), where the rules for operational strategies are approximated as non-linear functions that vary with specific parameters and system states to derive closed-loop control strategies (Giuliani et al., 2016;Quinn et al., 2017b;Salazar et al., 2017). DPS 195 is based on the parameterization of the operating policy p  and the exploration of the parameter space Θ to find a parameterized policy that optimizes the expected long-term cost, i.e., = arg min . . where J is the objective function. p   is the corresponding optimal policy with parameters   . Different DPS approaches have been proposed (Deisenroth et al., 2013). In this study, we use Radial Basis Functions (RBFs) to parameterize the policy and the k th decision variable in the vector ut (with 1, , kK ) is defined as: 200 ,, 1 () where N is the number of RBFs ()  where M denotes the number of policy inputs t  and ci, bi are the M-dimensional center and radius vectors of the i th RBF, respectively. The centers of the RBF must lie within the bounded input space (Yang et al., 2017). The parameter vector θ is defined as

205
The parameterized MORDM approach is then coupled with a rolling horizon scheme to solve the short-term reservoir operation problem. Given the lead time of 7 days as an example, it is operated following two steps: the optimization model is first operated daily over a 7-day horizon using the parameterized MORDM; after implementing current water allocation decisions, the status, inflow, and other information of reservoirs update as time evolves, and then the remainder is subsequently operated.
The two steps are repeated until the process is completed. 210

Study area and data
The Zhoushan Islands are located in the northeast of Zhejiang Province, China, with a total area of 22,000 km2 and 1,390 islands ( Figure 3). The climate is governed by monsoon-influenced subtropical marine weather systems, and the annual mean temperature and precipitation are 17 °C and 1,300 mm, respectively. There are no large rivers in the islands, and the insufficient 215 freshwater resources severely limit the development of industry and population in Zhoushan. Recently, a continental diversion project transferring water from Ningbo City to Zhoushan is treated as an effective solution to partially overcome the water scarcity problem. The transferred water is stored in Huangjinwan Reservoir and then operated together with the limited freshwater resources in the remaining 24 reservoirs to supply water to four water plants, i.e., Daobei, Hongqiao, Lincheng, and Pingyangpu. Data for this study includes historical inflow and state of reservoirs, water demand of water plants, and climate 220 https://doi.org/10.5194/hess-2020-617 Preprint. Discussion started: 16 December 2020 c Author(s) 2020. CC BY 4.0 License.
forcing data over 2002-2008. The climate data, including daily precipitation and evaporation, are observed at one meteorological station and three rainfall stations. The characteristics of the reservoirs are listed in Table 1. Table 1 is here

Problem formulation 225
The main goal of the water allocation plan is to ensure sufficient water flows into the four plants in the Zhoushan Islands. This is realized by allocating water in Huangjinwan Reservoir and the remaining 24 reservoirs. Figure 4 shows the simplified schematic diagram of the operating system for the functions of water supply. According to the water pipe flow direction, the whole islands are divided into three districts, i.e., Daobei, Hongqiao, and Dongbu.

Figure 4 is here 230
Three objectives are identified to evaluate the performance of the strategies. The conflicting objectives are to minimize the water deficiency ratio of the Daobei Plant, minimize the water deficiency ratio of the remaining three plants (Hongqiao, Lincheng, and Pingyangpu), and maximize the net benefits. The three plants can feed each other and thus are considered together in our study. A decision-maker would consider a different suite of costs depending on whether an existing system is being managed or a completely new system is being designed. As water supply occurs in an existing system, costs considered 235 in this study are the operating costs. These objective functions are given as follows: Min 100% The revenue can be obtained according to: 1) Operating costs for water supply from islands ( c,island , , ,3 3 max_ 11 where Mc,island,1,Mc,island,2,and Mc,island,3 represent the water resource fees paid to the government, water fees paid to reservoir managers, and the electricity fees in Zhoushan City, respectively (RMB); cisland,1, cisland,2, and cisland,3 denote the constant vectors, representing the unit price of water resources, the unit price of water, and the electric unit price in Zhoushan, respectively 245 (RMB/m 3 ); Qisland denotes the amount of water flowing through the pumping station (m 3 /s); N denotes the numbers of a pumping set; Pn denotes the supporting motor power of the n th pump (Kw), and Qmax_n denotes the upper flow boundary of the n th pump (m 3 /s). 2) Operating costs for water supply from the mainland ( c,mainland where Mc,mainland,1,Mc,mainland,2,and Mc,mainland,3 represent the water resources fees paid to the government, water fees paid to the 250 river managers, and electricity fees in Ningbo City, respectively (RMB); cmainland,1, cmainland,2, and cmainland,3 denote the constant vectors, representing the unit price of water resources, the unit price of mainland water, and the electric unit price in Ningbo, respectively (RMB/m 3 ); Qmainland denotes the amount of water transferred from Ningbo (m 3 ); denotes the total length of the continental diversion pipeline (m); S denotes the cross-sectional area of the continental diversion pipeline (m 2 ), and Qmax denotes the upper flow boundary (m 3 /s). 255 where b denotes the unit price of water supply revenue (RMB/m 3 ), and Ws,t,i is the amount of water that a reservoir supplies to a waterworks at a given time (m 3 ).
(3) Release capacity limits: (4) Pumping station limits: , max, , where t  is the time step; i is the number of a reservoir; I and Q are the inflow and release, respectively (m 3 /s); V is the

Forecast inputs setting
In this study, five input combination scenarios were considered to investigate whether the use of data-driven methods with 265 climate forcing is efficient in inflow forecasts or not. These scenarios are described in Table 2. Pa represents antecedent precipitation, Ea represents antecedent evaporation, Qa represents antecedent streamflow, Pf represents forecast precipitation, and Ef represents forecast evaporation.

Table 2 is here
Several strategies have been proposed in the literature to tackle a multi-step-ahead forecast task (Kline, 2004), such as the 270 recursive, direct, combination of direct and recursive strategies. In this study, we chose one of the most carried out strategies, i.e., the direct strategy (Ben Taieb et al., 2012), to forecast multi-step streamflow over the short-term horizon (1-7 days). In this case, the streamflow is forecasted using the following equations, given S3 as an example.
where F() is the mapping function between inputs and outputs. 275

Multi-step deterministic forecasts based on ML methods
An issue with the ML methods is that they can easily overfit training data. To avoid this issue, the entire data is divided into three subsets in RNNs: (i) a training set, which is used to compute the gradient and update the weights and biases of the network, (ii) a validation set over which the errors are monitored during the training process and is used to decide when to stop 280 training, (iii) a test set, which is used to assess the expected performance in the future. In addition, dropout is a regularization method where input and recurrent connections to LSTM and GRU units are probabilistically excluded from activation and weight updates while training a network. The strategies mentioned above have the effect of reducing overfitting and improving https://doi.org/10.5194/hess-2020-617 Preprint. We considered the five different input scenarios described in Section 3.3. Table 3 demonstrates the forecast analysis carried out with the different configurations (input combination and forecast model), tabulating the NSE ranges for lead times from 1 day-ahead to 7 day-ahead over all reservoirs during the calibration, validation, and test periods. It can be seen that S1 using 290 only the flow variables and S2 using only the antecedent climate variables are inferior to the other scenarios. The performance is generally improved when the flow variables are used in combination with the antecedent precipitation and evaporation under S3. However, in this case, the antecedent variables succeed to forecast only at 1-day ahead. The forecast performance decreases significantly as the forecast horizon increases from 1-day to7-day ahead. Herein, we suppose that the following precipitation and evaporation have been forecasted. It is clear that S4 and S5, with the forecast climate variables, make significant increments 295 in streamflow forecasting. The NSE can remain relatively stable at different horizons. There are no apparent differences between the three forecast models during the calibration period. However, the two RNNs perform better than GWO-LSSVM during the validation period, while GWO-LSSVM outperforms during the test periods. Besides, given that GRU has more superficial structures and fewer parameters and requires less time for model training, it may be the preferred method for shortterm streamflow forecast compared with LSTM. Same results have been obtained in Gao et al. (2020) when they used LSTM 300 and GRU to model short-term rainfall-runoff relationships.

Table 3 is here
We aimed to compare how the forecasted climate variables impact the streamflow forecast and reservoir operation performance.
For the sake of brevity, S3 and S5 were compared in detail in the following section. Recall that S3 uses flow variables, antecedent precipitation, and evaporation as inputs, while S5 uses flow variables as well as the antecedent and forecast climate 305 forcing. After assessing model validity, the next step was to compare the performance across the 24 reservoirs. The coefficient of variation (COV), defined as the ratio of the standard deviation of the inflow time series, was used to capture the varying characteristics of the incoming flow into the reservoir. From Figure 5, it reveals a strong negative relationship between COV and forecast performance under S3 at all lead times. The forecast performance decreases as the COV increases for all forecast models. This indicates that the more variation the flow has, the harder it is for data-driven methods to learn the flow pattern 310 when there exists not enough input information. However, the negative signal under S5 ( Figure 6) with forecasted climate variables (precipitation and evaporation in this study) is not that strong as it under S3, which indicates again that the forecast climate variables can help AI-based models mapping functions between inputs and outputs. The improvements are more significant for the two RNN models, i.e., LSTM and GRU. This result demonstrates that the efficiency of deep-learning RNN methods is better and more accurate than LSSVM. 315

Figure 7 is here
To access model validity, the evaluation of the modeled streamflow is performed over calibration, validation, and test periods using NSE, RMSE, and MAE metrics. Table 4 shows the performance metric ranges for all 24 reservoirs of BMA methods under S3 and S5. Apparently, both the replicative (forecast performance in calibration sets) and predictive (forecast 335 performance in validation and test sets) validity under S5 for forecast horizons are significantly better than those under S3. For example, Figure 8 demonstrates the improvement rates in terms of NSE, RMSE, and MAE of the BMA model compared with the three individual models. BMA produces the maximum NSE, minimum RMSE, and minimum MAE during the calibration period for both two scenarios, indicating that BMA has the best goodness-of-fit. This is because the weights are derived according to the individual forecast model in this period. With respect to validation and test periods, the BMA method shows 340 better forecasts than the three comparative models except for the GRU modeling validation datasets under S5. Thus, it is shown that the BMA model well matches the actual streamflow.

Figure 8 is here
The model validity was then assessed using (i) hydrographs and (ii) scatter plots of observed and modeled streamflow, as 345 shown in Figure 9 and Figure 10. Herein, we only show three reservoirs, i.e., Hongqiao (the largest reservoir), Goushan (the medium reservoir), and Nanao (the smallest reservoir), for the sake of brevity. From Figure 9, it is clear that the modeled streamflow deviate gradually from the 1:1 line and the forecast skill decreases with the increase of lead time under S3 as https://doi.org/10.5194/hess-2020-617 Preprint. Discussion started: 16 December 2020 c Author(s) 2020. CC BY 4.0 License. expected, which is consistent with the statistical results shown in Table 4. In contrast, the scatters of the observed and modeled streamflow implemented with forecasted climate variables fit well across the 1:1 line at different lead times under S5, observed 350 from Figure 10. The performance for Hongqiao Reservoir is affected explicitly by an extreme peak event that hit the reservoir during the calibration period in Figure 9, which does not occur over the training set of data. This causes heavy underestimations in the streamflow forecast. A more extended calibration period is required to improve the performance over such extreme peak flow events. However, the BMA method performs well on this extreme peak flow in Hongqiao Reservoir at all lead times, when the forecast climate forcing is applied as inputs. This is because the reservoirs in the Zhoushan Islands have relatively 355 small drainage areas, and thus the flow concentrates in a very short time after an extreme rain event. in this study. To demonstrate the optimization of multi-reservoir operations based on the data-driven forecast models under 360 uncertainty, 90% confidence intervals associated with the deterministic predictions at BMA were further calculated. The confidence interval provides more alternatives that are possibly useful to a tradeoff between multiple objectives, such as flood control, hydropower generation, and improved navigation (Zhang et al., 2015). The interval performance metrics of Cr and D described in Section 2.3 were adopted to assess the performance of uncertain forecasts. between the fifth and ninety-fifth percentiles of some representative reservoirs, e.g., Hongqiao, Goushan, and Nanao reservoirs, are presented in Figure 11 and Figure 12. The results are consistent with those in Figure 9 and Figure 10. It is observed from Figure 11 that the streamflow interval fails to capture the extreme peak flow for Hongqiao Reservoir under S3. The BMA performs gradually worse with increasing lead times for the three reservoirs. However, in Figure 12, the red dots represent the observed streamflow, most of which are covered by the 90% interval at both 1-day ahead and 7-days ahead. Therefore, the 370 forecast climate variables would be conducive to reduce the predictive uncertainty of real-time streamflow forecasting.

Multi-objective reservoir operation optimization results
For the short-term forecasting and reservoir operation purpose, a forecast horizon of 1-7 days ahead was chosen. The model was operated following the MORDM approach under a rolling horizon scheme. Under parameterized MORDM, the decision variables in the optimization problem were not the volumes of water to be transferred from Ningbo City and the remaining 24 reservoirs each day. Instead, the decision variables were the parameters of the RBF policies. The best operation was obtained by conditioning the operating policies upon the following two input variables, e.g., the previous reservoir levels and current inflow. The multi-reservoir operation optimization using inflow forecasts was performed over one year (April 1st, 2007-March 31st, 2008 with 25 reservoirs. The period was selected to ensure that it did not cover the calibration datasets. The optimization was solved at each time step (a particular forecast horizon, e.g., 1-7 days) by applying NSGA-II to search the space of decision variables and identify the islands' water allocation trajectories. 385 The optimized operations were both regulated based on deterministic and uncertain forecast inflow. To demonstrate the relationship between the conflicting objectives, a set of Pareto solutions over a 7-day horizon at different periods under S5 is given as an example, as shown in Figure 13. The optimization using the Pareto concept allows the operator to choose an appropriate solution depending on the prevailing circumstances and analyzing the tradeoff between the conflicting objectives.
In each of the plots, the water deficiency ratio of Daobei Plant and the sum of the remaining plants are plotted on the x and y 390 axes, respectively. The color of the markers indicates the net operating costs with color ranging from red, representing low value, to blue, representing high value. Thus, an ideal solution would be located at the left corner (low value of the water deficiency ratio of Daobei Plant and the sum of the remaining three plants) of the plot and represented by a red (low net operating costs) marker. The black arrows have been added in the figure to guide the reader in understanding the directions of optimization. Generally, the water deficiency ratio of Daobei Plant has an inverse relationship with that of the sum of the 395 remaining plants (inverse relationship, i.e., the former decrease with the increase of the latter). In contrast, the water deficiency ratio of the remaining three plants has a positive relationship with the net costs (positive relationship, i.e., the former increase with the increase of the latter).
It is interesting to compare the performances associated with deterministic and uncertain forecasts. Uncertain conditions ( Figure 13(b)) show a much broader scale on the three objectives than deterministic conditions (Figure 13(a)). For instance, 400 uncertain forecasts produce the water deficiency ratio of Daobei Plant, ranging from -40% to 80%, during 2007 August 12th -2007 August 18st, while deterministic forecasts have a smaller range with a value from 30% to 100%. The water supply deficits under deterministic forecasts are due to the high demand happening in August, which can be mitigated when informing the reservoir operations with uncertain forecasts. In this way, we expect that if the ensemble streamflow forecasts are used in a stochastic optimization scheme, the reservoir operation could be further enhanced because the optimization considers possible 405 uncertainty provided by uncertain forecasts and thus takes advantage of correcting the influences of uncertainty.

Reservoir operation performance evaluation
In general, forecasts are always useful for reservoir operations. The annual revenues, costs, and water supply reliability, were chosen as metrics to compare the performance of the operating policies derived from different configurations. Reliability is a 410 measure of how well the water demand for users is met in a water transfer system. In this case, reliability is expressed as a percentage. The system performances are averaged over a set of solutions. The annual values during the period from 2007 April 1 st to 2008 March 31 st at various configurations are provided in Table 6 with two decision horizons of 1 and 7 days. The https://doi.org/10.5194/hess-2020-617 Preprint. Discussion started: 16 December 2020 c Author(s) 2020. CC BY 4.0 License. multi-reservoir operation based on observation is designed as a benchmark. It can be seen from Table 6 that the performance indicators from the 1-day forecast horizon are better than those from 7-day using deterministic inflows (in the case of observed 415 and forecasted inflows). Two scenarios (S3 and S5) with the 1-day forecast horizon show similar operating performance, which is consistent with the performance of the inflow forecast listed in Table 3. Recall again that S3 uses flow variables, antecedent precipitation, and evaporation as forecast inputs, while S5 uses flow variables as well as the antecedent and forecast climate forcing. In contrast to S3, the operating results of S5 with a 7-day forecast horizon are closest to that of the observation. This is due to the improved inflow forecast performance under S5. However, it is depicted in Table 6 that the indicator of water 420 supply reliability and net costs under S5 are inferior to those under S3. As for the stochastic forecasts, S5 outperforms S3 with lower net costs and approximate water supply reliability. In this case, the improved performance may not lead to improved decisions in deterministic forecasts.

Table 6 is here
The results obtained in Table 6 show that system performance derived from the observed inflows is inferior to that from other 425 configurations. This finding cannot confirm the effectiveness of inflow forecasts. The reason for that is the forecast inflows may overestimate the actual inflows. For example, the mean value (0.14 m 3 /s) of the observed inflow of Hongqiao Reservoir is lower than that of the forecasted inflow (0.17 m 3 /s). In this case, the good performance presented in Table 6 is 'fake'. That is to say, although decision-makers can follow the strategies determined by the forecasted inflows, the system performance should be assessed using the actual inflows (i.e., observed inflows). We further re-evaluated the operating strategies optimized 430 from different configurations mentioned above using the observed inflows. The performance metrics were listed in Table 7. It is expected that the results can reveal the maximum efficiency and reliability that could be achieved based on accurate information. In general, the indicator values under deterministic forecasts in Table 7 are reduced compared with those in Table   6. The reason is that reservoir operating decisions in Table 6 are optimized based on a higher inflow series. In terms of both deterministic and uncertain forecasts, net operating costs of S5 are improved significantly compared with that of S3, while water supply reliability is increased slightly. This result may suggest that improved forecasts are more skillful in making decisions when using forecast climate variables as inputs. We highlight that this result we obtained is specific for the Zhoushan Islands. Indeed, many studies show that higher forecast performance did not lead to better operation decisions (Chiew et al., 2003;Goddard et al., 2010;Turner et al., 2017). However, some researchers draw the same conclusions as us. 440 For instance, Anghileri et al. (2016) declared that inflow forecasts with accurate weather components would produce much smaller water supply deficits. Moreover, Anghileri et al. (2019) found that preprocessed forecasts (higher performance) were more valuable than the raw forecasts (less performance) regarding two operation performance metrics, i.e., mean annual revenues and spilled water volume.
There is also an interesting finding from that the operating performance upon deterministic forecasts deteriorates, while the 445 performance upon uncertain forecasts can keep relatively stable. This implies that the use of uncertain forecasts in reservoir https://doi.org/10.5194/hess-2020-617 Preprint. Discussion started: 16 December 2020 c Author(s) 2020. CC BY 4.0 License. operation can be more efficient and reliable than that of deterministic forecasts. The reason is that in a stochastic optimization scheme, the value could be further enhanced because the optimization could account for the total uncertainty provided by the ensemble forecasts. Similar results were obtained by Roulston and Smith (2003), who reported that the hydroelectric power production derived from the ensemble forecasts was increased compared with the deterministic forecasts. Boucher et al. (2012)  450 also found that stochastic forecasts outperformed deterministic ones with the lower turbinate flow, higher generation production, and less spillage during a flood period. Overall, in most cases, a noticeable improvement can be achieved through the use of the stochastic decision-making assistance tool.
We then assessed the performance metrics of water supply reliability over different seasons. It is noted in Figure 14 that the deterministic forecasts are less skillful than the uncertain forecast when used in spring (JFM), summer (AMJ), autumn (JAS), 455 and winter (OND) with the two forecast horizons. Although the operating performance using the deterministic forecast is lower due to its deterministic character, the main characteristics of the relationship between forecast quality and value remain unchanged. That is to say, the benefits of considering the forecasts are more significant when the forecast quality is higher. It indicates that the optimization is capable of exploiting efficient information to improve reservoir operations. In our multiobjective optimization modeling, we would like to make the best use of water resources and maximize water supply. However, 460 the operating performance in autumn shows a lower value with respect to that in other seasons. This is because the water demand in autumn is usually much higher. The shortage does not imply the non-effectiveness of our proposed forecast-based management framework but is due to the limitation of available water and pies capacity.

Reservoir operation performance with different forecast horizons 465
The impact of different forecast horizons on the operation performance was further evaluated under different configurations, as shown in Figure 15. It is noted that the operating policy optimized from uncertain forecast inflows upon S5 outperforms that from S3. In terms of deterministic conditions, S5 improves the operation on the metrics of water supply reliability of Daobei Plant, water supply reliability of the other plants, and net costs with a variation of 2.11~13.58%, 2.74~7.38%, and -19.94~-10.30%, respectively, compared with S3. As for uncertain conditions, S5 improves by 0.24~1.90%, 0.06~1.32%, and 470 -59.45~-176.19%, respectively. Although the increments in water supply reliability are not insignificant, S5 can secure water demand with much less operating costs than S3, which decision makers value most. Furthermore, uncertain forecasts produce an improved ratio of 31.52~65.01%, 19.98~46.60%, -116.45~-56.95% than deterministic forecasts regarding the three metrics, respectively. Our results again highlight that uncertain forecasts are more valuable than deterministic forecasts when designing the forecast-informed reservoir operations. 475

Figure 15 is here
With an increase in forecast horizon from 1 to 7 days, the performance in water supply reliability and net operating costs upon deterministic conditions are generally reduced. This suggests that considering a longer forecast horizon (up to 7 days) does not https://doi.org/10.5194/hess-2020-617 Preprint. Discussion started: 16 December 2020 c Author(s) 2020. CC BY 4.0 License.
necessarily improve reservoir operation without future forecast climate variables as inputs (low forecast quality). The reduced performance in water supply reliability might be due to the fact that the optimization explores strategies to secure the whole 480 water demand in a longer-horizon, which results in reliability sacrification on some particular days. This result is similar to the finding proposed in Xu et al. (2014) who argued that the use of longer-horizon (an efficient forecast horizon longer than one day) inflows cannot improve hydropower performance when they set the forecast horizon as one to five days. Nevertheless, the increasing forecast horizon may not generate improved or decreased water supply reliability in uncertain conditions. Approximate water supply volume can lead to similar revenues or fees paid to the government and managers (water fees and 485 water resources fees). Accordingly, the growing trend in net costs is caused by the increased operating costs, mainly dominated by electricity prices, when the multi-reservoir is operated to supply the demand in a longer-horizon. In this case, the operation performance varies at different conditions. This demonstrates that the relationship between forecast horizon and reservoir operation is rather complex and depends not only on the configurations (i.e., inflow forecast quality and uncertainty) used to determine operating rules but also on the performance metrics used to assess operation. 490 Our work suffers from some limitations, which could be overcome in future studies. One of the limitations is that we used the average observed price to calculate the revenues and operating costs. In an operational and deregulated market setting, the prices may fluctuate significantly (Anghileri et al., 2019). For instance, forecasting electricity prices is likely to improve significantly short-term operation efficiency. The combined effects of price and streamflow forecasts on water resource allocation are worth investigating in future studies. Another limitation is that instead of the short-term weather forecasts from 495 the Global Forecast System (GFS) or European Centre for Medium-Range Weather Forecasts (ECMWF) model (Choong and El-Shafie, 2015;Schwanenberg et al., 2015;Peng et al., 2018;Ahmad and Hossain, 2019;Liu et al., 2019), we used the observed weather conditions as alternatives, which may result in an overestimation in forecast quality. However, forecast uncertainty and error that generally grow up with lead time. The usefulness of the forecast information can be reduced with the increase of the forecast horizon, and thus the operating performance. This may influence the finding we highlight above 500 that the relationship between forecast horizon and reservoir operation is not constant and specific. It would be interesting to analyze the reservoir operations performance when accounting for an ensemble numerical weather prediction.

Conclusions
In this study, we proposed an AI-based management methodology to assess forecast quality and forecast-informed reservoir operation performance together. The approach was tested on a water resources allocation system in the Zhoushan Islands, 505 China. Specifically, the findings obtained are summarized below.
A data-driven reservoir inflow forecasting using ML methods (LSTM, GRU, and GWO-LSSVM) was first developed with a comprehensive calibration-validation-testing framework. The validity of the deterministic forecast was demonstrated by applying it over 25 reservoirs with varying climate and hydrological characteristics. Results showed that the more variation the streamflow has (a high COV value), the harder it was for the ML methods to learn the flow pattern when there existed not 510 enough input information. The forecast skill deteriorated with increasing lead times under such scenarios. However, shortterm forecast climate forcing was efficient and scalable in forecasting the multi-reservoir inflow over the forecast horizon (1-7 days). LSTM and GRU models generated comparable performance under different configurations. Given that GRU has simpler structures and fewer parameters and required less time for modeling, it might be the preferred method for streamflow forecasts than LSTM. 515 Then we used BMA to generate stochastic inflow scenarios for quantifying uncertainty based on LSTM, GRU, and GWO-LSSVM deterministic forecasts. The results demonstrated that it was difficult to conclude which individual model provided the best prediction, but the BMA did display better forecast skill in comparison to the individual ones. Including one scenario with antecedent conditions and one scenario with both antecedent and forecast information, two input combination scenarios were compared on the uncertain forecast performance in detail. The comparison indicated that forecast climate variables would 520 help reduce the predictive uncertainty of short-term streamflow forecasting.
The forecasting scheme was further coupled with a multi-objective reservoir operation model to optimize water resources allocation. Using a MORDM approach, we identified strategies that tradeoff between water supply reliability and operating costs in the Zhoushan Islands. A rolling horizon scheme was employed to obtain optimal operating policy over the horizon of 1-7 days. The long-term assessment over a year based on deterministic and stochastic forecasts showed quite different 525 performances in terms of water supply reliability and net operating costs. Our averaged annual results showed that uncertain forecasts were more valuable than deterministic forecasts. The operating benefits of considering the forecasts were more significant when the forecast quality was higher. Similar results could be obtained at a seasonal scale. While showing the unquestionable benefit of implementing forecast-based reservoir operations, our results also demonstrated that the relationship between forecast horizon and reservoir operation was complex and depended on the operating configurations (forecast quality 530 and uncertainty) and performance measures for the Zhoushan Islands system.
Overall, the developed AI-based management framework has demonstrated a clear advantage in quantifying the uncertainties of inflow forecasts to improve the overall system performance of water allocation systems. Such a framework can be further applied to other study sites with similar problems. However, the results we obtained in this study are only specific for the Zhoushan Islands and should be exported with care to other study sites. 535 Data availability. The data used to support the findings of this study are available from the corresponding author upon request.                     Forecast (m     https://doi.org/10.5194/hess-2020-617 Preprint. Discussion started: 16 December 2020 c Author(s) 2020. CC BY 4.0 License.