the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Quantifying and reducing flood forecast uncertainty by the CHUPBMA method
Zhen Cui
Hua Chen
Dedi Liu
Yanlai Zhou
ChongYu Xu
The Bayesian model averaging (BMA), hydrological uncertainty processor (HUP), and HUPBMA methods have been widely used to quantify flood forecast uncertainty. This study proposes the copulabased hydrological uncertainty processor BMA (CHUPBMA) method by introducing a copulabased HUP in the framework of BMA to bypass the need for a normal quantile transformation of the HUPBMA method. The proposed ensemble forecast scheme consists of eight members (two forecast precipitation inputs; two advanced long shortterm memory, LSTM, models; and two objective functions used to calibrate parameters) and is applied to the interval basin between the Xiangjiaba and Three Gorges Reservoir (TGR) dam sites. The ensemble forecast performance of the HUPBMA and CHUPBMA methods is explored in the 6–168 h forecast horizons. The TGR inflow forecasting results show that the two methods can improve the forecast accuracy over the selected member with the best forecast accuracy and that the CHUPBMA performs much better than the HUPBMA. Compared with the HUPBMA method, the forecast interval width and continuous ranked probability score metrics of the CHUPBMA method are reduced by a maximum of 28.42 % and 17.86 % within all forecast horizons, respectively. The probability forecast of the CHUPBMA method has better reliability and sharpness and is more suitable for flood ensemble forecasts, providing reliable risk information for flood control decisionmaking.
 Article
(6484 KB)  Fulltext XML
 BibTeX
 EndNote
Accurate and reliable flood forecasting is one of the necessary measures to reduce flood disasters and improve water resource utilization (Zhou et al., 2019; Vegad and Mishra, 2022). With the development of hydrological theory and flood forecasting techniques, the flood forecasting accuracy and lead time have been significantly improved in recent years (Xu et al., 2023; Cui et al., 2023). However, neither physically based and conceptual hydrological models nor datadriven models can guarantee obtaining perfect forecasting in real conditions. Because of the influence of the changing environment and the limitations of the human perception of complex hydrological processes, the meteorological forcing and other inputs, hydrological model structure, and parameters, etc. contain significant uncertainties (Cloke and Pappenberger, 2009), which leads to the simulation and forecast results of the model inevitably containing integrated uncertainties from multiple sources (Liu et al., 2022). Traditional flood forecasting schemes are mostly deterministic forecast results without considering forecast uncertainty (Zhong et al., 2018a; Gelfan et al., 2018), which makes decisionmakers unable to grasp useful risk information beyond the forecast value. Excessive superstition regarding a single forecast value will likely lead to poor decisionmaking (Krzysztofowicz, 1999). Therefore, it is essential to quantify and reduce flood forecast uncertainty in practical applications.
Probabilistic flood forecasting is one of the effective methods of quantifying integrated forecast uncertainty (Matthews et al., 2022). It provides not only a deterministic forecast value, but also forecast uncertainty (or risk) information by means of a quantile, confidence interval, or density function (Biondi and Todini, 2018; Ferretti et al., 2020; Zhou et al., 2022), which is more scientifically reasonable and practically useful compared with deterministic forecasts and helps decisionmakers consider forecast risk quantitatively (Todini, 2008). Various probabilistic forecasting methods based on statistical postprocessing of numerical forecast data have been developed in recent years. Among these methods, probabilistic ensemble forecasting is considered to be able to overcome the limitations of a single model or a simple average with fixed model weights (Han and Coulibaly, 2017) and contains richer forecast information because it can consider the ensemble forecast results of multiple models to quantify and reduce integrated uncertainty that contains uncertainties in the inputs, model structure, and parameters (Li et al., 2017; Saleh et al., 2016). Bayesian model averaging (BMA), proposed by Raftery et al. (2005), uses the Bayesian theory and a total probability formulation to transform ensemble forecasts into probabilistic forecasts and is one of the most representative and reliable methods that has been widely used to supplement uncertainty information beyond point estimates (Shu et al., 2022).
The BMA method has been applied to temperature, precipitation, and wind speed ensemble forecasts of meteorological forcing (Raftery et al., 2005; Sloughter et al., 2007, 2010). After confirming that the BMA method can effectively quantify forecast uncertainty and obtain highly accurate deterministic forecasts, it is now widely used in hydrological forecasting to quantify forecast uncertainty from different sources, such as model inputs, structure, and parameters. The standard BMA method assumes that each member's posterior probability distribution approximately obeys a normal distribution (Huang et al., 2019; Guo et al., 2021). However, some variables, such as wind speed, rainfall, and runoff, usually obey skewed distributions and require methods such as Box–Cox to convert nonGaussian variables to standard normal variables that affect the accuracy of probability distribution estimation (Duan et al., 2007; Liu et al., 2018). Many authors have investigated the applicability of BMA in flood ensemble forecasting and tried to overcome its limitations (Madadgar and Moradkhani, 2014; Darbandsari and Coulibaly, 2020). Sloughter et al. (2010) proposed an improved BMA method by assuming that the posterior probability distribution of each member could obey a specific nonnormal distribution (e.g., Gamma distribution) and using the member forecast values to estimate the mean and variance of the distribution. Madadgar and Moradkhani (2014) introduced the Copula function to solve the posterior probability distribution of members in the BMA method and proposed the Copulabased BMA method, which avoids the assumption of the posterior probability distribution and further reduces the application limitation of the BMA method. In order to ensure that the quantiles of forecast distributions after the Box–Cox transformation are within the actual physical range, Baran et al. (2019) introduced upper and lower truncated normal distributions into the BMA and found that the double truncated BMA had reliable forecasting ability compared to ensemble model output statistics. The advantage was more obvious when rolling window training periods are used. Hemri et al. (2013) introduced the principle of geostatistical output perturbation to the BMA method and proposed a multivariate BMA, which extended the membership probability distribution into a multivariate normal distribution function. Relative to the univariate BMA method, the multivariate BMA can not only consider the temporal correlation between forecast flows, but also improve the forecast reliability when the forecast system is changing; i.e., fewer models are available due to dropping out at particular lead times. Meanwhile, the BMA method usually ensembles the forecast results of multiple models to be as close to the actual values as possible. However, too many ensemble members may generate redundant information. Darbandsari and Coulibaly (2020) introduced the Shannon entropy theory to select the forecast members that satisfy the above conditions before applying BMA. Their results showed that the BMA method incorporating entropy could improve the probabilistic forecasting performance for high flows over the standard BMA method. In addition, some studies have developed various methods based on the BMA principle, such as the multimodel ensemble forecasting method based on a vine copula (Zhang et al., 2022) and the combination of BMA and data assimilation techniques (Parrish et al., 2012).
However, most studies ignore an essential issue: the BMA does not consider the constraint of initial conditions (i.e., observed flow at the start of the forecast). It can be shown from Raftery et al. (2005) that the conditional distribution of the member (Q_{f,i}) in the BMA is assumed to follow the normal distribution with expectation ${\mathit{\mu}}_{i}={a}_{i}+{b}_{i}\cdot {Q}_{\mathrm{f},i}$ (a_{i} and b_{i} are the bias correction coefficients) and variance σ_{i}, which implies that the conditional distribution is only related to the member's forecasted flow and not affected by the observed flow at the forecast start time. It is unreasonable to produce the same posterior distribution when the forecast results are the same at different moments.
The hydrological uncertainty processor (HUP) can obtain the posterior distribution function of the actual value under the condition of the forecast value and the observed flow at the start time based on Bayesian principles and the assumption of perfect rainfall forecasting (Krzysztofowicz and Kelly, 2000). Darbandsari and Coulibaly (2021) firstly utilized the HUP method to derive the posterior distribution of each member considering the initial constraints and then used the BMA method to weight the conditional distribution of all members to obtain the final posterior distribution, which is called the HUPBMA method. Their results showed that the HUPBMA method outperforms the HUP method and improves the BMA method in shortterm probabilistic forecasting. In addition, the derivability of the posterior distribution for the ensemble members is theoretically enhanced, the heteroskedasticity of the ensemble members is considered, and the interpretability and logical rationality of the BMA method are improved.
Although it has been demonstrated that considering initial conditions in the BMA method can improve ensemble forecast performance, there are still issues to be explored. The HUPBMA method requires a normal quantile conversion method to convert the flow data series to Gaussian space to solve the posterior distribution. The process is not only tedious and complicated, but also prone to bias in the inverse conversion. To this end, Liu et al. (2018) adopted a copula to derive the conditional distribution of the observed flow under the conditions of the forecasted flow, which avoids the assumption that the flow series obeys a normal distribution in the HUP and relaxes the application limitation. The study shows that the copulabased hydrological uncertainty processor (CHUP) can improve the probabilistic forecasting performance of the HUP method. It is anticipated that coupling CHUP with the BMA may improve the HUPBMA accuracy and applicability, which motivates the current study.
The main innovations and research steps are shown as follows: (1) a novel CHUPBMA method is proposed for the first time by coupling CHUP with BMA, which not only avoids the normal distribution assumption in HUPBMA, but also considers the constraints of the initial condition of the forecast. (2) An ensemble forecast containing eight members is constructed by combining two types of forecast precipitation, two long shortterm memory (LSTM) models, i.e., the recursive encoder–decoder (RED) structurebased LSTMRED model and the feature–temporal dualattentionbased DALSTMRED model, and two objective functions of model calibration. (3) The ensemble forecast performance of the proposed method is analyzed and discussed in comparison to the HUPBMA benchmark method in terms of the deterministic and probabilistic forecasts. The interval basin between Xiangjiaba Dam and the Three Gorges Dam in the Yangtze River, China, is selected as case study.
The rest of the paper is organized as follows. Section 2 introduces the case study and materials. The methods are presented in Sect. 3. Section 4 evaluates the deterministic and ensemble forecast results. Conclusions and prospects are given in Sect. 5.
2.1 Study basin
The Three Gorges Reservoir (TGR) is the largest hydraulic project in the world and plays a vital role in flood control, power generation, and other water resource management issues (Zhong et al., 2020). The TGR controls a watershed area of about 1×10^{6} km^{2}. The total reservoir capacity is about 39.3×10^{9} m^{3}, with a flood control capacity of about 22.15×10^{9} m^{3}.
The TGR inflow is directly influenced by the runoff yield of the cascade reservoir interval basin between Xiangjiaba and TGR (Fig. 1), with a basin area of about 127 400 km^{2} (Zhou et al., 2019). The inflow of the TGR consists of the outflow discharge from the Xiangjiaba Reservoir; the inflow of several tributaries such as Min, Tuo, Jialing, and Wu rivers; and the rainfall of the interval basin. The flow sources are complex and have different effects on the TGR inflow. Moreover, TGR is a rivertype reservoir with a length of about 600 km at the normal storage level (175 m) and an average width of only 1.1 km, resulting in uncertainty in rainfall intensity and stormcenter positioning (Zhong et al., 2020). Therefore, there is significant uncertainty in the flood forecast of TGR. It has been a major challenge to quantify and reduce forecast uncertainty.
Table 1 shows the flow propagation time from the hydrological control stations of the mainstream and tributaries to the TGR dam. The outflow discharge of Xiangjiaba Reservoir, located on the Jinsha River, is observed at the Pingshan hydrological station and represents the mainstream flow. The discharge values from large tributaries (Min, Jialing, Tuo, and Wu rivers) are observed at the Gaochang, Fushun, Beibei, and Wulong hydrological stations, respectively.
Considering the uneven distribution of rainfall intensity because of the narrow and long basin, the interval basin between the Xiangjiaba and TGR dam sites is divided into three subbasins: Pingshan–Cuntan, Cuntan–Wanxian, and Wanxian–TGR dam site. Their watershed areas are 76 900, 22 900, and 27 600 km^{2}, respectively. Meanwhile, there are 45, 38, and 60 gauged rainfall stations in these three subregions, respectively.
2.2 Study materials
This study collects 6 h observed flow discharges at the TGR dam site and five hydrological stations (Table 1) and 6 h observed rainfall in the interval basin during the 2010–2021 flood season (May–September). The Thiessen polygon method is used to calculate areal average rainfall using rainfall station data for each subbasin area. Meanwhile, this study collects the forecasted precipitation data issued by the European Centre for MediumRange Weather Forecasts (ECMWF) and the Hydrology Bureau of the Yangtze River Water Resources Commission (HBYRWRC) for the 2017–2021 flood season in the three subbasins. Their forecast time starts at 08:00 LT, with the 6–168 h forecast horizons and the 6 h forecast interval. The spatial resolution of each grid for the ECMWF forecasted precipitation is 0.125°×0.125°. The HBYRWRC forecasted precipitation is the areal average forecasted precipitation data.
The training period is from 2010 to 2016, and the validation period is from 2017 to 2021. Since the precipitation forecast starts at 08:00 LT, the forecasted flow for the 6–168 h forecast horizons is also calculated from the 08:00 LT daily in the validation period.
3.1 Proposed CHUPBMA method
3.1.1 Bayesian model averaging (BMA)
Bayesian model averaging (BMA) method's principle is as follows:
where p(⋅) denotes the probability density function. Q_{o} denotes the observed flow corresponding to the forecast moment (target value). k is the number of ensemble members. Q_{f} denotes the forecasted flow of ensemble members. w_{i} denotes the weight of the ith model. p(Q_{o}Q_{f,i}) denotes the conditional probability density of Q_{o} conditional on Q_{f,i}, which is assumed to approximately obey a normal distribution with the expectation of ${\mathit{\mu}}_{i}={a}_{i}+{b}_{i}\cdot {Q}_{\mathrm{f},i}$ and variance of σ_{i}. a_{i} and b_{i} are the bias correction coefficients obtained by the linear fitting of Q_{f,i} to Q_{o}.
Therefore, Eq. (1) can be rewritten as follows:
From Eq. (2), it can be seen that the BMA method does not consider the influence of the initial state (the actual observed flow at the start of the forecast) on the posterior distribution. When the member forecasts at different times are the same, the posterior probability distribution generated by the BMA is also the same, which lacks logical rationality.
3.1.2 Hydrological uncertainty processor (HUP)
Based on the assumption that the precipitation uncertainty is zero, the posterior distribution of Q_{o} conditional on Q_{f,i} and Q_{b} is as follows:
where p(Q_{o}Q_{b}) is the prior density function, $p({Q}_{\mathrm{f},i}\mathrm{}{Q}_{\mathrm{o}},{Q}_{\mathrm{b}})$ is the likelihood density function, and $p({Q}_{\mathrm{o}}\mathrm{}{Q}_{\mathrm{f},i},{Q}_{\mathrm{b}})$ is the posterior density function.
The HUP method assumes that flow series transformed to normal space obey the Gaussian distribution. The cumulative distribution function is different for forecasted and observed flows. The common normal quantile transformation is key to the application of the HUP method and is significant for making the HUP method applicable to variables with any marginal distributions, heteroskedasticity, and nonlinear dependence structures (Krzysztofowicz and Kelly, 2000; Darbandsari and Coulibaly, 2021).
where P(⋅) denotes the probability distribution function. ${N}^{\mathrm{1}}(\cdot )$ denotes the inverse function of the standard normal distribution. ${\widehat{Q}}_{\mathrm{o}}$ and ${\widehat{Q}}_{\mathrm{f},i}$ are the observed and forecasted flow transformed to the normal space, respectively.
The HUP method also assumes that the observed flow obeys the strictly stationary firstorder Markov process (Krzysztofowicz and Kelly, 2000); i.e., the flows between adjacent forecast horizons obey the linear constraint after the normal transformation.
where ${\widehat{Q}}_{\mathrm{o},t}$ is the observed flow corresponding to the tth forecast horizon. c is the regression coefficient. ε is the residual, obeying $N(\mathrm{0},\mathrm{1}{c}_{t}^{\mathrm{2}})$.
The prior density function expressions are as follows:
where n(⋅) denotes standard normal density function and ${\widehat{Q}}_{\mathrm{b}}$ is the observed flow at the start of the forecast transformed to the normal space.
${\widehat{Q}}_{\mathrm{b}}$, ${\widehat{Q}}_{\mathrm{o}}$, and ${\widehat{Q}}_{\mathrm{f},i}$ are assumed to obey a linear relationship. The expression of the likelihood function in normal space is as follows:
where θ_{t} is an independent variable obeying $N(\mathrm{0},{\mathit{\sigma}}_{t}^{\mathrm{2}}$). a_{t}, d_{t}, and b_{t} are regression coefficients.
The posterior density function under normal space can be derived by substituting Eqs. (6) and (7) in Eq. (3):
The posterior distribution function under the normal space can be converted to the original space by Jacobian transformation (Liu et al., 2016). The posterior density function of Q_{o,t} under ${Q}_{\mathrm{f},i,t}$ and Q_{b} conditions is as follows:
where J(⋅) is the Jacobian transformation function.
3.1.3 HUPBMA method
Darbandsari and Coulibaly et al. (2021) applied the hydrological uncertainty processor (HUP) to the ensemble forecast members, substituted the posterior density function obtained by the HUP method (Eq. 9) in the BMA framework (Eq. 2), and then obtained the posterior distribution function of the target flow based on the initial state and the forecasted flow of the ensemble member. Therefore, the expression of the HUPBMA method is as follows:
3.1.4 Copulabased HUPBMA (CHUPBMA) method
Copulabased HUP
According to Sklar's theorem (Sklar, 1959), the joint distribution of m variables is as follows:
where C_{m}(⋅) denotes the mdimensional copula distribution.
The copulabased HUP method (CHUP), which can avoid the normal quantile transformation process of the flow series in the standard HUP method, was proposed by Liu et al. (2018). With the help of the copula function, the prior density function in Eq. (3) can be derived as follows:
where c_{m}(⋅) denotes the mdimensional copula density function. m denotes the dimension.
The likelihood density function in Eq. (3) can be derived as follows:
The posterior density function in Eq. (3) can be derived as follows:
Copulabased HUPBMA method
Applying CHUP to the ith ensemble member, the posterior probability distribution function $p({Q}_{\mathrm{o}}\mathrm{}{Q}_{\mathrm{f},i},{Q}_{\mathrm{b}})$ of Q_{o} based on Q_{f,i} and Q_{b} can be obtained. Coupling $p({Q}_{\mathrm{o}}\mathrm{}{Q}_{\mathrm{f},i},{Q}_{\mathrm{b}})$ with the BMA framework, the copulabased HUPBMA (CHUPBMA) method can be constructed and Eq. (2) can become as follows:
The forecast uncertainty is quantified by the forecast interval with a 90 % confidence level. Before constructing the copula, selecting the marginal distribution and the copula type is usually necessary. This study intends to select the appropriate marginal distribution and copula function from five common distribution functions, such as Pearson type III (PIII), gamma, normal, lognormal, and Weibull, and five common copula functions, such as the Gumbel–Hougaard, Frank, Clayton, Student t (Student), and Gaussian copulas, according to the root mean square error (RMSE) minimization criterion, respectively. For the definition and mathematical expressions of copula functions, refer to Liu et al. (2018) and Chen and Guo (2019).
Darbandsari and Coulibaly (2021) demonstrated that the HUPBMA method could improve the probabilistic forecasting performance of the HUP and BMA methods in the short forecast horizons. Therefore, this paper focuses on analyzing and comparing the performance of the HUPBMA and CHUPBMA methods. The HUPBMA and CHUPBMA methods only calibrate the ensemble members' weights through the expectation–maximization (EM) algorithm (Darbandsari and Coulibaly, 2021). Meanwhile, since the forecast accuracy of ensemble members may change with time due to seasonality and other factors (Zhong et al., 2020), the sliding window approach is used to update the weighting parameters. Parrish et al. (2012) and Darbandsari and Coulibaly (2020) have shown that the BMA method with the sliding window can obtain better probabilistic forecast performance compared to the method without the sliding window.
3.2 Ensemble forecasting scheme
An ensemble forecast scheme containing multisource uncertainties in the model input, the model structure, and the parameter is constructed using a multimember approach consisting of two forecasted precipitations, two models, and two objective functions used to calibrate parameters, as shown in Fig. 2.
3.2.1 Model input uncertainty
There are five flow discharge inputs from five large tributaries (Jinsha, Min, Jialing, Tuo, and Wu rivers) in our case study. The flow discharges are observed at the Pingshan, Gaochang, Fushun, Beibei, and Wulong controlled hydrological stations, respectively. Since these observed (or forecasted) flows are regulated by their respective upstream cascade reservoirs, these flow data inputs are more accurate than the rainfall inputs. This study collected the forecasted precipitation data from the European Centre for MediumRange Weather Forecasts (ECMWF) and HBYRWRC in these three subbasins. Since the rainfall data are more diverse and have relatively large uncertainty, the forecast rainfall input variable is used to explore the impact of forecast rainfall uncertainty on the TGR inflow forecasts. The TGR is a rivertype reservoir, so building a river confluence model for flood forecasting is necessary. The observed and forecasted precipitations are converted into the effective precipitation in the three subbasin areas, which accounts for the losses of plant reception, infiltration, evaporation, etc. The rainfall–runoff relationship (Fedora and Beschta, 1989) is commonly used in the Yangtze River basin to calculate the effective precipitation. The antecedent precipitation index, which is the key variable of the method, can be calculated by the following equation to represent the soil moisture content (Zhong et al., 2018b):
where P_{a,t} denotes the antecedent precipitation index on the tth day, P_{t} is the daily precipitation, I_{m} is the water storage capacity of the basin, and k denotes evaporation reduction index.
The values of k and I_{m} for these three subbasins are listed in Table 2, which are obtained from the HBYRWRC. Since the rainfall–runoff relationship graph method has been widely used for runoff generation calculation in the Yangtze River basin, the rainfall–runoff relationship between the Xiangjiaba and Three Gorges dam sites uncontrolled interval basin is established and plotted in Fig. 3 and is used to calculate the effective precipitation based on the antecedent precipitation index (P_{a}) and observed (or forecasted) precipitation for these three subbasins.
After obtaining the daily antecedent precipitation index at 08:00 LT, the antecedent precipitation index for the 6 h timescale is calculated as follows:
where ${P}_{\mathrm{a},t,m}$ denotes the antecedent precipitation index at m:00 LT on the tth day. $\sum {P}_{t,n}$ denotes the cumulative observed precipitation from 08:00 to m:00 LT on the tth day. h denotes the time gap from 08:00 to m:00 LT on the tth day.
3.2.2 Model structure uncertainty
The TGR inflow forecasting is influenced by the upstream mainstream and tributary reservoir scheduling decisions, the rainfall intensity and distribution in the interval basin, and the changes in the subsurface characteristics, which is challenging to establish complex and physically based hydrological models (Yang et al., 2019; Cho and Kim, 2022; Hauswirth et al., 2023). The simulation or forecast accuracy in this interval basin needs to be improved to meet the needs of the work. Therefore, two advanced datadriven models for obtaining multistepahead flood processes forecasting, namely the long shortterm memory model based on a recursive encoder–decoder structure (LSTMRED) and the coupled dual attention LSTMRED (DALSTMRED) model, are used for confluence calculations as a way to consider the uncertainty in the model structure. Since the forecast data series at the outlets of tributaries are inconsistent, the observed flow at the outlets of five large tributaries is used to train and validate the proposed models.
Long shortterm memory model based on encoder–decoder structure
The structure of the LSTM neural network includes the forgetting gate, input gate, updating the state of the memory unit, and output gate (Hochreiter and Schmidhuber, 1997). The forgetting gate can select the relatively important information in the previous memory unit. The input gate can select useful information from the input variables in the current moment. The memory unit state can store relatively important information extracted from historical moments, which is updated under the control of the forgetting gate and the input gate. The output gate selects and outputs useful information from the memory cell state. More detailed procedures of the LSTM neural network formulation have been described by Kratzert et al. (2018).
This study nests an LSTM neural network into a recursive encoder–decoder (RED) structure that can be obtained for forecasting flood processes to build an LSTMRED model. Among them, the RED structure is similar to that of Kao et al. (2020). The description of the LSTM neural network can be found in Cui et al. (2022). The encoding process of the RED structure is used to extract the critical information (C_{t}) of the input (Xiang et al., 2020). In the decoding process, forecast information of the same category as the encoding process is another input to the neural network of the latter moment apart from the C_{t} and the output of the hidden layer in the previous moment.
LSTMRED neural network coupled dual attention mechanism
The LSTMRED model based on the dual attention mechanism (DALSTMRED) is established by adding the feature–temporal dual attention mechanism to the LSTMRED model, which can enable the model to highlight effective information in different types and moments of the input. The DA mechanism (Fig. 4) consists of the feature attention module, the temporal attention module, and the forecast input processing module.
The feature attention module can adaptively highlight the critical input types by assigning feature weights to the input of the encoding process (Qin et al., 2017). The temporal attention module can highlight the information (hidden layer states) extracted at a critical time step by assigning temporal weights to the information extracted at all time steps in the encoding process (Ding et al., 2020). Meanwhile, the feature weights are averaged based on temporal weights and applied to the forecast information inputted in the decoding process, thus highlighting the key forecast input variables. The principle of the DALSTMRED model can be found in Cui et al. (2023).
Model input and hyperparameter selection
In this study, the input types for the encoding process include effective precipitation in the three subbasins, flow discharge in the mainstream and tributaries (i.e., five hydrological stations in Table 1), and previously observed inflow to the TGR for a total of nine data types. In order to make the model learn comprehensive information, input variables with the last 11 time steps (66 h) are inputted to the encoding process according to the flow propagation times from the hydrological stations to the TGR dam site in Table 1.
The forecasted effective precipitation, the forecasted flow of the mainstream and tributaries, and the forecasted flow for the previous forecast horizon are used as inputs to the decoding process. Among them, the forecasted effective precipitation is calculated by the observed precipitation during the training period and by the forecast precipitation during the validation period. The forecasted flow of the upstream mainstream and tributaries is replaced by the observed flow during the training and validation periods. The TGR's observed inflow for the 6–168 h forecast horizons is the target output, which is needed for practical forecasting.
The input and output data are handled by the normalization method. Moreover, the trialanderror method is used for debugging the network hyperparameters. The model is trained by the Adam method (Kingma and Ba, 2014).
3.2.3 Model parameter uncertainty
Different parameter optimization objective functions may focus on different forecast results (Zhong et al., 2020). For example, the mean absolute error function focuses on the magnitude of the error mean. The mean square error function is usually sensitive to outliers with large errors, which may make the model parameters with different objective functions produce forecast results with different focus points (Duan et al., 2007). Therefore, it is necessary to consider the uncertainty of the model parameters. Neural network models usually train model parameters (such as model internal weights and bias values) based on loss functions, so this paper uses two common loss functions, namely the mean absolute error and the mean square error, to train the model as a way of considering the uncertainty of model parameters.
3.3 Evaluation metrics
3.3.1 Deterministic forecast evaluation metrics
The accuracy of deterministic forecast is evaluated by three metrics: the Nash–Sutcliffe efficiency (NSE; Nash and Sutcliffe, 1970), the mean absolute error (MAE), and the relative error in total runoff (RE).
where N is the sample number and $\stackrel{\mathrm{\u203e}}{{Q}_{\mathrm{o}}}$ and $\stackrel{\mathrm{\u203e}}{{Q}_{\mathrm{f}}}$ are the average of the observed and forecasted flow, respectively.
The Nash–Sutcliffe efficiency (NSE) is one of the most important metrics in flood forecasting, reflecting the degree of fit between forecasted and observed flows (Nash and Sutcliffe, 1970). Since the accurate runoff volume predictions are more important than peak discharge for the operation of a large reservoir (Cui et al., 2023), the relative error for total runoff volume (RE) is also chosen. The mean absolute error (MAE) can reflect the forecast error for each moment and is compared with the continuous ranked probability score (CRPS) of the ensemble forecast (Raftery et al., 2005), which can reflect the effectiveness of the ensemble forecast correction.
3.3.2 Probabilistic forecast evaluation metrics
Forecast interval evaluation metrics
The forecast interval is evaluated by three metrics: the average coverage rate (CR), the average interval width (IW), and the percentage of observations bracketed by the unit confidence interval (PUCI; Li et al., 2011).
where n_{c} denotes the number of values of Q_{o} located in the forecast interval. Q_{u} and Q_{l} are the upper and lower boundaries of the forecast interval, respectively, with a 90 % confidence level.
The average coverage rate (CR) is one of the most necessary metrics for evaluating the reliability of forecast intervals (Li et al., 2021). The average interval width (IW) is the metric that directly reflects the level of forecast uncertainty, which is an important metric for evaluating the effectiveness of the proposed methods. The percentage of observations bracketed by the unit confidence interval (PUCI) is a comprehensive metric for evaluating the performance of forecast intervals in quantifying uncertainty (Xiong et al., 2009). Therefore, the CR, RB, and PUCI metrics are selected to evaluate the forecast intervals performance.
Probabilistic forecast evaluation metrics
The probabilistic forecast is evaluated by three metrics: the α_index (Renard et al., 2010), the ignorance score (IGS) (Gneiting et al., 2005), and the continuous ranked probability score (CRPS) (Raftery et al., 2005).
where, q_{e,i} and q_{th,i} denote the observed and theoretical p values of Q_{o,i}, respectively. The p value denotes the posterior probability distribution value of the Q_{o,i} (Renard et al., 2010). I(⋅) denotes the indicator function. r denotes the flow variable.
The α_index metric can quantitatively assess the reliability of ensemble probabilistic forecasts from the perspective of distribution function values (Renard et al., 2010). The closer the α_index value is to 1, the more reliable the probabilistic forecast. The IGS and CRPS metrics can reflect the reliability and sharpness of the probabilistic forecast. The former can quantify the forecast probability density at the observation, while the latter can indicate the fit performance between the posterior probabilistic distribution and the actual probabilistic distribution of Q_{o} (Raftery et al., 2005). Both CRPS and IGS are negative scores; i.e., the smaller the value, the better. The IGS imposes severe penalties for particularly poor probabilistic predictions and may be extremely sensitive to outliers and extreme events, yet it also lacks robustness (Raftery et al., 2005).
4.1 Deterministic forecast results of ensemble members
Since the study focuses on the differences in ensemble forecast performance between the HUPBMA and CHUPBMA methods, the overall forecast accuracy of members is analyzed (Fig. 5) and the differences in forecast accuracy between members are not explicitly analyzed. As shown in Fig. 5, using the observed values as input during the training period, high forecast accuracy can be acquired in different forecast horizons, with the NSE values exceeding 0.95, the MAE values being below 1400 m^{3} s^{−1}, and the absolute value of RE staying within 4 %.
After combining the forecasted precipitation during the validation period, the NSE values show a decreasing trend and the MAE and RE values show an increasing trend with the increase in the forecast horizon. Taking the NSE metrics of the 1–7 d forecast horizons as an example (Table 3), the average value of the NSE metric decreases from 0.97 to 0.89, which indicates that the forecast accuracy gradually decreases. Meanwhile, the range of evaluation metrics gradually increases with the increase in the forecast horizon. It can be seen from Table 3 that the difference between the maximum and minimum values of NSE indicators for the 1 d forecast horizon is only 0.01. In contrast, the difference for the 7 d forecast horizon is as high as 0.05, which indicates that the difference in forecast accuracy of members is also more significant and that the forecast uncertainty gradually increases. Overall, the NSE values of the forecast members in the 6–168 h forecast horizons are higher than 0.88, and the absolute values of the RE metrics are within 7 %. Hence, the forecast accuracy of members is relatively high and the forecast error is low, which can be used for flood ensemble forecasting.
4.2 Ensemble forecast results
4.2.1 Marginal distribution and copula function selection
It is necessary to first fit the marginal distributions of the observed flow and the forecasted flow of the 6–168 h forecast horizons. The Q_{o} and Q_{b} obey the same distribution. The RMSE criterion is used to select the marginal distribution type. In each forecast horizon, the RMSE values of the eight members are averaged to obtain the marginal distribution suitable for the forecasted flow intuitively. Meanwhile, according to Eq. (14), the threedimensional joint distribution of Q_{o}, Q_{b}, and Q_{f} needs to be constructed. The RMSE criterion is used to select the copula function. Similarly, the RMSE values for the eight members of each forecast horizon were averaged.
Figure 6a and b show the RMSE values generated by fitting the marginal distribution and copula function, respectively. It can be seen from Fig. 6a that the lognormal distribution has the lowest RMSE value among the five alternative marginal distributions and is chosen as the sequence marginal distribution type. As shown in Fig. 6b, the Student copula has the lowest RMSE value in the 6–168 h forecast horizons and is chosen to construct the threedimensional joint distribution function of Q_{o}, Q_{b}, and Q_{f}.
4.2.2 Sliding window length selection
Since there is no specific method or rule to calculate the sliding window length, this study adopts the CRPS metric as the objective function and the trialanderror method to select the sliding window length. The range of window lengths is [40,200].
To facilitate the selection of the sliding window lengths, Fig. 7 shows the average CRPS values of the HUPBMA and CHUPBMA methods for all forecast horizons with different window lengths. It can be seen from Fig. 7 that the HUPBMA and CHUPBMA methods all have the lowest CRPS values at the sliding window length of 80. Therefore, 80 is the optimal window length for the ensemble forecasting study.
4.2.3 Deterministic forecast results of ensemble forecast
The HUPBMA and CHUPBMA methods use expected values of ensemble forecasts as deterministic forecast results. In order to analyze the deterministic forecast performance of ensemble forecasts, one member with the best forecast accuracy is selected for comparative analysis based on the criteria of the relatively low RE and MAE values and relatively high NSE values, which are composed of the forecast rainfall from ECWMF, the DALSTMRED model, and the objective function with mean square error to optimize the parameters.
Figure 8a–c show the NSE, MAE, and RE metrics of three deterministic forecast results, respectively. It can be seen that the NSE metrics show a decreasing trend and that the MAE metrics show an increasing trend as the forecast horizon increases, indicating a gradual decrease in forecast accuracy.
As shown in Fig. 8a, the NSE metrics of three forecast results are at least 0.92 during the 6–168 h forecast horizons. The difference between the two is small, not more than 0.02. Among them, the CHUPBMA method has the best NSE metrics. However, the advantage value gradually decreases as the forecast horizon increases. The NSE metrics of the HUPBMA method are better than those of the selected forecast member in most forecast horizons. From Fig. 8b, the maximum and mean values of MAE are 1923 and 1513 m^{3} s^{−1} for the CHUPBMA method, 1999 and 1582 m^{3} s^{−1} for the HUPBMA method, and 2179 and 1719 m^{3} s^{−1} for the selected forecast member, respectively. The CHUPBMA method has the best MAE metric, with the maximum and average reduction of 10.69 % and 4.36 % relative to the HUPBMA method, respectively. Meanwhile, the MAE values of two ensemble forecasting methods are lower than those of the selected forecast members. As shown in Fig. 8c, the maximum and mean of the RE metric are 0.02 % and −0.27 % for the CHUPBMA method, 2.97 % and 1.36 % for the HUPBMA method, and 1.20 % and 0.34 % for the selected forecast member, respectively. The CHUPBMA method can reduce the RE metrics of the selected forecast member in most forecast horizons, while the HUPBMA method has no advantage in the RE metric. Overall, ensemble forecast methods can somewhat improve the selected best member forecast accuracy. The CHUPBMA method's expectation forecast has the best accuracy, which indicates that the copulabased CHUPBMA method can improve the performance of the HUPBMA method in correcting errors.
To further analyze the accuracy of ensemble forecast methods, seven floods with peaks exceeding 50 000 m^{3} s^{−1} during the 24 and 168 h forecast horizons in the validation period (2017–2021) are selected for analyzing. The average relative error metric of peak (PRE) (Cui et al., 2022) is added to analyze the forecasting performance for flood peaks. Table 4 demonstrates the forecast evaluation metrics for the seven flood events. With the increase in the forecast horizon, the NSE metric shows a decreasing trend and the RE and MAE metrics show an increasing trend, indicating a gradual decrease in forecasting performance. It can be seen from Table 4 that (1) in the 24 h forecast horizon, the forecast accuracy of the two methods is similar for most flood events and quality metrics, and (2) in the 168 h forecast horizon, the forecast accuracy of the CHUPBMA method is better than that of the HUPBMA method in most flood events and quality metrics. The average values of NSE, RE, MAE, and PRE are 0.88, −0.63 %, 2980 m^{3} s^{−1}, and −4.55 % for CHUPBMA and 0.84, −2.38 %, 3188 m^{3} s^{−1}, and −6.46 % for HUPBMA, respectively, indicating an overall improvement of CHUPBMA over HUPBMA in forecasting accuracy.
To further demonstrate the accuracy of flood process forecasting and applicability of the two methods, four relatively large flood events are selected for comparative analysis for the 168 h forecast horizon (Fig. 9). In the 20180703 flood event (Fig. 9a), the two methods have similar forecast performance, underestimating the peak and rising water processes and overestimating the receding water process. The CHUPBMA method has relatively low PRE and total runoff error values. The HUPBMA method accurately forecasts the peak present time. In the 20200815 flood event (Fig. 9b), two methods underestimate the flood peak and overestimate the receding water process. The HUPBMA method has a larger flood peak error, and the CHUPBMA method has a better fitting performance. In the 20200820 flood event (Fig. 9c), two methods overestimate the observed flood process, with the CHUPBMA method having the lower peak and total runoff error than the HUPBMA method. In the 20210907 flood event (Fig. 9d), the CHUPBMA and HUPBMA methods underestimate the flood peak and delay the forecast peak occurring time. The former has smaller peak and water volume error.
4.2.4 Probabilistic forecast results of ensemble forecast
Evaluation of forecast interval
Figure 10a–c show the CR, IW, and PUCI metrics for the forecast interval, respectively, with a 90 % confidence level. Figure 10a shows that during the 6–168 h forecasting period, the maximum, minimum, and mean of the CR metric for the forecast interval of the CHUPBMA method are 0.92, 0.88, and 0.89 and 0.93, 0.88, and 0.91 for the HUPBMA method, respectively. The CR values of the two methods' forecast intervals are close to or exceed the 90 % confidence level, indicating that the forecast intervals are reliable.
It is obvious from Fig. 10b that the forecast interval width tends to increase with the increase in the forecast horizon, indicating that the forecast uncertainty gradually increases. The maximum, minimum, and mean of the IW metrics for the forecast interval of the CHUPBMA method are 7820, 3337, and 6257 m^{3} s^{−1} and 8888, 4662, and 7345 m^{3} s^{−1} for the HUPBMA method, respectively. The forecast intervals of the CHUPBMA method are significantly narrower than those of the HUPBMA method, with the maximum and average reduction of 28.42 % and 15.32 %, respectively, which indicates that the CHUPBMA method can effectively reduce the interval width and forecast uncertainty.
From Fig. 10c, the maximum, minimum, and mean of the PUCI metric for the forecast interval of the CHUPBMA method are 6.24, 2.65, and 3.48 and 4.55, 2.35, and 2.95 for the HUPBMA method, respectively. The CHUPBMA method has the higher PUCI values, indicating that the forecast interval of the CHUPBMA method reflects the forecast uncertainty relatively well.
In summary, the CHUPBMA outperforms the HUPBMA method under the premise that the CR values are close to or exceed the 90 % confidence level. The CHUPBMA method has narrower forecast intervals and better performance in quantifying forecast uncertainty. Although the HUPBMA method has a higher CR value, its IW value is larger and the PUCI value is smaller for the long forecast horizon, indicating that the forecast interval is too conservative to reasonably estimate the uncertainty range.
In order to visually analyze the ability of the CHUPBMA method to quantify forecast uncertainty, the forecast intervals with a 90 % confidence level of the HUPBMA and CHUPBMA methods for 168 h forecast horizon in the 2020 flood season are compared. It can be seen from Fig. 11 that the forecast intervals of the two ensemble forecasts can cover most of the observed flows and always cover the annual maximum flood peak, indicating that the forecast intervals are reliable. Meanwhile, the forecast intervals of the CHUPBMA method are remarkably narrower than those of the HUPBMA method, indicating that the forecast uncertainty of the former is relatively low, which can provide more reasonable risk information for TGR flood control decisions.
Evaluation of overall probabilistic forecast
Figure 12 shows the probability integral transform (PIT) histograms of the HUPBMA and CHUPBMA methods for the 24, 96, and 168 h forecast horizons. It can be significantly observed that the PIT plots of the HUPBMA method show an upsidedownUshaped distribution, which indicates that the forecast distribution is overdispersed and overestimates the forecast uncertainty, explaining the phenomenon of wide intervals. Meanwhile, the PIT plot of CHUPBMA is more uniformly distributed than that of the HUPBMA method, which can obtain a better calibration performance.
Meanwhile, Fig. 13a–c show the evaluation metrics of α_index, IGS, and CRPS metrics for the two ensemble probabilistic forecasts, respectively. It can be seen from Fig. 13a that the α_index metrics of the CHUPBMA methodbased probabilistic forecasts are significantly higher than those of the HUPBMA method in the 6–168 h forecast horizons. Among them, the maximum, minimum, and mean of the α_index metric for CHUPBMA methodbased probabilistic forecasts are 0.98, 0.93, and 0.97 and 0.95, 0.88, and 0.93 for the HUPBMA method, respectively. The α_index metric of the CHUPBMA methodbased probabilistic forecast is closer to the perfect value of 1, indicating that its probability forecast is the more reliable one.
It can be seen from Fig. 13b that the IGS values of the two methods gradually increase with the increase in the forecast horizon, indicating that the forecast uncertainty gradually increases. The maximum, minimum, and mean of the IGS metric for the CHUPBMA method are 9.10, 8.33, and 8.87 and 9.16, 8.59, and 8.98 for the HUPBMA method, respectively. It can be seen that the IGS metrics of the CHUPBMA method are consistently lower than those of the HUPBMA method, which indicates that the CHUPBMA method has better ensemble forecast performance relative to the HUPBMA method by assigning a higher probability density around the actual values.
As shown in Fig. 13c, the CRPS values of the two methods are lower than the MAE values of the selected member (Fig. 8b), indicating that the probabilistic forecasts are effective and can fit the probabilistic distribution of the target values well. Meanwhile, during the 6–168 h forecast horizons, the maximum, minimum, and mean of the CRPS metric for the CHUPBMA method are 1356, 625, and 1074 m^{3} s^{−1} and 1425, 662, and 1188 m^{3} s^{−1} for the HUPBMA method, respectively. It can be seen that the CRPS values of the CHUPBMA method are lower than those of the HUPBMA method, with a maximum and average reduction of 17.86 % and 9.71 %, respectively. It can be seen that the CHUPBMA method can fit the posterior distribution of the actual values better and effectively improve the probabilistic forecast performance of the HUPBMA method.
From Table 5, it can be seen that the t statistic values at the 0.05 significance level for all three metrics are higher than the threshold value, indicating that there is a significant difference between the scores of the CHUPBMA and HUPBMA methods; i.e., the CHUPBMA method is significantly better than the HUPBMA method for ensemble forecasting metrics and performance.
In summary, the CHUPBMA method considers the influence of the initial state on the ensemble forecast, bypasses the normal quantile transformation of the HUPBMA method, derives the posterior distribution of the target flow without restrictions, and improves the probabilistic forecast performance of the HUPBMA method. Therefore, the ensemble forecasting by CHUPBMA method can provide more reasonable and reliable risk information for the TGR.
In this study, we propose a novel CHUPBMA method, which can not only consider the influence of the initial state on the ensemble forecast, but also avoid the assumption of normal distribution in the HUPBMA method and derive the posterior distribution function more accurately. An ensemble forecast scheme that consists of two forecasted precipitation, two hydrological models, and two objective functions of parameter calibration was established. The ensemble forecasting performance of the HUPBMA and CHUPBMA methods was discussed from the perspective of deterministic and probabilistic forecasts. The flood ensemble forecasting experiment with 6–168 h forecast horizons was conducted in the Xiangjiaba–TGR dam site interval basin. The main conclusions were summarized as follows:

The two ensemble forecasting methods can improve the members' forecast accuracy. The proposed CHUPBMA method performs better than the HUPBMA method, and the MAE metric is reduced by a maximum of 10.69 % within 6–168 h forecast horizons.

The coverage rate of the forecast interval of the CHUPBMA method is close to or exceeds the specified 90 % confidence level, and the forecast interval is significantly narrower than that of the HUPBMA method, with a maximum reduction of 28.42 % during 6–168 h forecast horizons, which can effectively reduce the forecast uncertainty.

The probabilistic forecast of the CHUPBMA method has better reliability and sharpness, and its CRPS values are reduced by a maximum of 17.86 % relative to the HUPBMA method, which indicates that the CHUPBMA method can fit the posterior distribution of the actual values better.

The CHUPBMA method can derive the posterior distribution of the target flow without restriction under the condition that the initial constraint is considered, which brings the BMA method closer to perfection. Therefore, it is more suitable for flood forecasting in the 6–168 h forecast horizons and provides reliable risk information for reservoir scheduling decisionmaking.
The present study focuses on flood ensemble forecasting for the TGR 6–168 h forecast horizons. Future studies can explore the ensemble forecasting performance of the proposed CHUPBMA method for longer forecast horizons and further validate the effectiveness of the proposed method in global basins. Meanwhile, the vine copula, which facilitates multivariate joint distribution modeling, can be considered for constructing the CHUPBMA method and exploring its advantages and effectiveness in ensemble flood forecasting. The effective way or method of guiding reservoir scheduling based on ensemble forecasts can also be further explored so that ensemble forecasts can be widely used in decisionmaking.
We set the number of neural network layers and neurons to be the same for the encoding and decoding processes, with trialanderror preferences for the number of hidden layers, neurons, and dropout. Meanwhile, the batch size, epoch, and learning rate are set to 100, 500, and 0.001, respectively. The different model parameters are shown in Table A1.
The code used to support the findings of this study is available from the corresponding author upon request.
The data generated and/or analyzed during the current study are not publicly available for legal/ethical reasons but are available from the corresponding author on reasonable request.
ZC and SG conceived and designed the experiments; ZC performed the experiments and wrote the paper draft; ZC, SG, CYX, HC, DL, and YZ reviewed and edited the paper.
The contact author has declared that none of the authors has any competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.
This study was financially supported by the National Key Research and Development Program of China (grant nos. 2022YFC3202801 and 2021YFC3200305) and the China Three Gorges Corporation (grant no. 0799254).
This research has been supported by the National Key Research and Development Program of China (grant nos. 2022YFC3202801 and 2021YFC3200305) and the China Three Gorges Corporation (grant no. 0799254).
This paper was edited by Lelys Bravo de Guenni and reviewed by two anonymous referees.
Baran, S., Hemri, S., and El Ayari, M.: Statistical postprocessing of water level forecasts using Bayesian model averaging with doublytruncated normal components, Water Resour. Res., 55, 3997–4013, https://doi.org/10.1029/2018WR024028, 2019.
Biondi, D. and Todini, E.: Comparing hydrological postprocessors including ensemble predictions into full predictive probability distribution of streamflow, Water Resour. Res., 54, 9860–9882, https://doi.org/10.1029/2017WR022432, 2018.
Chen, L. and Guo, S.: Copulas and its application in hydrology and water resources, Springer Water, Springer Singapore, https://doi.org/10.1007/9789811305740, 2019.
Cho, K. and Kim, Y.: Improving streamflow prediction in the WRFHydro model with LSTM networks, J. Hydrol., 605, 127297, https://doi.org/10.1016/j.jhydrol.2021.127297, 2022.
Cloke, H. L. and Pappenberger, F.: Ensemble flood forecasting: A review, J. Hydrol., 375, 613–626, https://doi.org/10.1016/j.jhydrol.2009.06.005, 2009.
Cui, Z., Zhou, Y., Guo, S., Wang, J., and Xu, C. Y.: Effective improvement of multistepahead flood forecasting accuracy through encoderdecoder with an exogenous input structure, J. Hydrol., 609, 127764, https://doi.org/10.1016/j.jhydrol.2022.127764, 2022.
Cui, Z., Guo, S., Zhou, Y., and Wang, J.: Exploration of dualattention mechanismbased deep learning for multistepahead flood probabilistic forecasting, J. Hydrol., 622, 129688, https://doi.org/10.1016/j.jhydrol.2023.129688, 2023.
Darbandsari, P. and Coulibaly, P.: Introducing entropybased Bayesian model averaging for streamflow forecast, J. Hydrol., 591, 125577, https://doi.org/10.1016/j.jhydrol.2020.125577, 2020.
Darbandsari, P. and Coulibaly, P.: HUPBMA: An Integration of Hydrologic Uncertainty Processor and Bayesian Model Averaging for Streamflow Forecasting, Water Resour. Res., 57, e2020WR029433, https://doi.org/10.1029/2020WR029433, 2021.
Ding, Y., Zhu, Y., Feng, J., Zhang, P., and Cheng, Z.: Interpretable spatialtemporal attention LSTM model for flood forecasting, Neurocomputing, 403, 348–359, https://doi.org/10.1016/j.neucom.2020.04.110, 2020.
Duan, Q., Ajami, N. K., Gao, X., and Sorooshian, S.: Multimodel ensemble hydrologic prediction using Bayesian model averaging, Adv. Water Resour., 30, 1371–1386, https://doi.org/10.1016/j.advwatres.2006.11.014, 2007.
Fedora, M. A. and Beschta, R. L.: Storm runoff simulation using an antecedent precipitation index (API) model, J. Hydrol., 112, 121–133, https://doi.org/10.1016/00221694(89)901844, 1989.
Ferretti, R., Lombardi, A., Tomassetti, B., Sangelantoni, L., Colaiuda, V., Mazzarella, V., Maiello, I., Verdecchia, M., and Redaelli, G.: A meteorological–hydrological regional ensemble forecast for an earlywarning system over small Apennine catchments in Central Italy, Hydrol. Earth Syst. Sci., 24, 3135–3156, https://doi.org/10.5194/hess2431352020, 2020.
Gelfan, A., Moreydo, V., Motovilov, Y., and Solomatine, D. P.: Longterm ensemble forecast of snowmelt inflow into the Cheboksary Reservoir under two different weather scenarios, Hydrol. Earth Syst. Sci., 22, 2073–2089, https://doi.org/10.5194/hess2220732018, 2018.
Gneiting, T., Raftery, A. E., Westveld, A. H., and Goldman, T.: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation, Mon. Weather Rev., 133, 1098–1118, https://doi.org/10.1175/MWR2904.1, 2005.
Guo, Y., Yu, X., Xu, Y.P., Chen, H., Gu, H., and Xie, J.: AIbased techniques for multistep streamflow forecasts: application for multiobjective reservoir operation optimization and performance assessment, Hydrol. Earth Syst. Sci., 25, 5951–5979, https://doi.org/10.5194/hess2559512021, 2021.
Han, S. and Coulibaly, P.: Bayesian flood forecasting methods: A review, J. Hydrol., 551, 340–351, https://doi.org/10.1016/j.jhydrol.2017.06.004, 2017.
Hauswirth, S. M., Bierkens, M. F. P., Beijk, V., and Wanders, N.: The suitability of a seasonal ensemble hybrid framework including datadriven approaches for hydrological forecasting, Hydrol. Earth Syst. Sci., 27, 501–517, https://doi.org/10.5194/hess275012023, 2023.
Hemri, S., Fundel, M., and Zappa, M.: Simultaneous calibration of ensemble river flow predictions over an entire range of lead times, Water Resour. Res., 49, 6744–6755, https://doi.org/10.1002/wrcr.20542, 2013.
Hochreiter, S. and Schmidhuber, J.: Long shortterm memory, Neural Comput., 9, 1735–1780, https://doi.org/10.1162/neco.1997.9.8.1735, 1997.
Huang, H., Liang, Z., Li, B., Wang, D., Hu, Y., and Li, Y.: Combination of multiple datadriven models for longterm monthly runoff predictions based on Bayesian model averaging, Water Resour. Manag., 33, 3321–3338, https://doi.org/10.1007/s11269019023059, 2019.
Kao, I. F., Zhou, Y., Chang, L. C., and Chang, F. J.: Exploring a Long ShortTerm Memory based EncoderDecoder framework for multistepahead flood forecasting, J. Hydrol., 583, 124631, https://doi.org/10.1016/j.jhydrol.2020.124631, 2020.
Kingma, D. P. and Ba, J.: Adam: A method for stochastic optimization, arXiv [preprint], https://doi.org/10.48550/arXiv.1412.6980, 2014.
Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.: Rainfall–runoff modelling using Long ShortTerm Memory (LSTM) networks, Hydrol. Earth Syst. Sci., 22, 6005–6022, https://doi.org/10.5194/hess2260052018, 2018.
Krzysztofowicz, R.: Bayesian theory of probabilistic forecasting via deterministic hydrologic model, Water Resour. Res., 35, 2739–2750, https://doi.org/10.1029/1999WR900099, 1999.
Krzysztofowicz, R. and Kelly, K. S.: Hydrologic uncertainty processor for probabilistic river stage forecasting, Water Resour. Res., 36, 3265–3277, https://doi.org/10.1029/2000WR900108, 2000.
Li, D., Marshall, L., Liang, Z., Sharma, A., and Zhou, Y.: Bayesian LSTM with stochastic variational inference for estimating model uncertainty in processbased hydrological models, Water Resour. Res., 57, e2021WR029772, https://doi.org/10.1029/2021WR029772, 2021.
Li, L., Xu, C. Y., Xia, J., Engeland, K., and Reggiani, P.: Uncertainty estimates by Bayesian method with likelihood of AR (1) plus Normal model and AR (1) plus MultiNormal model in different timescales hydrological models, J. Hydrol., 406, 54–65, https://doi.org/10.1016/j.jhydrol.2011.05.052, 2011.
Li, W., Duan, Q., Miao, C., Ye, A., Gong, W., and Di, Z.: A review on statistical postprocessing methods for hydrometeorological ensemble forecasting, WIRes Water, 4, e1246, https://doi.org/10.1002/wat2.1246, 2017.
Liu, J., Yuan, X., Zeng, J., Jiao, Y., Li, Y., Zhong, L., and Yao, L.: Ensemble streamflow forecasting over a cascade reservoir catchment with integrated hydrometeorological modeling and machine learning, Hydrol. Earth Syst. Sci., 26, 265–278, https://doi.org/10.5194/hess262652022, 2022.
Liu, Z., Guo, S., Zhang, H., Liu, D., and Yang, G.: Comparative study of three updating procedures for realtime flood forecasting, Water Resour. Manag., 30, 2111–2126, https://doi.org/10.1007/s1126901612750, 2016.
Liu, Z., Guo, S., Xiong, L., and Xu, C. Y.: Hydrological uncertainty processor based on a copula function, Hydrolog. Sci. J., 63, 74–86, https://doi.org/10.1080/02626667.2017.1410278, 2018.
Madadgar, S. and Moradkhani, H.: Improved Bayesian multimodelling: Integration of copulas and Bayesian model averaging, Water Resour. Res., 50, 9586–9603, https://doi.org/10.1002/2014WR015965, 2014.
Matthews, G., Barnard, C., Cloke, H., Dance, S. L., Jurlina, T., Mazzetti, C., and Prudhomme, C.: Evaluating the impact of postprocessing mediumrange ensemble streamflow forecasts from the European Flood Awareness System, Hydrol. Earth Syst. Sci., 26, 2939–2968, https://doi.org/10.5194/hess2629392022, 2022.
Nash, J. E. and Sutcliffe, J. V.: River flow forecasting through conceptual models: part I – A discussion of principles, J. Hydrol., 10, 282–290, https://doi.org/10.1016/00221694(70)902556, 1970.
Parrish, M. A., Moradkhani, H., and DeChant, C. M.: Toward reduction of model uncertainty: Integration of Bayesian model averaging and data assimilation, Water Resour. Res., 48, W03519, https://doi.org/10.1029/2011WR011116, 2012.
Qin, Y., Song, D., Chen, H., Cheng, W., Jiang, G., and Cottrell, G.: A dualstage attentionbased recurrent neural network for time series prediction, arXiv [preprint], https://doi.org/10.48550/arXiv.1704.02971, 2017.
Raftery, A. E., Gneiting, T., Balabdaoui, F., and Polakowski, M.: Using Bayesian model averaging to calibrate forecast ensembles, Mon. Weather Rev., 133, 1155–1174, https://doi.org/10.1175/MWR2906.1, 2005.
Renard, B., Kavetski, D., Kuczera, G., Thyer, M., and Franks, S. W.: Understanding predictive uncertainty in hydrologic modeling: The challenge of identifying input and structural errors, Water Resour. Res., 46, W05521, https://doi.org/10.1029/2009WR008328, 2010.
Saleh, F., Ramaswamy, V., Georgas, N., Blumberg, A. F., and Pullen, J.: A retrospective streamflow ensemble forecast for an extreme hydrologic event: a case study of Hurricane Irene and on the Hudson River basin, Hydrol. Earth Syst. Sci., 20, 2649–2667, https://doi.org/10.5194/hess2026492016, 2016.
Shu, Z., Zhang, J., Wang, L., Jin, J., Cui, N., Wang, G., Sun, Z., Liu, Y., Bao, Z., and Liu, C.: Evaluation of the impact of multisource uncertainties on meteorological and hydrological ensemble forecasting, Engineering, 24, 212–228, https://doi.org/10.1016/j.eng.2022.06.007, 2022.
Sklar, M.: Fonctions de repartition an dimensions et leurs marges, Publ. Inst. Statist. Univ. Paris, 8, 229–231, 1959.
Sloughter, J. M., Raftery, A. E., Gneiting, T., and Fraley, C.: Probabilistic quantitative precipitation forecasting using Bayesian model averaging, Mon. Weather Rev., 135, 3209–3220, https://doi.org/10.1175/MWR3441.1, 2007.
Sloughter, J. M., Gneiting, T., and Raftery, A. E.: Probabilistic wind speed forecasting using ensembles and Bayesian model averaging, J. Am. Stat. Assoc., 105, 25–35, https://doi.org/10.1198/jasa.2009.ap08615, 2010.
Todini, E.: A model conditional processor to assess predictive uncertainty in flood forecasting, Int. J. River Basin Ma., 6, 123–137, https://doi.org/10.1080/15715124.2008.9635342, 2008.
Vegad, U. and Mishra, V.: Ensemble streamflow prediction considering the influence of reservoirs in Narmada River Basin, India, Hydrol. Earth Syst. Sci., 26, 6361–6378, https://doi.org/10.5194/hess2663612022, 2022.
Xiang, Z., Yan, J., and Demir, I.: A rainfallrunoff model with LSTMbased sequencetosequence learning, Water Resour. Res., 56, e2019WR025326, https://doi.org/10.1029/2019WR025326, 2020.
Xiong, L., Wan, M. I. N., Wei, X., O'connor, K. M.: Indices for assessing the prediction bounds of hydrological models and application by generalised likelihood uncertainty estimation, Hydrolog. Sci. J., 54, 852–871, https://doi.org/10.1623/hysj.54.5.852, 2009.
Xu, C., Zhong, P. A., Zhu, F., Yang, L., Wang, S., and Wang, Y.: Realtime error correction for flood forecasting based on machine learning ensemble method and its uncertainty assessment, Stoch. Environ. Res. Risk A., 37, 1557–1577, https://doi.org/10.1007/s00477022023366, 2023.
Yang, T., Sun, F., Gentine, P., Liu, W., Wang, H., Yin, J., Du, M., and Liu, C.: Evaluation and machine learning improvement of global hydrological modelbased flood simulations, Environ. Res. Lett., 14, 114027, https://doi.org/10.1088/17489326/ab4d5e, 2019.
Zhang, B., Wang, S., Qing, Y., Zhu, J., Wang, D., and Liu, J.: A vine copulabased polynomial chaos framework for improving multimodel hydroclimatic projections at a multidecadal convectionpermitting scale, Water Resour. Res., 58, e2022WR031954, https://doi.org/10.1029/2022WR031954, 2022.
Zhong, Y., Guo, S., Ba, H., Xiong, F., Chang, F. J., and Lin, K.: Evaluation of the BMA probabilistic inflow forecasts using TIGGE numeric precipitation predictions based on artificial neural network, Hydrol. Res., 49, 1417–1433, https://doi.org/10.2166/nh.2018.177, 2018a.
Zhong, Y., Guo, S., Liu, Z., Wang, Y., and Yin, J.: Quantifying differences between reservoir inflows and dam site floods using frequency and risk analysis methods, Stoch. Env. Res. Risk A., 32, 419–433, https://doi.org/10.1007/s0047701714014, 2018b.
Zhong, Y., Guo, S., Xiong, F., Liu, D., Ba, H., and Wu, X.: Probabilistic forecasting based on ensemble forecasts and EMOS method for TGR inflow, Front. Earth Sci., 14, 188–200, https://doi.org/10.1007/s1170701907739, 2020.
Zhou, Y., Guo, S., and Chang, F. J.: Explore an evolutionary recurrent ANFIS for modelling multistepahead flood forecasts, J. Hydrol., 570, 343–355, https://doi.org/10.1016/j.jhydrol.2018.12.040, 2019.
Zhou, Y., Cui, Z., Lin, K., Sheng, S., Chen, H., Guo, S., and Xu, C. Y.: Shortterm flood probability density forecasting using a conceptual hydrological model with machine learning techniques, J. Hydrol., 604, 127255, https://doi.org/10.1016/j.jhydrol.2021.127255, 2022.
 Abstract
 Introduction
 Case study and materials
 Methods
 Result evaluation
 Conclusion and prospects
 Appendix A: The model parameters for ensemble membership
 Code availability
 Data availability
 Author contributions
 Competing interests
 Disclaimer
 Acknowledgements
 Financial support
 Review statement
 References
 Abstract
 Introduction
 Case study and materials
 Methods
 Result evaluation
 Conclusion and prospects
 Appendix A: The model parameters for ensemble membership
 Code availability
 Data availability
 Author contributions
 Competing interests
 Disclaimer
 Acknowledgements
 Financial support
 Review statement
 References