Multistep-ahead daily inflow forecasting using the ERA-Interim reanalysis data set based on gradient-boosting regression trees

Inflow forecasting plays an essential role in reservoir management and operation. The impacts of climate change and human activities have made accurate inflow prediction increasingly difficult, especially for longer lead times. In this study, a new hybrid inflow forecast framework – using the ERA-Interim reanalysis data set as input and adopting gradient-boosting regression trees (GBRT) and the maximal information coefficient (MIC) – is developed for multistep-ahead daily inflow forecasting. Firstly, the ERAInterim reanalysis data set provides more information for the framework, allowing it to discover inflow for longer lead times. Secondly, MIC can identify an effective feature subset from massive features that significantly affects inflow; therefore, the framework can reduce computational burden, distinguish key attributes from unimportant ones and provide a concise understanding of inflow. Lastly, GBRT is a prediction model in the form of an ensemble of decision trees, and it has a strong ability to more fully capture nonlinear relationships between input and output at longer lead times. The Xiaowan hydropower station, located in Yunnan Province, China, was selected as the study area. Six evaluation criteria, namely the mean absolute error (MAE), the root-mean-squared error (RMSE), the Pearson correlation coefficient (CORR), Kling–Gupta efficiency (KGE) scores, the percent bias in the flow duration curve high-segment volume (BHV) and the index of agreement (IA) are used to evaluate the established models utilizing historical daily inflow data (1 January 2017–31 December 2018). The performance of the presented framework is compared to that of artificial neural network (ANN), support vector regression (SVR) and multiple linear regression (MLR) models. The results indicate that reanalysis data enhance the accuracy of inflow forecasting for all of the lead times studied (1–10 d), and the method developed generally performs better than other models, especially for extreme values and longer lead times (4– 10 d).


Introduction
Reliable and accurate inflow forecasting 1-10 d in advance is significant with respect to the efficient utilization of water resources, reservoir operation and flood control, especially in areas with concentrated rainfall. Rainfall in southern China is usually concentrated for several days at a time due to strong convective weather, such as typhoons. Low accuracy inflow predictions can consequently mean that power stations are unable to devise reasonable power generation plans 7-10 d ahead of disaster events, and this subsequently leads to unnecessary water abandonment and even substantial economic losses. Figure 1 shows the "losses of electric quantity due to discarded water" (LEQDW) in Yunnan and Sichuan provinces, China, from 2011(Sohu, 2017Jiang, 2018). The total LEQDW in Yunnan and Sichuan provinces increased from 1.5 × 10 9 to 45.6 × 10 9 kWh during the period from 2011 to 2016, with an average annual growth rate of 98.0 %. In recent years, due to the increasing number of hydropower stations and improved hydropower capacity, the problem of discarding water due to inaccurate inflow forecasting is becoming increasingly serious and has had a negative impact on the development of hydropower in China.
S. Liao et al.: Daily inflow forecasting using ERA-Interim reanalysis The main challenge with respect to inflow forecasting at present, which is caused by climate change and human activities, is low accuracy, especially for longer lead times (Badrzadeh et al., 2013;El-Shafie et al., 2007). Meanwhile, streamflow variation, which also stems from climate change and anthropogenic activities, means that inflow forecasting models often need to be rebuilt, and the model parameters need to be recalibrated according to the actual inflow and meteorological data within 1 or 2 years.
Statistical methods are usually based on historical inflow records and mainly include the autoregressive model, the autoregressive moving average (ARMA) model and the autoregressive integrated moving average (ARIMA) model (Lin et al., 2006). Statistical methods assume that the inflow series is stationary and the relationship between input and output is simple. However, real inflow series are complex, nonlinear and chaotic (Dhanya and Kumar, 2011), making it difficult to obtain high-accuracy predictions using statistical models.
Physical methods, which have clear mechanisms, are implemented using theories of inflow generation and confluence. These methods can reflect the characteristics of the catchment but are very strict with initial conditions and input data (Bennett et al., 2016). Meanwhile, these methods, which are used for flood forecasting, have a shorter lead time and cannot be utilized to acquire long-term forecasting results due to input uncertainty.
Machine-learning methods, which have a strong ability to handle the nonlinear relationship between input and output and have recently shown excellent performance with respect to inflow prediction, are widely used for medium-and long-term inflow forecasts. In particular, several studies have shown that artificial neural networks (ANNs; Rasouli et al., 2012;Cheng et al., 2015;El-Shafie and Noureldin, 2011) and support vector regression (SVR; Tongal and Booij, 2018;Moazenzadeh et al., 2018) are two powerful models for inflow prediction. However, these models still have some inherent disadvantages. For example, ANNs are prone to being trapped by local minima, and both ANN and SVR suffer from over-fitting issues and reduced generalizing performance.
In recent years, gradient-boosting regression trees (GBRT) (Fienen et al., 2018;Friedman, 2001), a nonparametric machine-learning method based on a boosting strategy and decision trees, has been developed and has been used to study traffic (Zhan et al., 2019) and environmental (Wei et al., 2019) issues, where it has proven to alleviate the abovementioned problems. Thus, GBRT is selected for daily inflow prediction with lead times of 1-10 d in this paper. Compared with ANN and SVR, GBRT also has two other advantages. Firstly, GBRT can rank features according to their contribution to model scores, which is of great significance for reducing the complexity of the model. Secondly, GBRT is a white box model and can be easily interpreted. To the best of our knowledge, GBRT has not previously been used for daily inflow prediction with lead times of 1-10 d. For comparison purposes, ANN, SVR and multiple linear regression (MLR) have been employed to forecast daily inflow and are considered to be benchmark models in this study.
In addition to forecasting models, a vital reason why many approaches cannot attain a higher inflow prediction accuracy is that inflow is influenced by various factors (Yang et al., 2019), such as rainfall, temperature and humidity. Thus, it is very difficult to select appropriate features for inflow forecasting. The current feature selection methods for inflow forecasting mainly include two methodologies. The first method is the model-free method (Bowden, 2005;Snieder et al., 2020) which employs a measure of the correlation coefficient criterion (Badrzadeh et al., 2013;Siqueira et al., 2018;Pal et al., 2013) to characterize the correlation between a potential model input and the output variable. The second method is the model-based method (Snieder et al., 2020) which usually utilizes the model and search strategies to determine the optimal input subset. Common search strategies include forward selection and backward elimination (May et al., 2011). The correlation coefficient has a limited ability to capture nonlinear relationships and exhaustive searches tend to increase the computational burden. Thus, in order to accurately and quickly select effective inputs, the maximal information coefficient (MIC; Reshef et al., 2011) is used to select input factors for inflow forecasting.
MIC is a robust measure of the degree of correlation between two variables and has attracted a lot attention from academia (Zhao et al., 2013;Ge et al., 2016;Lyu et al., 2017;Sun et al., 2018). In addition, sufficient potential input factors are a prerequisite for obtaining reliable and accurate prediction results, and it is not enough to use only antecedent inflow series as model input. To enhance the accuracy of inflow forecasting and acquire a longer lead time, increasing amounts of meteorological forecasting data have been used for inflow forecasting (Lima et al., 2017;Fan et al., 2015;Rasouli et al., 2012). However, with extended lead times, the errors of forecast data continuously increase, as the variables are obtained using a numerical weather prediction (NWP) system are also affected by complex factors (Mehr et al., 2019). Moreover, due to the continuous improvement of forecasting systems, it is difficult to obtain consistent and long series of forecasting data (Verkade et al., 2013).
To mitigate these problems, reanalysis data generated by the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis Interim -ERA-Interim (Dee et al., 2011), which employs one of the best methods for the reanalysis of data describing atmospheric circulation and elements (Kishore et al., 2011), have been used as model input. The reanalysis data have less error than observed data and forecast data, which is the result of assimilating observed data with forecast data. ERA-Interim provides the results of a global climate reanalysis from 1979 to date, which have been produced using a fixed version of a NWP system. The fixed version ensures that there are no spurious trends caused by an evolving NWP system. Therefore, these meteorological reanalysis data satisfy the need for long sequences of consistent data and have been used for the prediction of wind speeds (Stopa and Cheung, 2014) and solar radiation (Ghimire et al., 2019;Linares-Rodríguez et al., 2011) in the past.
This study aims to provide a reliable inflow forecasting framework with longer lead times for daily inflow forecasting. The framework adopts the ERA-Interim reanalysis data as model input, which ensures that ample information is supplied to depict inflow. The MIC is used to select appropriate features to avoid over-fitting and the waste of computing resources caused by feature redundancy. GBRT, which is robust to outliers and has a strong nonlinear fitting ability, is used as the prediction model to improve the inflow forecasting accuracy of longer lead times. This paper is organized as follows: Sect. 2 describes a case study and the data collected; Sect. 3 introduces the theory and processes of the methods used, including the MIC and GBRT; Sect. 4 presents the results and a discussion of the data; and the conclusions are given in Sect. 5.

The study area and the data utilized
The Xiaowan hydropower station in the lower reaches of the Lancang River was chosen as the study site (Fig. 2). The Xiaowan hydropower station is the main controlling hydropower station in the Lancang River; therefore, it is very meaningful to adopt it as the case study. The Lancang River, which is also known as the Mekong River, is approximately 2000 km long and has a drainage area of 113 300 km 2 above the Xiaowan hydropower station. The river originates on the Tibetan Plateau and runs through China, Myanmar, Laos, Thailand, Cambodia and Vietnam. The major source of water flowing into the Lancang River in China comes from melting snow on the Tibetan Plateau (Mekong River Commission, 2005).
We collected the ERA-Interim reanalysis data set and the observed daily inflow and rainfall data for Xiaowan for 8 years (January 2011 to December 2018). Figure 3 depicts the daily inflow series. The data from January 2011 to December 2014 (1461 d, accounting for approximately 50 % of the whole data set), from January 2015 to December 2016 (731 d, accounting for approximately 25 % of the whole data set) and from January 2017 to December 2018 (730 d, accounting for approximately 25 % of the whole data set) are used as the training, validation and testing data sets respectively. The reanalysis data set can be downloaded from https://apps.ecmwf.int/datasets/data/interim-full-daily/ levtype=sfc/ (last access: 1 July 2019), and it is provided every 12 h on a 0.25 • × 0.25 • spatial grid. Based on expert knowledge and available literature, the 26 near-surface variables (Table A1), which include the total precipitation (tp), the 2 m temperature (t2m) and the total column water (tcw), from the reanalysis data are considered as potential predictors for inflow forecasting. More details regarding the ERA-Interim data set are presented in Appendix A.

Feature scaling and feature selection
Feature scaling is necessary for machine-learning methods, and all features are scaled to the range between zero and one, as shown in Eq. (1), before being included in the calculation.
where x scale and x original indicate the scaled and original data respectively; and x max and x min represent the maximum and minimum of inflow series respectively. The reasonable selection of input variables can reduce the computational burden and improve the prediction accuracy of the model by removing redundant feature information and reducing the dimensions of the features. If too many features are selected, the model will become very complex, which will cause trouble when adjusting parameters and subsequently result in overfitting and difficult convergence. Moreover, natural patterns  in the data will be blurred by noise (Zhao et al., 2013). Conversely, if irrelevant features are chosen, noise will be added into the model and hinder the learning process. The MIC is employed to select input data from candidate predictors from the reanalysis data set. The lagged inflow and rainfall series are identified using the partial autocorrelation function (PACF) and the cross-correlation function (CCF). The corresponding 95 % confidence interval is used to identify significant correlations. Furthermore, when the correlation coefficient slowly declines and cannot fall into the confidence interval, a trial-and-error procedure is used to determine the optimum lag, i.e. starting from one lag and then modifying the external inputs by successively adding one more lagged time series into the input data (Amiri, 2015;Shoaib et al., 2015).

Feature selection via the maximal information coefficient
The calculation of the MIC is based on the concept of mutual information (MI; Kinney and Atwal, 2014). For a random variable X, such as observed inflow, the entropy of X is defined as where p(x) is the probability density function of X = x. Furthermore, for another random variable Y , such as observed rainfall, the conditional entropy of X given Y may be evaluated using the following expression: where H (X|Y ) is the uncertainty of X given knowledge of Y , and p(x, y) and p(x|y) are the joint probability density and the conditional probability of X = x and Y = y respectively. The reduction of the original uncertainty of X, due to the knowledge of Y , is called the MI (Amorocho and Espildora, 1973;Chapman, 1986) and is defined by Considering a given data set D, including variable X and Y with a sample size n, the calculation of the MIC is divided into three steps. Firstly, scatter plots of X and Y as well as grids for partitioning, which are called "x-by-y" grids, are drawn. Let D|G denote the distribution of D divided by one of the x-by-y grids as G. MI * (D, x, y) = max MI(D|G), where MI(D|G) is the mutual information of D|G. Secondly, the characteristic matrix is defined as Lastly, the MIC is introduced as the maximum value of the characteristic matrix: is the upper bound of the grid size and is a function of sample size, which is defined as B = n 0.6 . We perform feature selection from the ERA-Interim reanalysis data set in two steps via MIC. First, we compute the MIC of each of the reanalysis variables and observed inflow. Then, we sort features based on the MIC in descending order and determine the optimum inputs using a trial-and-error procedure, i.e. starting from the top feature and then modifying the external inputs by successively adding one more feature into model input. The selected k features from the reanalysis data are used as part of the model input.

Gradient-boosting regression trees
GBRT is an ensemble model that mainly includes two algorithms: the decision tree algorithm and the boosting algorithm. The decision tree is robust to outliers and is used as a primitive model, and the boosting algorithm, which is an integration rule, is used to improve inflow forecasting accuracy.

The decision tree
The decision tree in this paper refers to decision tree learning used in computer science, which is one of the predictive modelling approaches used in machine learning. A decision tree consists of branch nodes (the tree structure) and leaf nodes (the tree output). Assuming that a training data set is given in a feature space with N features and each feature has n samples, (X 1 , y 1 ), (X 2 , y 2 ), . . . , (X n , y n ) (X i = (x 1 , x 2 , . . . , x N ), i = 1, 2, . . . , n). In the input space where the training set is located, each region is recursively divided into two subregions, and the output value of each subregion is used to construct a binary decision tree. The top-down cyclic branch learning of the decision tree adopts a greedy algorithm where each branch node only cares about its own objective function. By traversing all features and all segmentation points of each feature, the best feature j and segmentation points "s" can be found by minimizing squared loss: Here, y i is the observed value, and R 1 (j , s) and R 2 (j , s) are the results of partitioning. c 1 and c 2 are output values of R 1 (j , s) and R 2 (j , s) respectively. Figure 4 shows an example of a decision tree model with a max depth and number of leaf nodes of 3 and 5 respectively. If the threshold of loss is set as the stopping condition of the decision tree, it will easily lead to over-fitting problems. Hence, we set the following parameters to alleviate the over-fitting problem of the decision tree model: the maximum depth of the tree, the minimum number of samples required to split an internal node, the minimum number of samples required at a leaf node and the number of leaf nodes. These parameters are also those used for optimization when using the decision tree.
S. Liao et al.: Daily inflow forecasting using ERA-Interim reanalysis

The boosting algorithm
The idea of gradient boosting originated from the observation by Breiman (Breiman, 1997) and can be interpreted as an optimization algorithm based on a suitable cost function. Explicit regression gradient-boosting algorithms have been subsequently developed (Friedman, 2001;Mason et al., 1999). The boosting algorithm used is described in the following. Supposing a training data set with n samples (X 1 , y 1 ), (X 2 , y 2 ), . . . , (X n , y n ), a squared loss function is used to train the decision tree: The core of the GBRT algorithm is the iterative process of training the decision with a residual method. The iterative training process of GBRT with M decision trees is explained in the following. 1.
a. operating i (i = 1, 2, . . . , n) sample points, and using the negative gradient of the loss function to replace the residual in the current model . fitting a regression tree with R m t x i , r m i , the ith regression tree with R m t (t = 1, 2, . . . , T ) as its corresponding leaf node region is obtained, where t is the number of leaf nodes of regression; c. for each leaf region t = 1, 2, . . . , T , the best fitting value is calculated by d. the fitting results are updated by adding the fitting values obtained to the previous values using 3. finally, a strong learning method is obtainedf ( According to the above introduction to GBRT, the parameters of the GBRT can be divided into two categories: boosting parameters and tree parameters. The boosting parameters include the learning rate and the number of weak learners (learning_rate and n_estimators). The learning rate setting is used to reduce the gradient step. The learning rate influences the overall time of training: the smaller the value, the more iterations are required for training. There are four tree parameters: max_leaf_nodes, min_samples_leaf, min_samples_split and max_depth. Hence, GBRT has six parameters that control model complexity (Fienen et al., 2018), which we adjusted for tuning using a trial-and-error procedure.

Evaluation criteria of the models
It is critical to carefully define the meaning of performance and to evaluate the performance on the basis of the forecasting and fitted values of the model compared with historical data. The root-mean-squared error (RMSE) and mean absolute error (MAE) are the most commonly used criteria to assess model performance, and they are calculated using Eqs. (11) and (12) respectively: whereQ i and Q i are the inflow estimation and observed value at time i respectively, and n is the number of samples. The RMSE is more sensitive to extremes in the sample sets, and it is consequently used to evaluate the model's ability to simulate flood peaks. The Pearson correlation coefficient (CORR) is a measure of the strength of the association between the observed inflow series and the forecasted inflow series, and it is calculated according to Eq. (13): whereQ is the mean of the estimation series. The range of the CORR is between zero and one and values close to one demonstrate a perfect estimation result. The Kling-Gupta efficiency (KGE) score (Knoben et al., 2019) is also a widely used evaluation index. It can be provided following Eqs. (14) and (15): where σ is the standard deviation of the observed values,σ is the standard deviation of the inflow estimation, µ is the mean of the observed series andμ is the mean of the inflow estimation series. The percent bias in the flow duration curve high-segment volume (BHV; Yilmaz et al., 2008;Vogel and Fennessey, 1994) is presented to estimate the prediction performance of the model for extreme values. It can be provided following Eq. (16): where h = 1, 2, . . . , H is the inflow index for inflows with exceedance probabilities lower than 0.02. In this paper, the inflow threshold of exceedance probabilities equalling 0.02 is 1722 m 3 s −1 . The index of agreement (IA; Willmott, 1981) plays a significant role in evaluating the degree of the agreement between observed series and inflow estimation series. Similar to CORR, it ranges between zero (no agreement at all) and one (perfect fit). It is given by 3.4 Overview of framework Figure 5 illustrates the overall structure of the framework presented. This structure consists of two major models: GBRT and GBRT-MIC. In GBRT, we measure the relevance of the different lags in observed inflow and rainfall with observed inflow at the time of forecast using the partial autocorrelation function (PACF) and the cross-correlation function (CCF; Badrzadeh et al., 2013), and we select appropriate lags as predictors for the model using hypothesis testing and trial-and-error procedures. Data preprocessing and feature scaling are then carried out for selected predictors. Next, the data set is divided into a training set, a validation set, and a testing set according to the length of each data set, which is specified in advance (in Sect. 2.2). A grid search algorithm, which is an exhaustive search-all candidate parameter combination method, is guided to the optimization model parameters by the evaluation of the validation set for each lead time (Chicco, 2017). Lastly, the prediction results are evaluated based on the testing set. Compared with GBRT, GBRT-MIC adds reanalysis data which are selected via MIC (in Sect. 3.1) as the model input. Moreover, GBRT-MIC also calculates the importance of features according to the prediction results and ranks the features (Louppe, 2014). It is difficult to perform multistep forecasting due to the accumulation of errors, reduced accuracy and increased uncertainty. Therefore, the current state of multistep-ahead forecasting is reviewed. There are two main strategies that one can use for multistep forecasting for single output, namely, static (direct) multistep forecasting and recursive multistep forecasting Taieb et al., 2012). The recursive forecasting strategy is biased when the underlying model is nonlinear, and it is sensitive to the estimation error as estimated values, instead of actual values, are used more often as the forecasts move further into the future (Bontempi Q I t+T = f θ I t ; Q t , Q t−1 , . . ., Q t+1−p , R t , R t−1 , . . ., R t+1−q (T = 1, 2, . . ., 10) Q II t+T = f θ II t ; Q t , Q t−1 , . . ., Q t+1−p , R t , R t−1 , . . ., whereQ I t+T andQ II t+T are the forecasted values of GBRT and GBRT-MIC at lead time T of current time t respectively; θ I t and θ II t are parameters of GBRT and GBRT-MIC at lead time T of current time t respectively; p and q are lags of the observed inflow and rainfall determined using the PACF and CCF respectively; E t represents the features from reanalysis data at the current time t; and k is the number of features from the reanalysis data determined via the MIC.

Experimental results and discussion
In order to compare them with GBRT-MIC, the ANN-MIC, SVR-MIC and MLR-MIC models, which were obtained by S. Liao et al.: Daily inflow forecasting using ERA-Interim reanalysis replacing GBRT in the framework with ANN, SVR and MLR respectively, are also employed for inflow forecasting with lead times of 1-10 d. As mentioned previously, six indices, the MAE, RMSE, CORR, KGE, BHV and IA, are calculated to evaluate the performance of models based on the testing set. We also explored the feature importance based on the GBRT-MIC model (Louppe, 2014). All computations carried out in this paper were performed on a ThinkPad P1 workstation containing an Intel Core i7-9850H CPU with 2.60 GHz and 16.0 GB of RAM, using the version 3.7.10 of Python (Python Software Foundation, 2020), which is powerful, fast and open, and the scikit-learn package (Pedregosa et al., 2011). Figure 6 shows the PACF, CCF and the corresponding 95 % confidence interval from lag 1 to lag 12. The PACF shows significant autocorrelation at lag 1 and lag 4 respectively (Fig. 6a); thus, inflow series 1 and 4 d lag are selected as the model inputs. The CCF between inflow and rainfall gradually decreases as the time lag increases (Fig. 6b) and cannot fall into the 95 % confidence interval. Therefore, a trial-anderror procedure is used to determine the optimal selection of the lagged rainfall series. A total of 13 input structures are tried (Table 1), and the trial results are shown in Fig. A1.

Name
Functional expression temperature and pressure, it can be used to calculate the relative humidity. The total column water vapour (no. 11) is the total amount of water vapour, which is a fraction of the total column water. The total column water (no. 12) is the sum of the water vapour, liquid water, cloud ice, rain and snow in a column extending from the surface of the Earth to the top of the atmosphere. The volumetric soil water layer 1 (no. 19) is the volume of water in soil layer 1. In summary, all the selected predictors are interpretable and have a good physical connection to inflow.

Hyperparameter optimization
For machine-learning methods, hyperparameters are parameters that are set before training and cannot be directly learnt from the regular training process. In order to improve the performance of models, it is imperative to tune the hyperparameters of models. The grid search function is employed to tune the hyperparameters of GBRT, GBRT-MIC, ANN-MIC and SVR-MIC. After a review of the available literature (Badrzadeh et al., 2013;Rasouli et al., 2012), an optimizer in the family of quasi-Newton methods, namely L-BFGS, is chosen as the training algorithm for the ANN, and the number of hidden layers is fixed to three. Another two parameters, namely the activation function and the number of nodes of the hidden layer, need to be adjusted. A range of 2-20 neurons and four commonly used activation functions (Table 4) are selected by grid search. To alleviate the influence of the random initialization of weights, 50 ANN-MIC models are trained for each parameter combination. The optimal activa-tion function and the number of nodes of the hidden layer are determined by selecting the minimal MAE of the validation set for each lead time. The results of the trials show that tanh and the logistic function are the two more robust activation functions (Fig. 7), and ANN, with fewer nodes, is inclined to obtain a lower error. The optimal parameter combination for each lead time is listed in Table 5. It can be seen that the optimal number of nodes is 2, 3 or 4, and the optimal activation function is either tanh or the logistic function. For SVR, according to Lin et al. (2006) and Dibike et al. (2001), the radial basis function (RBF) outperforms other kernel functions for runoff modelling; thus, RBF is used as the kernel function in this study. There are three parameters that need to be adjusted. Firstly, an appropriate tuning range of each parameter is determined by a trial-and-error procedure. Then, to reach at an optimal choice of these parameters, the MAE is used to optimize the parameters using grid search. The optimal tuning parameters of SVR are shown in Table 5. As mentioned earlier, for GBRT, there are six parameters need to be adjusted. In order to obtain an optimal parameter combination as soon as possible, we optimize all parameters in two steps. Firstly, n_estimators and learning_rate are fixed to 100 and 0.1 respectively. The max_leaf_nodes, min_samples_leaf, max_depth and min_samples_split tuning parameters generate 40 000 models at each lead time. Secondly, after the tree parameters are determined, learn-ing_rate is modified to 0.01 and n_estimators is determined using grid search. To accommodate the computational burden, all models are distributed among about 12 central processing units (CPUs), and the total wall time for the runs is about 7 h for GBRT_MIC and GBRT. Table 6 lists the optimal tuning parameters for GBRT and GBRT-MIC. Figure 8 illustrates the performance indices of GBRT and GBRT-MIC for the testing set (1 January 2017-31 December 2018) at lead times of 1-10 d. It is obvious that the reanalysis data selected by the MIC greatly improves upon the GBRT forecasting at both short and long lead times. In par-  Note: the bold text, (min, max, step) represents min + max−min step−1 × 0, min + max−min step−1 × 1, . . ., min + max−min step−1 × (step − 1) .  4, 4, 4, 4, 2, 4, 2, 2, 2 7, 9, 13, 7, 15, 4, 5, 4, 4, 17 min_samples_leaf [1, 6, 11, . . . , 46] 6, 31, 1, 1, 1, 31, 6, 1, 6, 1 2, 7, 2, 4, 2, 1, 10, 10, 8, 1 max_depth [1, 2, 3, . . . , 10] 3, 2, 2, 2, 3, 1, 3, 1, 1, 1 4, 6, 8, 5, 9, 9, 2, 2, 7, 2 min_samples_split [2, 4, 6, . . . , 40] 18, 2, 16, 16, 24, 2, 16, 2, 2, 2 18, 15, 12, 13, 8, 3, 19, 3, 19, 8 n_estimators [100, 200, 300, . . . , 4000] 1100, 900, 1200, 700, 700, 3800, 2700, 1300, 900, 1000, 1200, 600, 1100, 900, 900 700, 1400, 2000, 1300, 1200 ticular, for the longer lead times, GBRT-MIC significantly outperforms GBRT. From Fig. 8a, it can be noted that the MAE of GBRT-MIC decreases from 175 to 172, which is a decrease of 1.74 %, for 2 d ahead forecasting, and it decreases from 273 to 237, which is a decrease of 13.18 %, for 10 d ahead forecasting compared with GBRT. From Fig. 8b, it can be seen that the RMSE of GBRT-MIC achieves a 1.4 % and 10.6 % reduction for 2 and 10 d ahead forecasting respectively compared with GBRT. Figure 8c, d and f show that the CORR, KGE and IA of GBRT-MIC increase by 0.2 %, 2.2 %, 1.0 % for 2 d ahead forecasting and 3.4 %, 7.8 % and 2.2 % for 10 d ahead forecasting respectively. Figure 8e compares the BHV values for GBRT and GBRT-MIC and indicates that the reanalysis data can enhance the forecasting of extreme values. Figure 9a shows the 5 d ahead forecasted inflow of GBRT-MIC and GBRT versus the observed inflow in the testing set. The slopes of the fitting curves of GBRT-MIC and GBRT are 0.89 and 0.81 respectively; this also demonstrates that GBRT-MIC can obtain more accurate inflow forecasting than GBRT. Figure 9b illustrates the distribution of the forecast errors of GBRT and GBRT-MIC.

Input comparison
The results show that the prediction error of two models has an approximately normal distribution. This demonstrates that the prediction error contains information that is not extracted by the model and that more errors of the forecasted inflow concentrate at around zero for GBRT-MIC than for GBRT. Figure 9c provides forecasted inflow time series (from the testing set) for GBRT-MIC and GBRT at a lead time of 5 d. It can be seen that GBRT-MIC shows great performance compared with GBRT, especially for extreme values. This reveals that the problem of inaccurate extreme value prediction that has arisen in areas with concentrated rainfall for the GBRT model could be mitigated by incorporating the reanalysis data identified by the MIC.

Model comparison
GBRT-MIC, SVR-MIC and ANN-MIC with the optimal model parameters are employed for inflow forecasting of 1 to 10 d ahead. The summarized results for the training and testing set are presented in Tables 7 and 8 respectively. To avoid the problem of local minima, 50 ANN-MIC models are trained for each lead time, and the median of the predictions of the 50 models is used as the final prediction. It is clear from Table 7 that the GBRT-MIC model is more effi-cient in the training set than other models at lead times of 1-10 d, which demonstrates that GBRT-MIC has a powerful fitting ability. Meanwhile, all machine-learning models obtain better forecasted results than MLR-MIC, which cannot capture nonlinear relationships. It should be noted that ANN-MIC has the best performance for extreme values in terms of the BHV in the training set. As shown in Table 8, GBRT-MIC performs best for the testing set at lead times of 4-10 d in terms of the above-mentioned six indices. At a lead time of 10 d, the KGE of GBRT-MIC even reached 0.8317. At lead times of 1-3 d, three machine-learning models obtain good performance and outperform MLR-MIC. The machinelearning models can acquire enough information to perform forecasting at short lead times (1-3 d). The performance indices of these four models in the testing set (2017-2018) at lead times of 1-10 d are presented in Fig. 10. The results indicate that the performance of these four models decreases (higher MAE, RMSE and BHV, and lower CORR, KGE and IA) as the lead time increases. As mentioned earlier, the four models perform equally well for 1 to 3 d ahead forecasting, whereas significant differences among their performance are found as lead times exceed 3 d. This clearly indicates that GBRT produces much higher CORR, KGE and IA values and lower MAE, RMSE and BHV values than the other three models for 4-10 d ahead forecasting, although ANN-MIC performs almost as well as GBRT-MIC for 10 d ahead forecasting. It should be noted that SVR shows the worst performance according to the BHV and KGE values, and this demonstrates that SVR cannot capture extreme values. On the contrary, GBRT-MIC significantly outperforms other models in terms of the BHV at lead times of 1-10 d, which indicates that GBRT-MIC is the most successful model with respect to obtaining extreme values among all of the models developed in this paper.

Feature importance
A benefit of using gradient boosting is that after the boosted trees are constructed, the relative importance scores for each feature can be acquired to estimate the contribution of each feature to inflow forecasting. Figure 11 shows the feature importance based on GBRT-MIC for lead times of 1 and 10 d respectively. The 1 d lag observed time series (Q t−1 ) is more important for shorter lead times (Fig. 11a), which demonstrates that the historical observed values are essential to inflow forecasting at shorter lead times. The features (e.g. stl3 t−10 and d2m t−10 ) from the reanalysis data have a high relative importance at longer lead times (Fig. 11b).
Based on the analysis of the concepts of stl3 t−10 and tcw t−10 (Sect. 4.1), we infer that the temperature near the ground impacts the inflow by affecting the melting of snow, which is consistent with the fact that the Lancang River is a snowmelt river. The 10 d lag observed time series (Q t−10 ) is also very important, which indicates the long memory of inflow series (Salas, 1993). Meanwhile, it is found that the reanalysis data provide important information for inflow forecasting at longer lead times.

Conclusions
In this study, GBRT-MIC is employed to make inflow forecasts for lead times of 1-10 d, and ANN-MIC, SVR-MIC and MLR-MIC are developed to compare with GBRT-MIC. The reanalysis data selected by the MIC and the antecedent  inflow and the rainfall records selected by the PACF and CCF are used as predictors to drive the models. These models are compared using six evaluation criteria: the MAE, RMSE, CORR, KGE, BHV and IA. It is shown that GBRT-MIC, ANN-MIC and SVR-MIC outperform MLR-MIC at lead times of 1-10 d, and GBRT-MIC performs best at lead times of 4-10 d, especially for the forecasting of extreme val-ues. According to a comparison of the forecasted results of GBRT and GBRT-MIC, we conclude that GBRT-MIC can be used for more accurate and reliable inflow forecasting at lead times of 1-10 d and that reanalysis data selected by the MIC greatly improve upon the GBRT forecasting, especially for lead times of 4-10 d. In addition, the feature importance achieved by GBRT-MIC demonstrates that soil tem-perature, the total amount of water vapour in a column and dewpoint temperature near the ground contribute to increasing the prediction accuracy of inflow at longer lead times. In summary, the framework developed integrates GBRT and reanalysis data selected by the MIC and can perform inflow forecasting well at lead times of 1-10 d. The results of this study are of significance as they can assist power stations in making power generation plans 7-10 d in advance in order to reduce LEQDW and flood disasters. Another possibility to improve the results may be the consideration of heuristic methods (e.g. the Grey Wolf algorithm) to optimize model parameters, which could search a wider range of hyperparameters and optimization parameters more quickly.
Appendix A: ERA-Interim reanalysis data set and model input ERA-Interim is a reanalysis product of global atmospheric forecasts at ECMWF that is produced using the Integrated Forecast System (IFS) data assimilation system. The system includes a four-dimensional variational analysis (4D-Var) with a 12 h analysis window. The spatial resolution of the data set is approximately 80 km (0.72 • ) with vertical 60 levels from the surface up to 0.1 hPa . The 0.125 to 2.5 • reanalysis meteorological products are generated by interpolation. Reanalysis meteorological products from the ERA-Interim such as rainfall, maximum and minimum temperatures, and wind speed at a 0.25 • × 0.25 • (latitude × longitude) spatial and 12 h temporal resolution for the study period from 2011 to 2018 are downloaded from the ECMWF web page.
A total of 13 input structures from the observed data are tried, and 50 trials are performed for each input structure. The results (Fig. A1) show that the 7th input structure is the optimal input subset for GBRT.
A total of 26 input structures from the reanalysis data are tried, and 50 trials are performed for each input structure. The results (Fig. A2) show that the 8th input structure is the optimal input subset for GBRT-MIC.