An experiment on the evolution of an ensemble of neural networks for streamflow forecasting
Abstract. We present an experiment on fifty multilayer perceptrons trained for streamflow forecasting on three watersheds using bootstrapped input series. This type of neural network is common in hydrology and using multiple training repetitions (ensembling) is a popular practice: the information issued by the ensemble is then aggregated and considered to be the final output. Some authors proposed that the ensemble could serve the calculation of confidence intervals around the ensemble mean. In the following, we are interested in the reliability of confidence intervals obtained in such fashion and in tracking the evolution of the ensemble of neural networks during the training process. For each iteration of this process, the mean of the ensemble is computed along with various confidence intervals. The performance of the ensemble mean is evaluated based on the mean absolute error. Since the ensemble of neural networks resemble an ensemble streamflow forecast, we also use ensemble-specific quality assessment tools such as the Continuous Ranked Probability Score to quantify the forecasting performance of the ensemble formed by the neural networks repetitions. We show that while the performance of the single predictor formed by the ensemble mean improves throughout the training process, the reliability of the associated confidence intervals starts to decrease shortly after the initiation of this process. While there is no moment during the training where the reliability of the confidence intervals is perfect, we show that it is best after approximately 5 to 10 iterations, depending on the basin. We also show that the Continuous Ranked Probability Score and the logarithmic score do not evolve in the same fashion during the training, due to a particularity of the logarithmic score.