The enhanced availability of many different hydro-meteorological modelling and forecasting systems raises the issue of how to optimally combine this great deal of information. Especially the usage of deterministic and probabilistic forecasts with sometimes widely divergent predicted future streamflow values makes it even more complicated for decision makers to sift out the relevant information. In this study multiple streamflow forecast information will be aggregated based on several different predictive distributions, and quantile forecasts. For this combination the Bayesian model averaging (BMA) approach, the non-homogeneous Gaussian regression (NGR), also known as the ensemble model output statistic (EMOS) techniques, and a novel method called Beta-transformed linear pooling (BLP) will be applied. By the help of the quantile score (QS) and the continuous ranked probability score (CRPS), the combination results for the Sihl River in Switzerland with about 5 years of forecast data will be compared and the differences between the raw and optimally combined forecasts will be highlighted. The results demonstrate the importance of applying proper forecast combination methods for decision makers in the field of flood and water resource management.

The combination, or aggregation, of differing probability distributions into a single one could result in beneficial effects, since the differences between various forecast systems provide a better understanding of the uncertainty about the target quantities, and the aggregates may reflect more accurately the information. However, the biggest advantage of aggregation is that the forecaster is not forced to decide a priori which forecast system is the most reliable at the actual point of issuing a forecast, because the combination method will be optimized at each forecast run by taking into consideration the quality of the forecast from previous time steps. Thus, the data themselves will automatically lead to the optimal decision incorporating all available information about the different deficiencies and strengths of the individual forecast systems.

In econometrics and related disciplines, the combination of forecasts has a
long tradition starting with

In general the challenge of model combination is that, apart from the simple
model averaging methodologies, different weights need to be assigned
according to the quality of the forecast of the preceding days and periods. A
frequently used method for model averaging and forecast combination is the
method of Bayesian model averaging (BMA) introduced by

In

The Beta-transformed linear pooling (BLP) approach, which has been developed
recently by

Before the combination methods are applied, the errors of the hydrological
model are corrected in order to minimize the difference between the last
available observation and the predictions at the time of initialization of
the forecast. This process of error correction is later on called
post-processing, since it starts after completing the hydrological
simulations and predictions given meteorological observations or forecasts.
Depending on the post-processing method, quantiles or pdfs for future
streamflows will be derived for each single forecast time step. Whereas
quantile regression (QR) methods (

Three different combination methods have been applied to the flood
forecasting system for the Sihl River at station Zurich (Switzerland), where
two meteorological forecasts, the 16-member COSMO-LEPS (

In a first step the hydrological modelling errors of all these forecasts will
be minimized, using a QR method in combination with neural networks (QRNN,

This QRNN method will be applied to each ensemble member of the COSMO-LEPS
forecasts, resulting in 16 forecasts of quantiles, and to the
C7 forecasts.

Thus, in total there are five different forecasts available after
post-processing, two based on the application of the QRNN method for the
COSMO-LEPS with probability averaging (p.aver.) or quantile averaging
(q.aver.), two post-processed C7 forecasts based on QRNN with the EMP and the
LN approach, and one forecast based on the waveVARX method. Additionally the
raw COSMO-LEPS forecast will be included in the following combination
procedures as well (see Fig.

Set of six different forecast models available for combination, five post-processed plus one raw forecast. For the quantile averaging (M1) and the probability averaging (M2) method, an example of averaging two ensemble members is indicated.

Three different methods will be tested for optimally combining these six
forecast models (M1,

If the combination is calculated within a Bayesian framework by using weights
corresponding to the posterior model probabilities, it is usually referred to
as BMA and follows from direct application of Bayes' theorem as explained in
e.g.

In

Another possibility to address underdispersion and forecast bias is the use
of the NGR method, also known as EMOS, and is based on multiple linear
regression for linear variables, such as temperature or streamflows, and
logistic regression for binary variables, such as precipitation occurrence or
freezing. More information about the MOS technique can be found for example
in

Thus the predictive mean is equal to the regression estimates with
coefficients

In

Thus

This BLP approach has been applied now to combine the different forecast
systems. The quantiles resulting from the QRNN method (models M1, M4, and M5)
forecasts have been converted to pdfs by applying the LN method (by fitting a
log-normal distribution to the re-arranged

Although probability and quantile forecasts are both probabilistic products,
the former is expressed in terms of a probability (e.g. that a certain
threshold will be exceeded) and the latter is given by a quantile for a
particular probability level of interest (

The CRPS compares the forecast probability distribution with the observation
and both are represented as cdfs. If

Thus, in the standard form (Eq.

COSMO-LEPS and C7 forecasts are available from 24 February 2010 to
27 April 2016 once a day with hourly time resolution, which have been
post-processed in order to derive predictive distributions and quantile
forecasts. To calibrate and validate the post-processing parameters (QRNN and
waveVARX), the data sets of available hourly observations and corresponding
simulations have been split into two halves (calibration period: 2010–2012;
validation period: 2013–2016). The results of the validation, which are not
shown due to lack of space, highlight the improvements of the QRNN method
(similar to the results shown in

The weighting parameters of the combination methods are estimated by applying
a moving window with a size of 7 days (168 h) for optimization. Different
window sizes have been tested as well, but 7 days was chosen finally as a
trade-off between computing time and efficiency. In Fig.

Hourly weights of the BMA

Probability integral transform (PIT) of the raw and three combined forecasts at a lead time of 48 h.

Before the forecast skill of the three combination methods are compared, the
statistical consistency between the predictive cdf and the observations are
analysed with the help of the probability integral transform (PIT) as
proposed by

The question now is whether there are significant differences between the three combination methods. Therefore the QS has been applied at first to highlight possible differences between the combination methods in more detail.

In Fig.

Quantile score (QS) for various lead times and the three combination methods in comparison to the raw COMSO-LEPS and a simple quantile mapping (QM) approach.

In Fig.

CRPS of the six forecast models: COSMO-LEPS with quantile averaging (QRNN-CL-q.) – M1, probability averaging (QRNN-CL-p.) – M2, the waveVARX(-CL) method – M3, the raw COSMO-LEPS (CL) forecast – M4, the two post-processed C7 forecasts based on QRNN with the EMP – M5, and the LN approach – M6. Additionally, the CRPS of the BLP combined forecast is shown.

The CRPS for the raw C-L, the QM approach and the three combination methods
is shown in Fig.

CRPS of the raw and combined forecasts.

So far most of the studies comparing the results of the BMA and the NGR
approach have not found any preference (see for example

In general the weights show some periodicity, which indicates that some models are more appropriate to be used in certain seasons and for certain flow conditions during a year. However, the limited amount of data does not allow us to draw clear conclusions.

The results of the PIT clearly indicate that all three combinations result in
well-calibrated forecasts with close to uniform histograms. In Fig.

The analysis of the QS (Fig.

As already stated previously, the comparison of the CRPS of the different
post-processed methods and the aggregated ones (e.g. BLP) clearly identifies
the advantage of combination (Fig.

Combination is an essential tool for improving the forecast quality. The different methods are all more or less equally suited. Although the BLP showed slightly better results, the straightforward application and the low computational costs of the NGR make this method an equally good alternative, at least for this case study. The parameter estimation of the BMA and the BLP could get quite time-consuming and sometimes results in suboptimal solutions, which could degrade the gain of applying combination methods.

The COSMO-LEPS and C7 raw meteorological forecasts are
properties of MeteoSwiss and have been made available under license agreement
between WSL and MeteoSwiss. The processed streamflow simulations and
forecasts as well as the measured discharge data can be made available upon
request. All calculations of the post-processing and the combination methods
have been implemented in the R statistical software (R Core Team, 2016) using
various packages like QRNN (

The authors declare that they have no conflict of interest.

The real-time operational system for the Sihl basin is financed by the Office of Waste, Water, Energy and Air of the Canton of Zurich. This study was conducted in the framework of the Swiss Competence Center for Energy Research – Supply of Electricity (SCCER-SoE) with funding from the Commission for Technology and Innovation – CTI (grant 2013.0288). MeteoSwiss is greatly acknowledged for providing all used meteorological data. The Swiss Federal Office for Environment (FOEN) provided the observed discharge data. The authors would like to thank especially Vanessa Round for proofreading. Edited by: Florian Pappenberger Reviewed by: two anonymous referees