A Bayesian Joint Probability Post-Processor for Reducing Errors and Quantifying Uncertainty in Monthly Streamflow Predictions

The section 3.2 with added description on Bayesian inference now reads as – “The BJP modelling approach assumes that a set of predictands y(2), and their predictors y(1) follow a joint multivariate normal distribution in a transformed space. Normalization of the variables is achieved by using the log-sinh transformation (Wang et al 2012). The log-sinh transformation replaces the previously used Yeo-Johnson transformation (Yeo and Johnson 2000, Wang et al 2009, Wang and Robertson 2011). Although both have data normalization and variance stabilization properties, the log-sinh has been shown to outperform the Box-Cox based Yeo-Johnson transformation when applied to catchments with highly skewed data (Wang et al 2011). The posterior distribution of the parameters | , including mean, variance and transformation parameters for each variable and a correlation matrix for the multivariate normal distribution, is estimated using a Bayesian inference (equation 1).


Introduction
Streamflow predictions from a hydrological model can be used for wide range of applications including flood forecasting at short time scales to long term assessments of water resources.Model predictions are subject to errors originating from various sources including input data, calibration data, model structure and parameters.The model is usually calibrated prior to its application to compensate for these errors, thus reducing uncertainty in the predictions.However, a model being a simplified representation of a system will always contain uncertainty in its predictions (Gupta et al., 2005).Post-processors are statistical models that are applied to model predictions to further reduce errors and to quantify uncertainty in the streamflow predictions (Seo et al., 2006) Post-processors can reduce errors through elimination of systematic bias and/or by reduction of "short memory" or transient errors (Pagano et al., 2011).The former is generally achieved by using simple statistical approaches like quantile mapping or regression (Hashino et al., 2007;Shi et al., 2008), while the latter by prediction updating (Lekkas et al., 2001;Moraweitz et al., 2011).The prediction updating techniques exploit persistence of residuals to correct for errors using linear or non-linear auto-regressive models (WMO, 1992;Shamseldin and O'Connor, 1999;Xiong and O'Connor, 2002;Pagano et al., 2011).Streamflow predictions, even after bias correction and prediction updating contain errors that cannot be eliminated and information on prediction uncertainty is useful for decision makers who use the predictions.Post-processors are generally designed to provide an estimate of the total "lumped" uncertainty in the predictions by constructing statistical models of errors based on model predictions and historical observations (e.g.Krzysztofowicz, 1999;Engeland et al., 2005;Montanari and Grossi, 2008).
In hydrology, post-processors have been mostly used for short-term streamflow or river height forecasting.The examples include Bayesian Forecasting System (BFS; Krzysztofowicz, 1999Krzysztofowicz, , 2002;;Reggiani and Weerts, 2008), the US National Weather Service (NWS) post-processor (Seo et al., 2006), the General Linear Model Post-Processor (Zhao et al., 2011), the Meta Gaussian post-processor (Montanari and Grossi, 2008) and others.They range in complexity from the NWS post-processor that adopts a fairly simple auto-regressive error structure (Seo et al., 2006), to BFS that uses complex parameterization scheme based on meta-Gaussian distributions.Some are primarily intended for uncertainty quantification but also include components for error reduction (e.g.Krzysztofowicz, 1999Krzysztofowicz, , 2002)).Methods for parameterization, parameter estimation and calculation of predictive distributions differ among post-processors, although some common features can be found.Most post-processors produce probabilistic predictive distributions of streamflow (or river height) conditioned on model predictions and recent streamflow observations.They generally assume linear dependence among the variates in a transformed Introduction

Conclusions References
Tables Figures

Back Close
Full normal space and most use Normal Quantile Transformation (NQT; Krzysztofowicz, 1997Krzysztofowicz, , 1999;;Todini, 2008;Li et al., 2010) to normalise the variables.All assume the estimated values of the parameters (of the post-processors) to be "true" and ignore the uncertainty in estimating their values (Krzysztofowicz, 1999(Krzysztofowicz, , 2002;;and others).For a complex post-processor like BFS, this (parametric uncertainty) can be substantial (Seo et al., 2006).More importantly, they are all designed to post-process streamflow predictions at daily or sub-daily time scales.For many hydrological applications, such as seasonal streamflow forecasting, water resources and climate change assessments, monthly streamflow volumes are of primary interest.While daily predictions from daily models may be post-processed at the daily time scale and then aggregated to monthly, there is no guarantee that the monthly volumes so produced are reliable in uncertainty distributions and have the least errors achievable.It is likely much more effective to apply post-processing directly at the monthly time scale, where pre-processed monthly volumes may come either from aggregating daily model outputs or simply from monthly models.
In this study, we investigate the use a Bayesian joint probability (BJP) modelling approach to post-process model predictions of monthly streamflow volumes.The BJP method was originally developed for forecasting seasonal streamflows in Australia (Wang et al., 2009).Here we apply it for bias correction, prediction updating and uncertainty quantification of monthly streamflow volumes generated from a monthly water balance model.The BJP method uses a parametric transformation to normalise data and stabilise variance.It allows for parameter uncertainty in the post-processor, and this can be important when dealing with monthly variables, which have far fewer data points than daily variables.In this study, we assess three formulations of the BJP postprocessor in their ability to reduce error and quantify uncertainty.
The paper is structured as follows.Section 2 describes the catchments and data used in the study.Section 3 presents the hydrological model used and the formulations of the BJP post-processor.Evaluation of the post-processor is given in Sect. 4 and followed by discussions in Sect. 5. Conclusions are drawn in Sect.6. Introduction

Conclusions References
Tables Figures

Back Close
Full We test BJP post-processor in 18 catchments located in Queensland, Victoria (including one at the border with New South Wales) and Tasmania (see Fig. 1).We use observed monthly streamflow data obtained from various water resource management agencies and the Bureau of Meteorology, Australia.For most catchments, with the exception of some in Queensland and Victoria, the data are available from 1950 to 2008 (see Table 1).The monthly catchment average rainfall and potential evapotranspiration for each catchment are calculated from a 5 km gridded dataset available from the Australian Water Availability Project (AWAP; Jones et al., 2009).

Methods
In each catchment, we calibrate parameters of a hydrologic water balance model and generate predictions.The "raw" predictions generated by the model contain errors that are unreconciled during calibration process.The BJP post-processor aims to reduce such errors and quantify uncertainty.This section describes the process of generating streamflow predictions and their subsequent post-processing.Introduction

Conclusions References
Tables Figures

Back Close
Full

Generation of streamflow predictions using a hydrological model
We use a monthly model known as WAPABA (Water Partition and Balance; Wang et al., 2011) model to generate streamflow predictions.WAPABA is a modified version of the Budyko framework model (Zhang et al., 2008) and consists of two storages and five parameters.The model uses consumption curves to partition water into different components based on the availability of water (supply) and demand.WAPABA has been tested in 331 catchments in Australia and has demonstrated ability to perform well (Wang et al., 2011).We calibrate WAPABA using the shuffled complex evolution search method (SCE; Duan et al., 1994) for a period of five years.Prior to every model run we allow a five year warm up period to reduce model sensitivity to state initialization errors.We use a scalarized multi-objective measure consisting of a uniformly weighted average of the Nash-Sutcliffe efficiency coefficient (Nash and Sutcliffe, 1970), the Nash-Sutcliffe efficiency of log transformed flows, the Pearson correlation coefficient and a symmetric measure of bias (Wang et al., 2011).Finally, we use the calibrated parameters to produce streamflow predictions using the observed rainfall.

Statistical post-processing
The BJP modelling approach assumes that a set of predictands y(2), and their predictors y(1) follow a joint multivariate normal distribution in a transformed space.Normalization of the variables is achieved by using the log-sinh transformation (Wang et al., 2012).The log-sinh transformation replaces the previously used Yeo-Johnson transformation (Yeo and Johnson, 2000;Wang et al., 2009;Wang and Robertson, 2011).Although both have data normalization and variance stabilization properties, the logsinh has been shown to outperform the Box-Cox based Yeo-Johnson transformation when applied to catchments with highly skewed data (Wang et al., 2011).
The posterior distribution of the parameters, including mean, variance and transformation parameters for each variable and a correlation matrix for the multivariate Introduction

Conclusions References
Tables Figures

Back Close
Full normal distribution, is estimated using a Bayesian inference.The sampling of the posterior parameter distribution is done by using a Markov chain Monte Carlo method.The posterior predictive density for a new event is given by: where Y OBS contains the historical data of both predictor and predictand variables used for model inference, and θ is the parameter vector.Details of the method for the numerical evaluation of Eq. ( 1) can be found in Wang et al. (2009) and Wang and Robertson (2011).
To apply BJP as a post-processing tool we implement three methodologies with different combinations of the predictors.
Method A: method A represents the simplest case where only WAPABA predictions is applied as the predictor (y(1) in Eq. 1) and the observed streamflows as the predictand (y(2) in Eq. 1).This combination is to achieve two post-processing objectives, correction of systematic bias and quantification of uncertainty.The bias correction is achieved through the regression property embedded within the BJP modelling approach (see Wang et al., 2009).
Method B: for method B, we add a second predictor over that used for method A. We add streamflow data observed one month previously.The inclusion of lagged streamflow observations is to add auto-regressive component to the postprocessor and allow prediction updating.This method reduces errors through correction of systematic bias as well as prediction updating and quantifies uncertainty.
Method C: for method C, we introduce a third predictor, the WAPABA model outputs simulated in the previous month.This inclusion is to further improve the prediction updating ability of the post-processor by utilizing the persistence in the simulated time series.Introduction

Conclusions References
Tables Figures

Back Close
Full For each method, we first train the post-processor using the historically observed data.
To account for seasonal effects, we establish 12 different models for each month of the year.For each month, the post-processed probabilistic predictions are generated using a "leave-one-out" cross validation procedure.This consists of sampling the parameters using all but the year of interest and then generating predictions for the "left out" year.
The cross validation period in most catchments period is about 59 yr .Figure 2 is an example of the post-processed predictions generated by the BJP post-processor.This example is to provide the reader an appreciation of how the postprocessed predictions from the BJP post-processor may look like.A detail evaluation of the post-processor, with respect to the post-processing qualities, will be presented in Sect. 4. The example is drawn from Lake Eildon in central Victoria and shows [0.1, 0.25, 0.5, 0.75, 0.9] quantiles and observed streamflow values plotted chronologically.In this case the post-processed predictions do not show any obvious trend with time and the widths of the quantile intervals seem to cover the expected number of the observed values.

Results
In this section we assess the quality of the probabilistic predictions generated by using the three methods and evaluate how effective the BJP post-processor is in reducing errors and quantifying uncertainty.

Reduction of error
We assess the ability of BJP post-processor to reduce errors by using a measure of accuracy called Root Mean Squared Error in Probability (RMSEP; Wang and Robertson, 2011).RMSEP (Eq.2) measures error in a probability space.An advantage of RMSEP over the more commonly used mean squared error or root mean squared error is that, it places equal emphasis on errors obtained at all events rather than on a few large Introduction

Conclusions References
Tables Figures

Back Close
Full (2) where, y t and y t OBS are the predictions and observations at t = 1, 2, . . ., n events, respectively.The predictions can be either WAPABA simulations or the medians of postprocessed distributions.F CLI is the cumulative historical distribution, and F CLI (y) is the non-exceedance probability.

Performance of the WAPABA model
The RMSEP error values of the WAPABA predictions are shown in Fig. 3 In general, the result shows that method A effectively reduces systematic bias present in WAPABA predictions.This is manifested as reductions in RMSEP error values over the 18 catchments.The reductions in RMSEP roughly follow the error patterns seen in Fig. 3.In most cases the differences in RMSEP values are either positive or zero, indicating that the post-processor either reduces errors or preserves (does not degrade) performance of the WAPABA predictions.The highest reductions in RMSEP values occur in Lake Eildon and Goulbourn Weir of central Victoria.

Method B: prediction updating
Figure 4b shows the benefit of prediction updating by assimilating the recent streamflow observations (method B).We use the difference between method A and B (method A − method B) to indicate any further reductions in errors achieved by prediction updating.As in previous case blue indicates reductions in errors and red indicates increases.
The figure shows further reductions in RMSEP values after bias correction (method A).The reductions occur in most of the catchments.The reductions in errors are governed by whether the errors are present after bias correction and the persistence in the streamflow observation data.For example, the WAPABA predictions in River Cape of Queensland and Lake Nillahcootie of central Victoria (Fig. 3) show the presence of substantially large error values in initial few months even after bias correction, but cannot be corrected due to the lack of persistence in the errors.In the upper Murray region, central Victoria and southern Victoria, reductions occur in most of the catchments, and in some catchments (such as Cairn Curran Reservoir), it is greater than that achieved through bias correction.In Tasmanian catchments, the reductions are negligible.Introduction

Conclusions References
Tables Figures

Back Close
Full

Quantification of uncertainty
The post-processor should be able to quantify the uncertainty in predictions.As a measure of the ability to quantify uncertainty, we assess if the probabilistic predictions generated by the post-processor are reliable and robust.We assess the predictions generated using all three methods in 18 catchments, but present results for Lake Eildon using method B as a general representation.

Assessment of reliability
Reliability refers to "statistical consistency" of the predictive probability distributions with the observed frequency of the events (Toth et al., 2003;Robertson et al., 2012).
In this study we use PIT (probability integral transform) uniform probability plots (Wang et al., 2009;Wang and Robertson, 2011) to assess the overall reliability of the postprocessed predictive distributions.We choose PIT uniform probability plots over other methods because they are more suited to smaller sample sizes (Wang et al., 2009).The PIT of the observed value is given as, π t = F t (y t OBS ), where, F t (y t OBS ) is the non-exceedance probability of the observed streamflow in the predictive distribution.The predictive distributions are said to be reliable if the PIT values are distributed uniformly.To check uniformity, we plot PIT values corresponding to each event in a uniform Introduction

Conclusions References
Tables Figures

Back Close
Full probability plot (Wang et al., 2009;Wang and Robertson, 2011).A close alignment of the values to 1 : 1 indicates uniformity and therefore reliable distributions.Deviations from the 1 : 1 line indicate if the predictive distributions are too low, high or if the uncertainty spreads are too wide or narrow.The details on how to interpret the PIT plots can be found in Thyer et al. (2009), Wang et al. (2009) and Wang and Robertson (2011).
Figure 5 shows the PIT uniform probability plots of the post-processed predictions generated for the months of February and July in Lake Eildon.The dotted inclined lines depict the Kolmogorov 5 % significance band.The PIT values in the plots align quite uniformly along the diagonal 1 : 1 line (solid inclined line) and are well within the significance band.This suggests that the post-processed predictive distributions are over all reliable and the width of uncertainty intervals are of appropriate spread (not too wide or narrow).The result is similar for all the months in Lake Eildon (figures not included).

Assessment of robustness
Robustness refers to "conditional reliability" of the predictive distributions over time and event size.To measure the robustness of the predictive distributions against time, we plot PIT values chronologically and analyse the plot for the presence of any trends or patterns.The distributions are robust (over time) if the PIT values are distributed uniformly.Any existing trends or patterns indicate the presence of systematic errors in the distributions (Wang et al., 2009;Wang and Robertson, 2011).
Figure 6 (top row) shows the PIT values plotted chronologically for February and July.The PIT values tend to be distributed randomly against time, devoid of any trends or patterns indicating that distributions are robust.In fact, this was the case for all the months in Lake Eildon (figure not included).
To measure robustness of the post-processed predictions against flow magnitudes, we plot post-processed prediction quantiles and the observed streamflow values against the medians of the predictions.As in previous case we analyse the plot to Introduction

Conclusions References
Tables Figures

Back Close
Full detect presence of any trends or patterns.Figure 6 (bottom row) shows the postprocessed quantiles plotted against event magnitude.
The figure shows that the quantiles increase with event sizes and the medians are consistent with the observed flows.The observed flows are scattered randomly over the medians, suggesting that the post-processed quantiles are robust with respect to event magnitudes.The plots also show that the width of the uncertainty intervals are of appropriate spread for all the event size.
This verification approach is applied for all post-processing method (A, B and C) for all the catchments, for each month.In general, the results are consistent to the results obtained in Lake Eildon.

Discussion
The results show that large bias can occur in predictions despite calibrating WAPABA using a multicriteria objective function that includes a symmetric measure of bias.This is not surprising because maximization of the scalarized function is a result of compromise between four objective functions and does not necessarily lead to removal of systematic bias in all catchments and in all months.The presence of bias is especially high in Lake Eildon.The BJP post-processor eliminates bias in the predictions effectively, resulting in bias close to zero throughout the year.This can be better appreciated in Fig. 7 which shows monthly percentage bias obtained by WAPABA predictions and its elimination by method A.
Furthermore, it is interesting to note that the bias correction is not just due to linear changes in slope or intercepts but also due to non-linear changes as illustrated by Our results show that further error reductions can be possible through prediction updating.This contradicts the assumptions made by Li et al. (2011), who assume that persistence in error structure at monthly time step is negligible.However, we note that the results tend to be catchment specific.In our case the improvements are mostly seen in catchments that have substantial streamflow contribution from the slow responding mechanisms (resulting in longer memory) in the catchment.This seems to be the case in upper Murray, central Victoria and southern catchments, where significant reductions in errors can be observed.The two catchments in Tasmania and the one in Queensland and central Victoria have shorter catchment "memory" with the streamflow being dominated by fast responding runoff processes and therefore the benefits of prediction updating is negligible.
We acknowledge that the rainfall forecast uncertainty represents a major source of uncertainty in streamflow forecast (Krzysztofowicz, 1999;Kuczera et al., 2006).In this study, however, we run the water balance model in a simulation mode.Therefore the total uncertainty quantified by the post-processor is the "lumped" combination of the hydrologic uncertainty, rainfall measurement uncertainty, the streamflow measurement uncertainty and the uncertainty in inferring the values of the parameters of the BJP post-processor.However, the post-processor is equally applicable in the real world applications using rainfall forecasts.In such case the uncertainty spread quantified by the post-processor will be wider to reflect the uncertainty in forecasting rainfall.

Summary and conclusions
In this study, we present a statistical post-processor capable of reducing errors and quantifying uncertainty in monthly streamflow predictions.The statistical postprocessor is based on the BJP modelling approach (Wang et al., 2009).The BJP post-processor is applied to 18 catchments in Australia and its ability to reduce errors, through reductions of systematic bias and prediction updating, and to quantify uncertainty in the monthly streamflow predictions is assessed.Introduction

Conclusions References
Tables Figures

Back Close
Full The study shows that the BJP post-processor is capable of improving the accuracy of the streamflow predictions by reducing systematic bias in most of the catchments.In many cases reduction of bias is achieved by means of non-linear relationship between model predictions and the observed streamflow values.The post-processor also demonstrates its useful property in preserving the accuracy (does not increase error) of predictions when bias correction is not possible.
Prediction updating through the assimilation of recent streamflows by the postprocessor results in further reductions in RMSEP error values over that achieved by bias correction alone, and is most effective for catchments showing stronger persistence in the prediction errors.Benefits of prediction updating using additional information from the water balance model simulation seem to be very marginal and do not justify the added complexity of introducing another predictor to the post-processor.
The BJP post-processor is capable of generating probabilistic predictions that are overall reliable.The uncertainty quantified by the processor is of appropriate spread.The post-processed predictive distributions are robust with respect to time and event magnitude.Introduction

Conclusions References
Tables Figures

Back Close
Full Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Figure4ashows the differences in RMSEP error values between the WAPABA predictions and that produced from method A (WAPABA prediction − method A).The values are colour coded with blue indicating the reductions in RMSEP error values and red indicating increases.
Figure4cshows additional benefits achieved by assimilating "lagged" streamflow simulation.The difference is measured relative to method B (method B − method C) such that positive (blue) values indicate further reductions in RMSEP error values over that achieved by B. The result shows that the benefits of adding lag-1 WAPABA streamflow tend to be negligible in most catchments and seasons.Although some reductions in RMSEP error values can be observed in Maroondah reservoir (in southern Victoria), for the months of February, March and May in other catchments it is close to zero.
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Fig. 8a, b.The figures demonstrate non-linear compensations to WAPABA predictions by the BJP post-processor.The log-sinh transformation in combination with the BJP model parameter inference, allow for the non-linear corrections of errors thus allowing for corrections of conditional as well as unconditional biases.Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 16 ure 1: Location of the 18 catchments used for the study.

Figure 2 :Fig. 2 .Fig. 5 .Figure 6 :Fig. 6 .
Figure 2: The time series of the post-processed prediction quantiles against the observed; only a subset of the entire cross validation period is shown.[Light blue lines represent 0.1 -0.9 quantiles, dark blue lines represent the 0.25 -0.75 quantiles, red dots are the observed streamflow values and the blue dots are the medians of the post-processed predictive distributions.]