Improving statistical forecasts of seasonal streamflows using hydrological model output

Abstract. Statistical methods traditionally applied for seasonal streamflow forecasting use predictors that represent the initial catchment condition and future climate influences on future streamflows. Observations of antecedent streamflows or rainfall commonly used to represent the initial catchment conditions are surrogates for the true source of predictability and can potentially have limitations. This study investigates a hybrid seasonal forecasting system that uses the simulations from a dynamic hydrological model as a predictor to represent the initial catchment condition in a statistical seasonal forecasting method. We compare the skill and reliability of forecasts made using the hybrid forecasting approach to those made using the existing operational practice of the Australian Bureau of Meteorology for 21 catchments in eastern Australia. We investigate the reasons for differences. In general, the hybrid forecasting system produces forecasts that are more skilful than the existing operational practice and as reliable. The greatest increases in forecast skill tend to be (1) when the catchment is wetting up but antecedent streamflows have not responded to antecedent rainfall, (2) when the catchment is drying and the dominant source of antecedent streamflow is in transition between surface runoff and base flow, and (3) when the initial catchment condition is near saturation intermittently throughout the historical record.


Introduction
Forecasts of streamflows for a range of forecast periods and lead times are valuable to many users, including emergency services, hydroelectricity generators, irrigators, rural and urban water supply authorities and environmental managers.
Forecasts of seasonal streamflows can inform tactical management of water resources, allowing water users and managers to plan operational water management decisions and assess the risks of alternative water use and management strategies.To be useful to water users and managers in assessing risks, seasonal streamflow forecasts need to be accurate and reliably quantify forecast uncertainty.
Statistical methods are commonly used for operational seasonal streamflow forecasting around the world, due to their robustness and ability to reliably quantify forecast uncertainty (Plummer et al., 2009;Robertson and Wang, 2012;Garen, 1992;Pagano et al., 2009).Statistical streamflow forecasting methods use predictors that describe the two sources of seasonal streamflow predictability, the initial catchment condition and future climate influences (Robertson and Wang, 2012;Rosenberg et al., 2011).Climate indices, such as the Southern Oscillation Index or Indian Ocean Dipole Mode Index, are commonly used to represent the influence of future climate on streamflows (Robertson and Wang, 2012).The initial catchment condition is represented by observations of antecedent streamflow, antecedent rainfall or, in cold climates, snow water equivalent, depth or extent (Robertson and Wang, 2012;Garen, 1992).In all cases, the predictors used are simple indices that act as surrogates for the true source of predictability in a statistical model.
Antecedent streamflow or rainfall totals can be crude surrogate indicators of the initial catchment condition.Robertson and Wang (2012) found that a single predictor, selected from a pool of candidates that included antecedent streamflow and rainfall totals for up to the preceding three months, was sufficient to characterise the initial catchment conditions in the majority of locations and seasons.However, under some circumstances a second predictor could add additional Published by Copernicus Publications on behalf of the European Geosciences Union.

D. E. Robertson et al.: Improving statistical forecasts of seasonal streamflows
independent information on the initial catchment condition.They concluded that more refined indicators of the initial catchment conditions that better represent catchment dynamics could improve forecast skill.
Antecedent streamflow or rainfall totals are limited in their ability to provide a refined index describing initial catchment conditions for several reasons.Conceptually, catchment soil moisture and groundwater storages have upper and lower bounds.When these storages are full, streamflows (and rainfall) can continue to increase to levels beyond those that reflect catchment moisture storage.Therefore, when observed antecedent streamflow is very high, subsequent streamflow forecasts may be considerably higher than the actual soil moisture or groundwater storage levels would cause.The dynamics of rainfall-runoff processes can also lead to antecedent streamflow or rainfall being a poor indicator of the initial catchment condition.When a catchment is wetting up, antecedent streamflows do not immediately respond to antecedent rainfall, but rather soil moisture and groundwater storages are replenished first.In this circumstance, antecedent streamflows can potentially underestimate the actual soil moisture conditions and lead to forecasts that are too low.
Another limitation of using antecedent streamflow and rainfall totals as indicators of the initial catchment condition arises because the performance of forecasts made using a particular indicator as a predictor varies considerably in space and time (Robertson and Wang, 2012).Therefore, it is necessary to choose which indicator to use for any location or season.Any method of choosing predictors that is based on the predictive performance of candidate predictors has the potential to introduce artificial skill, where the skill of retrospective forecasts is higher than the skill realised in real-time applications (Robertson and Wang, 2012;Michaelsen, 1987;DelSole and Shukla, 2009).Introduction of artificial skill can be prevented by choosing a single set of predictors a priori for all locations and seasons, and therefore it is desirable to eliminate predictor selection processes.However, the challenge is to identify a set of predictors that can perform as well as or better than selected predictors.
The use of dynamic hydrological models for seasonal streamflow forecasting has been investigated and adopted to overcome some of the limitations of statistical forecasting techniques (for example, Bierkens and van Beek, 2009;Koster et al., 2010;Wood and Schaake, 2008).Hydrological models describe the processes by which precipitation is converted into streamflow and in doing so explicitly represent catchment soil moisture and groundwater storages as state variables.Therefore, hydrological models can capture catchment dynamics that the simple indices used in statistical models cannot.When used in forecasting mode, the condition of model state variables is initialised by running the model using observed forcing data up to the forecast date.A streamflow forecast is then produced by forcing the model with forecasts of rainfall and other forcing variables.Future rainfall is highly uncertain and difficult to accurately forecast, and therefore several sources of future rainfall have been investigated, including conditional and unconditional historical climate sequences and output from seasonal climate forecasting models (Bierkens and van Beek, 2009;Wood et al., 2005).While these forecasts are derived from understanding of the hydrological processes occurring in the catchment, in many instances the direct forecasts from hydrological models are biased and do not reliably quantify forecast uncertainty (Shi et al., 2008;Wood and Schaake, 2008).
Both statistical and dynamical streamflow forecasting methods appear to have strengths and weaknesses.Recently, Rosenberg et al. (2011) investigated the benefits of a hybrid seasonal forecasting system that uses the output from a physically based hydrological model as predictors in a statistical forecasting method in a climate where snow melt is the dominant source of streamflow.They showed that by using simulations of snow water equivalent instead of observations as predictors that the skill of seasonal streamflow forecasts could be enhanced.The skill improvements were attributed to the simulations capturing the spatial and temporal variation in snow water equivalent better than the few sites that provide ground-based observations.This paper also investigates a hybrid seasonal forecasting system, but in contrast to Rosenberg et al. (2011) we consider the problem in environments where snow melt is not an important source of streamflow.We investigate how the output of a dynamic hydrological model can be used to improve the representation of initial catchment conditions for statistical streamflow forecasting and reduce artificial skill.We produce forecasts of three month streamflow totals with the Bayesian joint probability (BJP) modelling approach (Wang and Robertson, 2011;Wang et al., 2009) using two alternative sets of predictors to represent initial catchment conditions.The first set of predictors represents the operational practice by the Bureau of Meteorology in Australia, where the predictor with the highest Pseudo Bayes factor is selected from a pool of candidates comprising antecedent streamflow and rainfall totals for up to the preceding three months (Robertson and Wang, 2012).The second set of predictors is defined a priori and uses simulations from a hydrological model that represents only the influence of initial catchment condition of streamflows for the forecast period.We compare the skill and reliability of these forecasts for 21 catchments in eastern Australia and discuss the mechanisms by which the forecast performance is improved.

Hydrological modelling
For this study, we use a hydrological model to produce simulations that represent only the influence of initial catchment conditions on seasonal streamflows.Hydrological modelling Hydrol.Earth Syst.Sci., 17, 579-593, 2013 www.hydrol-earth-syst-sci.net/17/579/2013/ is undertaken using WAPABA, a monthly water partition and balance model with two conceptual storages and five model parameters.WAPABA uses consumption curves to partition water according to supply and demand, which allow for spatial and temporal heterogeneity of catchment process.WA-PABA has been shown to out-perform other monthly models in Australia and simulate monthly streamflow volumes as well as daily models forced with daily data (Wang et al., 2011).
The WAPABA model parameters are calibrated by maximising a multi-objective function modified from Zhang et al. (2008).The model fit is evaluated using a uniformly weighted average of the Nash-Sutcliffe efficiency coefficient (Nash and Sutcliffe, 1970), the Nash-Sutcliffe efficiency of the log transformed flows, the Pearson correlation coefficient and a symmetric measure of bias.Model calibration is performed using the Shuffled Complex Evolution algorithm (Duan et al., 1994).
Using calibrated model parameters, simulations are produced that represent only the initial catchment conditions influence on streamflow totals of the next three months.For a given date of interest, these simulations are obtained by running the model from the start of the historical record to the date of interest using observed forcing data, to initialise the model state variables, and then simulating streamflows for the subsequent three months using monthly climatology mean forcing data.A time series of these simulations of three month streamflow totals was produced by repeating the process for all months in the historical record.Using this approach, variation in the simulated three month streamflow totals for a given month is solely due to differences in the initial conditions of the soil moisture and groundwater storages and not related to variation in the climate forcing.Alternatives to using the monthly climatology mean forcing data were investigated, such as the climatology median forcing data and the mean and median of streamflow ensembles produced using all historically observed forcing data, but lead to final results that are no different to using the climatology mean forcing data.

Statistical streamflow forecasting
We use the Bayesian joint probability (BJP) modelling approach (Wang and Robertson, 2011;Wang et al., 2009) to produce joint forecasts of three month streamflow and rainfall totals.The BJP modelling approach assumes the joint distribution of forecast variables and their predictors is described by a transformed multivariate normal distribution.A Yeo-Johnson transformation is for variables defined over the entire real space, while a log-sinh transformation (Wang et al., 2012) is used for variables that are defined for real values greater or equal to zero, for example streamflows or rainfall.Model parameters, including transformation parameters and reparameterisations of the means, variances and correlation coefficients of the multivariate normal distribution are inferred using Bayesian methods.
In this study, we primarily compare statistical streamflow forecasts made using two sets of predictors.The first set of predictors represents the existing operational practice of the Bureau of Meteorology in Australia.Predictors representing initial catchment conditions and future climate influences on streamflows are selected separately using the procedure described by Robertson and Wang (2012).The performance of a range of candidate predictors is assessed using the Pseudo Bayes factor (PsBF), a Bayes factor based on the crossvalidation predictive density.The candidate predictor with the highest PsBF is selected, provided that the highest PsBF value is greater than a threshold value which can be produced using randomised predictor data.Imposing the threshold of the predictor selection reduces the likelihood of choosing a predictor due to chance features in the historical data.Predictors representing the initial catchment condition are chosen from a pool that includes monthly antecedent streamflow and rainfall totals for up to the preceding three months and these are selected on their ability to forecast three month streamflow totals.Predictors representing climate during the forecast period are selected from a pool of 13 monthly climate indices lagged by up to three months and these are selected on their ability to forecast three month rainfall totals.At most two predictors are selected, one to represent the initial catchment condition and one to represent the climate during the forecast period.Forecasts of three month totals of streamflow and catchment average rainfall are made jointly.Separate models are established for each season and location to allow for inter-annual variations in climate and hydrological processes.
The second set of predictors used to make forecasts for this study replaces the selected predictors representing the initial catchment condition with a fixed set of the WAPABA simulations described in the previous section and total streamflow for the month preceding the forecast (lag-1 streamflow).The previous month's streamflow is included as a form of model updating to provide a real-time measure of the 'true' condition of the catchment leading up to the forecast.The selected predictors representing climate during the forecast period are the same as in the first set of predictors.

Cross validation for assessment of forecast performance
The hydrological and statistical modelling processes described in the preceding sections require observations to infer model parameters.For real-time forecasting applications, all available historical data of the appropriate quality may be used to infer model parameters.However if these model parameters are to be used to assess the performance of the forecasting methods for historical events, the forecast skill and reliability will be inflated.Therefore, it is necessary to assess forecast performance on data that has not been used for parameter inference and predictor selection.Traditionally, the skill of statistical forecasting models is assessed using leave-one-out cross validation and this provides a realistic assessment of performance because the temporal sequence of data records is not preserved in model parameter inference.However, in this study we are also using a hydrological model which preserves and uses the temporal sequence of data records in model parameter inference, due to the presence of state variables in the model which carry information from one time step to the next.Therefore, forecast performance measures assessed using leave-one-out cross validation may be artificially inflated, because forecasts may not be independent of the data used for parameter inference.To limit this inflation of forecast performance measures, we adopt a leave-one-plus-x-years-out cross validation approach.Ideally, the value of x is as small as possible to allow the data use to infer model parameters to reflect operational conditions in terms of available data length, while it needs to be sufficiently long to minimise any artificial inflation of forecast performance measures.For this study, we adopt leave-one-plus-four-years-out cross validation to assess forecast performance.
To make a cross-validation forecast for a year of interest, model parameter inferences were based on all historical data with the exception of the forecast year of interest and the four subsequent years.Hydrological model parameters were obtained by running the model for the entire record using all available forcing data, but omitting the observed streamflows for the year of interest and four subsequent years in the evaluation of the objective function.Simulations representing the initial catchment condition were produced for all years in the historical record and used in the statistical model to produce a forecast for the year of interest.
The selected predictors used in the statistical models were also cross-validated.The predictors for the year of interest were selected using the PsBF computed using all historical forecasts, except the year of interest and the four subsequent years.The selected predictors representing initial catchment conditions used to produce forecasts for each location, season and year are summarised in the Supplement.Once model predictors and parameters were obtained, forecasts were made for the year of interest only and the process was repeated for all years in the historical record.

Forecast performance measures
There are many ways to assess the performance of streamflow forecasts.We assess the skill and reliability of the cross validation forecasts.Forecast skill is a measure of the quality of a set of forecasts relative to a baseline or reference set of forecasts (Jolliffe and Stephenson, 2003).We use skill scores that assess the percentage reduction in forecast error scores relative to the error scores of a reference forecast.This means that forecasts with a positive skill score are better than the reference, while forecasts with a negative skill score have greater errors than the reference.For this study we assess forecast error using two scores; the root mean squared error in probability of the forecast median (RMSEP) (Wang and Robertson, 2011) and the continuous ranked probability score (CRPS).The reference forecasts used to compute the skill scores are the cross-validation distribution of historically observed (climatology) streamflows.The two skill scores adopted assess different aspects of the forecast distribution.The CRPS skill score assesses the reduction in error of the whole forecast probability distribution, and can be sensitive to a few forecasts with large errors.The RMSEP skill score is less sensitive to forecasts with large errors, provided the anomaly is in the correct direction, and only considers the median of the forecast distribution.
Forecast reliability measures assess the statistical consistency of the forecast probability distributions and the observed frequency of associated events (Toth et al., 2003).For this study we use histograms of probability integral transforms (PIT) to assess the average reliability of the forecast probability distributions for all locations and seasons.

Catchments and data
In this study, we investigate the performance of forecasts made using the two different sets of predictors representing the initial catchment condition for 21 catchments in eastern Australia that experience a range of climatic and hydrological conditions (Figs. 1, 2 and Table 1).We use the observed monthly streamflow data obtained from various water resource management agencies and the Bureau of Meteorology.For most catchments, with the exception of some in Queensland and Victoria, the data are available from 1950 to 2008 (see Table 1).The monthly catchment average rainfall and potential evapotranspiration for each catchment are calculated from 5 km gridded data available from the Australian Water Availability Project (Jones et al., 2009).The monthly values of the 13 climate indices are obtained from the Bureau of Meteorology and described in Appendix A.

Forecast skill improvements
Forecasts made using WAPABA simulations and lag-1 streamflow as predictors to represent initial catchment conditions generally have greater skill than forecasts made using selected predictors (Fig. 3).The average forecast skill score for forecasts made using the WAPABA simulations and lag-1 streamflow as predictors is 2.7 % greater for both the RMSEP and CRPS skill scores than for forecasts made using the selected predictors.While this average improvement appears to be small, the range of changes in both skill scores extends from −10 % to +25 %.The greatest increases and decreases in forecast skill tend to occur for those locations and seasons  1).
where the skill of forecasts made using selected predictors is less than 10 %.
Figure 4 presents the increase in forecast skill that is achieved by replacing the selected predictors by the WA-PABA simulations and lag-1 streamflow arranged by catchment and season.Increases in both skill scores are most pronounced in the Queensland catchments (at the top of Fig. 4) and for the MJJ, JJA, NDJ and DJF seasons in the central Victorian and Upper Murray catchments.Decreases in forecast skill are most evident for the MAM and AMJ seasons in central Victorian and Upper Murray catchments.The seasons where there is the greatest increase in skill tend to be those that cover the steepest rise or fall of the annual hydrograph (see Fig. 2) and therefore this suggests that the selected predictors are unable to adequately capture the inter-annual variations in the dynamics of catchments wetting and drying.
Figure 5 presents the skill scores of cross validation forecasts by catchment and season.There is a distinct seasonal pattern in forecast skill, particularly for the catchments in the Upper Murray, central and southern Victoria.The highest skill forecasts tend to be for seasons that cover the period when the annual hydrograph is falling, while the lowest skill forecasts tend to be for seasons that cover the period when the annual hydrograph is rising.The Tasmanian catchments have low forecast skill year round because these catchments tend to remain near saturation all year, streamflows are strongly related to concurrent rainfall, and seasonal rainfall is difficult to forecast.The Queensland catchments are in tropical and sub-tropical environments with pronounced wet and dry seasons.Forecast skill tends to be highest for the dry seasons and lowest for seasons covering the wettest months (November to March).Skill also tends to be low for seasons and catchments where frequently rivers cease to flow.In these circumstances the forecast error, particularly the RMSEP error, of a climatology forecast is small and therefore it is difficult to further reduce forecast error.

Forecast reliability
Replacing the selected predictors representing initial catchment conditions with WAPABA simulations and lag-1 streamflow produced little change in the reliability of streamflow forecasts.Figure 6 presents histograms of the PIT values for forecasts of made using both sets of predictors.The differences between the two histograms are small and the general pattern of the histograms is similar.Perfectly reliable forecasts will produce a PIT histogram that is a uniform distribution.Figure 6 suggests that when viewed collectively the forecasts are not necessarily reliable, with the most obvious deviations from uniformity occurring in the highest and lowest bins of the histogram.However when the reliability is assessed for each season and catchment separately, deviations from uniformity are within the range expected by sample variability.

Reasons for improvements in forecast skill
The improvements in forecast skill occur under three main sets of conditions: (1) when the catchment is wetting up but antecedent streamflows have not responded to antecedent rainfall; (2) when the catchment is drying and the dominant source of antecedent streamflow is in transition between surface runoff and base flow; and (3) when the initial catchment condition is near saturation intermittently throughout  the historical record.Here we examine some examples of how replacing the selected predictors with WAPABA simulations and lag-1 streamflows influences cross-validation forecasts and improves forecast skill.

When the catchment is wetting up
The selected predictors are either antecedent streamflow or rainfall totals for the previous month.When the catchment is wetting up, antecedent streamflows are primarily base flow and have not necessarily responded to antecedent rainfall.Therefore, antecedent streamflow does not necessarily represent the wetness of the catchment well.Where antecedent rainfall totals have been insufficient to saturate the catchment, they will primarily reflect the surface moisture conditions of the catchment.Figure 7 provides an example of forecasts made for this situation for March-April-May forecasts for Kiewa River inflows to the Murray River.For this example, replacing the selected predictor representing initial catchment conditions with WAPABA simulations and lag-1 streamflow increases the RMSEP skill score from 9 % to 18 % and the CRPS skill score from 1 % to 10 %.The selected predictor representing  initial catchment conditions is predominantly total streamflow for January and February, which will provided an indication of base flow conditions.Overall the forecast quantile ranges for a given forecast median are similar using both sets of predictors, however forecast medians are rearranged.By replacing the selected predictors with WAPABA simulations and lag-1 streamflow the forecast error is reduced, particularly for forecasts associated with observations in the upper and lower quartiles of the historical distribution (light grey shade in Fig. 6).The primary reason for the difference between the forecasts produced using the two sets of predictors is due to the WAPABA simulations being more strongly correlated to streamflows during the forecast period than any of the candidate predictors representing initial catchment conditions used for predictor selection.The correlation between the WAPABA simulations and streamflows during the forecast period is also better preserved for independent cross validation forecasts.This suggests that the WAPABA simulations provide a better representation of the process of catchment wetting up than any of the candidates.

When the catchment is drying out
When the catchment is drying out, antecedent streamflows may be dominated by direct surface runoff if there has been recent rain, or by base flows if there has not been.Total monthly streamflows of similar magnitude can be produced by both sources and therefore antecedent streamflow may not necessarily provide the best indicator of the wetness of a catchment.Figure 8 provides an example of forecasts made for this situation of November-December-January forecasts for inflows into Dartmouth Reservoir.For this example, replacing the selected predictor representing initial catchment conditions with WAPABA simulations and lag-1 streamflow increases the RMSEP skill score from 19 % to 31 % and the CRPS skill score from 17 % to 28 %.The selected predictor representing initial catchment conditions is predominantly total streamflow for September and October.Like the previous example, when the catchment is wetting up, the forecast quantile ranges for a given forecast median are similar using both sets of predictors and the forecast medians are rearranged.However, in contrast to the previous example, the skill gains are achieved by reducing the errors of the median of forecasts with corresponding observations in the central quartiles (mid-gray shade in Fig. 7) of the historical observations rather than in the outer quartiles.It is for these moderate seasonal flow totals that antecedent streamflows could be sourced from either surface runoff or base flow and the WAPABA simulations can distinguish the dominant source, whereas the candidates used in predictor selection cannot.As with the previous example the WAPABA simulations are more strongly correlated to streamflows during the forecast period than any of the candidate predictors representing initial catchment conditions used in the predictor selection.

When the catchment is intermittently saturated
The soil moisture and groundwater stores of a catchment are bounded, that is, soil can become saturated and groundwater water tables can approach the surface.However, the antecedent streamflow and rainfall totals used as candidate indicators of the catchment condition in predictor selection are theoretically unbounded, that is, they continue to increase when the soil moisture and groundwater stores are full.When the soil in a catchment is saturated and groundwater stores are near capacity in the month preceding a forecast, antecedent streamflow and rainfall are poor indicators of the condition.For a given forecast period, a catchment may be saturated consistently or intermittently throughout the historical record.For much of the year, the Tasmanian catchments considered in this study provide examples of consistently saturated catchment conditions throughout the historical record.For these locations and seasons, the forecast skill is close to zero and replacing the selected predictors with WAPABA simulations and lag-1 streamflow results in little change in forecast skill.
When the catchment is intermittently saturated through the historical record, replacing the selected predictors with WA-PABA simulations and lag-1 streamflows can improve the skill of streamflow forecasts.Figure 9 provides an example of when the catchment is intermittently saturated through the historical record using forecasts of July-August-September inflows into Upper Yarra Reservoir.For this example, replacing the selected predictor representing initial catchment conditions with WAPABA simulations and lag-1 streamflow increases the RMSEP skill score from 5 % to 17 % and the CRPS skill score from 3 % to 12 %.The selected predictor representing initial catchment conditions is predominantly total streamflow for April, May and June.Like the previous examples, the forecast quantile ranges for a given forecast median are similar (using both sets of predictors) and the forecast medians are rearranged to more closely match to the observations.The skill gains are achieved by reducing errors in the forecast median for all forecasts throughout the entire range of historical observations.This is due to streamflows during the forecast period being more strongly correlated with the WAPABA simulations than any of the candidate indicators of initial catchment conditions considered in the predictor selection process.
The relationship between streamflow during the forecast period and the WAPABA simulations is approximately linear (Fig. 10).The relationship between streamflows during the forecast period and other variables used as candidates for predictor selection appears linear for low values of the candidate predictor and deviate from linearity above a threshold value.This two-part relationship suggests that for low values the candidate predictors are reasonable indicators of the initial catchment conditions, but at higher values they are not.Examining the initialised state variables of WAPABA used in producing the simulations that represent initial catchment conditions suggests that when the antecedent streamflows exceed this threshold the soil moisture store is at or near capacity and the groundwater store level is very high.Therefore, the improvements in forecast skill arising from replacing the selected predictors with WAPABA simulations and lag-1 streamflow can be attributed to a better representation of the catchment process when the catchment intermittently becomes very wet.

Discussion
In this study, selected predictors representing initial catchment conditions were replaced by a combination of simulations from a dynamic hydrological model and lag-1 streamflows.Lag-1 streamflows were included as a form of model updating to provide a real-time measure of the actual catchment condition leading up to the forecast.However, as lag-1 streamflow is not always a good indicator of the catchment condition and its inclusion may moderate some of the benefit of using the WAPABA simulations.Figure 11 presents the increases in forecast skill arising from using lag-1 streamflows as well as WAPABA simulations to represent the initial catchment condition.In general, including lag-1 streamflow to provide a real-time measure of the actual catchment condition has little impact on the forecast skill.In some instances, including lag-1 streamflow increases forecast skill for a small number of seasons and locations, but most importantly it does not degrade forecast skill.Therefore, it appears that including lag-1 streamflows as a form of model update is appropriate.
For this study, we simulated the initial catchment influence on future streamflows by forcing initialised hydrological models with monthly climatology mean rainfall and potential evapotranspiration.Our motivation for using simulated streamflows, rather than the state variables from WA-PABA, was because the simulated streamflows integrate the condition of both the model soil moisture and groundwater stores.The relative influence of soil moisture and ground water levels on seasonal streamflows varies with forecast date.Examples of this seasonally varying relationship are shown in Figs. 13 and 14.For forecasts made at the start of May, the May-June-July streamflow totals are more highly correlated with the groundwater storage levels than the soil moisture, or lag-1 streamflow (Fig. 13).For forecasts made at the start of November, November-December-January streamflow totals are more highly correlated with soil moisture than groundwater storage levels, or lag-1 streamflow (Fig. 14).In both instances, the correlation between the WAPABA simulations and the seasonal streamflow totals are comparable to the better of the two state variables.Therefore, the WAPABA simulations appear to be robust representations of the integrated condition of both the soil moisture and groundwater stores at the forecast time.
For the majority of catchments and seasons, the skill of forecasts is due to the knowledge of the initial catchment condition.Figure 12 illustrates the contribution of the climate indices to forecast skill.When the points in Fig. 12 are located on the 1 : 1 line the climate indices make no contribution to the skill of streamflow forecasts, while points below the 1 : 1 line suggest that climate indices improve forecast skill.The contribution of climate indices to streamflow forecast skill tends to be largest for catchments in Queensland, where there is the strongest evidence for using climate indices to forecast seasonal rainfall (Schepen et al., 2012).
The points above the 1 : 1 line in Fig. 12 indicate that forecasts made without selected climate indices are more skilful that those made with climate indices.This suggests that while there is evidence for using climate indices to forecast rainfall during a fitting period, the fitted relationship does not perform well for independent forecasts.The approach to assessing forecast skill used in this paper is designed to expose circumstances where this occurs and assess the true skill of the predictor selection and forecasting approaches by using cross validated predictors as well as cross-validated model parameters.Where the predictors are not cross validated, it is likely that the reported forecast skill is artificially inflated and will not be maintained in operational applications (Michaelsen, 1987;DelSole and Shukla, 2009).
For this study, we adopted leave-one-plus-four-years-out cross validation to assess forecast performance.We adopted  this approach to limit the potential for forecast performance to be artificially inflated due to the state variables in WA-PABA carrying information from one time step to the next and resulting in forecasts that are not independent of data used for parameter inference.We tested the assumption that leave-one-plus-four-years-out was sufficient to create independent forecasts by also assessing forecast skill using leave-one-plus-one-years-out and leave-one-plus-nineyears-out.In the assessment, we fixed the climate predictors so that variations in the forecast performance measures were solely due to the different periods omitted from the data used for parameter inference.The differences between the forecast skill scores produced using the different cross validation methods tended to be within the range of sample variability (not shown).Where there were differences there was no clear pattern to the best or worst performing cross validation approach and therefore the adopted approach appears appropriate.
The climate indices used to represent the future climate influences on streamflow in this study are surrogates for the true source of climate predictability.The true source of climate predictability arises from understanding of the initial conditions of the ocean, atmosphere and land surface and the processes by which these conditions evolve and interact.Many dynamic coupled ocean-atmosphere models have been developed to produce seasonal climate forecasts (for example: Alves et al., 2002).These models simulate the dynamic evolution of chaotic ocean and atmospheric processes from estimates of the ocean, atmosphere and land surface initial conditions.Forecasts of rainfall, or other atmospheric variables, produced by these models may provide better indicators of future climate influences on seasonal streamflows than simple climate indices because they integrate a wide range of initial conditions.They also provide the opportunity for the use of concurrent relationships, which tend to be stronger than lagged relationships.However, comprehensive analysis of dynamic climate model output is necessary to better understand the quality of the forecasts and which variables are useful for streamflow forecasting.Future work will investigate using forecasts from dynamic climate models for seasonal forecasting of streamflows in Australia using statistical models and rainfall-runoff models.
WAPABA simulates monthly streamflow totals in validation periods using monthly forcing data, as well as daily rainfall-runoffs models forced with daily data (Wang et al., 2011).However, the skill of raw WAPABA simulations representing the initial catchment condition was considerably poorer than the forecasts resulting from using the WAPABA simulations as a predictor in the BJP modelling approach.The poor skill of the raw WAPABA simulations representing initial catchment condition is primarily due to variation in seasonal biases than overall forecast performance measure do not diagnose (not shown).The BJP modelling approach was able to extract information from the biased WAPABA simulations and produce skilful forecasts with minimal biases.The water balance model used in this study is a relatively simple, lumped monthly model.Situations may exist where such a model may not necessarily provide sufficient spatial, temporal or process resolution to adequately describe the catchment condition at the forecast time.In these situations, more sophisticated models may be warranted to describe the catchment conditions.Simulations from more sophisticated models can also be included as predictors in the BJP modelling approach using the process described in this paper.

Conclusions
Forecasts of seasonal streamflows are valuable to a wide range of users.Traditionally, these forecasts are produced using statistical methods with observations of antecedent streamflows or rainfall as predictors to represent the condition of the catchment at the forecast date and with climate indices as predictors to represent the influence of future climate.These predictors are surrogates for the true source of predictability and can potentially have limitations.Dynamic hydrological models have also been used for streamflow forecasting, but often require statistical post-processing to remove biases and correct the reliability of forecast probability distributions.This study has investigated whether a hybrid seasonal forecasting system that uses the output of a dynamic hydrological model as a predictor in a statistical forecasting approach can lead to more skilful forecasts.Forecasts of three month streamflow totals were made using two alternative sets of predictors to represent initial catchment conditions: predictors selected using the method employed in the operational practice by the Bureau of Meteorology in Australia; and the combination of simulations from a monthly water balance model that represents the influence of initial catchment condition of streamflows and lag-1 streamflow.The skill and reliability of streamflow forecasts made using these sets of predictors were compared for 21 catchments in eastern Australia and insights into the reasons for any differences investigated.
In general, replacing selected predictors representing the initial catchment condition with simulations from a monthly balance model and lag-1 streamflow increases the forecast skill and has little impact on forecast reliability.The magnitude of the skill increases varies with location and season.The greatest increases in forecast skill tend to be for three sets of circumstances: (1) when the catchment is wetting up but antecedent streamflows have not responded to antecedent rainfall; (2) when the catchment is drying and the dominant source of antecedent streamflow is in transition between surface runoff and base flow; and (3) when the initial catchment condition is near saturation intermittently throughout the historical record.There is little change in forecast skill for catchments and seasons that are very dry or consistently saturated throughout the historical record.Even with the skill improvements realised by replacing the selected predictors, the skill of streamflow forecasts tends to be the highest for seasons that include the falling limb of the annual hydrograph, when seasonal streamflows are strongly related to the initial catchment condition.The skill tends to be the lowest for seasons that include the rising limb, when seasonal streamflows are strongly related to concurrent rainfall.In general the contribution of climate indices used to represent the influence of future climate to forecast skill is small but comparable to that of forecasts of seasonal rainfall.Future work will investigate how using the output of dynamic climate models may improve this situation.
Lag-1 streamflow was included as a predictor in addition to the monthly water balance simulations as a form of model updating to provide a real-time measure of the catchment condition.In general, it contributes little to forecast skill, but for some seasons and location skill increases of up to 20% are realised by its inclusion.Most importantly, including lag-1 streamflow does not degrade forecast skill and therefore can be confidently included as a predictor for operational forecasts.The use of a more sophisticated hydrological model with increased spatial, temporal or process resolution may reduce the need for model updating.The output of such a higher resolution hydrological model could be used as a predictor in the BJP modelling approach using the methods described in this paper.

Fig. 3 .
Fig. 3. Skill scores of forecasts made using WAPABA simulations and lag-1 streamflow as predictors plotted against skill scores of forecasts made using selected predictors for the RMSEP (left panel) and CRPS (right panel) skill scores.Each point represents the skill of forecasts for a single location and season.Points above the 1 : 1 line indicate improvements in forecast skill.(Green points are catchments in Queensland, red points are catchments in Tasmania, hollow blue circles tributaries to the upper Murray River, light blue are catchments in central Victoria and dark blue are catchments in southern Victoria).

Fig. 4 .
Fig. 4. Increase in skill scores of forecasts achieved by replacing selected predictors representing initial catchment conditions with WAPABA simulations and lag-1 streamflow.

D
Fig. 5. Skill scores of the cross-validation forecasts made using WAPABA simulations and lag-1 streamflow as predictors.

Fig. 6 .
Fig. 6.Probability Integral Transform histograms illustrating the reliability of forecasts made using selected predictors (solid grey bars) and WAPABA and lag-1 streamflow as predictors (hatched bars).

Fig. 10 .
Fig. 10.Relationship between predictors representing initial catchment conditions and seasonal streamflow totals for July-August-September inflows into Upper Yarra Reservoir.

Fig. 11 .
Fig. 11.Increase in skill scores of forecasts achieved by using lag-1 streamflow as well as WAPABA simulations to represent the initial catchment condition.

Fig. 12 .
Fig. 12.The contribution of selected climate indices to forecast skill illustrated by plotting skill scores of forecasts made without using climate indices as predictors against skill scores of forecasts made with selected climate indices as predictors for the RMSEP (left panel) and CRPS (right panel) skill scores.All forecasts use WAPABA simulations and lag-1 streamflows as predictors to represent the influence of initial catchment conditions.(Legend as per Fig. 3)

Fig. 13 .
Fig. 13.The relationship between May-June-July streamflow totals and the WAPABA state variables at the end of April.April streamflow and WAPABA simulations for May-June-July for Kiewa River inflows into the Murray River (Pearson correlation coefficients shown in top left corner of each plot).

Fig. 14 .
Fig. 14.The relationship between November-December-January streamflow totals and the WAPABA state variables at the end of October.October streamflow and WAPABA simulations for November-December-January for Kiewa River inflows into the Murray River (Pearson correlation coefficients shown in top left corner of each plot).

Table 1 .
Attributes of the 21 catchments used for the study.(Res.indicates Reservoir inflow, HES indicates inflow to hydroelectric scheme).
Fig. 2. Plot of streamflow seasonality for all catchments.(Mean annual flow is provided following the catchment name).