the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Bayesian modelling method for postprocessing daily subseasonal to seasonal rainfall forecasts from global climate models and evaluation for 12 Australian catchments
Andrew Schepen
Tongtiegang Zhao
Quan J. Wang
David E. Robertson
Rainfall forecasts are an integral part of hydrological forecasting systems at subseasonal to seasonal timescales. In seasonal forecasting, global climate models (GCMs) are now the goto source for rainfall forecasts. For hydrological applications however, GCM forecasts are often biased and unreliable in uncertainty spread, and calibration is therefore required before use. There are sophisticated statistical techniques for calibrating monthly and seasonal aggregations of the forecasts. However, calibration of seasonal forecasts at the daily time step typically uses very simple statistical methods or climate analogue methods. These methods generally lack the sophistication to achieve unbiased, reliable and coherent forecasts of daily amounts and seasonal accumulated totals. In this study, we propose and evaluate a Rainfall PostProcessing method for Seasonal forecasts (RPPS), which is based on the Bayesian joint probability modelling approach for calibrating daily forecasts and the Schaake Shuffle for connecting the daily ensemble members of different lead times. We apply the method to postprocess ACCESSS forecasts for 12 perennial and ephemeral catchments across Australia and for 12 initialisation dates. RPPS significantly reduces bias in raw forecasts and improves both skill and reliability. RPPS forecasts are also more skilful and reliable than forecasts derived from ACCESSS forecasts that have been postprocessed using quantile mapping, especially for monthly and seasonal accumulations. Several opportunities to improve the robustness and skill of RPPS are identified. The new RPPS postprocessed forecasts will be used in ensemble subseasonal to seasonal streamflow applications.
Rainfall forecasts are an integral part of hydrological forecasting systems at subseasonal to seasonal timescales (Bennett et al., 2016; Crochemore et al., 2017; Wang et al., 2011). Inclusion of climate information in seasonal streamflow forecasts enhances streamflow predictability (Wood et al., 2016). One strategy for integrating climate information into hydrological models is to conditionally resample historical rainfall (e.g. Beckers et al., 2016; Wang et al., 2011). An alternative approach is to use rainfall forecasts from dynamical climate models.
On one hand, ensemble rainfall forecasts from GCMs (global climate models) are attractive for hydrological prediction in that they forecast multiple seasons ahead and have a wellestablished spatial and temporal forecast structure. On the other hand, a major issue with GCM forecasts at subseasonal to seasonal timescales is that the forecasts are often biased and lacking in predictability of local climate (e.g. Kim et al., 2012; Tian et al., 2017). It is therefore necessary to postprocess GCM rainfall forecasts using statistical or dynamical methods before they can be used in hydrological models (Yuan et al., 2015).
Several conceptually simple statistical correction methods are used for directly postprocessing daily GCM rainfall forecasts including: additive bias correction, multiplicative bias correction and quantile mapping (Ines and Hansen, 2006). Crochemore et al. (2016), for example, recently evaluated linear scaling and quantile mapping for postprocessing ECMWF System 4 rainfall forecasts in France. Quantile mapping adjusts forecast means and ensemble spread, but it is not a full calibration method because it does not account for the correlation between forecasts and observations (Zhao et al., 2017). It is useful for bias correction of climate change projections where a full statistical calibration is inappropriate (Teutschbein and Seibert, 2012). Since additive and multiplicative bias correction and quantile mapping methods do not account for intrinsic GCM skill they are ineffective to use as postprocessing tools when GCM forecasts are unskilful.
Postprocessing methods that take model skill into account typically fall under the model output statistics (MOS; Glahn and Lowry, 1972) banner. MOS type approaches are well established in the weather forecasting community for shortterm forecasting and modern variants are normally probabilistic. Wilks and Hamill (2007), for example, studied ensemble MOS approaches for postprocessing global forecast system (GFS) forecasts of rainfall and temperature up to 14 days ahead.
MOS methods can also be thought of as full calibration methods. In this regard, several Bayesian calibration approaches are known to be effective at postprocessing GCM rainfall totals aggregated to monthly and seasonal timescales (Hawthorne et al., 2013; Luo et al., 2007; Schepen et al., 2014). However, it is apparent that full calibration methods are not normally applied to the postprocessing of daily GCM forecasts of rainfall in the subseasonal to seasonal period.
That said, some studies have explored more sophisticated methods for postprocessing daily rainfall forecasts from GCMs. An example of such a study is that undertaken by Pineda and Willems (2016), which applied a nonhomogenous hidden Markov model (NHHMM) to forecasts in the northwestern region of South America. Their method extracted information from GCM forecasts of rainfall fields and sea surface temperatures. The study also built on the NHHMM method originally developed by Robertson et al. (2004) for prediction of rainfall occurrence in Brazil. In Australia, Shao and Li (2013) and Charles et al. (2013) applied an analogue downscaling method (Timbal and McAvaney, 2001) to produce downscaled daily rainfall forecasts from POAMA (the Predictive Ocean Atmosphere Model for Australia). The NHHMM and analogue methods are not straightforward to apply operationally, as they require the identification of optimal climate predictors in different climatic regions. The methods do not, by design, lead to forecasts that are always reliable in ensemble spread and at least not worse than climatology forecasts.
Statistical postprocessing of daily rainfall forecasts is a formidable challenge, perhaps explaining the lack of sophisticated methods. Barriers include: short GCM hindcast records, a high prevalence of zero rainfall amounts, seasonal variations in rainfall patterns, and intrinsically low GCM skill. Amplifying the challenge is that GCM skill decays rapidly as the lead time increases. Lavers et al. (2009), for example, examined temperature and rainfall forecasts using DEMETER and CFS GCMs, and found that, in an idealised scenario, skill in the first 30 days is primarily attributable to skill in the first 15 days and much less to skill over the next 15 days. Postprocessing methods should therefore be designed to capture as much skill as possible in the first fortnight and take GCM skill into account when postprocessing forecasts further ahead.
In this study, we seek to develop a new, more direct daily rainfall postprocessing method that operates solely on rainfall output from GCMs and provides a full forecast calibration taking GCM skill into account. We build a new subseasonal to seasonal rainfall postprocessor, which has some similarities to the rainfall forecast postprocessor (RPP) developed by Robertson et al. (2013b) and Shrestha et al. (2015) for postprocessing numerical weather prediction (NWP) forecasts for shortterm streamflow forecasting. The new system is hereafter referred to as RPPS, which stands for rainfall forecast postprocessor – seasonal.
The proposed RPPS method applies the Bayesian joint probability (BJP) modelling approach to postprocess daily GCM forecasts of rainfall. BJP has never been used in this situation before and it is therefore important to fully evaluate the merits of BJP as a component of a daily forecast postprocessing system. As GCMs can produce ensemble forecasts for over 100 days ahead, RPPS is developed to generate daily rainfall amounts that aggregate to intraseasonal and seasonal totals. To this end, the Schaake Shuffle (Clark et al., 2004) is also included as a component of RPPS.
We apply RPPS to postprocess raw catchment rainfall forecasts from ACCESSS, Australia's new seasonal forecasting GCM. We evaluate bias, reliability and skill for 12 ephemeral and perennial catchments across Australia. RPPS catchment rainfall forecasts are compared to catchment rainfall forecasts derived from the Bureau of Meteorology's biascorrected product for ACCESSS. Opportunities to develop the method further are discussed.
2.1 ACCESSS rainfall forecasts
Raw, gridded GCM rainfall forecasts are obtained from the Bureau of Meteorology (BOM's) new ACCESSS forecasting system. Raw catchment rainfall forecasts are derived through a process of areaweighted averaging of the ensemble mean.
ACCESSS is a customised version of the UK Met Office's seasonal forecasting system. It contains a fully coupled model representing the interactions among the Earth's atmosphere, oceans and land surface, including sea ice. The current horizontal spatial resolution of the ACCESSS atmospheric model is approximately 60 km in the midlatitudes. Full details of the current system are provided by Hudson et al. (2017).
Available ACCESSS hindcasts are initialised at midnight UTC on days 1 and 25 of each month (additional start dates will become available in the future). A burst ensemble comprising 11 ensemble members is generated. Hindcasts are available for the period 1990–2012 (23 years). A longer hindcast set is not possible because of a lack of suitable initial conditions. The next version of ACCESSS will be delivered with more sets of initial conditions and longer hindcasts.
2.2 Observed rainfall
Observed rainfall is derived from the Australian Bureau of Meteorology's 5km daily rainfall analysis (AWAP). We make use of AWAP data from 1950 onwards. Catchment rainfall is derived through a process of areaweighted averaging.
An important note on time zones: Australian rainfall is recorded as 24 h totals to 09:00 LT (local time). Consequently, AWAP data are not perfectly synchronised with the ACCESSS forecasts, which are initialised at midnight UTC. The asynchronism is compounded by time zone differences and daylight savings.
We align the GCM data and observed data as best we can. For east coast locations, an ACCESSS forecast overlaps with the following day's AWAP rainfall analysis with a discrepancy of 1–2 h. The time discrepancy for west coast locations is approximately 2 h more.
2.3 Catchments
Twelve perennial and ephemeral catchments spread over Australia are selected for application and evaluation. Catchment information including name, gauge ID, regional location and size are shown in Table 1. Catchment locations and climate zones are mapped in Fig. 1. The catchments reside in highly distinct climate zones and vary markedly in size from 100 to 119 034 km^{2}. Evaluation across climate zones and for varying catchment sizes helps to comprehensively evaluate the effectiveness of the postprocessing methods.
3.1 Bayesian joint probability models
Our postprocessing method embeds the BJP modelling approach, which was originally designed for forecasting seasonal streamflow totals (Robertson et al., 2013a; Wang and Robertson, 2011; Wang et al., 2009). BJP has since been applied to calibrate hourly rainfall forecasts (Robertson et al., 2013b; Shrestha et al., 2015) and seasonal rainfall forecasts (Hawthorne et al., 2013; Khan et al., 2015; Peng et al., 2014; Schepen and Wang, 2014; Schepen et al., 2014). BJP was most recently adapted for subseasonal to seasonal streamflow forecasting (Schepen et al., 2016; Zhao et al., 2016).
BJP formulates a joint probability distribution to characterise the relationship between forecast ensemble means (predictors) and corresponding observations (predictands). The joint distribution is modelled as a bivariate normal distribution after transformation of the marginal distributions. Data transformation is handled using the flexible log −sinh transformation (Wang et al., 2012). The log −sinh transformation transforms y by
where α and β are transformation parameters. A predictor x is transformed to g. Likewise, a predictand y is transformed to h. The relationship between g and h is formulated by a bivariate normal distribution:
where
is the vector of means and
is the covariance matrix.
In forecasting mode, a BJP model is conditioned on new predictor values. For a particular parameter set, θ, and new transformed predictor value g_{new},
3.2 Daily rainfall postprocessing
We note some specifics of the BJP implementation used in this study. To allow the use of the continuous bivariate normal distribution, values of zero rainfall are treated as censored, meaning the true value is assumed to be less than or equal to zero. To simplify the inference of model parameters, data rescaling is used within the modelling process. To prevent undesirable extrapolation effects in forecasting mode, predictor values are limited to twice the maximum predictor value used to fit the model (future research will address this limitation).
Model parameters may be poorly estimated when there are insufficient nonzero data values used in data inference. We find that in very dry catchments, certain days of the year may have forecasts or observations with no nonzero values at all. In such cases, inference is not possible. To overcome this problem, predictor and predictand data sets are created by pooling data for nearby days and for multiple GCM initialisation dates. We choose to pool data within an 11day sliding window to be consistent with the Bureau of Meteorology's bias correction scheme (see Sect. 3.4); for the first 5 days the window is fixed to the first 11 days. However, the RPPS method places no restriction on the configuration, and alternative configurations are discussed in Sect. 5. We choose to pool data for ACCESSS hindcasts initialised on day 1 of the month and day 25 of the month prior. Predictor and predictand data are subsequently paired according to the number of days since forecast initialisation. Consider, for example, postprocessing of forecasts initialised on 1 February. Days 5–15 from the 1 February run and days 5–15 from the 25 January run are used to fit a model that is used to postprocess forecasts for 10 February.
We note that pooled data are not completely independent, meaning there is potential to underestimate parameter uncertainty using BJP. Nevertheless, we expect the effect to be limited by the weak predictor–predictand relationships and weak persistence in daily rainfall.
We now describe the model fitting and forecasting steps. After pooling the data, model fitting proceeds as follows:

Rescale predictor and predictand data so that each series ranges within [0, 5].

Estimate the log −sinh transformation parameters for the predictors using the maximum a posteriori (MAP) estimation.

Estimate the log −sinh transformation parameters for the predictands in the same way.

Apply the transformations to normalise the predictor and predictand data.

Apply the transformations to transform the predictor and predictand censoring thresholds.

Sample parameter sets representing the posterior distribution of the bivariate normal distribution parameters using Markov chain Monte Carlo sampling, the transformed data and transformed censoring thresholds.
Forecasting proceeds as follows:

Transform the predictor value using the log −sinh transformation for predictors.

Sample one h_{new} for each parameter set.

Backtransform the ensemble members using the transformation for predictands.

Rescale the ensemble members to the original space (opposite of step 1 in model fitting).

Set negative values to zero.
3.3 Forming ensemble timeseries forecasts
BJP forecast ensemble members are initially random and are not linked across days. To deal with this problem, we apply the Schaake Shuffle (Clark et al., 2004), which uses historical data to link ensemble members and create sequences with realistic temporal patterns. The Schaake Shuffle works by imposing the rank correlation of observations on the forecast ensembles, using the trajectories of historical observations as a dependence template. Several variations on the Schaake Shuffle exist and choices can be made about how to construct a temporal dependence template. Schefzik (2016), for example, propose selecting historical observations using a similarity criterion. The steps of the Schaake Shuffle as implemented in this study are as follows:

Select a large number of years from the past with the same month and day as the forecast initialisation date (excluding the date of the current forecast).

Construct a twodimensional dependence template by incrementing the dates by one day for each day ahead to be forecast.

Extract the observed data for each date in the dependence template.

For each day, reorder ensemble members according to the rank of the historical data for that day (e.g. if the second year selected has the smallest observed value, place the smallest ensemble member in second position).

Repeat steps (2)–(4) for blocks of ensemble members until the full ensemble has been shuffled.
After applying the Schaake Shuffle to all ensemble members across all lead times, the forecasts can be considered ensemble timeseries forecasts.
An example RPPS forecast for the BRS catchment is shown in Fig. 2. The top panel is the forecast of daily amounts and bottom panel is the forecast of accumulated totals. Hereafter, a daily amount is taken to mean a 24 h rainfall total on any given day. An accumulated total is the sum of daily rainfall amounts over a number of days. In this example, the forecast correctly predicts a dry beginning to the forecast period. The quantile ranges of the daily forecasts are reasonably consistent with the observed values. The accumulated forecast is somewhat narrower than the climatology reference forecast at monthly and seasonal timescales and the forecast is predicting a seasonal rainfall total less than the climatological median.
3.4 Verification
We use RPPS to postprocess all available ACCESSS forecasts initialised on day 1 of each month for 100 days ahead (the subseasonal to seasonal forecast period). For water resources management, whilst it is important that daily amounts are realistic, it is vital that the accumulated totals are as reliable and as skilful as possible. Forecasts are verified against both daily amounts and accumulated totals.
RPPS forecasts are generated using leaveoneyearout crossvalidation. RPPS forecasts are compared with the BOM's QM forecasts (see Sect. 3.5), which are also generated using leaveoneyearout crossvalidation.
Bias is recognised as the correspondence between the mean of forecasts and the mean of observations. Bias is visually assessed by plotting the bias for a set of events against the average forecast for the same set of events. We calculate bias as the mean error
where ${\stackrel{\mathrm{\u203e}}{y}}_{\mathrm{fcst},t}$ is the forecast ensemble mean for event t and y_{obs,t} is the corresponding observation. Bias is calculated separately for each catchment, initialisation date and day. The bias is calculated across 23 events. For a given lead time, we also calculate the average absolute bias (AB) across all 12 catchments and 12 initialisation dates. Notwithstanding that the bias is scale dependent, the average absolute bias is used to compare the magnitude of biases for different model forecasts and for different lead times.
Reliability is the statistical consistency of forecasts and observations – a reliable forecasting system will accurately estimate the likelihood of an event. Reliability is checked by analysing the forecast probability integral transforms (PITs) of rainfall observations. The PIT for a forecastobservation pair is defined by π = F(y_{obs}), where F is the forecast cumulative distribution function (CDF). In the case that y_{obs} = 0, a pseudoPIT value is sampled from a uniform distribution with a range [0, π] (Wang and Robertson, 2011). If a forecasting system is reliable, π follows a standard uniform distribution. Reliability can be visually examined by plotting the set of π_{t} (t = 1, 2, …, T) with the corresponding theoretical quantile of the uniform distribution using the PIT uniform probability plot (or simply PIT plot). A perfectly reliable forecast follows the 1 : 1 line. In this study, we do not plot individual PIT diagrams. Instead, reliability is summarised using the αindex (Renard et al., 2010)
where ${\mathit{\pi}}_{t}^{*}$ is the sorted π_{t} in increasing order. The αindex represents the total deviation of ${\mathit{\pi}}_{t}^{*}$ from the corresponding uniform quantile (i.e. the tendency to deviate from the 1 : 1 line in PIT diagrams). The αindex ranges from 0 (worst reliability) to 1 (perfect reliability).
Forecast skill is evaluated using the continuous ranked probability score (CRPS; Matheson and Winkler, 1976). The CRPS for a given forecast and observation is defined as:
where y is the forecast variable; y_{obs} is the observed value; F is the forecast CDF; and H is the Heaviside step function, which equals 0 if y < y_{obs} and equals 1 otherwise. Model forecasts are compared to reference forecasts by calculating skill scores:
where the overbar indicates averaging across a set of events. Reference forecasts for each day are produced using a BJP model fitted to observed data within an 11day window. The Schaake Shuffle is also applied to instil correct temporal characteristics into the reference forecasts. The CRPS skill score is positively oriented (whereas CRPS is negatively oriented). As a percentage, a maximum score of 100 is indicative of perfectly sharp and accurate forecasts. A score of 0 indicates no overall improvement compared to the reference forecast. A negative score indicates poor quality forecasts in the sense that the naïve reference forecast is more skilful.
3.5 Comparison with forecasts postprocessed using quantile mapping
We compare RPPS forecasts with forecasts that have been postprocessed at the ACCESSS grid scale using quantile mapping (QM). The gridded QM forecasts are supplied by the Bureau of Meteorology and we derive the catchment forecasts through areaweighting. QM matches the statistical distribution of past forecasts to the distribution of observations to reduce errors in the forecast mean and improve forecast spread (Crochemore et al., 2016; Zhao et al., 2017). A postprocessed forecast value is obtained by first working out the quantile fraction (cumulative probability) of the new forecast using the CDF of past forecasts, then inverting the quantile fraction using the CDF of observations. The Bureau of Meteorology applied a separate quantile mapping model to each day. The CDF of the past forecasts is formed using 11 ensemble members in an 11day sliding window and 22 years of data in leaveoneyearout crossvalidation. The statistical distribution of the observations is formed using the observations in an 11day sliding window and 22 years of data. For the first 5 days the window is fixed to the first 11 days. If the raw forecast ensemble member is above the previously known maximum forecast value, then the forecast value is instead linearly rescaled by o_{max}∕f_{max} where o_{max} is the maximum observed value and f_{max} is the maximum past forecast value.
4.1 Bias in forecasts of daily amounts
Biases in forecasts of daily amounts are analysed for selected days using Fig. 3. Each circle represents the bias for a catchment and initialisation date. The bias is plotted against the average forecast (averaged over all events). As expected, raw forecasts are more biased than postprocessed forecasts. The AB for raw forecasts ranges, for the examples given, from approximately 1.3 to 1.5 mm. The bias for raw forecasts tends to be negative, indicating that ACCESSS has a propensity to underestimate daily rainfall amounts.
QM and RPPS are similarly effective at reducing biases in daily amounts and both substantially reduce bias. After postprocessing, some residual bias remains for any single day since the bias is corrected using pooled observations. The AB for QM forecasts ranges from roughly 0.8 to 1.1 mm. The AB for RPPS forecasts ranges from about 0.7 to 1.1 mm. The differences in bias between QM and RPPS for daily amounts are evidently insignificant. Visual examination of the QM and RPPS scatter plots shows no wholesale difference in bias between the QM and RPPS forecasts of daily amounts.
4.2 Bias in forecasts of accumulated totals
Biases in forecasts of accumulated totals are analysed using Fig. 4 after rescaling to mm day^{−1}. As with the results for daily amounts, the raw forecasts are more biased than postprocessed forecasts and the bias tends to be negative. Visual examination of the scatterplots for the raw forecasts reveals, particularly for days 30, 60 and 90, that for some catchments and initialisation times, the raw forecasts are unbiased for subseasonal to seasonal rainfall totals. This suggests that the GCM performs well in some regions. Nevertheless, it is evident that postprocessing is necessary before using the ACCESSS forecasts in hydrological forecasting.
In contrast to the results for daily amounts, QM and RPPS have differing efficacy for reducing biases in accumulated totals, although both still reduce bias substantially. Visual examination of the QM and RPPS scatter plots shows that the RPPS points fit more tightly around the zero line than the QM points. The better ability of the RPPS forecasts to reduce biases is reflected in the AB; for example, for day 90 the AB for QM forecasts is 0.28 mm day^{−1} and for RPPS forecasts the AB is 0.13 mm day^{−1} (cf. the AB for raw forecasts of 0.93 mm day^{−1}).
4.3 Reliability
Reliability is analysed using Fig. 5, which presents box plots of αindex for forecasts of daily rainfall amounts (left panel) and accumulated totals (right panel). The box plots describe the distribution of αindex values for the same cases as we evaluated bias in Sect. 4.2 except that we omit the day 1 result from the accumulated totals analysis. Results are presented for RPPS before and after the Schaake Shuffle has been applied.
Raw forecasts have the poorest reliability for both daily amounts and accumulated totals. RPPS has better reliability overall than QM for daily amounts. It is noted that the Schaake Shuffle has no effect on forecast reliability for daily amounts since it is plainly a reordering of already randomised ensemble members.
RPPS forecasts are more reliable than raw forecasts and QM forecasts for accumulated totals. The RPPS forecasts become significantly more reliable after applying the Schaake Shuffle and linking the ensemble members so that they have realistic temporal patterns (discussed further in Sect. 5).
4.4 Skill scores – overall performance
Skill scores for QM forecasts are plotted against skill scores for RPPS forecasts in a scatterplot (Fig. 6). Skill scores for daily amounts are plotted in the left panel; skill scores for accumulated totals are plotted in the right panel. Accumulated totals are for two days or more. RPPS forecasts and QM forecasts tend to be positively skilful for the same cases, although there is considerable variation in the magnitude of the skill scores. A striking feature of the scatterplots is that when QM forecasts are negatively skilful, the RPPS forecasts tend to be neutrally skilful, particularly for the accumulated totals. It is evident that skill for daily amounts can be sharply negative. The skill for daily amounts can be difficult to estimate because of the small sample size and the inability to accurately forecast daily amounts beyond about 10 days. In contrast, the skill for accumulated totals is easier to estimate as it benefits from temporal averaging and the accumulation of skill from earlier periods.
4.5 Skill scores – detailed evaluation
Skill scores are partitioned according to catchment, lead time and initialisation date in Figs. 7–9, respectively. Since forecasts of accumulated totals are more informative for water resources management and forecast skill is generally known to be limited beyond the first week or two, we focus the remainder of the results on accumulated totals.
CRPS skill scores are plotted for each catchment in Fig. 7. Skill scores for RPPS forecasts are vastly positive except for, most noticeably, some cases in ORO, NMN and HLG. Negative skill can be caused by insufficient information to fit a stable RPPS model and can also be an artefact of crossvalidation in the presence of extreme events. In several catchments, such as WLC and DRT, RPPS forecasts are seen to significantly outperform QM forecasts by virtue of QM forecasts frequently being significantly negatively skilful and RPPS forecasts rarely being negatively skilful. In several catchments, including BRP and CTG, both QM and RPPS skill scores are predominantly positive, demonstrating that simple bias correction techniques can appear sufficient in localised studies.
CRPS skill scores are plotted for groups of lead times in Fig. 8. For days 2–10, the RPPS and QM forecasts are similarly skilful and vastly positively skilful. There are some instances of negative skill, which are likely to be artefacts of crossvalidation. For days 11–19 and 20–28, the positive relationship between QM and RPPS forecasts gradually weakens. From day 29 onwards, the QM forecasts become negatively skilful in many instances whereas the RPPS skill scores tend to level out around zero and are rarely negative to the same degree. For very long lead times, QM skill scores can be higher than RPPS skill scores, although overall the skill scores for accumulated seasonal totals are quite low (< 30 %). Both these factors signal further improvement in the RPPS forecasts is possible.
CRPS skill scores are plotted for each initialisation date in Fig. 9. Forecasts initialised from 1 September to 1 November have the highest proportion of cases where both QM and RPPS forecasts are positively skilful, suggesting that ACCESSS produces its best subseasonal to seasonal forecasts during the Austral spring and summer.
We demonstrate that RPPS is able to improve daily GCM rainfall forecasts by: reducing bias (Figs. 3–4); improving reliability (Fig. 5); and ensuring that forecasts are typically at least as skilful as a climatological reference forecast (Figs. 6–9). RPPS forecasts are comprehensively compared with QM forecasts. RPPS forecasts outperform QM forecasts, primarily because QM does not take into account the correlation between forecasts and observations nor corrects for autocorrelation problems. Since RPPS is built upon the BJP modelling approach, it explicitly models the correlation between the forecasts and observations, and thus takes into account model skill in the calibration. Our results add to the findings of Zhao et al. (2017) who studied the postprocessing of monthly rainfall forecasts from POAMA (Australia's GCM preceding ACCESSS). While Zhao et al. (2017) demonstrated that QM is very effective for bias correction, they did not consider accumulated totals. In our study, we find that the RPPS forecasts are less biased and more reliable than QM forecasts for accumulated totals.
Figure 5 illustrates the importance of the Schaake Shuffle for producing reliable forecasts using RPPS. If the forecasts are not shuffled, then the forecasts of accumulated totals will tend to be too narrow in terms of ensemble spread, making the forecasts overconfident and less reliable. Related to this, RPPS forecasts are more reliable for daily amounts than QM forecasts, yet QM and RPPS forecasts do not exhibit any obvious differences in the magnitude of biases (Fig. 3) – evidence that the RPPS forecasts have a more appropriate ensemble spread, since the αindex integrates information about forecast bias and ensemble spread.
We applied a consistent methodology to perennial and ephemeral catchments and established BJP models separately for each day, catchment and initialisation date. Alternative configurations are also possible, for example, we trialled establishing only six models by pooling data within week 1, week 2, weeks 3–4, and subsequent 4 week periods (not shown). The data in week 1 were used to fit model BJP1, the data in week 2 were used to fit model BJP2, and so on. Fewer days were pooled close to the initialisation date in an attempt to extract skill from the initial conditions and more days were pooled for longer lead times to better approximate the climatological distribution. We found significant efficiency gains with little loss in performance, except for dry catchments, where a small increase in the frequency of negative skill scores occurred. Therefore, performance and efficiency need to be considered when establishing RPPS models in new catchments.
The RPPS method is sophisticated in that it is a full calibration approach, however, there are opportunities to improve the methodology. By pooling the data for many days in model parameter inference for example, it is assumed that rainfall from one day to the next is independent, which is an oversimplification. New inference methods that treat the rainfall data as conditionally independent therefore ought to be investigated. Future studies will seek to address the matters of independence and overlapping data in more sophisticated ways.
We make use of ACCESSS runs initialised on day 1 of the month and day 25 of the previous month. These initialisation dates are only 4–7 days apart and therefore the climatology of daily rainfall is unlikely to change significantly over this period of time. It is technically possible to establish an RPPS model using initialisation dates spanning several months. If temporally distant initialisation dates are included in model parameter inference, new strategies may be needed to ensure that the effects of seasonality are minimised. One possible approach is to standardise forecasts and observations prior to fitting the BJP model. In this way, the model transformation and climatological parameters will be allowed to vary by day of year. Such strategies for building more robust RPPS models and coincidentally minimising the effect of seasonality will be investigated in follow up work.
RPPS and QM CRPS scores are calculated using ensembles of different sizes and we consider the effect of ensemble size on our results. When a forecasting system is perfectly reliable, a larger ensemble should yield a better CRPS score (Ferro et al., 2008). However, CRPS has a weakness in that it discourages forecasting extremes (Fricker et al., 2013); indeed it is our experience that ensembles that are unbiased but too narrow (like the QM forecasts) can score overly well in terms of CRPS. Our position is that because the RPPS and QM forecasts are not similarly reliable, we are unable to make meaningful adjustments to CRPS scores to allow for the effect of ensemble size. In any case, we understand that QM forecasts can be significantly negatively skilful and unreliable, and small adjustments to the CRPS of QM forecasts will not affect those conclusions.
RPPS is designed to postprocess forecasts of daily amounts. An alternative strategy is to postprocess accumulated totals and subsequently disaggregate to daily amounts. BJP models may be applied to postprocess monthly and seasonal totals, which are then disaggregated to daily timescales. Future work will investigate the relative merits of direct daily postprocessing versus a seasonal calibration and disaggregation approach. This followup work is of particular interest as our study has shown daily postprocessing skill is limited beyond 1015 days. It is not clear how much of the seasonal forecasting skill is attributable to skill in the initial period and how much of the skill is attributable to seasonal climate signals in the GCM.
While our study has focused on rainfall, other variables are important in hydrology. Temperature forecasts, for example, are required in areas of snowmelt. Therefore, future research will investigate the extension of the RPPS ideas to include other meteorological variables.
An alternative to statistical postprocessing of GCM outputs is to run a regional climate model (RCM) to provide much more localised information than a global GCM. A review study by Xue et al. (2014) found that RCMs have limited downscaling ability for subseasonal to seasonal forecasts. In that regard, RCM outputs may also be statistically postprocessed, which may lead to better forecasts in some regions. RCMs are suited to specialised studies and less suited to postprocessing operational GCM forecasts in support of national scale hydrological forecasting services.
We have developed a novel method for postprocessing daily rainfall amounts from seasonal forecasting GCMs. RPPS is a full calibration approach that makes use of the BJP modelling approach to account for predictor–predictand skill relationships in postprocessing. Reliable forecasts of subseasonal and seasonal accumulated totals are produced by linking ensemble members together using the Schaake Shuffle.
We applied RPPS to 12 catchments across Australia in diverse climate zones. The method is robust in terms of being capable of postprocessing forecasts in all cases, even in very dry catchments.
Compared to raw forecasts and QM postprocessing (which does not account for predictor–predictand skill relationships), RPPS performs significantly better in terms of correcting bias, reliability and skill. The only exception to this conclusion is that QM and RPPS are similarly effective for correcting biases in daily amounts. RPPS is particularly effective at delivering reliable, skilful, monthly and seasonal rainfall forecasts. Thus RPPS forecasts are highly suitable for feeding into hydrological models for seasonal streamflow forecasting and other water resources management applications. Pooling multiple GCM runs and data for adjacent days in model parameter inference is a practical measure that can enable statistical postprocessing across a range of perennial and ephemeral streams. There are many avenues of research that could significantly improve the robustness and performance of RPPS forecasts. In the future, more effort could be devoted to applying RPPS more widely and relating forecast performance to catchment characteristics, therefore yielding a better understanding of forecast bias, reliability and skill.
Gridded rainfall forecasts and observations used in this study can be sourced from the Bureau of Meteorology (http://www.bom.gov.au). Other data can be requested by contacting the corresponding author.
The authors declare that they have no conflict of interest.
This article is part of the special issue “Subseasonal to seasonal hydrological forecasting”. It is not associated with a conference.
This research was supported by the Water Information Research and
Development Alliance (WIRADA), a partnership between CSIRO and the Bureau of
Meteorology. We thank the Bureau of Meteorology for providing the ACCESSS
data, AWAP data and catchment information used in this study. We thank
Durga Lal Shrestha for helpful comments on the manuscript and two anonymous
reviewers for their insightful comments that helped improve the manuscript.
Edited by: Andy Wood
Reviewed by: two anonymous referees
Beckers, J. V. L., Weerts, A. H., Tijdeman, E., and Welles, E.: ENSOconditioned weather resampling method for seasonal ensemble streamflow prediction, Hydrol. Earth Syst. Sci., 20, 3277–3287, https://doi.org/10.5194/hess2032772016, 2016.
Bennett, J. C., Wang, Q. J., Li, M., Robertson, D. E., and Schepen, A.: Reliable longrange ensemble streamflow forecasts: Combining calibrated climate forecasts with a conceptual runoff model and a staged error model, Water Resour. Res., 52, 8238–8259, 2016.
Charles, A., Timbal, B., Fernandez, E., and Hendon, H.: Analog downscaling of seasonal rainfall forecasts in the Murray darling basin, Mon. Weather Rev., 141, 1099–1117, 2013.
Clark, M., Gangopadhyay, S., Hay, L., Rajagopalan, B., and Wilby, R.: The Schaake shuffle: A method for reconstructing spacetime variability in forecasted precipitation and temperature fields, J. Hydrometeorol., 5, 243–262, 2004.
Crochemore, L., Ramos, M.H., and Pappenberger, F.: Bias correcting precipitation forecasts to improve the skill of seasonal streamflow forecasts, Hydrol. Earth Syst. Sci., 20, 3601–3618, https://doi.org/10.5194/hess2036012016, 2016.
Crochemore, L., Ramos, M.H., Pappenberger, F., and Perrin, C.: Seasonal streamflow forecasting by conditioning climatology with precipitation indices, Hydrol. Earth Syst. Sci., 21, 1573–1591, https://doi.org/10.5194/hess2115732017, 2017.
Ferro, C. A. T., Richardson, D. S., and Weigel, A. P.: On the effect of ensemble size on the discrete and continuous ranked probability scores, Meteorol. Appl., 15, 19–24, 2008.
Fricker, T. E., Ferro, C. A., and Stephenson, D. B.: Three recommendations for evaluating climate predictions, Meteorol. Appl., 20, 246–255, 2013.
Glahn, H. R. and Lowry, D. A.: The Use of Model Output Statistics (MOS) in Objective Weather Forecasting, J. Appl. Meteorol., 11, 1203–1211, 1972.
Hawthorne, S., Wang, Q., Schepen, A., and Robertson, D.: Effective use of general circulation model outputs for forecasting monthly rainfalls to long lead times, Water Resour. Res., 49, 5427–5436, 2013.
Hudson, D., Shi, L., Alves, O., Zhao, M., Hendon, H. H., and Young, G.: Performance of ACCESSS1 for key horticultural regions, Bureau of Meteorology, Melbourne, 39 pp., 2017.
Ines, A. V. and Hansen, J. W.: Bias correction of daily GCM rainfall for crop simulation studies, Agr. Forest Meteorol., 138, 44–53, 2006.
Khan, M. Z. K., Sharma, A., Mehrotra, R., Schepen, A., and Wang, Q.: Does improved SSTA prediction ensure better seasonal rainfall forecasts?, Water Resour. Res., 51, 3370–3383, https://doi.org/10.1002/2014WR015997, 2015.
Kim, H.M., Webster, P. J., and Curry, J. A.: Seasonal prediction skill of ECMWF System 4 and NCEP CFSv2 retrospective forecast for the Northern Hemisphere Winter, Clim. Dynam., 39, 2957–2973, 2012.
Lavers, D., Luo, L., and Wood, E. F.: A multiple model assessment of seasonal climate forecast skill for applications, Geophys. Res. Lett., 36, L23711, https://doi.org/10.1029/2009GL041365, 2009.
Luo, L., Wood, E. F., and Pan, M.: Bayesian merging of multiple climate model forecasts for seasonal hydrological predictions, J. Geophys. Res.Atmos., 112, D10102, https://doi.org/10.1029/2006JD007655, 2007.
Matheson, J. E. and Winkler, R. L.: Scoring rules for continuous probability distributions, Management Science, 22, 1087–1096, 1976.
Peng, Z., Wang, Q., Bennett, J. C., Schepen, A., Pappenberger, F., Pokhrel, P., and Wang, Z.: Statistical calibration and bridging of ECMWF System4 outputs for forecasting seasonal precipitation over China, J. Geophys. Res.Atmos., 119, 7116–7135, 2014.
Pineda, L. E. and Willems, P.: Multisite Downscaling of Seasonal Predictions to Daily Rainfall Characteristics over Pacific–Andean River Basins in Ecuador and Peru Using a Nonhomogeneous Hidden Markov Model, J. Hydrometeorol., 17, 481–498, 2016.
Renard, B., Kavetski, D., Kuczera, G., Thyer, M., and Franks, S. W.: Understanding predictive uncertainty in hydrologic modeling: The challenge of identifying input and structural errors, Water Resour. Res., 46, W05521, https://doi.org/10.1029/2009WR008328, 2010.
Robertson, A. W., Kirshner, S., and Smyth, P.: Downscaling of Daily Rainfall Occurrence over Northeast Brazil Using a Hidden Markov Model, J. Climate, 17, 4407–4424, 2004.
Robertson, D. E., Pokhrel, P., and Wang, Q. J.: Improving statistical forecasts of seasonal streamflows using hydrological model output, Hydrol. Earth Syst. Sci., 17, 579–593, https://doi.org/10.5194/hess175792013, 2013a.
Robertson, D. E., Shrestha, D. L., and Wang, Q. J.: Postprocessing rainfall forecasts from numerical weather prediction models for shortterm streamflow forecasting, Hydrol. Earth Syst. Sci., 17, 3587–3603, https://doi.org/10.5194/hess1735872013, 2013b.
Schefzik, R.: A similaritybased implementation of the Schaake shuffle, Mon. Weather Rev., 144, 1909–1921, 2016.
Schepen, A. and Wang, Q.: Ensemble forecasts of monthly catchment rainfall out to long lead times by postprocessing coupled general circulation model output, J. Hydrol., 519, 2920–2931, 2014.
Schepen, A., Wang, Q., and Robertson, D. E.: Seasonal forecasts of Australian rainfall through calibration and bridging of coupled GCM outputs, Mon. Weather Rev., 142, 1758–1770, 2014.
Schepen, A., Zhao, T., Wang, Q. J., Zhou, S., and Feikema, P.: Optimising seasonal streamflow forecast lead time for operational decision making in Australia, Hydrol. Earth Syst. Sci., 20, 4117–4128, https://doi.org/10.5194/hess2041172016, 2016.
Shao, Q. and Li, M.: An improved statistical analogue downscaling procedure for seasonal precipitation forecast, Stoch. Environ. Res. Risk Assess., 27, 819–830, 2013.
Shrestha, D. L., Robertson, D. E., Bennett, J. C., and Wang, Q.: Improving Precipitation Forecasts by Generating Ensembles through Postprocessing, Mon. Weather Rev., 143, 3642–3663, 2015.
Teutschbein, C. and Seibert, J.: Bias correction of regional climate model simulations for hydrological climatechange impact studies: Review and evaluation of different methods, J. Hydrol., 456, 12–29, 2012.
Tian, D., Wood, E. F., and Yuan, X.: CFSv2based subseasonal precipitation and temperature forecast skill over the contiguous United States, Hydrol. Earth Syst. Sci., 21, 1477–1490, https://doi.org/10.5194/hess2114772017, 2017.
Timbal, B. and McAvaney, B.: An analoguebased method to downscale surface air temperature: application for Australia, Clim. Dynam., 17, 947–963, 2001.
Wang, E., Zhang, Y., Luo, J., Chiew, F. H. S., and Wang, Q. J.: Monthly and seasonal streamflow forecasts using rainfallrunoff modeling and historical weather data, Water Resour. Res., 47, W05516, https://doi.org/10.1029/2010WR009922, 2011.
Wang, Q. and Robertson, D.: Multisite probabilistic forecasting of seasonal flows for streams with zero value occurrences, Water Resour. Res., 47, W02546, https://doi.org/10.1029/2010WR009333, 2011.
Wang, Q., Robertson, D., and Chiew, F.: A Bayesian joint probability modeling approach for seasonal forecasting of streamflows at multiple sites, Water Resour. Res., 45, W05407, https://doi.org/10.1029/2008WR007355, 2009.
Wang, Q. J., Shrestha, D. L., Robertson, D. E., and Pokhrel, P.: A logsinh transformation for data normalization and variance stabilization, Water Resour. Res., 48, W05514, https://doi.org/10.1029/2011WR010973, 2012.
Wilks, D. S. and Hamill, T. M.: Comparison of EnsembleMOS Methods Using GFS Reforecasts, Mon. Weather Rev., 135, 2379–2390, 2007.
Wood, A. W., Hopson, T., Newman, A., Brekke, L., Arnold, J., and Clark, M.: Quantifying Streamflow Forecast Skill Elasticity to Initial Condition and Climate Prediction Skill, J. Hydrometeorol., 17, 651–668, 2016.
Xue, Y., Janjic, Z., Dudhia, J., Vasic, R., and De Sales, F.: A review on regional dynamical downscaling in intraseasonal to seasonal simulation/prediction and major factors that affect downscaling ability, Atmos. Res., 147, 68–85, 2014.
Yuan, X., Wood, E. F., and Ma, Z.: A review on climatemodelbased seasonal hydrologic forecasting: physical understanding and system development, Wiley Interdisciplin. Rev. Water, 2, 523–536, 2015.
Zhao, T., Schepen, A., and Wang, Q.: Ensemble forecasting of subseasonal to seasonal streamflow by a Bayesian joint probability modelling approach, J. Hydrol., 541, 839–849, https://doi.org/10.1016/j.jhydrol.2016.07.040, 2016.
Zhao, T., Bennett, J., Wang, Q. J., Schepen, A., Wood, A., Robertson, D., and Ramos, M.H.: How suitable is quantile mapping for postprocessing GCM precipitation forecasts?, J. Climate, 30, 3185–3196, https://doi.org/10.1175/JCLID160652.1, 2017.