A strategy to overcome adverse effects of autoregressive updating of streamflow forecasts

For streamflow forecasting, rainfall–runoff models are often augmented with updating procedures that correct forecasts based on the latest available streamflow observations of streamflow. A popular approach for updating forecasts is autoregressive (AR) models, which exploit the “memory” in hydrological model simulation errors. AR models may be applied to raw errors directly or to normalised errors. In this study, we demonstrate that AR models applied in either way can sometimes cause over-correction of forecasts. In using an AR model applied to raw errors, the overcorrection usually occurs when streamflow is rapidly receding. In applying an AR model to normalised errors, the overcorrection usually occurs when streamflow is rapidly rising. In addition, when parameters of a hydrological model and an AR model are estimated jointly, the AR model applied to normalised errors sometimes degrades the stand-alone performance of the base hydrological model. This is not desirable for forecasting applications, as forecasts should rely as much as possible on the base hydrological model, with updating only used to correct minor errors. To overcome the adverse effects of the conventional AR models, a restricted AR model applied to normalised errors is introduced. We show that the new model reduces over-correction and improves the performance of the base hydrological model considerably.


Introduction
Rainfall-runoff models are widely used to generate streamflow forecasts, which provide essential information for flood warning and water resource management.For streamflow forecasting, rainfall-runoff models are often augmented by updating procedures that correct streamflow forecasts based on the latest available observations of streamflow and their departures from model simulations.Model errors reflect limitations of the hydrological models in reproducing physical processes as well as inaccuracies in data used to force and evaluate the models.
The most popular updating approach uses autoregressive (AR) models, which exploit the "memory" -more precisely the autocorrelation structure -of errors in hydrological simulations (Morawietz et al., 2011).Essentially, AR updating uses a linear function of the known errors at previous time steps to anticipate errors in a forecast period.Forecasts are then updated according to these anticipated errors.AR updating is conceptually simple and yet generally leads to significantly improved forecasts (World Meteorological Organization, 1992).AR updating has been shown to provide equivalent performance to more sophisticated non-linear and non-parametric updating procedures (Xiong and O'Connor, 2002).
There is no agreement on whether it is better to apply an AR model to normalised or raw errors.Recent work by Evin et al. (2013) found that an AR model applied to raw errors may lead to poor performance with exaggerated uncertainty.They demonstrated that such instability can be mitigated by applying an AR model to standardised errors (raw errors divided by standard deviations).Here, standardisation has a similar effect to normalisation in that it homogenises the variance of the errors (but does not consider the non-Gaussian distribution of errors).Conversely, Schaefli et al. (2007) pointed out that when an AR model is jointly estimated with a hydrological model, there is a clear advantage in applying an AR model to raw errors rather than normalised (or standardised) errors.Schaefli et al. (2007) found that using raw errors leads to more reliable parameter inference and uncertainty estimation, because the mean error is close to zero and therefore the simulations are free of systematic bias.The same is not necessarily true when applying an AR model to normalised errors.
In this study, we evaluate AR models applied to both raw and normalised errors in four Australian catchments and three United States (US) catchments.We show that when estimated jointly with a hydrological model, the AR model applied to normalised errors sometimes degrades the standalone performance of the base hydrological model.We also identify the fact that both of these conventional AR models can sometimes cause over-correction of forecasts.We introduce a restricted AR model applied to normalised errors and demonstrate its effectiveness in overcoming the adverse effects of the conventional AR models.

Formulations
A hydrological model is a function of forcing variables (precipitation and potential evapotranspiration), initial catchment state, S 0 , and a set of hydrological model parameters, θ H .We denote the observed streamflow and model simulated streamflow at time t by Q t and Q t , respectively.An error model is used to describe the difference between Q t and Q t .The logsinh transformation defined by Wang et al. (2012), is applied to stabilise variance and normalise data.
In this study, we firstly examine two first-order AR error models: 1.An AR error model applied to normalised errors (referred to as AR-Norm) defined by where Z t and Z t are the log-sinh transformed variables of Q t and Q t .
2. An AR error model applied to raw errors (referred to as AR-Raw) defined by For both models, ρ is the lag-1 autoregression parameter, and ε t is an identically and independently distributed Gaussian deviate with a mean of zero and a constant standard deviation σ .
Both the AR-Norm and AR-Raw models represent the lag-1 autocorrelation by an AR process and both employ the logsinh transformation.However, the way the log-sinh transformation is applied differs between the two models.The AR-Norm model first applies the log-sinh transformation to the observed and model simulated streamflow, and then assumes that the error in the transformed space follows an AR(1) process.In contrast, the AR-Raw model essentially assumes that the error in the original space follows an AR(1) process and only applies the log-sinh transformation to fit the asymmetric and non-Gaussian error distribution.
The medians of the updated streamflow forecast (referred to as updated streamflow) for the AR-Norm and AR-Raw models (see Appendix A for proof), denoted by Q * t , are respectively and where f −1 (x) is the inverse of the log-sinh transformation (or back-transformation).The magnitude of the error update by the AR-Raw model, Q * t − Q t , is dependent only on the difference between Q t−1 and Q t−1 .In contrast, the magnitude of the error update by the AR-Norm model is dependent not only on the difference between Q t−1 and Q t−1 , but also on Q t .Put differently, the AR-Norm model uses errors calculated in the transformed domain, and this means that the error in the original domain can be amplified (or reduced) by the back-transformation (Eq.4).The AR-Raw model uses errors calculated in the original domain and no back-transformation is used in calculating Q * t (Eq.5), meaning that the error in the original domain cannot be amplified (or reduced).In Appendix B, we show that the AR-Norm model gives greater error updates for larger values of Q t .
We will demonstrate in Sect. 4 that the AR-Norm and AR-Raw models can sometimes cause over-correction of forecasts.Motivated to overcome the potential for overcorrection, we introduce a modification of the AR-Norm model, called the restricted AR-Norm model (referred to as RAR-Norm).
| is used to limit the correction to an amount not exceeding the raw error at the last time step.The updated streamflow is given by where The full RAR-Norm model in the transformed space is given by

Estimation
The AR-Norm, AR-Raw and RAR-Norm models are each calibrated jointly with the hydrological model.The method of maximum likelihood is used to estimate the error model parameters θ E and the hydrological model parameters θ H . Using a similar derivation as given by Li et al. (2013), the likelihood functions can be written as a. for AR-Norm c. for RAR-Norm where J Z t →Q t = {tanh(a + b Q t )} −1 is the Jacobian determinant of the log-sinh transformation and φ(x|µ, σ 2 ) is the probability density function of a Gaussian random variable x with mean µ and standard deviation σ .The probability density function is replaced by the cumulative probability function when evaluating events of zero flow occurrences (Wang and Robertson, 2011;Li et al., 2013).The shuffled complex evolution (SCE) algorithm (Duan et al., 1994) is used to minimise the log likelihood.

Data
We use daily data from four Australian catchments and three catchments from the US (Fig. 1, Table 1).Australian streamflow data are taken from the Catchment Water Yield Estimation Tool (CWYET) data set (Vaze et al., 2011).Australian rainfall and potential evaporation data are derived from the Australian Water Availability Project (AWAP) data set (Jones et al., 2009).All data for the US catchments come from the Model Intercomparison Experiment (MOPEX) data set (Duan et al., 2006).The selected US catchments are amongst the 12 catchments used by Evin et al. (2014) to compare joint and postprocessor approaches to estimating hydrological uncertainty, and allow us to compare results with that study (the other catchments used by Evin et al. (2014) are influenced by snowmelt, which is not considered in the hydrological model used in this study).The Abercrombie River and the Guadalupe River intermittently experience periods of very low (to zero) flow, while the other rivers flow perennially (Table 1).Such dry catchments are challenging for hydrological simulations and error modelling.All catchments have high-quality streamflow records with very few missing data.
We forecast daily streamflow with the GR4J rainfallrunoff model (Perrin et al., 2003).We apply updating procedures to correct these forecasts.All results presented in this paper are based on cross-validation to ensure the results can be generalised to independent data.We use different crossvalidation schemes for the Australian and US catchments, because of the shorter streamflow records available for the Australian catchments: 1.For the Australian catchments, we use data from 1992 to 2005 (14 years) for these catchments.We then generate 14-fold cross-validated streamflow forecasts.The data from 1990 to 1991 are only used to warm up the GR4J model.For a given year, we leave out the data from that year and the following year when estimating the parameters of GR4J and error models.For example, if we wish to forecast streamflows at any point in 1999, we leave out data from 1999 and 2000 when we estimate parameters.The removal of data from the following year ( 2000) is designed to minimise the impact of hydrological memory on model parameter estimation.We then generate streamflow forecasts in that To demonstrate the problems of over-correction of errors in updating and poor stand-alone performance of the base hydrological model, we consider only streamflow forecasts for one time step ahead.We will consider longer lead times in future work.Forecasts are generated using observed rainfall (i.e. a "perfect" rainfall forecast) as input.In streamflow forecasting, forecasts may be generated from rainfall information that comes from a different source (e.g. a numerical weather prediction model).Our study is aimed at streamflow forecasting applications, so we preserve the distinction between observed and forecast forcings by referring to streamflows modelled with observed rainfall as simulations and those modelled with forecast rainfall as forecasts.In this study the forecast rainfall is observed rainfall, so the terms forecast and simulation are interchangeable.

Over-correction of forecasts as the hydrograph rises
The first adverse effect of the conventional AR models is over-correction of errors in updating as streamflows are rising.By over-correction, we mean that the AR model updates the hydrological model simulations too much.Overcorrection is difficult to define precisely; however, we will demonstrate the concept with two examples in the Mitta Mitta catchment: the first example illustrates over-correction by the AR-Norm model, and the second example illustrates over-correction by the AR-Raw model.
To illustrate the problem of over-correction caused by the AR-Norm model, Fig. 2 presents a 1 week time series for the Mitta Mitta catchment, showing streamflow forecasts with GR4J before error updating (referred to as streamflow forecast with the base hydrological model) and after error updating.Figure 2 shows that the base hydrological models consistently under-estimate the streamflow from 23 to 25 September 2000, and the corresponding updating procedures successfully identify the need to compensate for this under-estimation.For the AR-Norm model, however, the correction for 26 September 2000 is unreasonably large.Because the forecast streamflow on 26 September 2000 is much higher than that of the previous day, the correction is greatly amplified by the back-transformation, leading to the overcorrection.In contrast, the AR-Raw model works better in this situation because the magnitude of the error update never exceeds the simulation error on the previous day, regardless of whether the forecast streamflow is high or low.The RAR-Norm model behaves similarly to the AR-Raw model for correcting the peak on 26 September 2000 and avoids the overcorrection made by the AR-Norm model.
Figure 3 shows instances of possible over-correction by the AR-Norm model, identified by the condition   3 shows that about 10-25 % of the AR-Norm updated forecasts have an error update that is larger than the forecast error on the previous day and therefore are susceptible to over-correction.The frequency of these instances varies somewhat from catchment to catchment.The RAR-Norm model identifies 10-30 % of the forecasts as possible instances of problematic updating, and the AR-Norm model identifies a similar number of instances (slightly fewer -they are not identical because the parameters for each model are inferred independently).
Figure 4 presents a time series for the Orara catchment that shows the instances susceptible to over-correction for the AR-Norm model.These instances all occur when the streamflow rises.The RAR-Norm model effectively rectifies the problem of over-correction caused by the AR-Norm model.We note that there is nothing that forces the instances susceptible to over-correction identified by the AR-Norm model to be the same as those identified by the RAR-Norm models, because the two models are calibrated independently (and therefore base hydrological model simulations may be different).However, the restriction defined in the RAR-Norm model is largely applied to the instances where the AR-Norm model is susceptible to over-correction.

Over-correction of forecasts as the hydrograph recedes
The second adverse effect of conventional AR models is over-correction of forecasts as streamflows recede.An example is presented in Fig. 5

Poor stand-alone performance of the base hydrological model
The third adverse effect with conventional AR error models is the stand-alone performance of the base hydrological model (GR4J).As noted above, the parameters of the base hydrological model are estimated jointly with each error model.For streamflow forecasting, we expect to obtain a reasonably accurate forecast from the base hydrological model followed by an updating procedure as an auxiliary means of improving the forecast accuracy.At lead times of many time steps (e.g.streamflow forecasts generated from medium-range rainfall forecasts) the magnitude of AR error updates becomes rapidly smaller (tending to zero), and thus the performance of the base hydrological model is crucial for realistic forecasts at longer lead times.While we only inves- tigate forecasts at a lead time of one time step in this study, we aim to develop methods that can be applied to forecasts at longer lead times.Furthermore, if the base hydrological model does not replicate important catchment processes realistically, the performance of the hydrological model outside the calibration period may be less robust.
Figure 7 presents the Nash-Sutcliffe efficiency (NSE) (Nash and Sutcliffe, 1970) calculated from the base hydrological model and the error models.When the AR-Norm model is used, the forecasts from the base hydrological model are very poor for the Orara catchment (NSE < 0).The scatter plot in Fig. 8 shows a serious over-estimation of the streamflow simulation for the Orara catchment.When the AR-Norm model is used, the base hydrological model greatly over-estimates discharge, and the AR-Norm model then attempts to correct this systematic over-estimation.This is also shown in Fig. 4, where the base hydrological model has a strong tendency to over-estimate streamflows for a range of streamflow magnitudes.The base hydrological model with the AR-Norm model also performs poorly for the Abercrombie catchment (Fig. 7).In this case, the base hydrological model tends to under-estimate streamflows (results not shown).For the other three catchments, however, the base hydrological model with the AR-Norm model performs reasonably well.
In general, the AR-Raw base hydrological model performs as well as or better than the AR-Norm base hydrological model.The AR-Raw base hydrological model is notably better than the AR-Norm base hydrological model in the Abercrombie and Orara catchments (Fig. 7).This suggests that  more robust performance can be expected of base hydrological models when AR models are applied to raw errors.
The RAR-Norm model generally improves the performance of the AR-Norm base hydrological model to a level similar to the AR-Raw base hydrological model (Fig. 7).The improvement over the AR-Norm base hydrological model is especially evident for the Orara (Figs. 4 and 7) and Abercrombie catchments (Fig. 7).
We note that for the AR-Norm models, the updated forecasts are not always better than forecasts generated by the base hydrological models.For the Tarwin and Guadalupe catchments, AR-Norm forecasts are not as good as the forecasts generated by the AR-Norm base hydrological model.This points to a tendency to overfit the parameters to the calibration period, resulting in the error model undermining the performance of the base hydrological model under cross-validation.Such a lack of robustness is highly undesirable in forecasting applications, where the hydrological models should be able to operate in conditions that differ from those experienced during calibration.Note that this problem also occurs in the RAR-Norm model (Guadalupe) and in the AR-Raw model (Abercrombie, Guadalupe), but to a much smaller degree.
In general, the updated forecasts from the RAR-Norm model show similar or better forecast accuracy, as measured by NSE, than both the AR-Raw model and the AR-Norm model (Fig. 7).We note that the Orara catchment is an ex-ception: here the AR-Raw model shows slightly better performance than the RAR-Norm model.Conversely, the RAR-Norm model shows notably better performance than both the AR-Norm and AR-Raw models in the Abercrombie and Guadalupe catchments.This suggests the RAR-Norm model may work better in intermittently flowing catchments, although further testing is required to establish that this is true for a greater range of catchments.

Further analyses
We further evaluate the NSE of the three different error models calibrated when streamflows are receding (i.e.Q t ≤ Q t−1 ) and rising (i.e.Q t > Q t−1 ) (Table 2).For the receding streamflows (constituting 70-85 % of streamflows), the AR-Raw model leads to the overall worst forecast accuracy because of the over-correction explained in Sect.4.1.This is especially evident for the Abercrombie catchment (and, to a lesser degree, the Guadalupe catchment).The RAR-Norm model significantly outperforms the other two models for the Abercrombie catchment and shares similar forecast accuracy to the (strongly performing) AR-Norm model for the other catchments.When streamflows are rising (which also includes streamflow peaks), the AR-Norm model can cause over-correction and leads to the least accurate forecasts (in terms of NSE), and the RAR-Norm model behaves similarly to the AR-Raw model, which consistently provides the most accurate forecasts.(The only exception is the Guadalupe River, where the AR-Raw model clearly outperforms the RAR-Norm model when streamflows are rising.This is somewhat compensated for by the markedly better performance the RAR-Norm model offers over the AR-Raw model when streamflows are receding for this catchment, leading to better forecasts overall (Fig. 7).)We conclude that the AR-Norm model generally tends to perform least well when streamflows recede, and that the AR-Raw model tends to perform least well when streamflows rise.We also conclude that the RAR-Norm model tends to combine the best elements of the AR-Norm and AR-Raw models, leading to the best overall performance.We have shown that over-corrections can lead to inaccurate deterministic forecasts, and we now discuss the consequences for the probabilistic predictions given by each of the error models.We assess probabilistic forecast skill with skill scores derived from two probabilistic verification measures: the continuous rank probability score (CRPS) and the root mean square error in probability (RMSEP) (denoted by CRPS_SS and RMSEP_SS, respectively) (Wang and Robertson, 2011).Both skill scores are calculated with respect to a reference forecast.The reference forecast is generated by resampling historical streamflows: for a forecast issued for a given month/year (e.g.February 1999), we randomly draw a sample of 1000 daily streamflows that occurred in that month (e.g.February) from other years with replacement (e.g. years other than 1999).Table 3 compares these two skill scores calculated for the all catchments.The RAR-Norm model performs best across the range of skill scores and catchments, attaining the highest CRPS_SS in 4 of the 7 catchments and the highest RMSEP_SS in 4 of 7 catchments.Even where RAR-Norm was not the best performed model, it performs very similarly to the best performing model in all cases.Interestingly, the AR-Raw model tends to outperform the AR-Norm model in CRPS_SS while the reverse is true for RMSEP_SS.The CRPS tests how appropriate the spread of uncertainty is for each probabilistic forecast, while RMSEP puts little weight on this.The results suggest that while the median forecasts of AR-Norm tends to be slightly more accurate than those of the AR-Raw model, the forecast uncertainty is represented slightly better by the AR-Raw model.
To understand better how reliably the forecast uncertainty is quantified by each model, we produce probability integral transform (PIT) uniform probability plots (Wang and Robertson, 2011) in Fig. 9.There are two main points to draw from these plots.First, the curves are very similar for all error models (a partial exception is the San Marcos catchment, where the AR-Raw model is slightly closer to the one-to-one line than the other models).This demonstrates that, in general, the models produce similarly reliable uncertainty distributions.Second, all models show an inverted S-shaped curve, which indicates that the uncertainty ranges are too wide.This underconfidence is a result of using a Gaussian distribution to characterise the error.The Gaussian distribution is not flexible enough to represent the high degree of kurtosis in the distribution of the residuals after error updating (partly because the errors become very small after updating).We are presently experimenting with other distributions in order to address this issue, and will seek to publish this work in future.For the purposes of the present study, we conclude that the three error models are similarly reliable.

Discussion and conclusions
For streamflow forecasting, rainfall-runoff models are often augmented with an updating procedure that corrects the forecast using information from recent simulation errors.The most popular updating approach uses autoregressive (AR) models that exploit the "memory" in model errors.AR models may be applied to raw errors directly or to normalised errors.
We demonstrate three adverse effects of AR error updating procedures on seven catchments.The first adverse effect is possible over-correction on the rising limb of the hydrograph.The AR-Norm model can exhibit the tendency to over-correct the peaks or on the rise of a hydrograph, because error updating can be (overly) amplified by the backtransformation.The second adverse effect is the tendency to over-correct receding hydrographs.This tendency is most prevalent in the AR-Raw model, which can fail to recognise that a large error update may not be appropriate for small streamflows.
The third adverse effect is that the stand-alone performance of the base hydrological model can be poor when the parameters of the rainfall-runoff model and the error model are jointly estimated.We show that poor base hydrological model performance is particularly prevalent in the AR-Norm model.The poor performance appears to occur in catchments with highly skewed streamflow observations (the intermittent Abercrombie River, and the Orara River, a catchment in a subtropical climate).For example, in the Orara River, the base hydrological model tends to greatly over-estimate streamflows, and then relies on the error updating to correct the over-estimates.This is not desirable in real-time forecasting applications for two major reasons.First, modern streamflow forecasting systems often extend forecast lead times with rainfall forecast information (Bennett et al., 2014).The magnitude of AR updating decays with lead time, and forecasts at longer lead times rely heavily on the performance of the base hydrological model.Second, hydrological models are designed to simulate various components of natural systems, such as baseflow processes or overland flow.In theory, simulating these processes correctly will allow the model to perform well for climate conditions that may substantially differ from those experienced during the parameter estimation period.If the hydrological model parameters do not reflect the natural processes for a given catchment, the hydrological model may be much less robust outside the parameter estimation period.
We note that the poor performance of the hydrological model may be specific to the GR4J model, and may not occur in other hydrological models.Evin et al. (2014) estimated hydrological model and error model parameters jointly using GR4J and another hydrological model, HBV, for the three US catchments tested here.While they did not assess the performance of the base hydrological models, they found that HBV tended to perform more robustly when combined with differ-ent error models.It is possible that we may have achieved more stable base model performance had we used HBV or another hydrological model.We note, however, that our conclusions can probably be generalised to other hydrological models that do not offer robust base model performance under joint parameter estimation (e.g.GR4J).Because the RAR-Norm model limits the range of updating that can be applied, it will tend to rely more heavily on the base hydrological model, and therefore will tend to favour parameter sets that encourage good stand-alone performance of the base model.For those hydrological models that already produce robust base model performance under joint parameter estimation (perhaps HBV), RAR-Norm is unlikely to undermine this performance for the same reasons.We see some evidence of this in our experiments with GR4J: when the performance of the base hydrological model is already strong relative to the updated forecasts for the AR-Norm and AR-Raw models (e.g. the Tarwin, Mitta Mitta, or Guadalupe catchments), the RAR-Norm model base hydrological model also performs strongly.
The tendency of the AR-Norm model to over-correct rising streamflows is probably generic.In particular, transformations other than the log-sinh transformation may still lead to over-correction at the peak of hydrograph.The proof in Appendix B shows that if a transformation satisfies some conditions (first derivate is positive and second derivate is negative), it will tend to correct more for higher forecast streamflows and can cause the problem of over-correction.The conditions given by Appendix B are generally true for many other transformations used for data normalisation and variance stabilisation in hydrological applications, such as logarithm transformation or the Box-Cox transformation with the power parameter less than 1.
We use joint parameter inference to calibrate hydrological model and error model parameters, in order to address the true nature of underlying model errors.Inferring parameters of the error model and the base hydrological model independently -i.e.first inferring parameters of the base hydrological model, holding these constant and then inferring the error model parameters -relies on simplified and often invalid error assumptions (it assumes independent, homoscedastic and Gaussian errors), but nonetheless could be a pragmatic alternative to the joint parameter inference to reduce computational demands.The over-correction of conventional AR models is independent of the parameter inference, whether the error and base hydrological model parameters are inferred jointly or independently.
In order to mitigate the adverse effects of conventional AR updating procedures, we introduce a new updating procedure called the RAR-Norm model.The RAR-Norm model is a modification of the AR-Norm model: in most instances it operates as the AR-Norm model, but in instances of possible over-correction it relies on the error in untransformed streamflows at the previous time step.That is, RAR-Norm is essentially a more conservative error model than AR-Norm: in situations where streamflows change rapidly, it opts to update with whichever error (transformed or untransformed) is smaller.This forces greater reliance on the base hydrological model to simulate streamflows accurately, leading to more robust performance in the base hydrological model.The RAR-Norm model clearly outperforms the AR-Norm model in both the updated and base model forecasts, as well as ameliorating the problem of over-correcting rising streamflows.The RAR-Norm model's advantage over the AR-Raw model is less clear: both the base hydrological model and the updated forecasts produced by the AR-Raw model perform similarly to (or sometimes slightly better than) the RAR-Norm model.However, the RAR-Norm model clearly addresses the problem of over-correcting receding streamflows that occurs in the AR-Raw model.As we show, this type of over-correction can seriously distort event hydrographs, and cause forecasts of near zero streamflows when reasonably substantial streamflows are observed.While these instances are not very common, the failure in the forecast is a serious one.As we note earlier, the over-correction of receding streamflows is likely to be exacerbated when producing forecasts at lead times of more than one time step.Accordingly, we contend that the RAR-Norm model is preferable to both the AR-Norm and AR-Raw models for streamflow forecasting applications.

Figure 1 .
Figure 1.Map of US (top panel) and Australian (bottom panel) catchments.

Figure 2 .
Figure 2.An example of over-correction caused by the AR-Norm model in the Mitta Mitta catchment.Dashed lines: forecasts from the base hydrological model (i.e.without error updating).Solid lines: forecasts with error updating.

Figure 3 .
Figure 3.The fraction of instances where D t > Q t−1 − Q t−1 (i.e.instances where over-correction may occur in the AR-Norm model and where error updating is restricted in the RAR-Norm model) for the AR-Norm and RAR-Norm models for Australian catchments.

D
where the AR-Raw model causes over-correction.Here, the base hydrological model overestimates the receding hydrograph on 5 October 1993.The magnitude of the error update given by the AR-Raw model cannot adjust according to the value of the forecast.As a result, the AR-Raw model updates the forecast on 6 October 1993 by a large amount, resulting in serious underestimation (the forecast streamflow is nearly zero), and an artificial distortion of the hydrograph.(We note that we have seen this problem become much worse in unpublished experiments of forecasts made for several time steps into the future, sometimes resulting in forecasts of zero flows during large floods.)In contrast, the AR-Norm model performs better in this example, giving a smaller magnitude of error update by recognising that the hydrograph is moving downward.It is generally true that in applying the AR-Raw model, over-correction may occur when the streamflow is receding.The RAR-Norm model produces updated streamflow similar to the AR-Norm model when the hydrograph recedes rapidly and avoids the over-correction by the AR-Raw model on 6 October 1993.Figure6provides more examples of the over-correction caused by the AR-Raw model from a longer time-series plot for the Abercrombie catchment.There are three clear instances of over-correction, all occurring on the time step immediately after large peaks in observed streamflows.The RAR-Norm model works better than the AR-Raw model to avoid the three instances of over-correction for the Abercrombie catchment.Overall, the RAR-Norm model takes a conservative position when streamflow changes rapidly, either rising or falling.When streamflow changes rapidly, it is difficult to anticipate the magnitude of forecast error.Accordingly, the conventional AR models are prone to overcorrection in such instances.

Figure 4 .
Figure 4. Forecast streamflows for the Orara catchment for an example 1 year period.The top panel shows streamflows forecast with the AR-Norm model; the bottom panel shows streamflows forecast with the RAR-Norm model.Dashed lines: forecasts from the base hydrological model (i.e.without error updating).Solid lines: forecasts with error updating.Tick marks on the x axis denote the instance of updating where D t > Q t−1 − Q t−1 .

Figure 5 .
Figure 5.An example of over-correction caused by the AR-Raw model in the Mitta Mitta catchment.Dashed lines: forecasts from the base hydrological model (i.e.without error updating).Solid lines: forecasts with error updating.

Figure 6 .
Figure 6.Forecast streamflows for the Abercrombie catchment for the period between 1 Augst 1997 and 15 September 1997.The top panel shows streamflows forecast with the AR-Raw model; the bottom panel shows streamflows forecast with the RAR-Norm model.Dashed lines: forecasts from the base hydrological model (i.e.without error updating).Solid lines: forecasts with error updating.Grey shading denotes instances of over-correction caused by the AR-Raw model.

Figure 7 .
Figure 7. NSE of streamflows forecast with the AR-Norm, AR-Raw and RAR-Norm models (colours).Performance of the corresponding base hydrological models is shown by hatched blocks.

Figure 9 .
Figure 9. PIT uniform probability plots.Curves on the diagonal indicate perfectly reliable forecasts.

Table 2 .
Comparison of the NSE calculated at (a) the receding limb and (b) the rising limb of the hydrograph for three different error models.
Figure 8.Comparison of the observed streamflows (Q t ) and forecast streamflows ( Q t ), as forecast: (1) with the base hydrological model (circles); and (2) with the base hydrological model and error updating models (dots) for the Orara catchment.

Table 3 .
Comparison of the skill scores based on CRPS and RMSEP (denoted by CRPS_SS and RMSEP_SS) for three different error models.