Adaptive correction of deterministic models to produce probabilistic forecasts

This paper considers the correction of deterministic forecasts given by a flood forecasting model. A stochastic correction based on the evolution of an adaptive, multiplicative, gain is presented. A number of models for the evolution of the gain are considered and the quality of the resulting probabilistic forecasts assessed. The techniques presented offer a computationally efficient method for providing probabilistic forecasts based on existing flood forecasting system output.


Introduction
The basis of many operational hydrological forecasting systems are process based models producing deterministic forecasts.Often significant resources have been invested in acquiring these models and users are familiar with their use and limitations.In many situations such models produce biased or inaccurate predictions of discharge or water level (Aronica et al., 1998;Pappenberger et al., 2007).This makes the issuing of accurate and reliable flood forecasts challenging.
Data assimilation (DA) has been used to address this challenge in two ways: assimilating observations to improve the process model predictions and assimilating observations to improve the representation of the prediction errors.Human forecasters widely practise both forms of DA.Manually altering the internal states of the model based on their interpretation of recent model forecast errors may act to improve future model predictions.The forecaster may use their knowledge of the recent prediction errors of the model in deciding when to issue flood warnings, thereby implicitly utilising the second type of DA.The effectiveness and consistency (across forecasters) of such manual DA techniques is rarely reported formally (Seo et al., 2009).
These manual DA techniques can be formalised to produce deterministic assimilation schemes (e.g.Cole et al., 2009;Moore, 2007).The DA process can also be cast in a probabilistic framework with the aim of constructing the predictive distribution P y t+f |y 1:T of the observation of some quantity of interest (e.g.water level or discharge); f time steps ahead given y 1:T = (y 1 , . . ., y T ) the observations of that quantity up to the current time t.
If the aim of the DA is to improve the predictions of a hydrological model M, a common framework (e.g.Liu and Gupta, 2007) is to cast the model in state space form so that the hydrological states (indexed by time) s t evolve according to s t+1 = M (s t , u t , ε t ) (1) where the u t are observed extraneous inputs (e.g.precipitation) and ε t a stochastic noise.The model states are then related to the observed values by the observation function H and stochastic noise ζ t+1 : The stochastic term ε t may be additive, that is It may also act within M to represent a number of features such as noise on the forcing term u t or time evolving model parameters (e.g.Rajaram and Georgakakos, 1989).By correcting the states of the model it may be hoped that predictions derived for unobserved sites (such as the internal Table 1.Model considered for the evolution of the gain specified in terms of the state space form in Eq. ( 12) along with any parameter constraints.
nodes of a hydraulic model) may also be improved.This of course cannot be validated until observations are taken at these points.
The operational usefulness of the predictive distribution constructed from the above state space formulation is dependent upon: -an appropriate description of the distributions of ε t and ζ t ; -an adequate solution of the filtering problem inherent in producing the forecasts.
Addressing both of these topics introduces a number of barriers to the operational implementation of this technique.
If either M or H is non-linear, the solution to the filtering problem is not trivial.Approximate solutions to the filtering problem can be provided by a number of algorithms such as particle filters (e.g.Doucet et al., 2001;Moradkhani et al., 2005a;Weerts and El Serafy, 2006), non-linear extensions to the Kalman Filter (Rajaram and Georgakakos, 1989;DaRos and Borga, 1997;Evensen, 2003;Moradkhani et al., 2005b;Reichle et al., 2008) or variational techniques (Li and Navon, 2001;Madsen and Skotner, 2005;Seo et al., 2003).Combinations of these techniques may also be used (e.g.Shamir et al., 2010).
Particle filters, which approximate the desired distributions through Monte-Carlo sampling, can be considered the most flexible, although the computational burden can be large (Smith et al., 2008a) and implementation difficult when ε t dominates the observation noise (Liu and Chen, 1998).The remaining techniques require less computational resource but introduce assumptions such as unbounded distributions that may require careful reparameterization of the hydrological model if the states are to remain hydrologically interpretable, e.g.volumes of water in the river channel must be greater or equal to zero.
All the techniques outlined above make multiple calls to the process model at each time step.The computational cost of this may be prohibitive for applications in real time when the lead times required for decisions about warnings are a constraint.This is particularly true if the implementation of the filtering algorithm is achieved by providing code that "wraps" the hydrological model and interacts by altering the initial state and parameter files (Weerts et al., 2010).
Regardless of the computational technique utilised great care should be taken in constructing the description of ε t and ζ t (Beven et al., 2008;Kirchner, 2006), particularly if there may be systematic biases, including phase errors, in the data or model (Reichle, 2008).The validation of these choices may require the re-analysis of a significant number of historic events, itself time consuming.
Using DA to improve the forecasts of the difference between the hydrological model and observed data can often be performed at minimal computational cost.If a suitable historic record of model output is maintained, the computational cost of setting up the DA may also be minimal.A wide variety of stochastic models have been proposed.These range from the classical auto-regressive moving average (ARMA) time series models of Box and Jenkins (1994) used operationally in the UK (Moore, 2007) to more complex semiparametric methods (e.g.Krzysztofowicz and Maranzano, 2004;Maranzano and Krzysztofowicz, 2004).
To provide reliable forecasts (in the pragmatic and probabilistic sense), these formulations and others (e.g.Montanari and Brath, 2004;Weerts et al., 2011;Seo et al., 2006) have to attempt to capture the potentially complex evolution of the model residuals.These residuals may incorporate a systematic or temporally varying bias.Reliance on temporal correlation within the residuals must be tempered by the fact that the correlation may be non-stationary, often being low at key times such as during the rising limbs of hydrographs (Todini, 2008) and much higher during recession periods.Residuals in extreme situations, such as floods may also possess characteristics different to the majority of the data.Furthermore, each flood may reveal previously unknown shortcomings in the hydrological/hydraulic model(s) making their residuals difficult to predict.In such situations, it may be useful to utilise robust error models and predictive bounds (e.g.Rougier et al., 2009;Vernon et al., 2010).
This paper considers the use of DA to improve the forecasts of the difference between the output of a deterministic hydrological model and the observed data.It explores the use of a multiplicative gain to correct the deterministic forecasts.This gain is evolved stochastically.Forecasts of future observed values, that is future y t , are expressed as probability distributions dependent upon both the gain and the deterministic forecast of the hydrological model.The approach presented has been utilised previously for operational flood forecasting (Lees et al., 1994).This paper extends previous work by considering a broader family of models for the evolution of the gain and two contrasting parameter estimation techniques.
Section 2 outlines the error model for providing probabilistic forecasts at a single observational site.In doing so a family of parsimonious models for the evolution of the gain are introduced.The use of the linear Kalman filter for DA and generation of predictive distributions is presented.Methods for estimating the parameters of the model are discussed in Sect.3. Section 4 presents an example application using an operational flood forecasting model from the UK.The validity of the assumptions used in parameter estimation is investigated and the usefulness of the uncertainty representations illustrated.

Methodology
This section presents the stochastic error model utilised within this paper and the computation of the predictive distribution.The representation of the stochastic error model and its evolution is outlined in a state space framework giving a natural framework for computing the predictive distribution as a filtering problem.

Error model
Recall that y 1:T = (y 1 , . . ., y T ) is a vector of T observations indexed by time with corresponding deterministic hydrological/hydraulic model predictions m 1:T .The observation y t is then related to the prediction m t by an adaptive gain g t and noise term t as outlined in Eq. (3).
The gain g t is a time varying correction for the bias in the model forecast.It is evolved stochastically according to the local level (Harvey, 1989) or generalised random walk (Jakeman and Young, 1984;Young et al., 1989) family of models outlined in this section.An alternative for the evolution of g t such as autoregressive moving average processes could also be considered.
The simplest local level model considered is a random walk where g t is given as the sum of its previous value and the stochastic noise η t .That is g t = g t−1 + η t .
(4) This is referred to as the random walk (RW) model.The local linear trend (LLT) model generalises this by introducing the slope d t which follows a random walk driven by the stochastic noise ξ t .Thus, In this paper it is assumed that t , along with the stochastic noise terms η t and ξ t , are not correlated with each other or in time.Further, they are realisations of unimodal, symmetric, unbounded random variables that can be summarised by their first two moments which are defined using the parameters σ 2 , q η and q ξ as The validity of these assumptions can be assessed from the forecast residuals as shown in Sect. 4. Two methods for estimation of the parameters are presented in Sect.3.
Three further local level models can be specified by placing constraints on the LLT model.If q η = q ξ , the trend in the LLT model is deterministic, resulting in the deterministic local linear trend (DLLT) model.When q ξ is zero, the slope is fixed and the evolution of the gain becomes a random walk with drift (RWD) model, that is Setting q η to zero but allowing positive q ξ results in an integrated random walk trend, referred to as the IRW model.This often results in a smoother adaptation of g t compared to the RW model outlined in Eq. (4).
The models outlined above for g t are parsimonious, the only unknown parameters other than σ 2 being the values of noise variance ratios q η and q ξ .A further level of complexity can be included by incorporating smoothing or damping parameters.Inclusion of such a parameter α in Eq. ( 4) results in a first order auto regressive (AR) model for the gain: Inclusion of the smoothing parameters (α and β) in the local linear trend model gives This is referred to as the smoothed local linear trend (SLLT) model.Two special cases of this are the smoothed random walk (SRW) model; in which β = 1 and q η = 0; and the damped trend (DT) model in which q η = q ξ and α = 1.More general and higher order representations are also possible such as doubly integrated random walks.Exploration of these is beyond the scope of this paper.

State Space form
All the models outlined can be conveniently expressed in a state space form with state vector x t = g t d t describing the gain (g t ) and its slope (d t ).The state vector evolves according the state transition matrix F and system noise matrix G as The state vector is related to the observations by where h t = m t 0 .The values taken by F and G depend upon the local level model selected.Table 1 outlines the values taken in terms of the matrix forms given in Eq. ( 12) for the various models considered along with any other parameter constraints.
Two methods for the estimation of the parameters are presented in Sect.3. The following sub-section discusses the use of the Kalman filter to generate the expected value and covariance of the predictive distributions.

Prediction using the Kalman filter
The assumptions regarding the stochastic noise terms presented in Sect.2.1 are the minimum required for application of the linear Kalman filter (Kalman, 1960).Suppose the distribution of x t has similar properties to that of the stochastic noise terms with its expected value and variance given by xt and σ 2 P t .(where P t is a 2 × 2 matrix), respectively.The Kalman filter can be used to predict future states and assimilate the observed data as it becomes available.
The one step ahead predictions of the distribution of the states, conditional upon the data up to time t, are given by the expected value: and variance σ 2 P t+1|t where The noise variance ratio matrix Q is constructed as The f -step ahead prediction of the states given the information up to time t can be computed by repeated application of Eqs. ( 13) and ( 14).The f -step ahead prediction error ν t+f |t and prediction variance σ 2 ψ t+f |t can be computed from the forecast states using: (15) Evaluation of these expressions requires knowledge of the future predictions of the flood forecasting model.
When a new observation becomes available it can be used to condition the distribution of the gain by updating the mean and covariance of the one step ahead prediction of the state distribution using Eqs.( 17) to (19) (see for example Kalman, 1960, for a derivation).
To evaluate the above recursions some initial values for x0|0 and P 0|0 are required.In this paper a representation based on taking a diffuse initial condition is used.The techniques presented in Durbin and Koopman (2001, Chapter 5) allow the conditional distributions of the states after two time steps; summarised by x2|2 and P 2|2 σ 2 ; to be computed analytically from the first two observation, the corresponding model predictions and θ.A suitable burn-in period denoted t 0 (see Sect. 4) is then used before commencing evaluation of the estimation criteria outlined in the following section.

Estimation
This section discusses the estimation of the unknown parameter vector θ defined for the models considered in Table 1.Two estimation techniques are outlined and there results contrasted in Sect. 4. The first technique is maximum likelihood estimation based upon the assumption that the prediction errors are independent realisations of Gaussian random variables.This introduces stronger assumptions about the stochastic noise terms than those introduced in Sect.2.1.The second method which is based on minimising the sum of the squared expected forecast errors, is more heuristic.In both cases the validity of the error assumptions can be assessed.This is discussed along with the the construction of predictive error bounds.

Gaussian maximum likelihood
In Gaussian maximum likelihood (GML) estimation, the parameters θ are estimated by maximising the likelihood of the f -step ahead predictions when it is believed that ν t+f |t is drawn independently from a zero mean Gaussian distribu-tion with variance σ 2 ψ t+f |t .Under these assumptions the log likelihood is where K is a constant with respect to θ.The maximum likelihood estimate of σ 2 can be computed conditional upon the other parameters in θ as This allows σ 2 to be concentrated out of Eq. ( 20) leaving (Schweppe, 1965): which is dependent upon the remaining parameters (denoted θ\σ 2 ).This can be numerically optimised to give maximum likelihood parameter estimates of θ.
The uncertainty in the predictions can be expressed as percentile confidence intervals for the predictions constructed as: were κ p is constant dependent upon p and can be computed from a standard normal distribution; for example κ 95 ≈ 1.96.
The normality of the forecast residuals and their correlation can be readily assessed using, for example, quantile and auto correlation plots (Box and Jenkins, 1994).

Minimising the sum of the squared expected forecast errors
The second estimation technique, referred to as SEFE for the remainder of this paper, is based on the appeal of minimising the sum of the squared expected forecasting error: This ensures that the expected value of the forecast is as close as possible (in terms of average squared error) to the observed data.The minimisation of S f allows the estimation of all the parameters in θ except σ 2 .A value for σ 2 can then be estimated using Eq. ( 21) if required.The error assumptions of the Kalman filter (Sect.2.1) imply that each predictive distribution is uni-modal, symmetric and unbounded.
Testing the symmetry of the forecast residuals, for example using Wilcoxon sign rank test (Wilcoxon, 1945), can indicate if this assumption is valid.Two methods for construction of predictive confidence intervals are considered.They make use of the theoretical symmetry of the forecast distribution and result in symmetric prediction intervals.The symmetry of the forecast distribution implies that prediction confidence intervals can be expressed as (25) t+f |t .Given the finite population of residuals, this empirical estimate of ρ p may not be not robust at high values of p.The values of ρ p can be adapted (for given p) as more data becomes available.Sequential tests for symmetry (e.g.Weed and Bradley, 1971) may be of use in such situations.Pukelsheim (1994) gives theoretical results for the upper limits of ρ p under the uni-model, symmetric and unbounded distributional assumptions.Specifically The case r = 3σ is the three sigma rule; that there is less than 5 % probability of a sample from univariate random variable random with the aforementioned properties being outside of 3 standard deviations from the mean.These upper limits can be used in two ways.Firstly, they allow for the estimation of conservative prediction confidence intervals, allowing for a more cautious view to be taken of the prediction uncertainty.The second use is as a means of analysing the suitability of the adaptive gain models considered by contrasting the sym-metrical empirical estimates and the theoretic upper limits for a given p.

Upper Severn case study
To illustrate the effectiveness and limitations of the proposed methodology in an operational setting a case study based on the upper Severn catchment (UK) is presented.The upper Severn river network is situated on the border of England and Wales and shown in Fig. 1.The River Severn rises in the Cambrian mountains (741 mAOD) and flows to the northeast before meeting the Vyrnwy tributary at Crew Green.The valley is wide and flat in this confluence area, with a considerable extent of flood plain.The river then flows east to Shrewsbury.The lower boundary of the 2284 km 2 upper Severn catchment is defined by the gauge at Welsh bridge in Shrewsbury where the median annual flood is greater than 284 cumecs.Average annual rainfall can exceed 2500 mm in the head waters of the catchment.The catchment has seen seven significant flood events in the past twelve years.
The existing flood forecasting model consists of a number of simplified rainfall-runoff representations linked to a one dimensional ISIS hydrodynamic model (see http://www.halcrow.com/isis/default.asp).Results are shown for two sites seen in Fig. 1: Welsh bridge the lower boundary Operationally, the flood forecasting model is run with a 15 minute time step and evaluated twice daily.At midnight it is run in a continuous mode using observed inputs and output data assimilation to generate a set of "warm states" which are used to initialise the forecasts.The first set of forecasts issued at midnight (00:00) give up to 36 h lead time using forecast precipitation.The second set of forecasts is issued at midday (12:00).These forecasts are initialised by evolving the "warm states" using the observed meteorological variables between midnight and midday.Forecast precipitation is then used to evaluate the flood forecasting model giving forecasts of the hydrological variables with up to 36 h lead time.Further details can be found in Weerts et al. (2011).
The adaptive gain correction is initialised at the start of each forecast period.The first four hours of forecasts are used as a burn-in period for the adaptive gain.Then, in keeping with the operational system, forecasts are issued based on the most recent forecast run of the flood forecasting model for which the adaptive gain is burnt in.A single year of data ( 2006) which contains a significant flood event is used to identify and estimate the adaptive correction.Three years (2007)(2008)(2009) are used for validation purposes.The results for each of the sites are summarised below.

Welsh bridge
Figure 2 shows the hydraulic model predictions and observed data at Welsh bridge during the calibration period.There is a systematic over estimation of the water level during periods of low flow.Such a systematic bias may arise from the calibration of the hydraulic model.If the model was calibrated to discharge data and the representation of the gauged cross section poor at shallow depths, such a bias may result.Alternatively an artificially high low water level within the model, such as that seen here, can arise as a means of achieving acceptable representations of high water level periods.
The adaptive gain correction could be utilised to correct the forecasts in these low flow periods.The difference in relationship between the hydraulic model and observed data for the low flow periods and when this model is responding to rainfall suggests that a different calibration of the stochastic model for the adaptive gain may be required for each.For flooding purposes the response to rainfall is more important.Therefore, the calibration and validation criteria are only evaluated at Welsh bridge for the simulations of the hydraulic model which forecast a water level greater than 1 m.Results are shown for all the time periods, high levels (> 2 m) and periods where the hydrograph is rising for different combinations of lead time (hours) and GRW model.
Table 2 presents the log likelihood and root mean squared error (RMSE) summaries of the calibration results at various forecast lead times.For both calibration methodologies, the performance of the LLT model along with its special cases DLLT, RWD and IRW decays much more rapidly with lead time than the alternative models.This suggests that as with the SLLT, SRW and DT models there is the need to recognise the decay in information about the future values of the gain or its gradient through the use of damping parameters.The RMSE of the deterministic model prediction (based on always using the most recent model run) is 0.37.At lead times greater than 12 hours, none of the models achieves a significant decrease in the RMSE, though they still provide an estimate of the forecast uncertainty.Calibration with the GML methodology allows model selection to be performed using time series identification criteria, such as the AIC (Akaike, 1974) or BIC (Schwarz, 1978).This leads to the selection of the SLLT at all the lead times considered.The RMSE results of the SEFE calibration suggest that at shorter lead times, a more parsimonious model, such as the RW, may be suitable.Further analysis of the predictive performance of the models can be gained by examining the coverage and precision of the forecasts shown for the calibration period in Tables 3 and 4.
Table 3 shows the percentage of points falling within the 95 % prediction confidence interval for various lead times during calibration.Results are shown for three categories: the whole period, high water levels greater than 2 m and rising parts of the observed hydrograph where y t+1 > y t .By definition the empirical bounds derived from the SEFE methodology bracket 95 % of all the (calibration) observations.Coverage of the high and rising flow categories for this methodology is less than 95 % for all models and lead times.Coverage of high flows decreases with increasing lead time for all the models.This indicates that the pooling of the residuals in computing the empirical bounds may not be appropriate.The values of κ p , or indeed the model parameters, may be state or time dependent (Young, 2002;Young et al., 2001).
The theoretical prediction bounds of the SEFE methodology give in all cases percentages of coverage based on all the calibration data that is consistent with the error assumptions outlined in Sect.2.1.A similar pattern to the empirical results for the coverage of the high and rising flows categories is shown.
The high percentages for coverage when using the Gaussian likelihood calibration at short forecast lead times support the suggestion that the assumptions used in deriving the likelihood may not be valid.However, at lead times up to six hours, there is less evidence of systematic differences in the percentage of observations covered in the three discharge categories.As with the SEFE calibration at longer lead times, the coverage of the high flow category deteriorates for all the models.
The results in Tables 4 summarise the precision of the of the forecasts through the mean of the forecasts standard deviation over each flow category.The empirically derived forecast error bounds are, especially at lead times up to six hours, significantly sharper than the theoretical upper limit and those resulting from the Gaussian calibration.
Figure 3 shows two summary plots of the standardised 6 h forecast residuals of the SLLT model calibrated using the Gaussian likelihood methodology.The upper pane shows a quantile-quantile plot of the standardised residual t+f |t against a standard norm distribution.The heavier tails present in the distribution for the standardised residuals indicates that the Gaussian assumption may not be valid.The residual distribution is approximately symmetric.Therefore, the use of the Kalman filter may be a valid approximation, even if the resulting prediction confidence intervals are not appropriately defined.The lower plot indicates that the residuals are correlated up to at least a lag of 28 samples, or 7 h, approximately the forecast lead time.This and the low correlation value suggest that the conditional independence assumption, while not strictly valid, may be a reasonable approximation.
The forecast residuals of the SEFE methodology can also be assessed.The lower pane of Fig. 4 shows the autocorrelation pattern in the residuals of the RW model fitted by the SEFE methodology for a 6 h leadtime.The correlation within the residuals lower and less persistent at longer lags than for the Gaussian calibration.The upper pane of Fig. 4 shows the cumulative distributions of the positive standardised residuals along with that of the absolute value of negative standardised residuals.It indicates a lack of symmetry in the residuals confirmed by a Wilcoxon sign rank test.
Table 5 shows the percentage of observations falling within the 95 % prediction confidence interval during the val-idation period.The high percentages for the GML methodology further support the suggestion that the assumptions used in deriving the likelihood may not be valid.The results for the SEFE calibration indicate that the symmetrical empirical estimation of ρ p derived during calibration perform well in the validation period.The predictive bounds derived using the theoretical upper limit appear to bracket around 99 % of the data during the validation period, suggesting they are unduly conservative.
Figure 5 shows the predictions made using the SLLT model calibrated using the Gaussian methodology for the largest flood events at Welsh bridge in the calibration and validation periods.The results are encouraging with the adaptive gain acting to correct the model forecast towards the observed values.During both the calibration and validation period adaptive gain methodology is able to correct for errors in both the timing and magnitude of the hydrograph peaks as well as the receding limb.
The calibration event (upper pane) highlights that the methodology is not able to correct for highly erroneous model forecasts.The validation results (lower pane) also reveal one aspect of how the adaptive gain behaves in a less than ideal situation.During January there is a period where no observations are taken.During this time, the prediction interval can at first be seen to gradually widen, as no further observations are available to condition the correction of the current simulation of the flood forecasting model.Then, when the next run of the flood forecasting model becomes available the gain cannot be initialised, so no forecasts are issued.In such situations, the forecaster making operational decisions may wish to consider the past simulation of the flood forecasting model and the corresponding adaptive gain.
Figure 6 shows the predictions made for the same event using the RW model calibrated using the SEFE methodology with empirical prediction bounds.Visual comparison with Fig. 5 shows the forecasts to be more precise, but failing to capture the rising limb of the validation hydrograph.

Llandrinio
Tables 6, 7 and 8 summarise the performance in calibration of the adaptive gain models at Llandrinio.As at Welsh bridge, there is a bias in the baseflow of the hydraulic model.Performance is assessed only when the flood forecasting model predicts a water level greater than 1.8 m.None of the models for the adaptive gain produce a RMSE lower than that of the uncorrected model (0.39) for lead times greater than 9 h.This time is shorter than that at Welsh bridge which is consistent with the location of the gauging site closer to the headwaters of the catchment.As for Welsh bridge the deterioration in performance is more marked for gain models, which include a gradient term but do not include a damping parameter.
The coverage probabilities during calibration (Table 7) are less dependant upon the water level than those at Welsh bridge.In keeping with the Welsh bridge results, the precision of forecasts issued using the SEFE methodology, with empirical error bounds, is higher than that of the gaussian calibration methodology.In validation (9) all the prediction error bounds appear conservative, bracketing a higher proportion of the observations than would be expected.
Figure 7 shows the 6 h lead item forecasts for two large calibration vents at Llandrinio.As at Welsh bridge the technique cannot correct for highly erroneous model outputs such as those around the 6 December 2006.An earlier period on the 3 December 2006 highlights the importance of

Conclusions
The results presented in this paper indicate that comparatively simple error models combined with real time data assimilation can provide probabilistic forecasts.
Results are shown for a medium size river basin.Two calibration methodologies are presented.Detailed residual analysis suggests that, though the error assumptions used in these derivations are not satisfied, they can produce probabilistic forecasts with conservative coverage and precision of the order of 10 % of the observed water level.The selection of which calibration methodology performs most adequately requires consideration on a case by case basis.Techniques for making this selection are presented.
The lead times at which the expected value of the forecasts can be seem to improve the RMSE of the uncorrected model predictions depends upon the time it takes for the catchment to respond at the gauged site.It remains to be seen if such a data assimilation methodology is able to perform adequately on small basins where the response to rainfall is more rapid and timing errors, due perhaps to errors in the rainfall forecast, are more severe (see Alfieri et al., 2011, for some initial results).
In an operational setting, only the linear Kalman filter evolving two states needs to be evaluated.Though not discussed this can be readily implemented in spreadsheet packages without recourse to more complex programming.This makes the addition of these data assimilation techniques to existing deterministic forecasting systems comparatively straightforward with a very low "cost of entry".
A number of future developments can be proposed.Firstly, analysis of the results suggests that recognition of state or time dependency within the model parameters (Young, 2002;Smith et al., 2008b) may be worthy of exploration.Recognising that the gain at different sites may be correlated opens up a further line for future research.If such correlations can be successfully exploited, it will both increase the robustness of the data assimilation scheme to missing observations and potentially allow forecast updating at sites where very limited data are available (e.g.locations only observed during the calibration of a hydraulic model).The exploration of such strategies is the subject of on-going research.

Fig. 1 .
Fig. 1.Schematic map showing the upper River and Vyrnwy tributary which both flow from west to east.Gauged locations considered in the study (circles) and urban areas are shown (hatched).The insert shows the location of the catchment within the UK.

Fig. 2 .
Fig. 2. Summary plots of the data available for Welsh bridge during the calibration period.Points represent observed data with the line being the concatenation of the output of the most recent hydraulic model initialisation.A bias in the prediction of low flows is clearly shown as are periods of missing data and poor model initialisation.

Fig. 3 .
Fig. 3. Summary plots of the analysis of the 6 h forecast residuals at Welsh Bridge for the DT model calibrated using the Gaussian maximum likelihood methodology.The upper pane shows the quantile-quantile plot of the standardised residuals compared to a standard normal distribution.The lower pane shows the auto-correlation of the residuals.

Fig. 4 .
Fig. 4. Summary plots of the analysis of the 6 h forecast residuals for the RW model calibrated using the SEFE methodology.The upper pane shows the cumulative distributions of the absolute values of the residuals at Welsh Bridge.The solid line being positive residuals and the dotted line being negative residuals.The lower pane shows the auto-correlation of the residuals.

Fig. 5 .
Fig. 5. Examples of 6 h ahead forecasts given at Welsh bridge for two large flood events during calibration (upper pane) and validation (lower pane) periods generated using the SLLT model calibrated using the Guassian methodology.The shaded area represents the 95 % prediction confidence interval with the solid line being the expected value of the predictions.Observed data points are also shown along with the deterministic model forecast (dotted line).

Fig. 6 .
Fig.6.Examples of 6 h ahead forecasts given at Welsh bridge for two large flood events during calibration (upper pane) and validation (lower pane) periods generated using the RW model calibrated using the SEFE methodology.The shaded area represents the 95 % prediction confidence interval with the solid line being the expected value of the predictions.Observed data points are also shown along with the deterministic model forecast (dotted line).

Fig. 7 .
Fig. 7. Examples of 6 h ahead forecasts given at Llanrinio for two large flood events during calibration (upper pane) and validation (lower pane) periods generated using the SLLT model calibrated using the SEFE methodology with empirical error bounds.The shaded area represents the 95 % prediction confidence interval with the solid line being the expected value of the predictions.Observed data points are also shown along with the deterministic model forecast (dotted line).

Table 2 .
Calibration results for Welsh bridge showing the log likelihood and RMSE (bracketed) for various forecast lead times (hours) and GRW models.

Table 3 .
Fraction of observations at Welsh bridge bracketed by the estimated 95 % prediction intervals during calibration.Bracketed results are those for the SEFE calibration with the italicised and bold values corresponding to the empirical and theoretical symmetric bounds.Results are shown for all the time periods, high levels (> 2m) and periods where the hydrograph is rising for different combinations of lead time (hours) and GRW model.

Table 4 .
Precision of the forecasts (see text for definition) during calibration at Welsh bridge.Bracketed results are those for the SEFE calibration with the italicised and bold values corresponding to the empirical and theoretical symmetric bounds.Results are shown for all the time periods, high levels (> 2 m) and periods where the hydrograph is rising for different combinations of lead time (hours) and GRW model.

Table 5 .
Fraction of observations at Welsh bridge bracketed by the estimated 95 % prediction intervals during validation.Bracketed results are those for the SEFE calibration with the italicised and bold values corresponding to the empirical and theoretical symmetric bounds.

Table 8 .
Precision of the forecasts (see text for definition) during calibration at Llandrinio.Bracketed results are those for the SEFE calibration with the italicised and bold values corresponding to the empirical and theoretical symmetric bounds.Results are shown for all the time period, high levels (> 4 m) and periods where the hydrograph is rising for different combinations of lead time (hours) and GRW model.