A data based mechanistic real-time flood forecasting module for NFFS FEWS

Introduction Conclusions References


Introduction
New and useful real-time flood forecasting research can be made operational by taking advantage of the DELFT-FEWS open shell framework.The FEWS modular design allows the modeller to concentrate on producing an effective forecasting algorithm which can later be converted to a FEWS module.The FEWS framework provides tried and tested code for accessing real-time hydrology/meteorology data to drive the model and permit data assimilation, together with graphing facilities to produce a user-friendly interface for the end user.This paper presents the Data Based Mechanistic method for real-time flood forecast modelling incorporating data assimilation as described by Young (2002).The method is used to form a representation of the Eden catchment network with a termination node at Carlisle (Cumbria UK); a location prone to flooding (Archer et al., 2007).We then describe the structuring of this model in a format suitable for inclusion within the DELFT-FEW (and by extension the UK's NFFS).When several nodes are connected together in a network the DBM model becomes semi-distributed.While a semi-distributed model adds complexity, it also provides the attractive properties of accounting for spacial variability in the rainfall field, the provision of forecasts at intermediate sites, and a degree of robustness should one or more data streams fail.A number of studies have shown good results using this approach (see for example Romanowicz et al., 2006;Leedal et al., 2009;Alfieri et al., 2011).The DBM approach models the relationship between an upstream level or rainfall as input and the downstream level.By choosing level as the model output, it is no longer possible to interpret results according to principles of mass balance; however, this choice provides a number of advantages: (1) it remove the reliance on a rating curve transformation which will generally introduce additional uncertainty; (2) given present sensor technology, the observation of level is generaly more accurate (and cheaper) than measures of flow; (3) level is often the variable of interest when choosing whether or not to issue a warning; and (4) the relationship between rainfall/upstream level and downstream level is generally closer to linear, and therefore easier to model using the DBM method (at least during the in-bank phase of an event).
The fundamental component of a DBM node is the system transfer function.This is represented in the discrete time case by Eq. (1).
where B(z −1 ) and A(z −1 ) are polynomials of order m and n, respectively such that the system not accounted for by the model.The model orders (values for m and n) and the value of the polynomial coefficients are identified and estimated from time series data.A large collection of algorithms for identifying, estimating and processing this type of model have been collected together in the Captain ™ toolbox for the Matlab ™ numerical programming environment (Taylor et al., 2007).

Mechanistic interpretation
Young (1998) has writen extensively on the mechanistic interpretation of DBM models.
A key step in this process is to decompose a DBM model of order greater than one to an equivalent assemblage of first order components and interrogate the properties of these.A first order transfer function (B(z −1 ) = b 0 and A(z ) can be characterised by steady state gain: b 0 /1+a 1 , and time constant: −∆t/ ln(a 1 ) where ∆t is the discrete sampling time (commonly 15 or 60 min in operational flood forecasting applications).An important step in the DBM approach is to guarantee that a meaningful mechanistic interpretation exists for the linkage and characteristics of each of these first order components.This step is intended to guard against empirical over-fitting that could result in a poor extrapolation to system behaviour outside that described by the calibration data.

State space formulation
It is straight forward to convert the transfer function representation of Eq. ( 1) to the equivalent state space form of Eq. ( 2).
where x k is a vector of internal model states; the elements of F, G, and h are determined by the TF parameters; r k−δ is a vector of suitably lagged input values.For operational flood forecasting, these will generally be rainfall or upstream levels possibly Figures

Back Close
Full transformed by a nonlinear function (see later).δ is the identified advective time delay between input and output; q k is a vector of process noise [q 1,k • • • q n,k ] T with each element applied to the associated n internal states; ξ k is the observation noise associated with the measurement.Here we make the simplifying assumption that the elements of q k and ξ k are zero mean, serially uncorrelated and statistically independent, normally distributed random variables with variance at sample k specified by q 1,k • • • q n,k and ξ k .The facility to specify variance at each sample period allows for heteroskedasticity within the modelling framework (see later).

Input nonlinearity function
To increase the degree of linearity between the input and output of the model it is often beneficial to apply a transform to the input.The DBM approach does not impose a specific form for the input nonlinearity function.Instead it is left to the modeller to choose an appropriate function.Beven et al. (2011) describe a number of methods including power law, radial basis functions, piecewise cubic Hermite data interpolation, and Takagi-Sugino fuzzy inference systems.From a code-writing perspective, any candidate input nonlinearity method can be hidden behind a standard function call template thus abstracting the specific computation approach from the other components of the forecasting algorithm.In line with best-practice in software design, this approach encourages modularity and abstraction as a means to support robust and extensible code; a practice which has been repeated as much as possible throughout the DBM FEWS module design.

Data assimilation
The state space form of Eq. ( 2) is well suited to data assimilation schemes using the Kalman Filter (KF) (Kalman , 1960).Young (2002) describes a modified KF designed for the DBM model structure.This two-stage data assimilation approach is shown in Introduction

Conclusions References
Tables Figures

Back Close
Full Eq. (3). (3) where P k is the error covariance matrix associated with the state estimate vector xk ; Q is a square matrix with diagonal entries representing the ratio of variance of the process to observation noise for the model states (off-diagonal entries = 0) i.e., the diagonal elements of Q are [ k is an estimate of the heteroskedastic observation noise variance at sample k calculated using the empirical formula shown by Eq. ( 4).
where θ 0 and θ 1 are hyperparameters determining the degree of inflation in observation uncertainty for increasing amplitude of the observation.The f -step forecast (where f ≤ δ) is produced by iterating the forecast step of Eq. ( 3) the required number of times.
The estimate of the variance of the forecast output ŷk+f |k is calculated as shown by Eq. ( 5).

Conclusions References
Tables Figures

Back Close
Full

Hyperparameter optimisation
The hyperparameters in Eqs. ( 3) and ( 4) are multiplicative and can not be optimised directly.Smith et al. (2012a) have shown that an optimisation up to proportionality is possible if θ 0 from Eq. ( 4) is fixed at 1 and optimisation is limited to θ 1 and the diagonal element of Q.This provides a ratio for the distribution of process noise between internal state variables; together with θ 1 , the degree of inflation of observation noise variance for increasing observation level.Adopting this approach means that during the optimisation process Eq. ( 4) is replaced by: If we then make the simplifying assumption that the model residuals are normally distributed, c scale can be approximated using Eq. ( 7).
In the case study described below, the model identification and estimation were performed using the RIV, RIVID and SDP functions from the Captain ™ toolbox described earlier.The hyperparameter optimisation uses the Matlab ™ lsqnonlin function.

Forecast uncertainty
It is significant that the KF algorithm operates as an estimator for the probability distribution of the state estimates.of these assumptions will be simplifications of the true nature of the system; however, it is straight forward to test the performance of the forecast uncertainty estimates as part of the model calibration process and as an ongoing performance measure during operation.
The ability to provide an estimate for the forecast uncertainty is an increasingly important component of operational flood forecasting (for an extended discussion of this topic see Beven, 2012, 289-311).As an example of the general move towards generating probabilistic forecasting within FEWS see Weerts et al. (2011) who describe the quantile regression method applied to deterministic forecasting models.The estimation of forecast uncertainty can be displayed in a clear fashion alongside the mean forecast using the FEWS data visualisation mechanism.

Adaptive gain
Finally, we include an adaptive gain mechanism such that the true output (y k ) is assumed to be a product of a deterministic, time-varying scalar gain term (g k ) and the model estimate ( ŷk ).The gain operates independently of the main data assimilation scheme and is designed to account for long term bias in the equilibrium response of the catchment such as seasonal variations or gradual adjustments to channel geometry.The time variation in the gain term is modelled as a random walk process again employing a KF algorithm to update the gain state on-line in response to long-term mismatch between observed and forecast level.Although the adaptive gain estimator generates a distribution for g k , the approach taken for the DBM FEWS module is to collapse this distribution to the mean and use this as a deterministic scalar.The uncertainty estimate in this term is therefore discarded.While this may result in a loss of information, it greatly simplifies the implementation and optimisation of the forecast model by removing the requirement of treating the interaction between the adaptive gain and the state estimate uncertainties.A justification for this simplification is that the time frame for adjustment in the adaptive gain term is very much longer than that of the state vector adjustment.In operation, the adaptive gain is fixed during the f -step forecast period as observations are not available.The modified scalar Kalman Filter proposed by Young (2002) is used for the adaptive gain: where p k is the error variance term associated with the gain estimate ( ĝk ); q g is a noise variance ratio that determines how quickly the gain is allowed to adjust; ŷk is the estimate of level prior to applying the adaptive gain; and y k is the observed level.
In the Eden example described later the q g terms were tuned manually taking values between 1×10 −2 and 1×10 −5 depending on the characteristics of the node.This value was then held constant during the optimisation of the hyperparameters in Eqs.
(3) and ( 6).Ongoing research is taking place to include the hyperparameters of the adaptive gain mechanism together with the main KF hyperparameters and optimise both together.As stated above, this is challenging due to interaction among the parameters in an unconstrained optimisation.
The adaptive gain component of the DBM FEWS module is not limited to the DBM modelling scheme and can be used as a simple data assimilation mechanism applied to the output of alternate model schemes.Smith et al. (2012b) describes the use of adaptive gain in a number of real-time flood forecasting examples.

DBM FEWS module
The preceding section outlined the required components of a typical DBM model node.The identification, estimation and optimisation of a node is not part of the FEWS Introduction

Conclusions References
Tables Figures

Back Close
Full framework (any computational environment could be used to generate the node).The necessary requirements for inclusion of a node as a FEWS module are shown in Table 1.Within the FEWS open shell framework it greatly aids licensing and therefore dissemination if the module can be written in an open source language.The R language (GNU General Public License version 2) proved an ideal language for implementing the FEWS DBM module (R Development Core Team, 2008).

Overview of FEWS
The FEWS open shell framework provides a central database of hydrological and meteorological time series together with a set of sophisticated modules, utilities, and adapters to providing a linkage between this and an extensible inventory of forecasting models, external databases, and an interface to the user (Werner, 2008).FEWS deliberately stores data in a model agnostic format, and handles data using a generic XML schema.In this way FEWS avoids the limitation of being a model-specific utility and can instead interface with a range of present and future forecast models.
The FEWS database access module and module adapters can perform sophisticated merging of time-series data.This allows a forecast model to request a segment of time-series processed such that observations take precedence followed in order by forecasts from the least distant temporal index (Deltares, 2012).This is essential for multi-node DBM forecasting where the downstream nodes will themselves generate forecasts based on a cascade of upstream forecasts.

Main function logic
The DBM FEWS module has been designed to be flexible, extensible, and as clear as possible for future code collaboration.We have produced a data specification sheet that describes the format of the data to extract from the central hydroinformatic database.
It is then straightforward for the FEWS DBM model adapter to be configured such that at each forecast cycle the correctly formatted data segment can be collected and Introduction

Conclusions References
Tables Figures

Back Close
Full passed to the DBM module.The data segment contains a header line that identifies the required catchment node.This segment is placed in the catchment's work folder and the catchment's main R script is then called with command line arguments identifying the data file to operate on.The main script moves the operating system to the required catchment node sub-directory and sequentially calls the node's functions.The functions that make up each node are called with a uniform function template.The parameters of the node that are incorporated into the function computations (i.e., those described in Table 1) are loaded from file each time the forecast cycle is invoked and saved back to file upon exit from the forecast cycle.This provides a robust form of state persistence.The actual computation method operating on the node parameters and data can vary between nodes provided the input and output conform to the template.Following the sequential execution of each node function the main script then deposits the output data in the catchment's work folder for collection by the FEWS adapter and eventual storage in the central hydroinformatic database.The high level flow control for a single forecast cycle is shown in Fig. 1.The directory structure for a catchment model is shown in Fig. 2.

Eden case study
The The demonstration Eden DBM FEWS module uses a network of 6 model nodes.The configuration of the nodes is shown in Fig. 4. The name of the Environment Agency gauge sites supplying input (rain or level) and output (level) for the nodes is also shown.

Calibration
The calibration data set ran from 9 September 2003 to 10 March 2005.This period included the unprecedented 8 January 2005 event (see Mayes et al., 2006;Archer et al., 2007;Roberts et al., 2009).The results of the DBM model at 2, 4, 6 and 7 h forecast lead times are shown in Fig. 5.

Testing
A flood event of significant magnitude (though still in-bank) occurred in November 2009.This event occured four years after the end of the calibration data period.The performance of the DBM FEWS module at lead times of 2, 4, 6 and 7 h for this event is shown in Fig. 6.
A summary of the probability of detection (POD) and false alarm rate (FAR) performance of the Eden DBM module over 57 threshold crossing events for the testing period is shown in Table 2. Table 3 shows the Nash-Sutcliffe efficiency score for the 50th percentile forecast using (1) all data and (2) the subset above 3m depth.The table also shows the percentage of observations that fell outside the one and two standard deviation ranges.It is clear from subset (2) that very good global performance statistics can be somewhat misleading; providing an over-optimistic impression of the forecast performance at the high-flows relevant to flood risk management (this is not an issue confined to the DBM FEWS model).A more realistic investigation of forecast skill is best obtained from a visual investigation of performance for significant flood events such as those provided by Figs. 5 and 6.Introduction

Conclusions References
Tables Figures

Back Close
Full A description of POD and FAR calculation is given in Environment Agency (2006).The POD and FAR results show the ability of the forecast to track observations -this is greatly aided by the data assimilation process.As expected, the number of false alarm events increases as FAR is calculated for percentiles above the mean.The calculation of forecast uncertainty could be used to provide warnings based on a given percentile appropriate to the sensitivity of individual sites, or allow ranked or continuous probability score statistics to be calculated and used for cost benefit decision support analysis (Murphy, 1970) (see also Jones et al., 2003, for a review of model performance measures including probabilistic methods).
The uncertainty estimate provides a valuable model evaluation that would not be available were the forecast purely deterministic.

Conclusions
This paper demonstrates that, consistent with previous studies (Lees et al., 1994;Romanowicz et al., 2006Romanowicz et al., , 2008)), the DBM method can produce effective semi-distributed catchment models for real-time flood forecasting.The data assimilation process is based on the KF and is therefore an inherently probabilistic estimator of the river level.This allows the DBM module to provide not only a mean forecast but also an indication of the uncertainty associated with the level forecast in the form of an heteroskedastic normal distribution.The user then has a useful indication of the scale of spread for the forecast period.Communicating this information to the user is performed by the visualization methods for communicating probabilistic flood forecasting data is an active area of flood risk management research (see Leedal et al., 2010).
The forecast probability distribution also allows the user to issue warnings at a range of approximate likelihoods using a consistent, numerically-derived framework; for example, supervisors of sensitive infrastructure can be issued with warnings at a lower level of inundation probability.However, it is still far from clear how best to use the additional information provided by probabilistic forecast methods (see Todini, 2004, for an overview of the issues involved).
The DELFT-FEWS framework provides a flexible and extensible mechanism for embedding flood forecasting models within a wider hydroinformatic infrastructure.The exercise presented here provides a demonstration of how this process can be accomplished.Development of the DBM module for FEWS provided a test-case for the knowledge transfer (KT) process between the research and applied fields.The ease with which this transfer was accomplished for the Eden case study demonstrates the strength of the FEWS philosophy and its benefits in achieving KT.Introduction

Conclusions References
Tables Figures

Back Close
Full  Full  Full  Full Discussion Paper | Discussion Paper | Discussion Paper | 2 The DBM flood forecasting model A DBM catchment model for real-time flood forecasting can contain one or more nodes.
the discrete time backwards shift operator i.e., z −i u k = u k−i ; δ is an integer value representing the time lag of the system i.e., output (y) at time k is a response to the input stimulus (u) applied at time k − δ; ξ k is a noise input representing all the stochastic components of Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | The online updating of the state estimate error covariance matrix P k and by extension the estimate of the f -step ahead forecast error variance provides important information about the properties of the forecast produced by the DBM node and therefore an estimate of the range within which the future observations will fall.The estimate of uncertainty is dependent on the underlying statistical assumptions of the KF, the KF hyperparameters, and the heteroskedastic variance model.All Figures Back Close Full Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Eden catchment is located in Cumbria, North West England.It has an area of approximately 2400 km 2 .The principle rivers are the Eden, Eamont, Irthing, Petteril and Caldew.The key runoff generating regions are the Central Lake District peaks (including Skiddaw and Helvellyn) to the West, the Pennine moors to the East, and the Kielder Forest to the North East.The high rainfall and steep terrain result in a fast catchment response.Flood warning lead times beyond six hours are difficult without recourse to numerical weather predictions.The catchment is well instrumented by the UK's Environment Agency with some 31 level and 16 rainfall telemetered gauges.Figure 3 shows the Eden catchment together with the location of the gauge sites used by the DBM model network.Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

FEWS
software via flexible data visualisation tools.The default configuration for the DBM module renders the mean forecast (the 50th percentile value) for both past values and forecast together with a coloured patch to indicate the forecast uncertainty at the 95th percentile range.Past forecast values and observations are also viewable together with the 95th percentile uncertainty range for a given f -step value.This allows the user to gain insight into the performance of the uncertainty estimation.Data Introduction Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Murphy, A.: The ranked probability score and the probability score: a comparison 1, 2, Mon.Weather Rev., 98, 917-924, 1970.7283 R Development Core Team: R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, available online: http://www.R-project.org,last access: 28 May 2012, 2008.Discussion Paper | Discussion Paper | Discussion Paper | Young, P.: Advances in real-time flood forecasting, Philos.T. Roy.Soc.A, 360, 1433-1450Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Table 1 .
The data structure for a DBM model node in FEWS.Each node also includes specific input nonlinearity, heteroskedastic variance, and data preprocessing functions.KF parameters as defined in Eq. (3) Introduction g adaptive gain parameters as defined in Eq. (8) m order of TF numerator polynomial n order of TF denominator polynomial isLevel Boolean defining if input is level or rain outOffset offset for output inOffset offset(s) for input(s) δ integer forecast lead time F , G, h, P, Q

Table 2 .
Performance measures for the Eden DBM module at Sheepmount for the 6 h forecast lead time.

Table 3 .
Summary of model performance for observations above (row 1) 0 m, and (row 2) 3 m.Column 3 (N-S) is the Nash-Sutcliffe efficiency measure.Columns 4 and 5 show the percentage of observations falling outside the 1 and 2 standard deviation range described by the forecast uncertainty parameterisation.Results are for the Eden DBM module at Sheepmount for the 6 h forecast lead time.