Inundation risk for embanked rivers

Introduction Conclusions References Tables Figures

Abstract.The Flood Frequency Analysis (FFA) concentrates on probability distribution of peak flows of flood hydrographs.However, examination of floods that haunted and devastated the large parts of Poland lead us to revision of the views on the assessment of flood risk of Polish rivers.It turned out that flooding is caused not only by the overflow of the levee crest but also due to the prolonged exposure to high water on levees structure causing dangerous leaks and breaches that threaten their total destruction.This is because the levees are weakened by long-lasting water pressure and as a matter of fact their damage usually occurs after the culmination has passed the affected location.The probability of inundation is the total of probabilities of exceeding embankment crest by flood peak and the probability of washout of levees.Therefore, in addition to the maximum flow one should also consider the duration of high waters in a river channel.
In the paper the new two-component model of flood dynamics: "Duration of high waters-Discharge Threshold-Probability of non-exceedance" (DqF), with the methodology of its parameter estimation was proposed as a completion to the classical FFA methods.Such a model can estimate the duration of stages (flows) of an assumed magnitude with a given probability of exceedance.The model combined with the technical evaluation of the probability of levee breaches due to the duration (d) of flow above alarm stage gives the annual probability of inundation caused by the embankment breaking.
The results of theoretical investigation were illustrated by a practical example of the model implementation to the series of daily flow of the Vistula River at Szczucin.Regardless of promising results, the method of risk assessment due to prolonged exposure of levees to high water is still in its infancy despite its great cognitive potential and practical importance.
Therefore, we would like to point out the need for and usefulness of the DqF model as complementary to the analysis of the flood peak flows, as in classical FFA.The presented twocomponent model combined with the routine flood frequency model constitutes a new direction in FFA for embanked rivers

Introduction
The most popular way of flood protection in Poland is the embankment of the rivers.A consequence of this passive way of protection is that floods in Poland occur mostly due to levee breach or to flow over the crest of dikes.The sense of security in floodplains of embanked rivers results from the belief that levees protect against the flood magnitude for which they were designed.Therefore, it creates the illusion that if the actual forecasted flood peak does not exceed the safety levels related to levee's designed value, one can assume that the risk of water overtopping the dike crest is negligible and so is the risk of flooding in the protected area.The records of floods in Poland show that this is not true; more often the floods are the result of the prolonged exposure to high water on levees.The levees are weakened by water and their disruption occurs when it seems that the danger is over, i.e. after passing culmination.This is particularly dangerous because when the staff responsible for flood protection and local residents breathe a sigh of relief the worst is yet to come.
Therefore, apart from the magnitude of the peak flows another important factor should be taken into consideration, the duration of high water levels, in fact, a parameter of the wave's shape.Long-lasting high stages may weaken the levees' structure (soaking) and cause dangerous leaks, blurs and breaks that threaten their destruction.That is why the classical Flood Frequency Analysis (FFA) concerning only the frequency of the annual maximum (AM) flows is not suitable in this case and ought to be supplemented by the analysis of the duration of flows over the given threshold (Bogdanowicz et al., 2011, also Eagleson, 1972;Sivapalan et al., 1990;Gioia et al., 2008;Iacobellis et al., 2011).The joint risk of inundation making allowance for the two main sources of vulnerability to flood hazard for areas protected by embankments, over-crest flow and levee failure, have been proposed and defined.
In Poland, as in many other countries for each hydrological station, two benchmark water levels, called the warning stage and the alarm stage, have been specified.Although warning and alarm stages are assigned to the places where water levels are observed, to the hydrological stations, their determination procedures as well as other inundation risk characteristics take into account, inter alia, the elevation of the embankment system for the whole river reach.So, the results of below analysis refer to the river reaches represented by data observed at hydrological stations.The frequency of annual maximum uninterrupted duration, D (in days), of flows over the flood alarm stage (Fig. 1) can be used to assess the risk of flooding due to waning of the levees' strength.The aim of this study is to introduce formal aspects of the duration-flow-frequency (DqF) modelling in stationary and non-stationary conditions, to use it to assess the inundation risk due to the levees breach and to combine it with the AM flow model to get the cumulative probability of inundation.In the presented statistical model, the duration is considered as a random variable while the alarm flow discharge is the fixed value.The approach presented here for non-stationary conditions can to some extent resemble the Peak Over Threshold (POT) with covariates techniques developed by Davison and Smith (1990).Looking for similarities to other approaches used in hydrology one can find that the likelihood function of the DqF model is for stationary conditions similar to the likelihood function of the censored sample introduced to FFA by Kaczmarek (1977).This paper's layout is as follows: in the second section the concept of the inundation risk for embanked river is defined.Then a short review of literature on statistical modelling of flood shape hydrographs with emphasis on onedimensional models is presented (Sect.3).In the next section the Duration-Flow discharge-Frequency (DqF) model is introduced and estimations of its parameter for stationary and non-stationary case are described and discussed.Taking into account the embankment resistance, the annual probability of inundation caused by levees breaching is introduced.To illustrate the proposed way of inundation risk assessment the case study for the Szczucin gauging station at the Vistula River (Southern Poland) is presented (Sect.5).The probability of inundation due to levees breaching is compared with the conventional probability of peak flow exceeding the levee crest and the cumulative probability of inundation are computed.Sect.6 concludes the paper.

Flood risk
Floods occur as a result of water spilling over the crest of embankment (Q > Q B ) or more often as a result of prolong existence of high water in the embanked river channel, so when the peak flow discharge exceeds the alarm flow (Q A ) but is lower than the overtopping flow (Q B is the discharge that overtops levee crests) (Q AQ < Q max < Q B ).One can also distinguish many other causes of floods, such as back water and ice-jams, etc., but they do not stem from the embankment failures and will not be considered in this study.
The annual probability of inundation for embanked river reach is expressed as the total of probability of the two exclusive events ("Fl" stands for "flood") (see Fig. 2): where the first term comes from the conventional FFA The second term of Eq. ( 1) defines the probability of inundation caused by levees breaching which depends on both the flood persistency and levees resistance to high water stages which in turns depends on their design and technical condition.Therefore, the P 2 (Fl) is expressed as the integral of the product of the value of the hazard index h(Fl|d) which is defined as the probability density of levee breaching caused by where f (d) -pdf of the duration d of flows above the alarm stage; h(Fl|d) -the hazard index being the probability of levee breaching caused by a high water of the duration d.
The value of the hazard index h(Fl|d) tends to 0 for d going to 0 and to 1 for d going to infinity (e.g.Fig. 6).The hazard index h(Fl|d) is determined administratively for the river reach by the Regional Water Management Board based on the technical assessment of flood embankments.
Note that collating the annual maximum high flow duration data for analysis one puts d t = 0 (index t marks the tth year in a series in which the particular event d = 0 occurred, t = 1, 2,. . ., T and T is the length of the series in years) the both for Q max ≤ Q A and Q max > Q B , so 1 inundation yearly is considered and that caused by spilling over crest has the priority over that caused by prolonged high stages.Furthermore, note that the weaker is the relationship between annual maximal values of peak flow and duration of flows above the alarm flow (Q A ) the more justified is the separate analysis of the both random variables.The DqF approach is the extension of the conventional FFA performed on a single annual peak flow series.But, even though it does not have to be the same flood that gives the annual flood peak and last longest in the year, the annual peak flows are usually assumed to be temporary independent what has been verified by several investigators and so is assumed here for the annual maximal durations.Due to the poor measurement material -short samples -it would be ever harder to analyse autocorrelations in the series of durations.
The ratio of probabilities P 2 to P 1 and their total is helpful to determine the actions to reduce the risk of flooding, namely the strengthening or heighten the levees (or building parallel levees).

The statistical modelling of flood hydrographs shape
Due to complexity of stochastic nature of river flow process one has to accept a rational ignorance while dealing with flood risk management.In response to practical needs, several simple conceptual structures are being developed for statistical modelling of flood hydrographs.The methods of constructing design flood hydrographs are most popular for modelling flood hydrographs.Their reviews are available in, for example, Serinaldi and Grimaldi (2010), Strupczewski (1964Strupczewski ( , 1966) ) and Strupczewski et al. (2013).The design hydrograph Q(t) with the defined return period of its peak serves both in flood-risk mapping procedures and for designing a reservoir storage capacity and other hydraulic structures sensitive to flood hydrograph magnitude and shape.
The common feature of most of the approaches to flood hydrographs analysis is an avoidance of using a joint probability distribution of parameters describing the shape of the hydrographs while limiting multi-dimensional analysis to conditional expectations further reduced to a regression.The most commonly used variables are flood peak and flood volume.
Extension of the standard FFA for statistical analysis of peak part of flood hydrographs is the one-dimensional model Flow-duration Frequency (QdF) initiated by NERC (1975) and Askhar (1980); in the 1990s, Sherwood (1994), Balocki and Burgess (1994).Galéa and Prudhomme (1997) laid out the foundations of the present form of the QdF method.Based on the assumption of the convergence of different flood distributions for small return periods (Javelle et al. 1999), Javelle (2001) introduced a converging approach to the QdF modelling.Here the annual mean maximum peak flood volume (or equivalently the mean excess discharge -Qd ) corresponding to the given duration (d) is taken (Fig. 3a) as the random variable.Therefore, consequently the maximum d−days annual outflow volume V d = d • Qd is the random variable as well.In fact, the above idea of flood peaks analysis is modelled on the analyses of the Intensity-duration-Frequency (IdF) commonly used for stochastic modelling of high intensity rainfalls and of the QdF analysis of low flows.
To cater to the conventional FFA, the flow discharge (Q A ) corresponding to the alarm stage (H A ) is used here, so the upper limb of the rating curve is regarded as time invariant.The frequency of annual maximum uninterrupted duration of flows, D (in hours, days, etc.), over the flood alarm stage (H A ) (or equivalently over the alarm flow (Q A )) but excluding floods pouring over the embankment crest (which corresponds to flows exceeding the overtopping flow Q B ) serves to assess the inundation risk of flood spilling out of river channel caused by scouring the levees (Fig. 2).Therefore, the d t = 0 in the [d] time series means that the threshold discharge, Q A , has not been exceeded during the tth year of the series (Q max (t) < Q A ) or that the peak flow has exceeded the overtopping flow (Q max (t) > Q B ), where Q max (t) denotes the annual maximum discharge occurred in the tth year of the sample series.In other words, there is no risk of the dike's damaging due to the prolonged exposure to the high water because the flood wave was either too small to reach the weaken construction of the levee or, the contrary, the flood is such big and sudden that the water immediately overtops the levee's crest.Note that if more than one flood appears in a year, the D and the annual peak flow (Q max ) can correspond to different floods (Fig. 1).
Using multi-duration approach, by fitting the appropriate statistical distribution to the extracted samples for various durations, from the relations QdF for various d one can roughly construct the scaled Flood-duration-Frequency curve (QdF).To avoid inconsistency of the estimates of quantile Q(d, F ) for various d, the same distribution function is applied for all duration (Javelle et al., 1999;Castellarin et al., 2004;Iacobellis, 2008;Botter et al., 2008) and the quantiles are reduced by the appropriate function φ(d, ν) which is decreasing function of d: where the ν denotes the vector of parameters which are estimated from the data.
It means that differences in the distributions of various d values result from the differences in the mean value only.Note that Q(0, F ) corresponds to the distribution of annual instantaneous peak discharges.The parameters of the function φ(d, ν) and Q(0, F ) (Eq. 5) are estimated separately.
Finding that flood persistence is a factor of flood hazard for embanked rivers, Bogdanowicz et al. (2008) modified the above model redefining Q as the annual maximum flow discharge (Q d ) which is continuously exceeded during the period d, wherein the d variable is still treated as a deterministic value (Fig. 3b).The applied way of determining the scaled distribution function does not differ much from the method described by Javelle et al. (1999).In parallel, the use of ML method in the presence of the d as the covariate (Strupczewski et al. 2001a, b;Katz et al., 2002;Stasinopoulos and Rigby, 2007;Stasinopoulos et al., 2008Stasinopoulos et al., , 2012) ) is demonstrated for Weibull distribution with the lower bound parameter and the constant shape parameter.Here all parameters are estimated jointly.
However, to address the 1-D statistical analysis of the peak part of flood hydrographs directly to the problem of softening and breaching of river embankment, the duration (d) of high stages should be taken as a random variable rather than the mean excess discharge Qd (Javelle, 2001) (Fig. 3a) or the the annual maximum flow discharge (Q d ) (Fig. 3b) (Bogdanowicz et al., 2008).Note that the duration of flood (d) can be more accurately assessed than the peak flow discharge of large floods.

Formal aspects of the duration-flow-frequency modelling
To address the flood risks arising from softening and washing out the river embankments, Bogdanowicz et al. (2011) proposed to take as the subject of analysis the frequency of annual maximum uninterrupted duration, D (in days), of flows over the flood alarm stage (Q A ), the duration (D) is considered as a random variable while the alarm flow discharge (Q A ) is the fixed value (Fig. 1).
The time series of annual maximum uninterrupted duration, D (in days), of flows over the flood alarm flow , is the subject of statistical modelling in stationary and non-stationary conditions.The d t = 0, denotes that the Q A has not been exceeded during the tth year (Q max (t) < Q A ) or that the peak flow has exceeded the overtopping flow (Q max (t) ≥ Q B ), which means that the priority of overtopping over breaching is given and we rule out the possibility of two inundation floods of the two different origins within one year.Note Frequency analyses of hydrological sample with zero discrete values have received relatively little attention.Still there are several approaches for analysis of censored data, including probability plot, regression, weighted-moment estimators, maximum likelihood estimators, and conditional probability analyses (Gilliom and Helsel, 1986;Hass and Scheff, 1990;Harlow, 1989;Helsel, 1990).A consistent approach to the frequency analysis of such data requires using discontinuous probability distribution functions.Jennings and Benson (1969), Interagency Advisory Committee on Water Data (1982), Woo and Wu (1989), Wang and Singh (1995) among others developed empirical three-parameter models for frequency analysis of hydrologic data containing zero values.
When the available data represent mean daily discharge, the d values are in fact the integer numbers (the exposition can last 1, 2, 3, etc., days) but to maintain the continuity of time we treat them as real numbers and consider d as if it corresponded to the duration range (d -0.5 day, d + 0.5 day).In particular, for d = 0 (beginning of the time axis) the interval corresponds to the range (0, d + 0.5 day).If a flood starts before the end of a year and is continuing to the next year, the d value is derived for the entire flood wave (from its beginning in one year to its end in the next year) but attributed to the year t when the flood culmination occurred.To get an insight into flood persistence properties, the several threshold stages (Q T ) are considered but not only the alarm stage Q A .

Stationary conditions
As far as the probability theory is concerned, the occurrence of zero events can be expressed by placing a non-zero probability mass on a zero value: P (D = 0) = 0, where D is the random variable, and P is the probability mass (e.g.Strupczewski et al., 2002Strupczewski et al., , 2003;;Weglarczyk et al., 2005).Therefore, the parent distribution functions of such hydrologic series would be discontinuous (with discontinuity at 0) and, using the theorem of total probability, their forms can be written as where β denotes the probability of the zero event, , which is continuous in the range (0, + ∞) with a lower bound of 0, and g is the vector of parameters (containing β or not), δ(d) is the Dirac's delta function and 1(d) is the unit step function.Assuming the infinite upper bound for D seems acceptable and facilitates modelling.Due to discretisied duration d intervals, the probability of exceeding the Q A flow during one day only is equals to Hydrological samples with zero values are most frequently of exponential-like shape.Weglarczyk et al. (2005) modelled CPDF f • (d; g) of ( 5) by two-parameter distributions, namely by generalized pareto, Weibull and gamma, estimating parameters by the maximum likelihood (ML) and the moments (MOM) methods.From Eq. ( 5) one can write the likelihood function as where n 1 and n 2 denote the number of zeros and non-zeros values, respectively.If β / ∈ g, from ML-equations: one can easily find that the ML-estimate of β is so β and g are estimated by MLM independently.

From CDF of annual maximum floods obtained from FFA
The better estimate of the β parameter in the sense of definition (Eq.9), not its standard error, can be obtained from the CDF of annual peaks providing the selected for Annual Maxima (AM) model fits upper tail data well.Note that the D = 0, denotes that the Q A has not been exceeded during the tth year (Q max (t) < Q A ) or that the peak flow has exceeded the overtopping flow (Q max (t) > Q B ) where Q max denotes the annual maximum discharge, therefore, probability of zero value of D should be estimated from CDF of annual peak flows got from FFA rather than from the (0, 1) time series of the d record.Having derived from FFA the CDF of the annual peaks Ĝ (Q max ) ≡ϕ Q max , ĥ where ĥ is the vector of parameter estimates, one can get the estimate of β as

W. G. Strupczewski et al.: Inundation risk for embanked rivers
Note that if more than one flood appears in a year it may happen that the d t and the annual peak flow Q max (t) correspond to different floods.Floods in excess of Q B are unique in Polish rivers, but if they were they should be in the FFA treated as of unknown magnitude over the thereshold Q B , thus one deals with first order right censored sample.

Estimation of parameters of the continuous part
of Eq. ( 5) ML estimate of the parameters (g) of the continuous part of PDF (Eq.6): the conditional probability density function , can be obtained by solving the ML system of equations: Since the d samples deprived of the zero values are most frequently of exponential-like shape, the distribution functions in Table 1 are recommended as candidates for f • (d j ; g) model.
Note that Exponential distribution is a special case of all other mentioned above distributions, Eqs. ( 12)-( 15).
The detailed information on the models mentioned above with the methods of ML estimation, one can easily find in hydrological and statistical literature, e.g. in Rao and Hamed (2000) and for GE in Gupta and Kundu (2000).

Non-stationary case
The non-stationary Flood Frequency Analysis has been a subject of numerous publications.Davison and Smith (1990) dealt with time-related POT approach, Strupczewski and Mitosek (1991) (later completed and published e.g. in Strupczewski andFeluch, 1997a, b, c, 1998a, b;Strupczewski et al., 2001a, b) dealt with maximum estimation of flood distribution functions within the presence of time as the covariate; a similar approach is presented, for example, in Katz et al. (2002) and Stasinopoulos and Rigby (2007), Stasinopoulos et al. (2008Stasinopoulos et al. ( , 2012)).The basic assumption in the classical Flood Frequency Analysis and the duration-flood-frequency modelling is that neither the adopted distribution function nor its parameters change in time.However, the longer the hydrological series, the harder to maintain the assumption of stationarity in the face of a changing environment and climate (Milly et al., 2008).The non-stationarity of hydrological data ought to be taken into account in FFA for theoretical and empirical reasons, but practical aspects of its introduction into design and planning procedures are not so obvious or simple and pose significant ongoing challenges to the hydrological research and water management policy.One could easily accept the increasing trend in design upper quantiles, but decreasing detected trends may distort decision making in the engineering design, evaluation of flood risk and in other flood-related issues.Especially when statistical inference is based on peak flow series of average length currently covering barely 60, 70 elements or on climate change scenarios and their hydrological response that we presume, we are able to predict in a realistic manner.Herein the formal aspects of at site non-stationary Duration-Flow-Frequency modelling are presented while regional Flow-Duration-Frequency modelling has been introduced by Cunderlik and Ouarda (2006).
Assuming that only the values of parameters of the continuous part of the PDF may vary with time, its form remains unchanged, and the PDF f can be written as Assuming the forms of trends and denoting the vectors of their parameters, respectively, as θ and ξ we have got For compact notation let us define the dichotomous variable Y t given by For the time series d = (d 1 , d 2 ,. . .,d t ,. . ., d T ) of the maximal annual duration of river flows exceeding the given threshold, the likelihood function can be expressed as: and the Log-likelihood function As one can see from Eq. ( 20), the parameters θ and ξ , as they are independent, can be estimated separately.

Estimation of parameters of the continuous part of Eq. (16) [f • (d; t, ξ )]
The ML estimate of the parameters ξ of CPDF (f • (d; t, ξ )) are obtained by solving the system of equations: while the candidate functions f • are given by Eqs. ( 11)-(15), however, with time-dependent parameters in this case (Strupczewski et al., 2001;Strupczewski and Kaczmarek, 2001).The estimates can also be found by direct search for the maximum of log-likelihood function (the last component of Eq. 20) with respect to trend parameter vector ξ .The consequence of making allowance for time-dependent parameters of f • (d; g) is an increase of the number of parameters to be estimated.Given the small number of nonzero elements in the time series d = (d 1 , d 2 ,. . ., d t , . . ., d T ), the number of parameters which can be effectively estimated is small.Therefore, we decided to adopt the values of these parameters as independent of time.Then the only nonstationarity lies in the weighting parameter β (t; θ) which plays the role of the time-dependent function "switching" on and off the event of dikes' prolonged exposure to high waters.Note here that the duration d is a parameter that describes the shape of the flood hydrograph, so we assume that the persistence of flood of magnitude Q A < Q max < Q B is not subject to time variability.

Two ways of estimating the time-dependent weight parameter β (t; θ)
The estimation of parameters θ of the discrete part -weighting parameter β (t; θ), in the joint distribution Eq. ( 17), can be performed in two ways: by regression analysis and on the base of non-stationary distribution of annual maxima with time-dependent parameters.

Regression analysis
The variable Y t represents binary outcomes and has a binomial distribution with parameter: however, the trend in β cannot be found by means of frequently assumed linear regression.The reasons of being that -in general linear trend may take the values of probability β (t, θ) outside the range from 0 to 1; -the error term is not homoscedastic, nor it is normally distributed as in normal regression.
In order to avoid values outside the range from 0 to 1 a monotonic transformation of the interval (0,1) is performed to the range (−∞, +∞).There are many transformations with this property, but the most popular are two: probit and logit transformations.Both give similar results but logit transform is more convenient for calculations.Probit transformation consists in converting the probability to corresponding quantiles of the standard normal distribution.Logit transformation is given by And the trend is modelled as Inverse transformation leads to the logistic (LO) function β of time t with parameter vector θ = [a, b].
Logistic regression is used in many disciplines, medicine, social science, econometrics, in engineering, especially for predicting the probability of failure of a system or product.
The logistic regression coefficients a and b are usually determined using maximum likelihood estimation by iterative process until the improvement of the solution is minute and the procedure is said to have converged.Sometimes, when the considered flow threshold is high and thus number of "ones" greatly exceeds number of zero values of y t , the convergence cannot be reached.The failure to converge may indicate that the trend coefficients are not significant or other methods of inference about the trend in β should be applied.
Several measures enable to evaluate the goodness of fitted trend model.Deviance, pseudo-R 2 and odds ratios confidence intervals are the most frequently used.There are two measures of deviance corresponding to the likelihood ratio.One, called model deviance, to compare fitted model to saturated model (a theoretical model with perfect fit) and second, null deviance, which represents the difference between null model (a model with only intercept, so representing the stationary case, β given by Eq. 8) and saturated model.Model deviance is given by equation: likelihood of the fitted model likelihood of the saturated model ( 24) and similarly, null deviance: Note that in logistic regression the likelihood of the saturated model (y t = β (t; θ)) is equal 1.The deviance has an approximate chi-square distribution with 1 degree of freedom for each predictor, so 1 in our case.Smaller values of deviance indicates better fit what corresponds to non-significant chi-square values.
Pseudo-R 2 is calculated on the base of deviances: and interpreted almost like a coefficient of determination in linear regression.

The method via annual maxima distribution with time-varying parameters
An alternative way of analyzing a trend in β is to use the non-stationary CDF of annual peaks with time-dependent parameters.From NFFA (Strupczewski et al., 2001) one gets G = ϕ (Q, h, t), where h -the vector of PDF parameters of the annual flood peaks distribution.Then per analogy to Eq. (9a) one can write: providing the selected distribution and trend model of its parameters fits the upper tail of data well.It would be advisable to compare the results of both methods.Compatibility of the results could serve as the overall test of correctness of the assumptions made.

Probability of inundation during the period (t 1 , t 2 )
Dealing with hydrologic design, due to non-stationarity, the notion of return period is no longer valid and the probability of inundation should refer to the whole period of life of a hydraulic structure, not to a single year as has been agreed in the stationary case.
When the parameters of DqF distribution are time dependent, consequently the annual probability of leves breach (Eq. 3) becomes time dependent: P 2 (Fl, t).The probability that at least once in the period (t 1 , t 2 ) the inundation caused by levees breach occurs is expressed as Similarly, if the distribution of annual maximum peaks is time dependent, G = ϕ (Q, h, t), the exceedance probability of overflow of the levees' crest, so the probability that (see Eq. 2) Then the probability that the inundation caused by overtopping the embankment crest occurs at least once in the period (t 1 , t 2 ) and can be expressed as The total probability of inundation in the period (t 1 , t 2 ) equals to:

Example -Szczucin at Vistula River (Southern Poland)
To illustrate how the proposed approach works in practice, the Szczucin gauge (southern Poland) at the Vistula River has been selected as an example.Recent flooding in the upper Vistula bared the weakness of the system of flood protection, especially unsatisfactory condition of the embankments in the region of Szczucin.One, but not the only, major reason for the current state of flood protection infrastructure is a complex history of these lands.When Western European countries formed an effective flood protection scheme, Polish south-eastern lands were at the periphery of three empires, two of which were among the most undeveloped countries of the continent.After regaining independence, social and economic problems associated with merging the various districts of the reborn Poland influenced the poor development of an efficient protection system.For these reasons, embankments built during World War II do not meet current requirements which were recently put to a higher level.The Polish People's Republic period did not bring any important changes.Although the embankments have been periodically increased and strengthened, the high cost of post-war reconstruction and industrialization carried out under conditions of socialist economy did not allow for catching up with Western standards.Lately, the material excavated on the flood land, very often at the immediate vicinity of the embankments, was used for the re-construction.As a consequence, the top layer of inactivated meadow was damaged, what facilitated the filtration of water from the horizontal residual layer under the layer of permeable sealer coat.There are present plans to modernise the dikes and the first works have been carried out.The investor claims that the modernisation will reduce the flooding risk by 80 %.To assess the risk before and after modernisation (provided that the statement of the investor is right) the following analysis was performed.
The daily flows record covering the period 1951-2006 (n = 56 yr was used in this study).At first the daily records have been controlled and tested with regard to the sharp discontinuities and jumps in data -no particular irregularities have been detected (Fig. 4).
The overtopping flow Q B was assessed from the rating curve as 10 500 m 3 s −1 which roughly corresponds to twohundred-years return period of annual peak flow (Q 0.5 % ), the base design value for the 1st class embankments.In fact, there are no annual peak flows exceeding this value in the record.Therefore the Q B value does not affect the composition of the vector of observation values [d t ].The alarm threshold for the Szczucin station Q A = 1690 m 3 s −1 (which means flow of ca. 2 yr return period, stage 660 cm); however, for completion a few other thresholds will be analysed, too, namely Q Tr = 700, 1000, 1300 and 2000 m 3 s −1 .The hazard index h(Fl|d) for Q A = 1690 m 3 s −1 (Eq. 3) was assessed as so the embankments cannot withstand the pressure of high waters of more than 20 days.

Stationary case
The weak correlation between the durations when d(t) > 0 (see Fig. 5) and the respective annual maxima Q max (t) indicates the variety of shapes of flood hydrographs and, as a consequence, d cannot be represented (or replaced rather) in FFA by Q max .It implies the analysis of both d and Q max by (perhaps) two different types of models.As a model for the parameters of the f • function Generalised Exponential (GE) distribution has been chosen (e.g.Gupta and Kundu, 2000).Among the distributions presented in Eqs. ( 11)-( 15), the GE distribution Eq. ( 14) performs relatively well in terms of the AIC value and shows stability of numerical ML solutions in  2.
The annual maxima are believed to be adequately described by the heavy-tailed distributions (e.g.Strupczewski et al., 2011), so to cater for the Flood Frequency Analysis (FFA) for extreme values (annual maxima) the β values (Eq.8) and P 1 (Fl) (Eq.2) by means of Q max series were calculated with the three-parameter Generalised Extreme Value distribution: From the AM sample covering the period 1951-2006 (n = 56 yr), we got the ML estimates of GEV parameters equal (for calculations we used our original soft-packages Flood-Durations, NonstationaryMLM and SDEP which we can eagerly share with others): location ≡ ε = 1260.02m 3 s −1 , scale ≡ α = 671.39m 3 s −1 and shape ≡ γ = −0.33.For completion note that the value of log-likelihood function lnL = −463.231and thus AIC = 932.461.Substituting for q into Eq.( 33), the chosen Q Tr and Q B values and then putting the corresponding probabilities to Eq. (9a), one gets the estimates of the weighting parameters display in Table 2.
One can notice from the Table 2 that β obtained by means of Eqs. ( 8) and ( 9) are quite similar particularly for higher values of Q Tr and for all cases the confidence interval for Fig. 6.Components of the integral Eq. ( 34).
proportion β includes the value estimated from AM distribution (Eq.9).

Assessment of probability of levee breach along Szczucin reach
Since the event of levee breach is conditioned by the peak flow being in the range of [Q A , Q B ], Eq. ( 3) can be written as (see also Fig. 6) The pdf of GE (Eq.15) for while the ML estimate of β equals (Table 2) 0.577.Substituting them and the hazard index function defined by Eq. ( 32) into Eq.( 34) and integrating one gets the annual probability of levee breaching P 2 (Fl) = 0.064.Note that at the same time, and when the same GEV distribution is used (see the Eq.33 and its parameters below the equation), the probability of flood caused by exceeding embankment crest by annual peak flow: ) is equal to 0.005, so it is almost insignificant (more than ten times smaller than P 2 ), hence the overall probability of flooding along Szczucin reaches P = P 1 + P 2 = 0.069.Variety of shapes of flood hydrographs one can evaluate by a measure of correlation strength between Q max (t) and d(t).Due to shape similarity of flood peak parts, a strong dependence between the peak flows (Q max ) and the duration above the alarm flow (d) can take place.If it is the case, the probability P 2 (Fl) can be assessed on the base of Q max distribution g(Q max ).Assuming that d = ψ(Q max ), one can express in Eq. ( 34) the d variable by the Q max getting where per analogy to Eq. ( 32) h(Fl|ψ(Q max )) equals 0 and 1 for Q A and Q B , respectively.The Pearson's correlation coefficient r(Q max , d) for Szczucin equals to 0.83.Of course, when estimating the risk of a levee breach, except for during the time of high water residence, more technical parameters of levees should be analysed, such as the construction of the levee, the material used for its building, its age, susceptibility to softening, the regime of the river, wind-induced waving and so on.All in all, those who decided to build their houses in the river's proximity behind the levees, sooner or later do experience a catastrophe.

Non-stationary case
Analysis of long series of hydrological observations on Polish rivers lead us to the conclusion that two random variables whose probability distributions have been considered Hydrol.Earth Syst.Sci., 17, 3111-3125, 2013 www.hydrol-earth-syst-sci.net/17/3111/2013/ as components of DqF analysis show different behaviour versus time.The continuous variable -duration of water level above certain stage -in general, shows no trend.It describes the shape of the flood waves which has been stated to be rather stable and, if any trend there exists, it does not pose any effect on the final results of the DqF calculations.On the other hand, a visual assessment of records for Szczucin and other hydrological stations show that the frequency of occurrence of extreme flows (P 1 ) and flows above (so well below) a given threshold (Q Tr ) may reveal some trend.Therefore in this study we focused only on the search of trends in the probability P 1 and in the weighting factor β that plays the role of the time-dependent function "switching" on and off the event of dikes' prolonged exposure to high waters.These trends have been estimated from the annual peak flow series and by direct analysis of [d t ] vector represented by the sequence of 0 and 1 as given by Eq. ( 18).In both cases the maximum likelihood method (MLM) has been used for calculation, while the logistic function ( 23) serves to model the (0,1) duration series.
The estimation β(t) for the threshold corresponding to the alarm stage (Q A = 1690 m 3 s −1 ) in the form of the logistic function ( 26) revealed the decreasing trend (b < 0), whereas a = 0.405, so the β(t) takes the form: and the parameters of stationary f • function for selected Q Tr values can be found in Table 2.
The above equation (Eq.37a) says that the odds (the ratio of probabilities of events against nonevents: β(t)/(1-β(t)) decreases in average by 0.2 % from year to year, that gives the change of β from ca. 0.60 in 1951 to about 0.58 in 2006.However this trend is not statistically significant.The model deviance D model being equal to 75.8286 and the null deviance D null = 75.8372give the difference with p value of 0.9264 from chi-square distribution.The value of Pseudo-R 2 = 0.046 is close to 0. It is likely that this result points on almost stable risk of inundation caused by dike breaches for summer floods that prevail in the reach of the Vistula river represented by Szczucin hydrological station, where changes in the river bed and on the floodplains have not influenced considerably the transportation of high waters.Winter floods can reveal stronger trends due to greater variability of melting condition and observed temperature rise, therefore, as consequence, the volume of runoff.Small catchments seem to be more susceptible for trends in β.These statements ought to be verified on the larger hydrological data set.
If instead of the logistic (LO), we take the non-stationary Generalised Extreme Value (GEV) distribution function (see stationary case above), assume linear trends in mean value and standard deviation (but not in the parameters of location, scale and shape) and calculate the β(t) by means of Nonstationary Flood Frequency Analysis (e.g.Strupczewski et al., 2001Strupczewski et al., , 2009)), we obtain Eqs.37a and 37b and Fig. 7 point at the difference in trend sign of β between the results received by the two approaches (LO and GEV).However, there are similarities, too.The results for both cases say that the value of β is practically time independent (statistically insignificant) within time period 1951 to 2006 and thus maintain the relatively constant balance between the first and the second terms of the DqF probability density function (Eq.17).In consequence, the durations of water stay above Q A described by the f • function are actually as frequent nowadays as they were in past.On the other hand, the probability P 1 and P 2 (and thus P ) are now the functions of t.If we take the GEV-based β(t) as an example (as more reliable than LO-based β(t)) and t = 1 (year 1951) one obtains P 2 = 0.066.Further, with the non-stationary GEV (by the same parameters as for β(t)): P 1 = 0.007, so in consequence P = 0.073.For t = 56 (year 2006): P 1 = 0.004, P 2 = 0.064, so P = 0.068, thus the probability of flood in Szczucin dropped by 7 % over the half of the century -a judgement whether it is much or not we leave for the reader and decision makers.Please also note that regardless the point in time the ratio P 1 /P 2 is similar to the stationary conditions.
However, the probability for the certain point in time may not carry information sufficient for flood protection authority.Therefore, it is interesting to know what is the probability of inundation over the certain period, e.g.20 yr of the exploitation of the dikes in Szczucin.For the GEV non-stationary model (with the parameters mentioned above) and last 20 yr of the time series  the probability of overtopping over the levee crest is equal to P 1 = 0.048, whereas the dike's breach probability is more than 10 times larger: P 2 = 0.516.Overall risk of inundation, P = 0.563, is almost 10 times larger than for a single year.The reader also notes easily that again the ratio P 1 /P 2 is alike the ratios for the point-in-time non-stationary case as well as for the stationary case.
One has to bear in mind, however, that the linear trend in parameters (in case of the LO) and first two moments (as it was in GEV) is just the simplest of the countless trend patterns that may be employed for the time-dependent models and application of other ways (e.g.parabolic, polynomial, exponential, etc.) usually leads to the overparametrization and noteworthy complication of numerical calculations.It is so, because maximum likelihood estimates for time-dependant models require multi-parameter optimisation of relatively "flat" log-likelihood functions with use of relatively short data series.

Conclusions
In the paper the new two-component model of flood waves, "duration of flooding-discharge-probability of nonexceedance" (DqF), with the methodology of its parameters estimation was proposed as a completion to the classical FFA methods.Such a model can estimate the duration (d) of stages (and flows) exceeding the assumed magnitude with a certain probability which is of key importance when the river's dikes are prone to the prolonged impact of high waters.The embankments may be weaken by the water, soak and eventually break -this is the most frequent cause of floods in Poland.However, in this study the two main causes of inundation of embanked rivers, namely over-crest flow and wash out of the levees, were combined to assess the total risk of inundation.The proposed DqF modelling approach was generalised to the non-stationary conditions.Therefore, in addition to the maximum flow one should consider also the duration of high waters above the alarm flow Q A in a river channel.The model combined with the technical evaluation of probability of levees breach expressed by the hazard index gives the annual probability of inundation caused by the embankment failure.The probability of inundation is the total of probabilities of exceeding embankment crest by flood peak and the probability of washout of levees.
The DqF modelling is the consequence of QdF approach developed by Javelle et al. (1999Javelle et al. ( , 2000Javelle et al. ( , 2002) ) and Bogdanowicz et al. (2008) but in the first model the gravity is put on the probability of the certain duration above alarming stage/discharge (Q A ) rather than on magnitude of flood itself (Q max ) like in the latter case (Fig. 3).
The DqF model in the form of Eq. ( 5) consists of two terms: β × δ(d) deals with the zero event, D = 0, whereas the latter term (1β) × f • (d; g) × 1(d) stands for the events when the duration D > 0. In general both β and f • in non-stationary case may depend on time.The maximum likelihood method (MLM) was proposed for estimation of β and g parameters.In the non-stationary case it is convenient to describe the β(t; θ) by means of the logistic function (23).However, β and β(t; θ) can be also estimated by means of annual peak flows series, Q max , using the routine flood frequency techniques (FF) with distribution functions commonly used in FFA (e.g.GEV) for stationary and nonstationary case, respectively.Note that estimating the weighting factor β and β(t; θ) from the duration d time series the information (0,1) for excess the threshold level Q Tr is used exclusively, while based on the annual peak flow time series Q max , the information from whole range of recorded flood magnitude is used to assess the trend in the alarm flow Q A .For f • (d j ; g) model (both stationary and non-stationary) the exponential-like shaped distribution functions are recommended, such as exponential, Weibull, pareto, generalised exponential, gamma and similar.
The calculations for the Szczucin at the Vistula River case study made for several threshold values (Q Tr ) including the alarm flow (Q A ) have showed the similar results for the weighting factor β estimated by ML method from the duration time series and from annual peaks time series (Table 2).The peak flows that could overtop the embankments have not been detected in the Szczucin's record .According to the hazard function (32), the possibility of levees breaching increases almost tenfold the probability of inundation.
Variability in the Szczucin time series of the d duration (understood as a time-dependence of the g parameters of f • (d t ; g)) has not been subject of modelling because of the insufficient data and the conviction based on the visual judgment of the (d(t) vs. t) diagram that the trend would be negligibly small.The only trend considered is the trend in the weighting factor β. The significant difference in trend estimates of β (t) obtained by ML method from the direct analysis of [d t ] vector represented by the sequence of 0 and 1 (Eq.18) assuming the logistic function (LO) of time and from GEV distributed annual peak flow series is striking.The results for both cases differ in sign (Fig. 6) and moreover they point that the value of β is practically time independent within time period 1951 to 2006.Nevertheless, as long as the change of river regime in time is visible (regardless its origin), one should consider non-stationary modelling accepting (sadly) the fact that the tools available are in their infancy.
The DqF model proved to be the important completion to the traditional FFA concentrating on maximal seasonal or annual discharges.The DqF approach is especially useful in polish specific conditions where the flood protection infrastructure is dated and often does not survive confrontation with prolonged pressure of high waters.
Reliable data and information about floods are indispensable for better understanding the interactions between rivers and flood protection system: embankments, reservoirs and polders.Improvement of statistical models is essential for engineering design in general and in particular for implementation of flood risk mitigation procedures.Not only has the DqF modelling shown that actual flood risk is greater than the risk assessed by means of classical FFA but also provides quantitative measures which can be used in flood protection systems planning, exploitation and conservation.This measures in form of dependence of inundation risk on river flow (or water level) should be established for other hydrological stations on Polish rivers and their dimensionless versions compared.The geographic information systems technique (GIS) could be used to indicate locations prone to inundation; also, the GIS can be a helpful tool to visualisation and testing trends in the structure of river network and to the regional analysis.These results can constitute the theoretical background to a number of practical decisions in water management issues.

Fig. 1 .
Fig. 1.Definition of the threshold flow discharge and duration in DqF model: (a) the flood wave of d t duration entirely in the year t; (b) the flood wave starts in the year t and continues in t + 1.

Fig. 3 .
Fig. 3. Definition of the random variables in the QdF models: (a) the mean maximum d days flow, (b) the annual maximum flow discharge (Q d ) continuously exceeded during the period d.

4. 1 . 1
Estimation of the weight parameter β From the pdf of the duration d (Eq.5) and the records d = (d 1 , d 2 , . . ., d t , d T ) for given alarm flow Q A

Fig. 4 .
Fig. 4. Hydrograph of the daily flows at the Szczucin gauging station.Horizontal dashed lines reflect the Q Tr values used in this study.

Table 1 .
Distribution functions recommended as f °(d j ; g) model.

Table 2 .
The parameters of the two-component DqF model for Szczucin data.Second component, f • is the First component (by two methods) two-parameter Generalised Exponential β = n 1 /n β = β(Q Tr ) Q Tr n 2 by Eq. (9)