Estimation of high return period flood quantiles using additional non-systematic information with upper bounded statistical models

Introduction Conclusions References


Introduction
Flood frequency analysis is one of the most common methods to estimate the design flood for hydraulic structures and for flood hazard/risk mitigation programs.In Europe, the national legislation for flood risk assessment is based on flood frequency analysis to estimate discharges associated with different return periods, from 50 to 500 years (Benito et al., 2004).In some projects the focus is on extreme floods, which have been defined according to different authors as floods with an annual probability of occurrence of about 10 −3 to 10 −7 (Jarret and Tomlinson, 2000), 10 −3 or lesser (Naghettini et al., 1996) and in other cases, as floods with return periods greater than 500 years (England et al., 2003).Traditionally, extreme flood estimates have been associated with large dam projects or with the location of nuclear and other high vulnerable facilities, in which the release of hazardous materials to the environment is in consideration (Stevens, 1992).For some of these projects, the design criteria commonly include the Probable Maximum Flood (PMF) estimation.The PMF is the biggest flood physically possible at a specific catchment (Smith and Ward, 1998).It has a physical meaning and it provides an upper limit of the interval within which the decision maker must operate and design.The PMF is the flood generated by the Probable Maximum Precipitation (PMP) with the worst but reasonable hydrological conditions in the studied basin.The PMP is defined by the World Meteorological Organization as a precipitation upper limit for a given region, duration and time of the year (WMO, 1986).
Related to high return period quantiles estimation, flood frequency analysis has a well known drawback, as pointed out by Merz and Blöschl (2008): the lack of available information about large events in a relatively short data series recorded systematically at a flow gauge station (from now, B. A. Botero and F. Francés: Estimation of high return period flood quantiles Systematic information).This fact involves the extrapolation of very high return period quantiles from data records which rarely exceed a hundred of years, producing quantile estimates with a high level of uncertainty.
Probability distribution functions with 2, 3 or 4 parameters have been used in extreme floods frequency analysis with the common characteristic of having no upper bound, at least for high positive skewness coefficient (γ x ).The use of parametric distribution functions allows the quantile extrapolation as a function of the requested return period as much as it is required (obviously increasing at the same time its uncertainty).However, as the return period increases with unbounded parametric distribution functions, the estimated quantiles increase too with no limit.Though, the question to pose at this point is: would it be possible to have a flood with such a high magnitude, as large as it could be obtained with these unbounded distribution functions, for a certain catchment with specific area and geomorphologic characteristics?The straight answer is no, this is not possible.There must be a limiting flood discharge which is the biggest physically possible flood for the specific climatic and hydrologic characteristics of the catchment, which indeed corresponds with the PMF definition (Enzel et al., 1993).Or in Horton's words: "... a small stream cannot produce a major Mississippi flood for much the same reason a barnyard fowl cannot lay an egg a yard in diameter" (second author's class notes of J. Salas' lecture in 1990).Not considering the existence of this upper limit must introduce an additional significant model error in the high return period estimated quantiles.Moreover, in our opinion, this additional error could produce in most cases the underestimation of the high return period quantiles, which is one of the most frequent causes of dam failure (ASCE, 1988).
In accordance with reality, some distribution functions incorporate an additional parameter, which is actually the upper limit to the random variable.This class of functions has been applied to the extreme frequency analysis of annual maximum daily precipitation by Elíasson (1994 and1997), Takara and Loebis (1996) and Takara and Tosa (1999) and in frequency analysis of annual maximum flood by Takara and Tosa (1999).All these authors concluded that upper bounded distribution functions fit properly to extreme data and improve the quantile estimates.
In this paper we propose the use of upper bounded distribution functions, in order to better estimate high return period quantiles.The upper limit of these distribution functions can be fixed a priori or not.Following the classification of Merz and Blöschl (2008) for additional information, in the first case the PMF value can be considered as a causal information expansion.This was the option followed previously by Elíasson (1997), Takara and Loebis (1996) and Takara and Tosa (1999).In the last case, the PMF can be estimated as one of the parameters of the statistical model, using in this paper additional Non-Systematic information, called temporal information expansion, in terms of Merz and Blöschl (2008) to obtain enough estimation reliability.

Upper bounded distribution functions
If parent distribution is upper bounded, the annual maximum will also be.In this situation, upper unbounded classical limiting functions from Extreme-Value Theory will not be good approximations for the estimation of high return period annual maximum, as it will be shown in the robustness analysis in Sect.6.The three upper bounded distribution functions applied in the case study were chosen because they had been previously successfully applied to hydrological extremes series.Other distribution functions commonly used in Hydrology which have an upper bound are the Generalized Pareto and the GEV.The former has an upper bound when its shape parameter is bigger than 0, which occurs for γ x < 2.0.The latter presents an upper bound when the shape parameter is also positive, but then, γ x becomes less than the Gumbel's constant skewness coefficient, which is equal to 1.14.Our aim is to analyse rivers with high skewness coefficient (γ x clearly bigger than 2), like those with a Mediterranean regime, which is the reason to not include in this paper the GEV and Generalized Pareto distribution functions.
Following paragraphs presents a short description concerning the three selected distributions.More behavioural and statistical details can be found in Botero (2006).

The extreme value with four parameters distribution function (EV4)
This probability distribution function was firstly proposed by Kanda (1981), who empirically derived it from the EV distribution function family.The EV4 cumulative distribution function (cdf) is given by where g and a are respectively the upper and lower bounds of the random variable, and v and k are parameters which characterize the scale and shape of the distribution.Takara and Tosa (1999) applied this distribution to annual maximum daily precipitation and annual maximum daily discharge at Ohtsu (Japan).The authors made a sensitivity analysis fixing the upper and lower bounds to a priori values obtained empirically.They concluded in a comparison with other distribution functions, the EV4 is the most appropriated to datasets with sample skewness coefficient higher than about 2.

The Slade-type four parameter LogNormal distribution function (LN4)
Proposed by Slade (1936) and named in this manner by Takara and Loebis (1996), the LN4 can be obtained if a Slade-type random variable transformation is applied to a Two Parameters LogNormal distributed random variable.This transformation is given by where g and a are respectively the upper and lower bounds.
The resulting LN4 pdf is defined as following where µ y and σ y are the well known LogNormal cdf parameters.From the LN4 application to annual maximum daily precipitation data, Takara and Loebis (1996) concluded that if the four parameters of the distribution are estimated, the variability of quantile estimates is higher than if one or both limits are fixed previously in a known value.In addition, they suggested the use of the PMF as the upper bound when dealing with floods.In a posterior paper, Takara and Tosa (1999) conclude that the LN4 distribution function fits well to many hydrological datasets with sample skewness coefficient less than about 1.5.

The transformed extreme value type distribution function (TDF)
This distribution was proposed by Elíasson (1994) as a statistical model for frequency analysis of extreme precipitation.The author suggested that bounded data fitted by an unbounded distribution as the EV1 (also called Gumbel), must deviate from the distribution at high return periods.In Elíasson (1997) is defined a Transformed Distribution Function (TDF) derived from a Base Distribution Function (BDF) selected by the author, which corresponds with the EV1.In his work, Elíasson (1997) fits the resulting TDF to standardized annual maximum daily precipitation from Iceland and Washington State (USA) with very good results fixing a prior estimate of the PMP.The final expression of the TDF cdf is given by where g is the upper bound, α is a scale parameter, b is a location parameter, and k * is a negative constant.
3 Parameter estimation methodology

Maximum Likelihood method and data classification
In this study, the parameters set for each distribution function is estimated based on the Maximum Likelihood (ML) estimation method, as in many others works dealing with Non-Systematic information (Leese, 1973;Condie and Lee, 1982;Hosking and Wallis, 1986;Cohn and Stedinger, 1987;Phien and Fang, 1989;Guo and Cunnane, 1991;Pilon and Adamowski, 1993;Frances et al., 1994;Kroll and Stedinger, 1996;Frances, 1998;Martins and Stedinger, 2001;O'Connell et al., 2002;Williams, 2002;Naulet et al., 2005;Calenda et al., 2005;Calenda et al., 2009).This methodology has been selected on the basis of its statistical features for large samples, and also because of its ability to incorporate easily in the estimation process any type of additional data.
As it was mentioned in the Introduction, data series recorded systematically at a flow gauge station located in a river section will be called Systematic information.In opposition, the Non-Systematic information is that information not recorded systematically.If there is not a gauge station, all river flow information can be considered as Non-Systematic.The sources for this information can be historical or from palaeofloods studies.The former are associated with past human registered observations (Leese, 1973).The latest are floods identified using physical or botanical indicators irrespective of any direct human observation (Stedinger and Baker, 1987), but not necessarily, previous to human registers.In practice, Non-Systematic information is always censored type I, in such a way we have some information concerning a flood at a given time during the Non-Systematic period because this flood was bigger than a given discharge or threshold level of perception X H (Stedinger and Cohn, 1986;Frances et al., 1994), where H is the threshold return period (the return period is used in order to generalize the results).The value of the peak flow for the floods above X H can be known or not.Concerning the floods below X H , always it is not known their exact values, but at least it is known they were smaller than X H .The threshold level of perception can be, for example, the corresponding discharge to the position of the cave where flood sediments are deposited (for B. A. Botero and F. Francés: Estimation of high return period flood quantiles palaeoflood information) or the minimum discharge which produces damages in a city (for historical information).It can change with time and, in some cases, there can be upper and lower thresholds for the same flood (following the palaeoflood example if there are two caves at different positions, the lower one with sediments of a particular flood and the upper one without any trace of this flood).On the other hand, usually the Systematic data is completely known, but some times the uncertainty in the data forces to treat them also as censored.
Concerning "years without information" within the Non-Systematic period, the statistical treatment depends on the situation.If there is no information during some years (historical or palaeoflood), it will be in the major part of the cases because the flood was below the threshold level of perception and, therefore, they must be considered as UB data.If "we do not know what happened", the solution is the same than we traditionally do when dealing with the Systematic record: do nothing and assume there is not a bias to miss the very high floods or very low ones.But the last situation should be very rare, because very frequently there is more than one source of information.For example, palaeoflood studies look for slackwater deposits in more than one location, in order to reduce the possibility of missing floods over the threshold level of perception.Stedinger and Cohn (1986) and Frances et al. (1994) first classified the flood information from a statistical point of view.However, in a Maximum Likelihood framework, our experience dictates it is more convenient to follow the classification presented by Naulet (2002), in which any piece (Systematic or Non-Systematic) of flood data can be classified in (see Fig. 1): -EX type.The flood peak value is known.It will correspond with most of the Systematic data and with some Non-Systematic floods with enough information to reconstruct the peak discharge.
-LB type.In this case, it is only known that the flood was bigger than a lower bound L, which is the known X H .
-UB type.For this type of data it is known that at the time of the flood, an upper bound (U ) with known magnitude was not exceeded.Again, this U corresponds with the X H .
-DB type.The flood peak is unknown and the only information about this flood is that it has a double bound.This means that the flood was within an interval with known values for the upper (U ) and lower (L) bounds.
For annual maxima analysis of a stationary process, but with variable threshold of perception in time, depending on the type of data each year i contributes to the likelihood function through one of these general expressions:

L LB,i
; where the independent and identically distributed random variable X is described for all years through its probability density function f X (•) or its cumulative distribution function F X (•); represents the parameters set; x i is the magnitude of the flood presented in the i-year; U i is the upper bound which is not exceeded in the i-year; and L i is the lower bound exceeded in the i-year.The ML estimated parameters are obtained by maximizing the logarithm of the likelihood function over the parameter space.
When dealing with Non-Systematic information, what is commonly available is a combination of the different types of data described above, as it is shown in Eq. ( 13) for the likelihood function of the case study (Sect.4).It was stressed by Frances et al. (1994), that from the statistical point of view there is no difference concerning the source of the Non-Systematic information, historical or palaeoflood, and their treatment must be completely similar.More over, with this new data classification there is not any difference also between Systematic and Non-Systematic information, with the additional advantage that always the data can have a time assigned, which will be crucial for future non-stationary flood frequency models.

Upper limit estimation
Dealing with upper bounded distribution functions, it must be carefully undertaken the estimation of the upper limit parameter (g).The first possibility is to prefix g in a specific value previously computed (called G): i.e., the parameter g is not estimated jointly with the other parameters.This was the approach used by Takara and Tosa (1999), Elíasson (1997) and Takara and Loebis (1996).In this case, G is the PMF estimate by traditional means.The PMF can be estimated by different procedures, being the most advisable the rainfall-runoff modelling, where rainfall input is the PMP estimated by the WMO maximization and transposition procedure (WMO, 1986).
A second possibility is to estimate the whole parameters set, including g, using the ML method.However, with EV4 and TDF distribution functions and when the available Non-Systematic information is a mix of EX and UB data (also called Censored Information or CE by Stedinger and Cohn, 1986), the estimated ĝ is equal to the maximum observed value of the data set (which obviously it is not the PMF) and thus, other methods to estimate g must be introduced.This follows from the fact that the likelihood function for these two distributions and this type of information decreases monotonically for g → ∞ (Kijko and Sellevoll, 1989).Kijko (2004) proposed what he called a "Generic Equation" (GE) for the estimation of the maximum earthquake magnitude, which corresponds with the magnitude of the largest possible earthquake, conceptually equivalent to the PMF for floods.This author developed this equation based on the limit of a random variable estimator proposed by Cooke (1979).The GE is valid for any cdf and it is given by where x max is the maximum observed value of the Systematic and Non-Systematic data, and n is the number of observed values (i.e., the x max order).It must be noticed that it is not possible to apply the Generic Equation when there are LB or DB data in the available information, because with this kind of data the x max order cannot be known.

Proposed estimation methodologies
In this paper and in accordance with the selected method to estimate g, the whole parameters set estimation method will be referred as following ML-C: When the whole parameters set of the distribution function is estimated by the ML method, including g as another free parameter in the maximization process: ML-GE: It will be referred to the Maximum Likelihood-Generic Equation estimation method.This method consists on the use of the Generic Equation presented above (Eq.9) to estimate g and the ML method for the rest of parameters: max L( ) where is the parameters set excluding g.The expression in Eq. ( 11) must be solved iteratively, fixing the g when maximizing the likelihood function and fixing the rest of parameters when obtaining a new g value with the Generic Equation.This procedure is repeated until the estimated g converges with the proper tolerance.ML-PG: Finally, in this case, gis fixed at the value previously calculated (G) as the best approximation for the true unknown PMF, and the other parameters are estimated by ML method: 4 The Jucar River case study The statistical models described above have been applied to the available data of the Jucar River at "Huerto-Mulet" flow gauge station, where we had Systematic annual maximum flows from 1946 to 2004 (56 years).This river is located in a semi-arid climate region of Eastern Spain and has a long period with historical information.The point of interest is close to the river sea mouth and has a basin of 21 500 km 2 , but due to meteorological reasons, actually only one third of the basin is contributing to the floods in this area.The basin mean annual precipitation is 450 mm, but it must be underlined that the Jucar River presents the typical high variability (or torrentiallity) of Mediterranean rivers, with frequent observed daily precipitation events with more than 100 mm during the Fall season.These extreme events are generated by strong Convective Mesoscale Systems positioned in the Western Mediterranean Sea (Rigo and Llasat, 2007).The main sample statistical characteristics of the instantaneous annual maximum floods are: mean = 713 m 3 s −1 , coefficient of variation = 2.74 and skewness coefficient = 5.26.The Spanish Centro de Estudios Hidrográficos (Francés, 1998) quantified the peak flow of 4 Non-Systematic floods who reached the level of an ancient convent, located within the floodplain, in 1778, 1805, 1814 and 1864.The discharge required to flood the convent is 6200 m 3 s −1 , which can be considered as the threshold level of perception for this source of information.To reduce the bias in the estimation of the number of known floods during the Non-Systematic period, as studied by Hirsch and Stedinger (1987), we have considered the beginning of this period in the middle year between first and second historical floods, eliminating the first one in the statistical analysis.So, the final selected historical period is 153 years long, since 1792 (the average between 1778 and 1805) to 1945.During this period, the Non-Systematic information can be classified as CE with 3 EX data type.It must be mentioned that, in this case study, the sensitivity of the results to this decision was small: for the three EV4 models, the maximum estimated quantile change was 5%.For more than one hundred years there was not such extraordinary floods until 1982, when the threshold was also exceeded with a flood of 12 000 m 3 s −1 , and it was almost exceeded in 1987 with a flood of 5200 m 3 s −1 .The stationarity of this long information period (more than 200 years) was tested and proved with the test of stationarity described by Lang et al. (1999 and2004), using as random variable the cumulative number of floods over the threshold of perception.As an example of application of Eqs. ( 5) to ( 8), the likelihood function for this case study is: where the first term represents the contribution to the likelihood function of the Non-Systematic UB floods (during the Non-Systematic period the annual maximum flood did not reach the convent 150 years), the y i are the three Non-Systematic EX floods and the x i are the Systematic ones.
Francés (1998) studied this case study, using the unbounded distribution function TCEV, which assumes a mixed population of "ordinary" and "extraordinary" flood events (Rossi et al., 1984), the later originated by Convective Mesoscale Systems.In this work, the statistical models applied for the flood frequency analysis of the Jucar River are combinations of the three bounded distribution functions presented in Sect. 2 and the three parameter estimation methods shown in Sect.3. The lower bound for the EV4 and LN4 has been fixed to zero, hence reducing to three the number of parameters to be estimated.In any case, this is not an influential parameter for high return period quantiles.
In order to apply the ML-PG parameter estimation method, a previous PMF value to the Jucar River catchment must be calculated.Cifres and Abad (1992), using for the PMP the maximization and transposition procedure (WMO, 1986), computed the PMF in 25 000 m 3 s −1 for the Tous dam, which is located upstream our point of interest.Assuming the same specific discharge (overestimating hypothesis), taking into account the catchment area increment and considering only the meteorologically active basin, the PMF can be extrapolated to 33 900 m 3 s −1 at Huerto-Mulet flow gauge station.With a high probability, it will be an overestimation of the PMF.With any better estimation, due to the scope of this study, this value will be used for G.
Usually, to test the model performance (distribution and estimation method) from a "descriptive" point of view (Cunnane, 1986), the fitted cdf and the plotting positions are compared graphically.In this work, the probability plotting positions with Systematic and CE Non-Systematic information were calculated with the E formula proposed by Hirsh and Stedinger (1987).Figure 2 shows the plotting positions for the Jucar data and the fits of the applied distribution functions by each parameter estimation method.
A very interesting first conclusion about the three upper bounded distribution functions is their completely different behaviour approaching the upper limit: the EV4 do it faster than the LN4 and the TDF is the slowest, even if the upper limit is the same as in Fig. 2b or, better, in Fig. 3 (which is a more general view of Fig. 2 (central), including the same upper limit).This different behaviour can be generalized for the usual parameter range of the three functions and results in most cases crucial for the model selection.
For the Jucar sample data, Fig. 2 shows the characteristic "dog-leg" effect in torrential regime rivers.It is clear the TDF distribution can not reproduce the shape of the plotting positions.The reason for this unsuitability could be that the TDF is based on a Gumbel distribution function, which, according to Francés (1998), is not appropriate for Mediterranean rivers.On the other hand, the EV4 distribution function with all the parameter estimation methods is the cdf that better reproduces the shape of the plotting positions.The triangle in Fig. 2 represents the non-exceedence probability for the threshold of perception considering the complete sample: only the EV4 (for the three estimation methods) can approach to it.In fact, the sample skewness coefficient is in the range recommended by Takara and Tosa (1999) for the EV4.This descriptive skill of the EV4 distribution function, when the "dog leg" effect is present, makes the EV4 the recommended distribution function to the Jucar River annual maximum floods.
Concerning the estimated PMF ( ĝ), for the EV4/ML-C and TDF/ML-C models is equal to 13 000 m 3 s −1 , which is the maximum observed data (the 1864 flood).This poor result was expected from the likelihood function properties in these two cases, as it was explained earlier in this paper.The ĝ estimated by models TDF/ML-GE (93 100 m 3 s −1 ) and LN4/ML-C (99 300 m 3 s −1 ) are approximately three times G (the PMF deterministic value), which are unreasonable values, whereas the ĝ estimated with EV4/ML-GE is 18 100 m 3 s −1 , almost half of G value (certainly an overestimation of the PMF) and about 40% above the maximum observed value in two centuries, which is more reasonable.

Uncertainty analysis for the EV4 model
Once the EV4 distribution function has been selected for the Jucar River case study, an uncertainty analysis has been made in order to establish the reliability of the quantiles and PMF, estimated with this distribution function and using the different parameter estimation methods (ML-C, ML-GE and ML-PG).For the sack of simplicity and without loosing generality, it will be presented the results for the case study data types and parameter values.More results can be found in Botero (2006).
The uncertainty analysis has been made by Monte Carlo simulations with two skewness coefficient scenarios.One population has a high skewness coefficient (γ x ) of γ x = 5.77 (scenario 1), which corresponds with the EV4's skewness coefficient calculated with the parameters estimated for the Jucar River.The other population has a γ x = 2.39 (scenario 2), which is lower than the first one, but it can still be considered large and possible in Mediterranean rivers.
The length of the generated series was 450 years, with a Non-Systematic period of M = 400 years with a X H with return period equal to 50 years and a Systematic period of N = 50 years, which can be typical characteristics when dealing with historical information in European cities.
The parameter estimation methods compared here are those exposed in Sect.3, but with a variation in the ML-PG method introducing some random and systematic errors in the G prefixed value.It has been assumed that the error in G value is normal with a coefficient of variation (CV G ) of 0.3 and mean equal to 10% bigger than the PMF (i.e., a systematic additional positive error of a 10% of the theoretical PMF).The aim of this modification in the ML-PG method has been to analyze how the PMF uncertainty is propagated to the quantile estimation uncertainty.Figure 3 shows the uncertainty of qT and ĝ, reflecting how it varies with the quantile return period (T ).The uncertainty is measured with the next error index: where S is the number of generated samples; θi is the estimated quantile or upper limit; and θ is the theoretical quantile or upper limit value of the distribution function.In terms of Eq. ( 14), the error introduced in G has an equivalent error index of 32%, which can be assumed reasonable in our experience.
From Fig. 4 left, which corresponds with scenario 1, it can be seen that the three estimation methods have an error between 15% and 25%, from 50 to 500 years quantiles.Below 200 years, the quantile errors are similar, but for quantiles larger than q500 , the parameter estimation method with less error is the ML-GE, which gives the less sensitive error to the quantile return period.ML-PG is the method whit higher error, which results in an error of 30% for the q10 000 , in opposition to the ML-GE with only 18%.Obviously, the errors for the ML-PG method can be reduced if the error in the a priori G value is reduced, either, its coefficient of variation or its bias.
On the other hand, results for scenario 2 (Fig. 4 right) show that for ML-C and ML-GE methods, the errors for qT and ĝ are limited to about 10%.The quantiles errors with ML-PG method are strongly controlled by the G error, even for quantile return periods smaller than 1000 years.In both scenarios ML-PG gives q10 000 and ĝ, errors close to 30%, which corresponds with the assumed G error index.It is clear that if the G estimation error were zero, this method would be the best, but it deteriorates as this error increases.
A second Monte Carlo simulation was performed in order to analyze how is the influence of the X H , characterized by its return period H , in the qT and ĝ uncertainty.The return period H of the generated series was established at 25, 50, 100 and 250 years.Figure 5 shows the quantile and ĝ estimation errors with the ML-GE method.For both scenarios, it can be observed a minimum error, located near to the qT B. A. Botero and F. Francés: Estimation of high return period flood quantiles  with a return period equal to H .After this point, the estimation error is slightly higher, but remains in a similar order.It means that the Non-Systematic information contributes to reduce the error in the flood quantiles estimation only for those quantiles of equal or larger return period than the threshold of perception.

Robustness analysis
Robustness analysis has been also made by Monte Carlo simulations.Series have been generated coming from 3 different populations: (1) EV4 population, to analyse the robustness with respect to the selected upper bounded distribution in the case study; (2) TCEV population, in order to analyse robustness with respect to an unbounded distribution with 4 parameters, which have shown good results in Mediterranean rivers; and (3) GEV population, to analyse robustness with respect to an unbounded distribution commonly used when dealing with flood frequency analysis.In this section, we will assume "similar error" if the estimation error increment is smaller than 50%.The estimation method was ML-GE for the EV4 and ML for the TCEV and GEV.
When an EV4 population is assumed (Fig. 6), or in other words, if the flood population has an upper bound, the quantiles estimated with EV4 and TCEV distributions give similar errors in scenario 1, at least up to the estimation of 10 000 years return period quantiles, whereas for scenario 2 (lower skewness coefficient), the TCEV can be used with confidence below 1000 years.On the other hand, the GEV qT error is larger than 80% for T > 200 years in both scenarios.
Assuming a GEV population (Fig. 7), the three distribution functions give similar qT errors, being the maximum difference of an absolute increase of only about 20%, compared with the GEV qT error.For low skewness coefficient (scenario 2, Fig. 7 right), the EV4 gives almost the same errors that the GEV quantile estimations for all return periods, and the TCEV for very large ones (T > 1000 years).Reader must take into account that the TCEV has four parameters (enough flexibility for a population derived from a GEV with three parameters), but on the other hand the EV4 has a fixed parameter in this study (i.e., three parameters to be estimated).
Finally, for samples with a TCEV population (Fig. 8), the quantiles estimated with EV4 give similar errors to the TCEV in scenario 2 for return periods smaller than 1000 years, thought the qT error increment in scenario 1 is limited but significant (Fig. 8 left).In both scenarios, the GEV cannot reproduce the ordinary and extraordinary flood populations contained in the TCEV and gives an estimation error increment increasing with return period, only acceptable for low quantiles.If TCEV and EV4 behaviours are compared, it should be underlined how similar they are from robustness point of view: both distributions have a limited estimation error increment for high skewness coefficient (scenario 1, Figs. 6 and 8 left) and increasing with return period for relative low skewness coefficient (scenario 2, Figs.6 and 8 right).If we consider just the "descriptive" ability of the model (as Cunnane, 1986, refers to), it means mixed population and upper bound hypothesis can be interchanged below the PMF, particularly for very high skewness coefficient populations.
Dealing with PMP or PMF, a general assumption in the hydrological community is that these upper bounds can be only estimated deterministically, with the single exception of Elíasson (1994) for PMP estimation.However, it has been rigorously shown in this paper that, using statistical analysis with upper bounded distribution functions and introducing additional Non-Systematic information, it is possible to estimate very high return period quantiles and the PMF with acceptable and similar estimation errors, as it is shown in Figs. 4 and 5.
Based on the ML method, three different methodologies to estimate the parameters of the bounded distribution functions can be implemented, depending on the available flood information.If there is a good deterministic a priori estimation of the PMF, the ML-GP method is advised.Otherwise, ML-C can be used considering the upper limit as one more parameter.But in some combinations of distribution function and information (for example, the EV4 and TDF with additional CE Non-Systematic information), the upper limit estimated by ML-C method is equal to the maximum observation.In these situations, only the ML-GE method can be used.
Between the LN4, TDF and EV4 upper bounded distribution functions, the latest is the distribution function which better represents the shape of the empirical distribution function of the Jucar River, which has a high skewness coefficient and, consequently, in accordance with the results obtained by Takara and Loebis (1996) and Takara and Tosa (1999).Combining this distribution function with the three proposed parameter estimation methods, the Jucar series has been fitted and it has been possible to estimate statistically a PMF value for this river with an error of 50% (obtained by Monte Carlo analysis).In any case, the resulting estimate is not out the possible range of the Jucar River PMF at its sea mouth.
The uncertainty analysis shows that the EV4/ML-GE statistical model is the most adequate among those presented in this paper, for the estimation of high return period quantiles and PMF when dealing with CE type Non-Systematic information.
For the sensitivity analysis case study (with more historical information than in the Jucar River), the q10 000 and ĝ (PMF estimate) estimation error using the EV4/ML-GE model is approximately about 20%.Nevertheless, this model shows a slight sensitivity to the sample skewness, giving a reduction for the qT and ĝ estimation error as the skewness coefficient is reduced.It must be pointed out the ĝ error with all methods is close to the q10 000 error.Actually, if we acknowledge the 10 000 years return period quantile estimation error is admissible, we must admit also the ĝ estimation error using the statistical approach given by ML-C or ML-GE.
The q10 000 error obtained with the model EV4/ML-PG is approximately the error associated with the deterministic estimation of the PMF.It means that if it is available a previous value of the PMF, with its associated uncertainty, it is possible to estimate high return period quantiles with equal or less uncertainty using the EV4/ML-PG model.However, introducing a prior estimation of G with relative small errors (as we are using in this work) is worst than do not use it and estimate it statistically by ML-GE or ML-C methods (Fig. 4).
For the information scenario used in the robustness analysis, it can be concluded the EV4/ML-GE model can satisfactory fit samples coming from unbounded populations (GEV, Fig. 7) or unbounded mixed populations (TCEV, Fig. 8).On the contrary, if the flood population is upper bounded, the TCEV distribution function in general cannot be used for very high return periods and the GEV distribution function only can be accepted for the estimation of low return period quantiles, as it is shown in Fig. 6.So, it can be generalized with an upper bounded population, the quantiles estimated using unbounded distributions with return period large enough, will be higher than the PMF and, consequently, with large unacceptable errors.For return periods producing quantiles similar to or smaller than the PMF, the error increment compared with the use of the "true" upper bounded distribution will depend of the fitted right tail distribution properties.

Fig. 2 .
Fig. 2. Applied models to the Jucar River.Stars are the sample plotting positions given by the E formula.Triangle represents the X H = 6200 m 3 s −1 plotting position.Figures correspond to each estimation method: ML-C (left), ML-PG (central) and ML-GE (right).

Fig. 3 .
Fig. 3. TDF, EV4 and LN4 different behaviour approaching the same upper limit.Parameters for each distribution function are the same than in Fig. 2 (central): the case study data with the ML-PG estimation method.