Probability distribution of flood flows in Tunisia

L (Linear) moments are used in identifying regional flood frequency distributions for different zones Tunisia wide. 1134 site-years of annual maximum stream flow data from a total of 42 stations with an average record length of 27 years are considered. The country is divided into two homogeneous regions (northern and central/southern Tunisia) using a heterogeneity measure, based on the spread of the sample L-moments among the sites in a given region. Then, selection of the corresponding distribution is achieved through goodness-of-fit comparisons in L-moment diagrams and verified using an L moment based regional test that com- pares observed to theoretical values of L-skewness and L- kurtosis for various candidate distributions. The distributions used, which represent five of the most frequently used distri- butions in the analysis of hydrologic extreme variables are: (i) Generalized Extreme Value (GEV), (ii) Pearson Type III (P3), (iii) Generalized Logistic (GLO), (iv) Generalized Nor- mal (GN), and (v) Generalized Pareto (GPA) distributions. Spatial trends, with respect to the best-fit flood frequency distribution, are distinguished: Northern Tunisia was shown to be represented by the GNO distribution while the GNO and GEV distributions give the best fit in central/southern Tunisia.


Introduction
Peak or flood flow is an important hydrologic parameter in the determination of flood risk, management of water resources and design of hydraulic structures such as dams, the same region are pooled to get an efficient estimate of parameters of a chosen distribution and hence a more robust quantile estimate. Bobee and Rasmussen (1995) reported that the use of regional information allows a reduction of sampling uncertainty by introducing more data, as well as a reduction of model uncertainty by facilitating a better choice of distribution. 10 Recently, research efforts focused more on regional rather than the conventional at-site flood frequency analysis. Hosking and Wallis (1993) organized regional flood frequency analysis into 4 stages: (i) screening of the data, (ii) identification of homogeneous regions, (iii) choice of a regional probability distribution, and (iv) estimation of the regional flood frequency distribution. Recent research efforts also focused on 15 the use of L-moment diagrams for the identification of flood frequency distributions, such as the studies performed in Bangladesh (Abdul karim and Chowdhury, 1995), New Zealand (Pearson, 1991), Australia (Nathan and Weinmann, 1991), Canada (Pilon and Adamowski, 1992;Nguyen, 2006), United States (Wallis, 1988;Vogel and Wilson, 1996), China (Jingyi and Hall, 2004), India (Rakesh and Chandranath, 2006) 20 and the globe (Onoz and Bayazit, 1995). In fact, there appear to be a general worldwide agreement among agencies and governments to re-evaluate their flood frequency standard procedures using L-moment based techniques.
In this context, this study uses L-moment diagrams to select the flood frequency distribution that best fits the annual maximum flood flows in Tunisia. The paper first 25 presents a survey of similar previous L-moment-based studies all over the world. Then, the study area and the data used in the numerical analysis are described. Next, the flood frequency identification procedure is presented. Finally, the results of the analysis are discussed and summarized. Many statistical distributions for flood-frequency analysis have been investigated in hydrology. Annual flood series were found to be often skewed, which led to the development and use of many skewed distributions, with the most commonly applied 5 distributions now being the Gumbel (EV1), the Generalized Extreme Value (GEV), the Log Pearson Type III (LP3), and the 3 parameter Lognormal (LN3) (Pilon and Harvey, 1994). The proponents of each distribution have been able to show some degree of confirmation for their particular distribution by comparing theoretical results and measured values. However, there is no theoretical basis for justifying the use of one specific 10 distribution for modeling flood data and long term flood records show no justification for the adoption of a single type of distribution (Benson, 1968). Different studies were undertaken on distribution selection for flood data in different countries all over the world. Beard (1974) estimated the 1000 year floods at 300 stations in USA with 14200 station-years of data by eight different models and concluded, 15 based on split sample experiments, that the two parameter lognormal (LN2) and the log Pearson Type III (LP3) were the best. Gunasekara and Cunnane (1992) repeated the split sample experiments of Beard (1974) with synthetic data consisting of samples of 40 events. They concluded that the GEV distribution with probability weighted moments (PWMs) estimated parameters was the best at-site method to estimate the 100 20 and 1000 year floods and the LP3 with regional skew yielded comparable results. McMahon and Srikanthan (1981) used the moment ratio diagrams to compare various distributions with the data from 172 streams in Australia and concluded that LP3 was the only one suitable. Farquharson et al. (1987) fit a GEV distribution to annual flood flow data at 1121 gauging stations in 70 different countries using probability 25 weighted moments. McMahon et al. (1992) and Finlayson and McMahon (1992) analyzed annual maximum flood flow data at 974 stations around the world using ordinary product moment diagrams. The authors tested several probability distributions and 960 Introduction EGU concluded that the LP3 distribution provided the best fit to observed flood flow data. However, other testing methods should have been used in this study because the estimates of ordinary product moment ratios such as the coefficient of variation and skewness contain significant bias (Vogel and Fennessey, 1993), especially for small and highly skewed samples.

Standard distributions adopted by National Institutions in the World
Based on large scale studies of their own flood data, many countries adopted standard methods to be used by governments or private agencies to achieve uniformity in flood frequency analysis and estimation. A working group in the USA (US Water Resources Council (Benson, 1968) recommended the LP3 distribution whereas a similar study in the United Kingdom (NERC, 1975) proposed the GEV distribution as a standard. The generalized gamma distribution was recommended in the former USSR (Kritsky and Menkel, 1969) while the P3 and the LP3 distributions were generally recommended in West Germany. The LP3 distribution was also advocated by the Institution of Engineers in Australia (Institution of Engineers, Australia (IEA, 1977).

15
More recently, a worldwide survey of flood frequency methods, prepared for the World Meteorological Organization in 1984 and involving 55 agencies from 28 countries, reported the use of six distributions namely EV1, EV2, GEV, LN2, P3, and LP3. The survey, which was summarized by Cunnane Cunnane (1989), revealed that EV1, LN2, P3, and LP3 were the most common distributions while only one country used 20 the GEV distribution in spite of its recent popularity.

L-moments and flood frequency analysis
In the last century, probably one of the most significant scientific contributions to statistical hydrology is the L-moments of Hosking (1990). The advantages of the L-moments are that (i) they characterize a wider range of distributions than conventional moments, 25 (ii) they are less sensitive to outliers in the data, (iii) they approximate their asymptotic EGU normal distribution more closely, and (iv) they are nearly unbiased for all combinations of sample sizes and populations (Hosking and Wallis, 1990). Wallis (1988), Cunnane (1989) and Hosking (1990) illustrated that, compared to the product moment ratio diagram, the L-moment ratio diagrams possess a better ability to discriminate between distributions. Vogel and Fennessey (1993) reported that con-5 ventional product moment estimators should be replaced by L-moment estimators for most goodness-of-fit applications in hydrology. They showed that L-moment diagrams perform always better than ordinary product moment diagrams, regardless of the sample sizes, probability distributions, or skews involved. Cong et al. (1993) reported that L-moment goodness-of-fit tests are more robust than classical single-site goodness-10 of-fit tests since they use regional rather than single-site data to discriminate between alternative distributions.
Numerous studies have used L-moment diagrams in regional flood frequency analysis, most of which are summarized in Table 1. In spite of this recent tendency of using L-moments world-wide, Klemes (2000a, b) articulated some cautionary notes about 15 their use in hydrological frequency analysis. He claimed that L-moments artificially impose a structure upon a data set and de-emphasize the importance of observed extremes, which leads to the underestimation of extreme design events. However, Alila and Mtiraoui (2002) argued that if the annual floods in a sample are distributed identically and the outliers are caused by sampling variability (for instance, a 100-year event 20 in a 10-year sample) they should not be given an undue weight. If any historic information can be found for any high outlier, a reasonably well-established method, referred to as flood frequency analysis with historic information, could be used (Pilon and Harvey, 1994). Unfortunately, however, in the absence of any historic information, such high outliers are often either removed from the sample or simply ignored and, consequently, 25 the use of conventional moments would either over-or underestimates the T-year flood event. Therefore, in this case, it is more rational to use a method that is less sensitive to outliers in the data, such as L-moments.
As a conclusion, L-moments provide undeniable advantages over conventional mo- EGU ments in using flood frequency analysis for the estimation of flood quantiles. This is particularly true when considering regional trends in higher order moment statistics. The use of L-moments permits the delineation of regional trends that otherwise might be obscured by biases and sampling variability [variability (Cathcart, 2001).
3 Study area 5 Tunisia is a relatively small (162 155 km 2 ) North African country, located at the northeastern tip of Africa at the center of the Mediterranean Sea. Linked on the west to the rest of North Africa by the mighty ridges of the Atlas Mountains, it stretches out to the south into the Sahara, of which it occupies a small part.
Opening on its northern and eastern fronts to the Mediterranean Sea, Tunisia enjoys 10 a clement and mild although notoriously capricious climate. By its latitude it is situated halfway between the temperate zone and the tropical zone, forming thus a meeting place at which cold air masses are confronted by the masses of warm air coming from the tropical regions. It has a rather unstable climate. When it is swept at the equinoxes by tides of opposing depressions, the result is severe cold fronts along with violent 15 storms and frequent downpours. With a general profile stretching lengthwise from north to south, Tunisia shows some climatic variations accentuated by its diversified geographical aspect. The Atlas Mountains stretching from east to west create a variety of large climatic areas distinct from each other mainly by their rainfall. Rainfall in Tunisia might be crudely characterized by its shortage, irregularity and -The High Tell or Northern Tunisia, characterized by its fertile soil and its high degree of moisture. It is an area of high mountains surrounding plains irrigated by the Medjerda River and its tributaries. The western Tell is continued by the northeastern Tell, a maritime area on account of its being deeply penetrated by 5 the Gulf of Tunis and its climatic influences. This is an area of plains and hills crossed by large rivers such as the Medjerda and some of its tributaries and the Oued Miliane.
-Central Tunisia is the region that covers the high and low steppes stretching out to the eastern coast. The high steppes represent a region of lofty mountains and 10 wide hollow dips, cut across by large creeks (wadis). The vegetation is made up of forests, often stunted, and fields of alfalfa grass. The continental climate contributes to the barrenness of the region. Further to the east the low steppe stretches over wide alluvial plains and hills cut across by large creeks running down the Atlas Mountains.

15
-Southern Tunisia, bordered on the west by Algeria and on the east by Libya, is jutting out into the Sahara, of which it occupies a part.

Data used
A total of 49 annual flood series representing natural hydrologic regimes, obtained from the publications of the Tunisian Ministry of Agriculture and Water Resources, were used 20 for the identification of the appropriate flood frequency distribution. Discharges were estimated by observing water levels and employing pre-calibrated rating curves to convert measured stages to observed flow rates. Rating curves were determined through velocity measurements using a current meter and graphic integration of the velocity distribution over the entire cross-section. Regulated stations, 25 influenced by the existence of hydraulic structures, were eliminated.

EGU
The annual flood data series need to be independent, random, homogeneous, and without trends. These properties were verified by four nonparametric tests using the Consolidated Frequency Analysis (CFA) package of Environment Canada (Pilon and Harvey, 1994).
Only 37 gauged stations met the screening criteria of having a minimum record 5 length of 10 years, representing unregulated natural flow regimes, and passing all of the nonparametric tests at the 5% level of significance.

Procedure used to select a distribution
The study area was divided arbitrarily into 3 sub-regions, for which separate flood frequency analysis procedures were performed. The procedure adopted to select appropriate flood frequency distributions, first, uses the three statistical measures for regional flood frequency analysis of Hosking and Wallis (1993): (i) a discordance measure for identifying unusual sites in a region, (ii) a heterogeneity measure, for assessing whether a proposed region is homogeneous, and (iii) a goodness of fit measure, for assessing whether a given distribution provides an adequate fit to the regional 15 annual maximum flood flow data. Then, flood frequency distributions are selected from L-moment diagrams that compare observed to theoretical values of L-skewness and L-kurtosis for various candidate distributions. In the selection process, either the weighted sample average or the line of best fit through the data points is used in the comparison with theoretical curves, depending on the outcome of the hetetrogeneity 20 test, as was recommended by Peel et al. (2001).

Discordance and heterogeneity tests
First, data screening was performed using the discordance measure of Hosking and Wallis (1993)

EGU
This resulted in decreasing the stations' number from 37 to 31. Homogeneity testing was performed then, through the heterogeneity measure H, which is based on the spread of the sample L-moments among the sites in a given region. This statistic basically compares the between-site variations in sample L moments for the group of sites with what would be expected for a homogeneous region. Hosking and Wallis (1993) 5 suggested that the region under testing should be regarded as "acceptably" homogeneous if H<1, "possibly" heterogeneous if 1<H<2, and "definitely" heterogeneous if H>2.
Homogeneity was investigated only with respect to skewness and kurtosis because these dimensionless statistical characteristics are commonly used to identify candidate 10 regional flood frequency distributions. Homogeneity in the coefficient of variation was not considered because this statistic was shown to vary, among other things, with the size of the catchments and therefore constancy can not be achieved in any geographical region (Gupta et al., 1994).

15
The goodness-of-fit test used compares the observed regional L-skewness and Lkurtosis to the theoretical values of various candidate distributions (Hosking and Wallis, 1993): Wheret 4 is the regional average L-kurtosis of the observed network in the homoge-20 neous region and τ DIST 4 is the theoretical L-kurtosis, and σt 4 is the standard deviation oft 4 obtained by repeated simulations of the homogeneous region with the DIST frequency distribution as a parent. Based on Monte Carlo simulation performed by Hosking and Wallis (1993), the goodness-of-fit of a particular distribution should be 966 Introduction EGU considered acceptable at the 90% confidence level if |Z| ≤1.64. The Z-test uses regional data as opposed to single-site information. Therefore, it is more reliable than single-site goodness-of-fit testing. The Z-test discriminates between five of the most frequently used distributions in the analysis of hydrologic extreme variables, namely: (i) Generalized Extreme Value (GEV) Pearson type III (P3), (iii) Generalized Logistic 5 (GLO), (iv) Generalized Normal (GN), and (v) Generalized Pareto (GPA) distributions.

Graphical goodness-of-fit from L-moment diagrams
An L-moment ratio diagram of L-kurtosis versus L-skewness compares sample estimates of the dimensionless ratios with their population counterparts for a range of statistical distributions. It has the advantage of comparing the fit of several statistical distributions with observed data using a single graphical instrument. L-moment diagrams are useful for discerning groupings of sites with similar flood frequency behavior, and identifying the statistical distribution likely to adequately describe this behavior. The distances separating sample points from the curve for a certain distribution can be taken as a measure of the goodness of fit. Peel et al. (2001) demonstrated that 15 the graphical selection process of a distribution from the L-moment ratio diagram depends on the homogeneity of regional data. If the regional data are homogeneous, the selection should be based on comparison of theoretical curves with the weighted sample average. On the other hand, for very heterogeneous regional data the line of best fit through the data points known as LOWESS (LOcally WEighted Scatterplot 20 Smoothing) should rather be considered. In this study, the delineated regions correspond to the three arbitrarily chosen areas described above, which cannot be claimed to be homogeneous. Therefore, similarity between theoretical distribution curves and LOWESS was adopted in the selection of the most suitable flood frequency distribution from the L-moment diagram for any particular region. EGU 7 Results and discussions

HESSD
The regional weighted average L-skewness and L-kurtosis were computed for the three considered regions, based on flood data series for only 31 stations, and the corresponding results are shown in Table 3. Computations of L-skewness and L-kurtosis values for the entire territory were also performed. The values obtained were very 5 close to those of northern Tunisia, since 75% of the stations were located in the north. The smallest L-moment values were obtained for the north while the highest were associated with the south. These differences are generally small in spite of the contrasting climatic and physiographic differences, which affect flood flows from a region to another in Tunisia. However, small differences in L-skewness and L-kurtosis values usually re-10 sult in substantial differences in the tail characteristics of flood frequency distributions and therefore in different flood quantile estimates. Table 3 also presents H-values, and recommended best-fit distributions for all regions considered, based on graphical fit between LOWESS and theoretical distribution curves in L-moment ratio diagrams on one hand, and the Z-test of Hosking and Wallis 15 (1993) on the other. It can be shown from Table 3 that all H-values, except for that of northern Tunisia, are smaller than 2, and therefore they pass the homogeneity test. It is also interesting to note that H-values, describing homogeneity, generally increase from south to north.
Based on stream flow data properties ( EGU ing the boundaries of the previously delineated regions. The final delineation is presented in Fig. 2 and the corresponding H values are 1.9 and 0.68 for northern and central/southern Tunisia respectively. Figures 3 to 5 compare the observed relationships between L-kurtosis and Lskewness of annual maximum flood flows with the theoretical probability distributions: 5 GLO, GEV, GPA, P III, and GN. Also shown on the same figures are the locally weighted scatterplot smoothings (LOWESS) of L-skewness/L-kurtosis data, with the corresponding correlation coefficients. The distributions selected based on this graphical exercise, are GLO for the whole country and central part and GEV for northern Tunisia. The distribution selection from L-moment ratio diagrams was confirmed using the Z-test. 10 The distribution selection from L-moment ratio diagrams was confirmed using the Ztest. The only exception was for Central and Southern Tunisia, where L-moment ratio diagrams using LOWESS, yielded GN distribution and Z-tests resulted in recommending GLO distribution. Since the distributions selected from L-moment ratio diagrams using LOWESS, as was recommended by Peel (2001), should only be used for het-15 erogeneous zone, the Z-test selection is considered more appropriate. The average Weighted L-Skewness and L-Kurtosis for the region was also represented in the diagram. Therefore GLO distribution is assigned to central and southern Tunisia.
The final outcome of both L-moment diagram and Z statistical test is therefore the distribution GEV for northern Tunisia and GLO for central Tunisia. These zones are in 20 fact, characterized by relatively different physiographic and climatic conditions, which reflects the importance of these characteristics in selecting the appropriate flood frequency model.