A multiple threshold method for fitting the generalized Pareto distribution to rainfall time series

Previous studies indicate the generalized Pareto distribution (GPD) as a suitable distribution function to reliably describe the exceedances of daily rainfall records above a proper optimum threshold, which should be selected as small as possible to retain the largest sample while assuring an acceptable fitting. Such an optimum threshold may differ from site to site, affecting consequently not only the GPD scale parameter, but also the probability of threshold exceedance. Thus a first objective of this paper is to derive some expressions to parameterize a simple threshold-invariant threeparameter distribution function which assures a perfect overlapping with the GPD fitted on the exceedances over any threshold larger than the optimum one. Since the proposed distribution does not depend on the local thresholds adopted for fitting the GPD, it is expected to reflect the on-site climatic signature and thus appears particularly suitable for hydrological applications and regional analyses. A second objective is to develop and test the Multiple Threshold Method (MTM) to infer the parameters of interest by using exceedances over a wide range of thresholds applying again the concept of parameters threshold-invariance. We show the ability of the MTM in fitting historical daily rainfall time series recorded with different resolutions and with a significative percentage of heavily quantized data. Finally, we prove the supremacy of the MTM fit against the standard single threshold fit, often adopted for partial duration series, by evaluating and comparing the performances on Monte Carlo samples drawn by GPDs with different shape and scale parameters and different discretizations. Correspondence to: R. Deidda (rdeidda@unica.it)


Introduction
Several rainfall modelling approaches for hydrological applications use a simple representation of the rainfall process and assume that the marginal distribution of rainy and nonrainy values x at daily or any other fixed time scale can be described by the following Cumulative Distribution Function (CDF): x ≥ 0 where ζ 0 =Pr{X>0|X≥0} represents the probability of occurrence of rainy days, while F 0 (x)=Pr{X≤x|X>0} is the CDF of only rainy values.
Equation (1) has some advantages, but it also presents some potential problems that must be taken into account and properly managed.A great advantage obviously relies on the simplicity of this representation that allows to easily simulate rainfall time series by reproducing separately the binary process of rainfall occurrences (i.e. the succession of wet and dry periods) on one hand, and the distribution of rainfall values on rainy days on the other hand.E.g. this is the working mode of simple weather simulators implemented in some widely used models, such as EPIC (Erosion-Productivity Impact Calculator) and SWAT (Soil and Water Assessment Tool), in which the temporal sequence of wet/dry days is often modelled by Markov chains, while the distribution F 0 (x) R. Deidda: A multiple threshold method for fitting the generalized Pareto distribution is fitted on all strictly positive rainfall records and then used to fill in the records of rainy days in the Markov chain (e.g., Nicks, 1974;Nicks et al., 1995;Williams, 1995).
The simple form of Eq. ( 1) suggests fitting F 0 (x) on all strictly positive rainy observations, but particular care should be taken in this (seemingly very simple) approach.Indeed, the distribution of very small values may not be clearly definite and may depart from the distribution of the bulk of higher records for several reasons, including: (i) small values may be due to dew processes rather than being the result of true rainfall events; (ii) measurements of very small rainfall values may be seriously affected by local atmospheric interactions (e.g.evaporation and wind); (iii) small rainfall amounts manually collected by non-recording rain gauges may be sometimes classified as rainy or non-rainy records depending on the subjective judgment of the person in charge of the observation.Moreover, whatever the cause may be, there is empirical evidence that small values often depart from the distribution of the bulk of rainfall observations.Thus, whatever distribution F 0 (x) is candidate to describe daily rainfall records, a robust criterion is needed to infer parameter values only on records exceeding a proper optimum threshold, in order to be confident that all the censored values belong to the same distribution.
We also want to highlight that fitting a distribution function F u (x)=Pr{X≤x|X>u} on the records above a given threshold u leads in general to parameter estimates that differ from those of F 0 (x), even if F 0 (x) and F u (x) belong to the same family.For practical applications it is thus particularly useful to derive relationships to parameterize Eq. ( 1) with threshold-invariant parameters by assuring a perfect overlapping with the distribution F u (x) for any x>u, regardless the value of the threshold u.
The first objective of our work is thus the derivation of such relations.Although some developments presented in this paper hold for any distribution function F u (x), we specifically focus on the generalized Pareto distribution (GPD) (Pickands, 1975) for the following reasons.
First, under certain conditions, the GPD family has important connections with the generalized extreme value distribution (GEV) family (e.g. the shape parameter is expected to be the same asymptotically as the threshold u → ∞, while the other parameters are linked through theoretical relations), thus fitting GPD can give us a more accurate insight into the maxima.Referring the reader to Gumbel (1958), Castillo (1988), and Coles (2001) for a review of the GEV and GPD properties and derivations, we just remark that if there exists a limiting distribution of the block-maxima extracted from our samples (usually yearly maxima in Earth sciences), this distribution belongs to the domain of attraction of the GEV family (Fisher and Tippett, 1928;Gnedenko, 1943).In addition, under these conditions, the Balkema -De Haan -Pickands theorem (Balkema and de Haan, 1974;Pickands, 1975) states that the limit distribution of scaled excesses over high enough thresholds has a corresponding approximate distribution within the GPD family.For hydrological extreme events modelling, Madsen et al. (1997a,b) generalized previous findings by Cunnane (1973) and showed that fitting a GPD on a reasonable number of exceedances over a proper threshold leads to more accurate extreme quantile estimates than fitting a GEV on annual maxima.We remark also that it would be desiderable to select as low an optimum threshold as possible in order to minimize the estimation variance when fitting the GPD on observed samples (Coles, 2001).With this aim, graphical and numerical methods have been proposed and applied by several authors (e.g., Davison and Smith, 1990;Smith, 1994;Lang et al., 1999;Dupuis, 1998;Choulakian and Stephens, 2001;Guillou and Hall, 2001;Peng and Qi, 2004), but what can be assumed as an optimal threshold for rainfall observations is still an open question without a definitive answer.In a recent study, under the hypothesis that the rainfall process can be described by multiplicative models, Veneziano et al. (2009) highlighted some convergence problems for the GPD shape parameter when fitting the distribution function on records above a finite threshold.Finally, the detection of an optimum threshold becomes even more difficult, if not impossible using available methods, on heavily quantized records (Deidda and Puliga, 2006).In this framework, the GPD fitting approach proposed in this paper makes it possible to overcome some of these problems, such as the estimation bias related to heavily quantized records and to non asymptotic thresholds (Veneziano et al., 2009).
A second reason for the adoption of the GPD is that its mathematical form leads to very simple equations for the parameterization of Eq. (1) using results of inference on records censored with any threshold.Indeed, for thresholds larger than the optimum, the shape parameter of the GPD is expected to be constant, while the scale parameter should linearly depend on the threshold value.Thus simple linear equations for reparameterization of the scale parameter have been proposed (see e.g., Madsen et al., 1997b;Coles, 2001).Beguería (2005) analyzed several daily time series in Spain and used these expressions to estimate the scale parameters corresponding to the on-site optimum thresholds by averaging the reparameterized scale values obtained for a range of thresholds.Nevertheless, a drawback of this approach is that the final scale parameter estimates depend not only on the local climatic conditions but also on the on-site optimum threshold.In this paper we generalize these concepts in order to eliminate the dependence of the scale parameter on the threshold and we also provide a threshold-invariant parameterization for the ζ 0 parameter.Specifically, we rewrite Eq. (1) using only three parameters to describe the rainy and non-rainy records, regardless of the thresholds adopted to fit the GPD on the exceedances.In such a way Eq.(1) becomes independent from the threshold with undoubted advantages for practical applications and regional analyses.
The second objective of this paper is to propose and test the Multiple Threshold Method (MTM) which is based on the threshold-invariant GPD parameterization of Eq. ( 1) and provides a fitting to Eq. ( 1) on the excesses above a proper range of thresholds.Although the motivation for the development of the MTM comes from the need to improve the fitting on irregularly discretized records, as is often the case for manually collected rainfall measurements, we show that its performances are superior anyway to standard singlethreshold fitting on regularly discretized data.The need of such a technique is motivated by the discretization usually adopted for rainfall records, which can be the common standard resolution of 0.2 mm for tipping-bucket rain gauges in Europe (or 0.254 mm in the US), but can also become higher for records manually collected by non-recording rain gauges.For example, Deidda (2007) highlighted that many time series collected by the Sardinian Hydrological Survey (Italy) contain anomalous quantities of daily rainfall records rounded off at unexpected resolutions of 0.5, 1 and 5 mm/d.Recently, Deidda and Puliga (2009) evaluated and compared the performances of several estimators of the GPD parameters on discretized samples.Specifically, they considered some widely used estimators such as those based on maximum likelihood, simple moments and probability weighted moments (Hosking and Wallis, 1987), as well as other recently proposed GPD estimators such as those based on the maximum penalized likelihood (Coles and Dixon, 1999), the minimum density power divergence (Juárez and Schucany, 2004), the likelihood moment estimator (Zhang, 2007), the median estimator (Peng and Welsh, 2001).Nevertheless, Deidda and Puliga (2009) concluded that none of the considered methods provides acceptable estimates when records are discretized at a resolution of 1 mm or larger.Indeed bias and root mean square errors of parameter estimates are often of the same magnitude as the site-to-site variability of the parameter values to be estimated.In this paper we show, using observed as well as synthetic time series, how the Multiple Threshold Method is able to overcome these fitting problems even on roughly rounded-off and heavily quantized records.
The paper is organized as follows.Section 2 briefly describes the database.In Sect.3, we derive some relations between Eq. ( 1) and distribution functions fitted on the exceedances above any threshold, then we provide specific equations to reparameterize the GPD and finally rewrite Eq. (1) with only three threshold-invariant parameters.In Sect.4, we introduce the MTM and present some examples of application on daily rainfall time series.In Sect.5, the performances of the MTM are evaluated on Monte Carlo samples drawn by GPD, while Sect.6 is devoted to the conclusions.

Database
Some of the analyses and figures presented in the following sections were performed on daily rainfall time series collected by the Sardinian Hydrological Survey (Italy) from 1922 to 1996: specifically, we used 217 time series with more than 40 complete years of records.Most of the series were collected by non-recording standard rain gauges and discretized with resolutions up to 1 and 5 mm (Deidda, 2007), while only a subset was obtained by tipping-bucket rain gauges and was correctly discretized at 0.2 mm.Time series are used with a twofold objective: to show the MTM working on historical records and to select representative GPD parameters for evaluation of MTM performances on synthetic samples.

Some basic relationships
We derive here some general relationships among the marginal distribution F (x) in Eq. ( 1) and distribution functions F u (x) of the exceedances over any threshold u≥0 (Sect.3.1).Results are then applied to parameterize Eq. (1) using GPD parameter estimates on left-censored records in order to obtain a three-parameter distribution which describes rainy and non-rainy values (Sect.3.2).A reader who is not interested in the details of derivation of such relationships may skip Sect.3, just keeping in mind Eqs. ( 8) and ( 9), which are needed to reparameterize the GPD in Eq. ( 10) using estimates obtained with any threshold u.

Some relations among uncensored and left-censored distribution functions
We want first to derive some relationships among F (x) = Pr{X≤x|X≥0}, F 0 (x) = Pr{X≤x|X>0}, and F u (x) = Pr{X≤x|X>u}, in order to obtain a perfect overlapping among these Cumulative Distribution Functions (CDFs) for any x>u as sketched in Fig. 1.
Using simple arguments of probability we can write for x>u.These equalities lead to the following relationship between F (x) and F u (x) for any x>u: where ζ u =Pr{X>u|X≥0}=1−F (u) represents the survival function (i.e. the probability to observe excesses of u), while F u (x) is the CDF of x>u only.Nevertheless, since x−axis for x−axis for F(x) x−axis for Fig. 1.
The sketch depicts some relations among the cumulative distribution functions (CDFs) F (x)=Pr{X≤x|X≥0}, F 0 (x)=Pr{X≤x|X>0}, and F u (x)=Pr{X≤x|X>u}, which are used in the text to determine the constraints for overlapping of all the CDFs for any x above the threshold u.Cartesian axes of F (x) are drawn with a thin line and characteristic values are reported on the left side, while the axes of F 0 (x) and F u (x) are drawn with dashed and solid thick lines, respectively, with values reported on the right side.
F u (u)=lim x→u + F u (x)=0, Eq. ( 2) becomes valid for any x≥u and thus includes also Eq. ( 1) as a special case for u=0.
Using similar arguments we can write u) to obtain a relationship between F 0 (x) and F u (x): Finally, computing Eqs.(1) and (2) for x=u, eliminating F (u) among the equations, and putting F u (u)=0 we obtain: We highlight that all the above equations hold for any distribution function F u (x) adopted to fit the exceedances above a threshold u.The same equations can be derived by the following proportions in Fig. 1:

GPD reparameterization
Now let us assume that for a given threshold u the exceedances of our sample could be reliably described by a generalized Pareto distribution (GPD): where ξ is the shape parameter, α u the scale parameter, while u is the threshold value.
The ξ parameter controls the tail behaviour of the distribution and the attitude to originate heavy extremes.For ξ =0 the distribution has the ordinary exponential form.For ξ >0 the distribution has a long right tail, thus it is often referred to as "heavy tailed distribution": in this case it is worth noticing that simple moments of order greater than or equal to 1/ξ are degenerate, thus estimators based on ordinary moments can be applied to fit Eq. ( 6) only for ξ 1/2 to prevent degeneration of the first two moments and consequent parameter estimation biases (Hosking and Wallis, 1987).For ξ <0 the distribution is short tailed with an upper bound value (u−α u /ξ ).For a given ξ , the scale parameter α u controls the mean of the exceedances above the threshold u.Finally, the threshold u cannot be considered a true distribution parameter: indeed, the value of u must be specified (and used for left-censoring the sample) before fitting Eq. ( 6) since the GPD is a distribution of threshold excesses.
As discussed in the Introduction, in literature several methods have been proposed to infer the shape ξ and the scale α u parameters of the GPD once the threshold u has been set.Concerning the probability ζ u to observe an exceedance of the threshold u, since the number of exceedances follows a binomial distribution, the same following estimator can be derived by the maximum likelihood, the simple moments, and the probability weighted moments methods: where N u is the number of records above the threshold u and N is the sample size (including the zeros).
The generalized Pareto distribution has an important property.If a sample can be reasonably considered drawn by a GPD with threshold u * and shape parameter ξ , then the excesses of any other threshold u>u * should also follow a GPD with the same shape parameter ξ and a scale parameter α u which will linearly change with the threshold u.
Now, let us assume that GPD in Eq. ( 6) is a reasonable model for the exceedances over a given threshold u and that parameters ξ , α u and ζ u have been estimated using exceedances over this threshold.Our objective is to parameterize equations F (x) and F 0 (x) by imposing a perfect overlapping with F u (x) for any x>u, as depicted in Fig. 1 and formalized by the equations derived in Sect.3.1.In doing so let us assume that also F 0 (x) is a GPD with threshold u=0 and parameters α 0 and ξ , and that it can be expressed by Eq. ( 6) with u=0.
Substituting F 0 (x) and F u (x) from Eq. ( 6) into Eq.( 3) we can easily obtain: where the subscript u is used to label parameter estimates (including ξ ) on the basis of the threshold used.
Thus, if a suitable threshold has been selected (so that the excesses can be reliably represented by a GPD), by virtue of Eq. ( 8) the α 0 reparameterization should be invariant for any higher threshold (even if α u changes with u).As discussed in the introduction, similar equations have been proposed and used to reparameterize the scale parameter α u * corresponding to the optimum threshold u * by using α u estimates obtained for thresholds u>u * (see e.g., Madsen et al., 1997b;Beguería, 2005).Nevertheless, in such approaches α u * estimates will depend not only on the local climatic conditions but also on the local optimum threshold u * , which may be different from site to site.In contrast, results from Eq. ( 8) do not depend on the on-site optimum threshold.Finally we highlight that Eq. ( 8) can also be derived by the linkage between GPD and GEV distribution in the asymptotic limit (see e.g., Coles, 2001, p. 83), but here it was obtained more simply without this assumption.
Computing now F 0 (u) from Eq. ( 6), i.e. putting first u=0 and then computing for x=u, substituting F 0 (u) in Eq. ( 4), and (optionally) using Eq. ( 8) we obtain: As Eq. ( 8), this last equation states that the ζ 0 reparameterization is threshold-invariant, although the probability ζ u of exceeding u obviously decreases as u increases.
Finally, a threshold-invariant GPD parameterization is obtained by substituting F 0 (x) from Eq. ( 6) into Eq.(1), and using α 0 and ζ 0 values calculated from Eqs. ( 8) and (9): Assuming x as an i.i.d.random variable, the distribution function of annual maxima G(x) is related to F (x) and the yearly return period T by the relation =1−1/T , where n=365.25 is the average number of days in a year.Thus obtaining an expression for the T -year return period quantile is straightforward: We highlight two important properties of Eq. ( 10).Firstly, it perfectly overlaps any GPD fitted on the exceedances over thresholds larger than the optimum one u * : the only minor drawback is that there can be small departures from records smaller than u * , but this does not affect extreme quantile estimations by Eq. ( 11).Secondly, the three parameters in Eq. ( 10) do not depend on the threshold used for GPD fitting, but only on the local climatic features: this property is particularly helpful to investigate the spatial pattern of rainfall signature in regional analyses.

The multiple threshold method
By virtue of the GPD properties and of the derivations presented in Sect.3, if a sample can be reasonably considered drawn from a GPD with threshold u * and shape parameters ξ , then for any other threshold u>u * we should expect threshold-invariance not only for the estimates of the shape parameter ξ , but also for the reparameterizations α 0 and ζ 0 provided by Eqs. ( 8) and ( 9).This concept is used in the development of the Multiple Threshold Method (MTM) which is based on the parameter estimates within a range of thresholds u>u * and provides robust GPD fitting regardless of the data resolution or rounding off.Concerning the choice of the optimum threshold u * we remark that it should be selected large enough to reliably consider the distribution of the exceedances closely approximated by a GPD, but low enough to keep small the estimation variance.
For the sake of clarity, we first present in Sect.4.1 the MTM with an application on a time series in our database which was recorded at 0.2 mm resolution, deferring the problems related to data discretization and MTM application on roughly rounded-off records to Sect.4.2.

MTM rationale
To show how the threshold-invariant properties of the parameterizations derived in Sect. 3 hold for rainfall time series and to convey better the MTM rationale, in Figs. 2 and 3 we present the results obtained on a 58-yr long time series recorded by a tipping-bucket rain gauge at a 0.2 mm resolution.
We first obtained the ξ and α u estimates on the excesses of a range of thresholds u by maximizing the likelihood function in Grimshaw (1993), and the ζ u estimates by Eq. ( 7).Then we used Eqs.( 8) and ( 9) to calculate the parameters α 0 and ζ 0 for each threshold u.The first three plots from the top of Fig. 2 show these estimates ξ , α 0 and ζ 0 as a function of thresholds u ranging from 0 to 20 mm.We can clearly observe a stabilization of the ξ estimates for thresholds larger than u * ≈3 mm, indicating that the tail behaviour becomes stable and thus u * can be considered an optimal threshold.A similar behaviour can be observed for the estimates of α 0 and ζ 0 which become stable for u>u * , as expected by the theoretical derivations presented in previous Sect.3. Finally, for thresholds larger than about 10 mm, we can observe all the estimates starting to visibly fluctuate, and moreover the deviations of the ξ parameter seem to be amplified in the α 0 and ζ 0 estimates.We also remark that the increasing variability of all the estimates should be expected since, despite the fact The first plot from the top displays the ξ estimates as the threshold u ranges from 0 to 20 mm: the ξ M MTM estimate is the median value (horizontal line) within the range of thresholds between 2.5 and 12.5 mm suggested for practical applications.Similarly, the second and third plots display the unconditioned α 0 and ζ 0 estimates provided by Eqs. ( 8) and ( 9) as a function of u.In the fourth plot the α M 0 MTM estimate is obtained as the median value of the reparameterized α C 0 estimates conditioned to the ξ M MTM estimate, while in the fifth plot the ζ M 0 MTM estimate is obtained by the ζ C 0 estimates conditioned to both ξ M and α M 0 MTM estimates.The sixth plot shows the sizes of the records exceeding the thresholds u.The starting point of stabilization of all estimates suggests u * ≈3 mm as an optimum threshold.
that thresholds between 10 and 20 mm may be considered modest, the corresponding number of exceedances becomes very small, as shown in the last plot of Fig. 2.
Although the rigorous assessment of the optimum threshold u * goes beyond the main scope of this paper, we performed the same analysis on the other time series that were  10) parameterized with MTM estimates (line): we can observe how the fitting can reliably capture the highest records, despite the fact that the MTM was applied with a moderate range of thresholds up to 12.5 mm.The bottom plot shows a zoom of the empirical CDF and the MTM-GPD fit with the same symbols: we can observe again a good fitting, except for very small records below the optimum threshold u * ≈3 mm detected in previous Fig. 2. recorded with a 0.2 mm discretization.The results were very similar to those presented in Fig. 2, revealing the the optimal threshold u * in our dataset is always smaller than 5 mm and generally around 3-4 mm.
Starting from these observations and from the results on roughly discretized time series presented in Fig. 4 and discussed later, the main idea of the Multiple Threshold Method (MTM) is to estimate the ξ , α 0 and ζ 0 parameters in Eq. ( 10) using a convenient statistic of the estimates obtained from a range of thresholds.As a convenient statistic, we suggest the adoption of the median value since it is quite robust to the asymmetric distribution of the estimates obtained for different thresholds on discretized samples (see e.g., Fig. 4).Concerning the range of thresholds to be adopted, we calculate the median of the estimates obtained for thresholds ranging from 2.5 to 12.5 mm: for our time series this represents a trade-off among the need to (i) have a range large enough to filter out and smooth the departures artificially driven by large roundings (as those shown in the left column of Fig. 4), (ii) hold enough exceedances in order to keep small the estimation variance, and (iii) perform almost all the estimates using thresholds u>u * .
The horizontal lines of the first three plots in Fig. 2 show preliminary MTM results, i.e. the median of the ξ , α 0 and ζ 0 estimates on a range of thresholds u from 2.5 to 12.5 mm.We can observe how the parameter estimates within the adopted range of thresholds are very close to the lines representing the MTM estimates.The departures on the left hand side indicate that the exceedances over thresholds smaller than 3 mm are not fitted by a GPD, while the departures observed for the larger thresholds are due, as already discussed, to the increasing estimation variance associated with the small number of exceedances.
Although results in the first three plots in Fig. 2 can be considered already satisfactory, we suggest to improve further the behaviour of our estimates by applying the MTM through www.hydrol-earth-syst-sci.net/14/2559/2010/ Hydrol.Earth Syst.Sci., 14, 2559-2575, 2010 the following hierarchical steps, where the final MTM estimates will be denoted as ξ M , α M 0 , and ζ M 0 and will be used to parameterize Eq. ( 10).
-Step 1: ξ M estimate.We first obtain the MTM estimate ξ M of the shape parameter as the median of the ξ estimates on the suggested range of thresholds as shown in first plot of Fig. 2.
-Step 2: α M 0 estimate.In order to filter out the variability of the α 0 estimates driven by the fluctuations of ξ we estimate again the α u values conditioned to ξ M estimate obtained at step 1 (i.e.we maximize the likelihood function with ξ =ξ M known) and use again the reparameterization in Eq. ( 8) with the new α u estimates and ξ =ξ M constant.Results from Eq. ( 8) are now denoted as α C 0 to remark that they are conditioned to ξ M and are shown in the fourth plot of Fig. 2: comparing with the second plot of the same figure we can observe a minor dispersion of the new α C 0 estimates.Finally, the MTM estimate α M 0 of the scale parameter is the median of the new α C 0 estimates within the range of thresholds.
-Step 3: ζ M 0 estimate.In a similar way we can reduce the variability of ζ 0 by introducing the ζ u estimates provided by Eq. ( 7) together with the MTM estimates ξ M and α M 0 (obtained at step 1 and 2) into Eq.( 9).Results from Eq. ( 9) are now denoted as ζ C 0 to remark again that they are conditioned to ξ M and α M 0 and are shown in the fifth plot of Fig. 2 which again displays a reduction of variability with respect to the unconditioned estimates in the third plot of the same figure.Finally, the MTM estimate ζ M 0 is the median of the new ζ C 0 estimates within the range of thresholds.
The described procedure provides the MTM estimates ξ M =0.15, α M 0 =4.95 mm, and ζ M 0 =0.20 that are used to parameterize Eq. ( 10) for the analyzed time series.Figure 3 (top) shows the excellent fitting of Eq. ( 10) to our sample from moderate to the highest rainfall values, while Fig. 3 (bottom) provides a zoom of the empirical CDF to show departures from very small rainfall values, consistently with results of parameter estimates presented in previous Fig. 2.However, adopting the optimal threshold u * ≈3 mm, with the exception of the records x ∈ (0,u * ), Eq. ( 10) allows modelling in a simple way (i.e. with only three thresholdinvariant parameters) the whole rainfall marginal distribution and gives a very good representation of the higher records providing a reliable insight on the extreme behaviour.

MTM on roughly rounded-off records
We want now to discuss the MTM application on time series with significant percentages of records rounded off at large discretizations.Deidda (2007) analyzed the database described in Sect. 2 and found that many daily rainfall time series collected by non-recording rain gauges contain anomalous percentages of records discretized at multiples of 0.1, 0.2, 0.5, 1 and 5 mm.Columns from left to right in Fig. 4 show the results of the MTM on three of these time series with different discretizations and shape parameter values: the first time series contains more than 30% of records anomalously discretized at multiples of 5 mm and is characterized by ξ ≈0; the second one has about 70% of records discretized at 1 mm and ξ ≈0.25; the third one counts about 35% of values at 1 mm resolution and ξ ≈0.35.
As in Fig. 2, the first three rows of subplots in Fig. 4 show the ξ , α 0 , and ζ 0 estimates as a function of the threshold u.If we compare these results with those presented in Fig. 2 we can observe an increased dispersion and a wide spread of all the estimates, and we can also observe the repetition of some patterns at multiple intervals of the discretizations of the records.The fourth and fifth rows show the conditioned estimates α C 0 and ζ C 0 : we can observe a stabilization of these estimates, although the signatures of roundings are still present.As previously described, the MTM estimates ξ M , α M 0 , and ζ M 0 are obtained as the median of ξ , α C 0 , and ζ C 0 values (displayed in the first, fourth and fifth rows of subplots in Fig. 4) within the range of thresholds between 2.5 and 12.5 mm.
Analyzing the results of Fig. 4, it should now be clearer the rationale of our suggestion to apply the MTM in a range of thresholds between 2.5 and 12.5 mm.Indeed, since we often observed an anomalous percentage of roundings with 5 mm resolution, the adopted range corresponds to joining two intervals of thresholds of 5 mm in size and centered on 5 mm and 10 mm, where we observe the jumps of the estimates.At the same time applying the median operator to the estimates on the proposed range of thresholds should guarantee that the MTM estimates are not affected by errors due to an imprecise location of the optimal threshold u * .Indeed, we can also notice how determining the optimal threshold u * by looking for the starting point of constant parameter estimates is quite difficult, if not impossible here.
Finally the last row of Fig. 4 compares the empirical survival functions of the three time series with Eq. ( 10) parameterized by the MTM estimates ξ M , α M 0 , and ζ M 0 .As already noticed for Fig. 3 we can observe again the good performances of the MTM in capturing the tail of the empirical distributions, despite the roundings.Thus, regardless of the exponential or heavy tailed shape behaviour, the proposed approach is robust also when fitting time series with significant percentages of roughly rounded-off and heavily quantized records.
In order to evaluate the performances on synthetic samples that can be considered representative of our daily rainfall records, we preliminarily evaluated the GPD parameters on the longest time series belonging to the dataset described in Sect.2: namely, 217 time series with more than 40 complete years of records.With this aim, the MTM presented in Sect. 4 was first applied on these time series with a range of thresholds between 2.5 and 12.5 mm and using three different GPD parameters estimators: the Simple Moments (SM), the Probability Weighted Moments (PWM), and the Maximum Likelihood (ML) methods based on the expression reported in Hosking and Wallis (1987), Stedinger et al. (1993), andGrimshaw (1993).The MTM estimates of ξ M and α M 0 parameters obtained for each station using the three estimators are shown in the scatterplot of Fig. 5.We can observe how the ξ M estimates derived from the SM method are never larger than 0.35: this can be explained by the bias of the estimator related to the divergence of ordinary moments on heavy tailed distributions (Hosking and Wallis, 1987), thus we discarded the SM approach for our analysis.We can also observe that the ξ M estimates by ML are slightly more spread than the PWM ones.We investigated the issue to some detail and the largest ML estimates should be due to the higher sensitivity of the ML method to the presence of outliers or to convergence problems as argued by Hosking and Wallis (1987).We also visually inspected the CDFs of the few time series with a negative shape parameter and found that they can be reliably described by exponential distributions (ξ =0).
On the basis of this preliminary analysis, we decided to explore the MTM performances with the ML estimator on Monte Carlo samples generated by Eq. ( 10) with the following 7 couples (ξ , α 0 ) of GPD parameters (displayed in Fig. 5 with square symbols): (0, 9), (0, 12), (0.2, 6), (0.2, 9), (0.2, 12), (0.4, 6), (0.4, 9).90% of the MTM ζ M 0 estimates resulted in a range between 0.15 and 0.25 with a median value very close to 0.20, while the lengths of the considered time series range between 40 and 60 yr.Thus, for the sake of simplicity, we decided to generate all synthetic daily rainfall time series by Eq. ( 10) using only the value ζ 0 =0.20 and a length of 50 yr, since choosing different values has the only effect to slightly change the number of strictly positive records.
To evaluate the MTM performances on records with different discretizations we considered the following groups of tests: -Test A: all records are discretized with a 0.2 mm resolution.This corresponds to the standard resolution of most tipping-bucket rain gauges in Europe.
-Test B: all records are discretized with a 1 mm resolution, as most time series in our database contain large amounts of records discretized at multiples of 1 mm.Fig. 5.The scatterplot displays the couples of (ξ M , α M 0 ) MTM estimates of GPD parameters for the 217 daily rainfall time series (which are more than 40-yr long) collected by the Sardinian Hydrological Survey (Italy).Parameters estimates were obtained by applying the MTM within a range of thresholds between 2.5 and 12.5 mm: plus signs, circles, and diamonds refer to estimates based on maximum likelihood (ML), probability weighted moments (PWM) and simple moments (SM), respectively.Finally the seven couples of GPD parameters used in Sect. 5 to explore the performances of the MTM on Monte Carlo samples are drawn with square symbols.
-Test C: 30% of records are discretized with a 5 mm resolution, 40% are discretized with a 1 mm resolution, while the remaining 30% are discretized at 0.2 mm.This is the case of a large number of time series in which we detected a mixture of discretizations up to 5 mm.
In summary, we generated 5000 samples of 50-yr synthetic daily rainfall time series from Eq. ( 10) with a probability of rainfall ζ 0 =0.20 and (ξ , α 0 ) parameters taking the values of the 7 couples reported above.Each sample was then discretized according to the three group of tests.
On each sample we estimated the ξ , α 0 , and ζ 0 parameters with two different approaches.In the first approach the ξ and α 0 values are simply estimated on all strictly positive records, thus adopting a single threshold u=0, while ζ 0 is estimated as the ratio between the number of all strictly positive records and the sample size: this will be referred to as "standard fit".In the second approach the ξ , α 0 , and ζ 0 parameters are provided by the MTM on a range of thresholds between 2.5 and 12.5 mm as described in Sect.4: this approach will be referred to as "MTM fit".Finally, parameterizing Eq. ( 11) with ξ , α 0 , and ζ 0 parameters obtained by the two fitting approaches we estimated also the 50-yr return period quantile x 50 from each sample.In both approaches estimates are always obtained by maximizing the likelihood function.Examples of MTM application on 50-yr synthetic time series generated by Eq. ( 10) with parameters ξ =0.2, α 0 =9 mm, and ζ 0 =0.2 are shown in Fig. 6: each column reports the results for a sample extracted from one the groups of tests A, B, and C. As in previous Figs. 2 and 4, the first three rows of subplots show the unconditioned estimates of ξ , α 0 , and ζ 0 as a function of the threshold, while the fourth and fifth rows show the reduction of the spread for conditioned estimates α C 0 , and ζ C 0 , but again the signature of the roundings is still visible.Comparing these results with those in the previous Fig. 4, we can observe a strong similarity with the patterns obtained for historical daily rainfall time series.Moreover, in the first column of subplots of Fig. 6 (time series discretized with 0.2 mm resolution) we can again observe the increasing dispersion of the unconditioned estimates of ξ , α 0 , and ζ 0 for thresholds larger than the MTM range.As already discussed for Fig. 2, this dispersion can be related to the increasing estimation variance as the number of excesses decreases.It is worth noticing on the second and third columns of subplots of Fig. 6 how the increasing dispersion is hidden by the effects of roundings.Finally, the last row of subplots in Fig. 6 shows a comparison between the empirical survival functions and Eq. ( 10) parameterized with ξ M , α M 0 , and ζ M 0 MTM estimates: again we can visually appreciate the good results of the proposed approach and the reliable fitting to the highest quantiles.
Figure 7 shows the relative frequency distributions of ξ , α 0 , ζ 0 , and x 50 estimates provided by the standard fit (left column) and the MTM fit (right column) on 5000 Monte Carlo samples discretized according to tests A, B, and C. The vertical lines in each subplot show the parameter values used for generations (ξ =0.2, α 0 =9 mm, and ζ 0 =0.2) and the expected 50-yr return period quantile x 50 =187 mm.A visual analysis of the subplots in the left column of Fig. 7 gives us  Relative frequency distributions of ξ , α 0 and ζ 0 and corresponding 50-yr quantiles on 5000 GPD random samples discretized according to test A (resolution =0.2 mm, dotted lines), B ( =1 mm, dashed lines), and C (mixing of resolutions up to =5 mm, solid lines).Results from the standard fit method with a single threshold u=0 (all strictly positive records are used) and from the MTM applied in a range of thresholds between 2.5 mm and 12.5 mm are shown in the left and right column, respectively.From top to bottom, the plots display results for ξ , α, ζ 0 and 50-yr quantile estimates.Vertical thick solid lines show the parameters of Eq. ( 10) used for Monte Carlo simulations and the expected quantile.
Table 1.Bias (top part of the table) and RMSE (bottom part) of ML ξ estimates obtained by the standard fit with a single threshold u=0 and the MTM fit on a range of thresholds between 2.5 to 12.5 mm.Parameters are estimated from synthetic samples generated by Eq. ( 10) with different couples of shape ξ and scale α 0 GPD parameters and ζ 0 =0.2.Each sample is 50-yr long and is discretized according to test A (0.2 mm resolution), B (1 mm resolution), and C (mixing of resolutions up to 5 mm).For test A, results for the standard fit with a single threshold u=5 mm are also presented.a clear picture of the bias affecting the standard fit estimates: the larger the discretization, the higher the bias.On the other hand, looking at the corresponding subplots in the right column we can observe how the MTM is not affected by these bias problems: the only visible drawback is a slight increase of the estimation variance related to the lower number of exceedances used for MTM estimations.
Figures presented and discussed till now give us a qualitative but quite clear idea of MTM supremacy on the standard fit.Nevertheless, in order to provide an objective evaluation of the MTM performances, we evaluated bias and RMSE of the two fitting approaches for each group of rounding-off tests and GPD parameters: bias where θ is an estimator (provided by the standard or the MTM approach) of the parameter θ.In our case the θ parameter can be ξ , α 0 , ζ 0 , or the 50-yr return period quantile x 50 .For each parameter, results in term of bias and RMSE are presented in Tables 1, 2, 3, and 4, respectively.We do not show results in terms of estimation variance, since it can be easily obtained as var( θ )=RMSE( θ ) 2 −bias( θ ) 2 .But we would like to highlight that the estimation variance of the standard fit (on all strictly positive records) is expected to be lower than the one of the MTM fit, since var( θ ) of ML estimators is asymptotically inversely proportional to the sample size: as shown in the sixth row of subplots in Fig. 6, the number of exceedances of the MTM range of thresholds varies between about 75% and 25% of all strictly positive records.
An overall look at the tables clearly reveals how performances can drastically change depending on the resolution of the sample (test A, B, and C) and also on the shape and scale parameter values.However some general behaviours can be identified.
The top part of each table shows the bias for each parameter: we can observe a clear supremacy of the MTM against the standard fit for all the considered discretizations and for all the couples (ξ , α 0 ) of GPD parameters.The qualitative conclusions from Fig. 7 are objectively confirmed here also for the other couples of GDP parameters: the MTM is able to correct most of the bias affecting the standard fit.
The bottom part of the tables reports the evaluation of performances in term of RMSE.We can still observe a clear supremacy of the MTM for samples discretized according to test B and C. For test B, where samples are rounded off at a 1 mm resolution, the RMSE for the standard fit is about 2-3 times larger than the one for the MTM fit, while for test C (roundings up to 5 mm) the ratio of RMSEs of the two fitting approaches increases to about 3-4.Thus there is no doubt about the advantage of applying the MTM approach on samples with records rounded off at 1 mm or higher resolutions.Moreover, it is worthwhile noticing that in the case of test C, where 30% of records were rounded off at a 5 mm resolution, the RMSE of the standard fit assumes unacceptable values if compared to the range of GPD parameters estimated on our database (Fig. 5): e.g. the RMSE(ξ ) is of order 0.1 while the range of estimates in our database is between 0 and 0.4, similar arguments hold also for α 0 since RMSE(α 0 )≈2 mm while α 0 estimates range from 6 to 12 mm.Now, let us draw some considerations on the RMSE for test A, where all records were discretized at a resolution 0.2 mm.Although this is the standard resolution of most tipping-bucket rain gauges in Europe, results for this test can be considered representative also for many tipping-bucket rain gauges in the US where the standard resolution is often 0.254 mm, very close to our choice.Looking through the tables, we can observe for test A that the RMSE for all considered parameters is slightly better for the standard fit (with a single threshold u=0) rather than for the MTM one: as already discussed this result is obviously expected since the standard fit is performed on all strictly positive records while the MTM is based on estimates coming from left-censored samples with size ranging from about 25% to 75% of all strictly positive data (see e.g. the plots in the sixth row of Fig. 6), thus the slight worsening of the MTM RMSE with respect to the standard fit cannot be attributed to a deficiency of the MTM itself, but to the increasing estimation variance as the the sample size decreases.In order to support this argument, for test A we also report in all the tables the results of the standard fit with a single threshold u=5 mm, which appears to be reasonable for our time series since we found an optimum threshold around u * ≈3−4 mm for daily rainfall data in our database: we can observe how the RMSE for the standard fit with u=5 mm becomes the same as the one for the MTM fit, despite the fact that the latter is penalized by the higher estimation variance due to the smaller number of exceedances.selecting only those time series that were discretized at a 0.2 mm resolution, we found a stabilization of the shape parameter for thresholds larger than 3-4 mm.Thus, we assumed an optimum threshold located around these values.Nevertheless, should other values be more reliable in other regions, the methods proposed here can be applied anyhow on revised ranges of thresholds.
The GPD is usually fitted on rainfall time series (or other hydrological variables) through the following steps: (i) identification of a single optimum threshold u * for each time series with any numerical or graphical method; (ii) estimation of the probability of threshold excesses, e.g.counting the number of exceedances; (iii) inference of GPD shape and scale parameters on the exceedances over the selected optimum threshold with any parameter estimator.In order to represent and reproduce a time series, or to estimate extreme quantiles, four parameters for each site should be determined: the shape ξ and scale α u * parameters of the fitted GPD, the threshold u * and the probability ζ u * to observe exceedances over the threshold.Nevertheless, α u * and ζ u * estimates are not the best indicators of climatological spatial patterns, because of their dependence on the threshold u * .
In this paper we provided equations to eliminate this dependence of parameters on the threshold and to describe the rainfall distribution with the simple representation in Eq. ( 10), where only the three threshold-invariant parameters ξ , α 0 , and ζ 0 are used.In such a way, even if we are analyzing different stations where we observe a good fitting of exceedances distributions with different values of the thresholds, once the fitting has been completed we can forget the threshold values, and use Eq. ( 10) without any explicit parameter dependence on the thresholds.This is a desired property for regional analyses since the three parameters ξ , α 0 , and ζ 0 reflect only the climatic signature.
Using this threshold-invariance property of the ξ , α 0 , and ζ 0 parameters we developed the Multiple Threshold Method (MTM) which provides the three estimates as the median values of reparameterizations over a proper range of thresholds.We have also shown how the MTM is particularly able to filter out the deviations from thresholdinvariance which are artificially driven by the presence of roughly rounded-off records.Indeed, despite the fact that ξ , α 0 , and ζ 0 reparameterizations are expected to be constant for any threshold larger than the optimum one, the presence of rounded-off records leads to fluctuations around the expected values, but the median operator is robust even in case of asymmetric fluctuations of the estimates, as found on historical time series with roughly rounded-off records and on synthetic samples discretized at different resolutions.The range of thresholds adopted here for the MTM is between 2.5 and 12.5 mm: in our opinion this is the best trade-off between the need to have a range wide enough to filter out fluctuations artificially driven by the roundings, but also small enough to have an acceptable estimation variance, and finally we are quite confident that most of the thresholds are larger than the optimum one, at least in the database analyzed here.However, as already remarked, if there is evidence of different optimum threshold values in other regions, the MTM range of thresholds can be consequently revised.
The Monte Carlo method was systematically applied to evaluate and compare the performances of the MTM against the single-threshold standard fit in terms of bias and RMSE, considering different discretizations, as well as different shape and scale GPD parameters.Results of our analysis clearly prove the supremacy of the MTM with respect to the standard fit in case of roughly rounded-off records, while in the test devised for records discretized at a 0.2 mm resolution (like most EU tipping-bucket rain gauges) the RMSE for the MTM resulted about the same as the standard fit with a single threshold around u=5 mm, but MTM has the smallest bias.Thus in conclusion the MTM always performs better than the standard single-threshold fit regardless of the record discretizations.Moreover, we strongly recommend the MTM also because the results provided by the median operator over a wide range of thresholds should not be affected by small errors in the location of the optimum threshold, as conversely happens for the single-threshold standard fit.
All the analyses were performed with the ML, nevertheless the MTM can be applied anyhow on the estimates provided by any other estimator.

Fig. 2 .
Fig. 2.Example of MTM application on a daily rainfall time series collected by a tipping-bucket rain gauge with a 0.2 mm resolution.The first plot from the top displays the ξ estimates as the threshold u ranges from 0 to 20 mm: the ξ M MTM estimate is the median value (horizontal line) within the range of thresholds between 2.5 and 12.5 mm suggested for practical applications.Similarly, the second and third plots display the unconditioned α 0 and ζ 0 estimates provided by Eqs.(8) and (9) as a function of u.In the fourth plot the α M 0 MTM estimate is obtained as the median value of the reparameterized α C 0 estimates conditioned to the ξ M MTM estimate, while in the fifth plot the ζ M 0 MTM estimate is obtained by the ζ C 0 estimates conditioned to both ξ M and α M 0 MTM estimates.The sixth plot shows the sizes of the records exceeding the thresholds u.The starting point of stabilization of all estimates suggests u * ≈3 mm as an optimum threshold.

Fig. 3 .
Fig.3.These figures display the good GPD fitting obtained by MTM application showed in Fig.2.The top plot shows the empirical survival function (circles) and Eq.(10) parameterized with MTM estimates (line): we can observe how the fitting can reliably capture the highest records, despite the fact that the MTM was applied with a moderate range of thresholds up to 12.5 mm.The bottom plot shows a zoom of the empirical CDF and the MTM-GPD fit with the same symbols: we can observe again a good fitting, except for very small records below the optimum threshold u * ≈3 mm detected in previous Fig.2.

Fig. 4 .
Fig. 4.Other examples of application of the MTM estimator as in Fig.2, but here columns show the results on time series containing anomalous percentages of roughly rounded-off records.The selection of the time series was made to give examples of MTM working with different roundings (the left column shows results for a sample with many records discretized at 5 mm, the other ones for samples containing many roundings at 1 mm) and different values of the shape parameter (from left to right ξ ≈0, 0.25, 0.35).Again, comparing the fourth row of subplots against the second one, and the fifth one against the third one we can observe the benefit of hierarchical MTM application.Finally the last row compares the empirical survival function (circles) with Eq. (10) parameterized with ξ M , α M 0 and ζ M 0 MTM estimates (line) for each time series.

Fig. 6 .
Fig.6.Same as Fig.4, but here the MTM is applied on three synthetic samples generated by Eq. (10) and discretized according to the rounding rules of test A (0.2 mm resolution), B (1 mm resolution), and C (mixing of resolutions up to 5 mm): results of each test are shown in columns from left to right, respectively.The sixth row of subplots now shows the percentage of records exceeding each threshold u with respect to the number of strictly positive records.
of x T estimates − MTM on threshold range 2.5 ÷ 12.

Fig
Fig. 7.Relative frequency distributions of ξ , α 0 and ζ 0 and corresponding 50-yr quantiles on 5000 GPD random samples discretized according to test A (resolution =0.2 mm, dotted lines), B ( =1 mm, dashed lines), and C (mixing of resolutions up to =5 mm, solid lines).Results from the standard fit method with a single threshold u=0 (all strictly positive records are used) and from the MTM applied in a range of thresholds between 2.5 mm and 12.5 mm are shown in the left and right column, respectively.From top to bottom, the plots display results for ξ , α, ζ 0 and 50-yr quantile estimates.Vertical thick solid lines show the parameters of Eq. (10) used for Monte Carlo simulations and the expected quantile.

Table 2 .
Same as Table 1 but for α 0 estimates.