Effects of disregarding seasonality on the distribution of hydrological extremes

This paper deals with the seasonality of hydroclimatic extremes and with the problem of accounting for their non-homogeneous character in determining the design value. To this aim we devise a simple stochastic experiment in which extremes are produced by a non-homogeneous extreme value generation process. The design values are estimated in closed analytical form both in a peak over threshold framework and by using the standard annual maxima approach. In this completely controlled world of generated hydrological extremes, a statistical measure of the error associated to the adoption of a homogeneous model is introduced. The sensitivity of this measure, named return period ratio, to the typology and strength of seasonality is investigated. We find that neglecting seasonality induces a downward bias in design value estimators. The magnitude of the bias may be large when the peak over threshold approach is adopted, while the return period distortion is limited when the annual maxima are considered. An application to rainfall data from a 30 000 km2 region located in North-Western Italy is presented to better clarify the effects of disregarding seasonality in a real case.


Introduction
Hydrological variables, like precipitation depth and river discharge, often exhibit marked periodic variability at the annual time scale (Black and Werritty, 1996;Sivapalan et al., 2005;Viglione et al., 2010;Villarini et al., 2011).As a consequence, also the corresponding extreme values, i.e. the precipitation or discharge values associated to relevant and unfrequent events, will likely be dependent on the Correspondence to: P. Allamano (paola.allamano@polito.it)non-homogeneity of their respective date of occurrence (e.g.Maraun et al., 2009).The seasonal variability of extreme events represents an issue for the standard applications of the frequency analysis, because the presence of seasonality can make some commonly used assumptions inappropriate.In particular, one should relax either the assumption of a standard probability distribution to represent the data (e.g.Gumbel or GEV) or the assumption that all available data are sampled from the same distribution (see Allamano et al., 2011b, p.3684-5).The manner how these date-dependent sample values should be treated represents an open problem in statistical hydrology, tackled in different ways.
The standard procedures for design value estimation are based on the analysis of series of annual maxima (AM).Estimation of the distribution of annual maxima from data with seasonal variability has received attention in the literature.Attempts to obtain the annual maximum distribution by fitting different distributions to the maxima in separate seasons -and to derive the expressions for the bias and variance of the resulting annual quantile estimators -were done by Lamberti and Pilati (1985) and Buishand and Demaré (1990).The basic idea is to have, in each season, random variables that satisfy the "identically distributed" hypothesis, which is essential in standard statistical inference.The use of monthly maxima has also been advocated (e.g.Carter and Challenor, 1981;Ettrick et al., 1987;Rust et al., 2009) because the hypothesis of identically distributed random variables in classical extreme-value theory is better satisfied for monthly than for annual maxima.
The problem with these approaches is that the number of parameters to be estimated increases, while the amount of information useful to describe the upper tail of the distribution increases at a lower rate, thus inflating the estimation uncertainty of design values.In addition, the application of these methods requires non-standard data (maxima in separate seasons) which may not be available for long time periods.P. Allamano et al.: Effects of disregarding seasonality on the distribution of hydrological extremes Maxima in separate seasons are, in fact, usually extracted from the continuous time measurements, which are systematically collected since the beginning of the digital era.
If a continuous time series is available, a valid alternative is offered by the peak over threshold (POT) or partial duration series approach.This approach considers several over-threshold values instead of a unique extreme value per year (Madsen et al., 1997;Lang et al., 1999;Claps and Laio, 2003).The strongest motivation for adopting the POT approach is that over-threshold values constitute precious additional information about the upper tail of the distribution, which is very important in hydrological applications (Revfeim, 1991;Katz et al., 2002).The POT approach naturally lends itself to being used with a rate of occurrence of the exceedances (λ) and an exceedances distribution having an annual cycle (Todorovic and Zelenhasic, 1970;Rasmussen and Rosbjerg, 1991).However, there are numerous applications of the POT approach which do not consider timedependent parameters (e.g.Madsen et al., 1997;Claps and Laio, 2003;Bacova-Mitkova and Onderka, 2010;Fruh et al., 2010).
The aim of this paper is to provide indications about the effects of neglecting seasonality within standard POT or AM approaches.A very simple and controllable framework (which we call "toy-model" in Sect.2) is adopted, suitable for sharpening the questions regarding the possible effects of seasonality on design event estimation.The scope of our contribution, hence, is not to propose a new model for use in POT or AM analyses, but to analyze the magnitude of under-(or over-) estimation of design events in the presence of seasonality.In other words, our aim is to address the question "what is the impact of neglecting seasonality on design event estimation?".

A "toy-model" for understanding seasonality effects on the distributions of extremes
A simplified analytical model of the distribution of hydroclimatic extremes is devised, in which hydroclimatic variables are assumed to follow a non-homogeneous Poisson process of event arrivals in time with rate λ(t).The intensity x of each event is distributed according to an exponential function where the mean α(t) of the distribution is dependent on time.Despite its simplicity, the model is widely adopted in hydrology, as demonstrated by the numerous applications that can be found in the literature (e.g.Eagleson, 1978;Rodriguez Iturbe et al., 1987;Rasmussen and Rosbjerg, 1989;Laio et al., 2001).Observe that the exponential is merely used as an example (simple and commonly used) distribution.To mimic the effects of seasonality, a sinusoidal (timedependent) variation is imposed on the parameters of the  2).The temporal shift δ between the seasonal peaks is indicated, as well as their amplitudes, a α and a λ .In this case n = 2.
Poisson process.The sine assumption is taken as a compromise between the need to provide a realistic representation of a periodic time series and the quest for simplicity that drives our research.We write α(t) and λ(t) as where α 0 and λ 0 are the average intensity and rate values; t is time in days (t ∈ [0; 365]); n is the number of cycles per year; h(t) is the non-dimensional α regime and k(t) the nondimensional λ regime; a α and a λ are, respectively, the amplitudes of the sinusoids h(t) and k(t); and δ is the temporal shift between the two (i.e. if the sinusoids are in phase events are more intense when they are more frequent).Note that, to constrain h(t) and k(t) above zero, a α and a λ can vary in the range (0, 1).Observe also that the initial time t = 0 is taken at any point where α(t) = α 0 and dα(t)/dt > 0. An example for the h(t) and k(t) curves is reported in Fig. 1.The distribution in Eq. ( 1) is the conditioned distribution F (x|t) of the event intensities x on the date of occurrence t.The marginal distribution of the exceedances F (x) can then be obtained by applying the Law of Total Probability (or Bayes theorem): An analytical solution to the integral in Eq. ( 3) is not available.To overcome this problem we substitute F (x|t) with ] (see the Appendix for details).This guarantees the analytical tractability of Eq. ( 3).Of course F (x|t) and F (x|t)) behave similarly, however it is not essential to have a precise correspondence between the two functions.In reality, in fact, the average intensities are Hydrol.Earth Syst.Sci., 15,[3207][3208][3209][3210][3211][3212][3213][3214][3215]2011 www.hydrol-earth-syst-sci.net/15/3207/2011/ not strictly sinusoidal and (in a toy-model) could be represented also by an inverse sinusoidal function, α (t).Our initial guess to start with a standard sinusoidal form for the event intensities serves the purpose of making our results more easy to interpret.Setting F (x|t) for F (x|t) in Eq. ( 3) leads to an expression of the distribution of exceedances in closed analytical form: x cos(δ) where [•] is the modified Bessel function of the first kind (Abramowitz and Stegun, 1965, ch. 9) and k(t) is as defined in Eq. ( 2).An important property of F (x) is that its expression does not change by changing the oscillation frequency (parameter n in Eq. ( 2)).Observe also that the mean of F (x) can be obtained analytically as the first moment of the distribution in Eq. ( 4) Moreover, the distribution in Eq. ( 4) tends to an exponential distribution (also called "base POT distribution") in the homogeneous case, i.e. when a α = a λ = 0.
The distribution of exceedances Eq. ( 4) can then be transposed into the correspondent annual maximum distribution.In fact, despite the non-homogeneity of the process, the standard relation F (AM) (x) = e −λ 0 (1−F (x)) (relating the distribution of exceedances to the distribution of annual maxima, F (AM) (x)) still holds (see Allamano et al., 2011a.On this premise, the following expression for the distribution of annual maxima is obtained in which α 0 plays the role of the scale parameter.Observe that the distribution of maxima becomes a Gumbel distribution (also called "base distribution") when a α = a λ = 0 in Eq. ( 2).Observe also that for Eq. ( 6) the mean and standard deviation of the distribution are not analytical.

Model application
Ignoring the possible effects of seasonality when adapting a distribution to a set of data is a rather common simplification, especially when few observations are available.In the following we explore the consequences of such a simplification, as they could affect the return period estimation both for the distribution of exceedances and the distribution of maxima.
To quantify the error associated to the selection of the base (non-seasonal) distribution, we consider the T -year event with the non-seasonal distribution and compute the corresponding return period, T * , with the (correct) seasonal distribution.The return period ratio provides an indication of the magnitude of the underestimation (overestimation) in design values related to neglecting seasonality.In particular, if R T > 1 the adoption of the base distribution corresponds to an underestimation of the real design value.

Peak over threshold
Consider process exceedances to follow the distribution in Eq. ( 4), having mean µ x as defined by Eq. ( 5).As anticipated, in the homogeneous case a α = a λ = 0, and the distribution of exceedances becomes an exponential distribution, also called "base distribution".The design event x (POT) T with the base distribution is determined as Using the seasonal distribution Eq. ( 4) one finds that the same event x (POT) T has a return period T * , obtained by setting Eq. ( 8) for x in Eq. ( 4), and considering that λ 0 T * = 1/(1 − F (x (POT) T )) (see e.g.Claps and Laio, 2003).Given the simplicity of the model the return period ratio is obtained in analytical form, and reads Note that Eq. ( 9) is independent of the mean intensity α 0 because α 0 is a scale parameter for both the base and the seasonal distribution.
The sensitivity of the return period ratio to the parameters of Eq. ( 4) is explored in Fig. 2 (gray-shaded contour areas).The results are obtained by varying two parameters at a time and by fixing the other parameters.In particular, in Fig. 2, λ 0 and a α are assumed to vary.In each point of the plane the intensity of the grey-shade is proportional to the value of the return period ratio, with large values of the return period ratio R (POT) T corresponding to events R (POT) T -times more frequent in the "seasonal reality" than if the exponential distribution was adopted.We find a strong sensitivity of the return period ratio to large values of a α and λ 0 .The design event dependence on this latter parameter (λ 0 ) is formalized by Eq. ( 8).From this equation it can be inferred that considering larger λ 0 values correspond to move further towards the right tail of the distribution, thus exacerbating the differences between the base and seasonal distribution.
The sensitivity of R (POT) T to the variation of the two other parameters T and δ is shown, instead, in Fig. 3: it emerges that R (POT) T significantly increases with both the return period T and the phase shift δ.The reason behind the dependence of R (POT) T on δ is to be found in the event generation mechanism.In fact, when the two regimes are in phase we have a very high probability of picking all the exceedances in the same season, with the consequence that they will be almost identically and exponentially distributed.Conversely, when the two regimes are out of phase, we will have a wellmixed distribution of the events deriving by the coexistence of one season characterized by few very intense events with another season with a lot of small events.In this case, the non-exponentiality of the distribution will be very marked, with consequent high R T values.From Fig. 3 it can be also deduced that R (POT) T assumes slightly lower values when the regimes are in phase.
One could argue that the effects of seasonality would be recognized if the data were closely scrutinized.This is verified by applying the Anderson-Darling goodness of fit test for exponentiality to the samples generated by the seasonal model.The Anderson-Darling test is especially well suited to recognize deviations in the tails of the hypothetical and operational distributions, and its failure to recognize the difference between the two distributions is precisely a demonstration that there might be a serious problem of model selection in the presence of seasonality (see Laio, 2004).In details, 1000 samples of length N , generated from the distribution Eq. ( 4) under different parameterizations, are tested against the hypothesis of exponential distribution.In Fig. 2, black labelled contour lines show the percentage of cases recognized as non-exponential by the Anderson-Darling test (at a 5 % level) for the case of N = 20 years (which entails a sample length of N = λ 0 N ).It is observed that, for small a α values, the percentages in the graph are very close to the significance level of the test, meaning that the distribution in this region is substantially undistinguishable from an exponential distribution.The percentage of cases recognized as non-exponential increases up to 50 % for larger values of a α (i.e. for very peaky seasonal regimes) and λ 0 .
By comparing the two sets of curves (gray-shaded areas and contours) we observe that the pattern of the R (POT) T values and of the test performances (as functions of λ 0 and a α ) are rather similar: low efficiencies of the Anderson-Darling test emerge in correspondence of low R (POT) T values, whereas better performances are found where R (POT) T is high.However, it is clear that the power of the test remains rather low even when the R (POT) T values are large, with a 70 % probability of not being able to recognize a 5-fold increase in return period, with possible serious detrimental consequences in design-value estimation.

Annual maxima
Suppose that a sample of annual maxima follows the distribution function in Eq. ( 6), which is described by the five parameters α 0 , λ 0 , a α , a λ and δ.The parameters α 0 and λ 0 are estimated in each station as the average (over-threshold) rainfall intensity and rainfall rate.a α and a λ are estimated as functions of the coefficients of variation (CV) of α(t) and λ(t).The reasoning to find the relation between CV λ and a λ is described hereinafter.Similarly one can find the relation between CV α and a α .
Under the hypothesis of poissonianity, the cumulative distribution of λ, F (λ), can be obtained as a derived distribution of the times of occurrence, provided that the relation λ(t) is monotonic.To meet this condition, we restrict the domain of λ(t) to the interval [π/2n; 3π/2n] (which should be multiplied by 365/2π to be expressed in days).The distribution of λ is unaffected by this assumption, due to the periodicity of the λ(t) function.On this domain, the cumulative distribution of times F T (t) is uniform, Eq. ( 2) relates λ to t, and the cumulative distribution function of λ reads values are then obtained from Eq. ( 9) for each station, as shown in Fig. 6.Rather large values are found, varies between 1 and 1.45 (not shown), demonstrating that in this case the errors induced by neglecting seasonality are not extremely relevant.

Discussion and conclusions
The effects of disregarding seasonality of extremes when evaluating the return period of a design event are investigated.Two very simple examples are illustrated: (1) the case of using an exponential distribution to describe POT values derived from a non-homogeneous Poisson process (Sect.3.1); and (2) the case of using a Gumbel distribution for AM values extracted from a seasonal (i.e. with time dependent parameters) distribution (Sect.3.2).The entity of the return period overestimation (i.e.apparently less frequent extreme events) when seasonality is not accounted for is quantified by a coefficient R T .It is found that, when seasonality is ignored and threshold exceedances (POT) are analyzed, the return period of an event can be significantly overestimated.Neglecting seasonality therefore corresponds to significantly underestimating the design values; moreover, standard goodness-of-fit techniques are not very efficient at recognizing the unsuitability of the non-seasonal distribution.
Resorting to peak over threshold analysis is, nevertheless, an advantageous technique, especially in the presence of few years of observation, as acknowledged by numerous works (see e.g.Wilks, 1993;Lang et al., 1999).This allows, in fact, to capture more accurately the upper tail of the distribution, P. Allamano et al.: Effects of disregarding seasonality on the distribution of hydrological extremes provide an analytical result, but one can observe that the integral on the right hand side of Eq. (A2) can be seen as a weighted average of the distance between the curves, where the weighting factor is w(t) = e −2x/α(t) .It is clear that this weighting factor is maximum where α(t) is largest, i.e. in t = 365 (π/2n + 2π/n)/2π and minimum in t = 365 (3π/2n + 2π/n)/2π.The ratio between the minimum and maximum weighting factors reads r = e and rapidly converges to zero for large values of x/α 0 , which are those of interest in this paper (see Allamano et al., 2011b, p.3684-5 for discussion).The differences between F (x|t) and F (x|t) are therefore negligible when α(t) is small, and it is sensible to determine α 0 and a 0 when α(t) attains its maximum, i.e. in t .By setting t for t in F (x|t) and F (x|t) and equating the two expressions one easily obtains α 0 = 1/α 0 and a α = a α /(1 + a α ).

Fig. 1 .
Fig. 1.Example of the h(t) (solid) and k(t) (dashed) curves, see Eq. (2).The temporal shift δ between the seasonal peaks is indicated, as well as their amplitudes, a α and a λ .In this case n = 2.

Fig. 2 .
Fig. 2. Sensitivity of the return period R (POT) T to variations of λ 0 and a α , with T = 100 years, δ = π , n = 1 and a λ = 0.5.In each point of the plane the intensity of the grey-shade is proportional to the value of the return period ratio R (POT) T .The percentage of cases recognized as non-exponential by the Anderson-Darling test (at a 5 % level) is indicated by the black (labelled) contour lines.N = 20 years is considered.

Fig. 6 .
Fig. 6.Return period ratios (R (POT) T ) evaluated, for T = 100 years, in 294 rain gauging stations of the North-West of Italy.The color scale refers to the resulting range of variation of R (POT) T.
The Gumbel distribution (which is characterized by two parameters, called α Non-dimensional monthly average rainfall intensity h(t) and rainfall rate k(t) for 294 gauging stations in North-Western Italy (grey lines).The overall spatial mean values are reported in black.