Estimating strategies for multiparameter Multivariate Extreme Value copulas

. Multivariate Extreme Value models are a fundamental tool in order to assess potentially dangerous events. Exploiting recent theoretical developments in the theory of Copulas, new multiparameter models can be easily constructed. In this paper we suggest several strategies in order to estimate the parameters of the selected copula, according to different criteria: these may use a single station approach, or a cluster strategy, or exploit all the pair-wise relationships between the available gauge stations. An application to ﬂood data is also illustrated and discussed.

The investigation of multivariate phenomena is best carried out via copulas.The use of copulas in hydrology, as well as in other geophysical and environmental sciences, is recent and rapidly growing.Incidentally, we observe that all the multivariate distributions present in literature can be described in a straightforward way in terms of suitable copulas.For a thorough bibliography see Nelsen, 2006;Salvadori et al., 2007.The problem of measuring the amount of dependence between the variables involved is a central issue when modeling multivariate extremes.For instance, in literature the Correspondence to: G. Salvadori (gianfausto.salvadori@unisalento.it)pair-wise dependence is generally measured via the canonical Pearson's correlation coefficient.However, it may not be the best measure of dependence when dealing with extremes (Joe, 1997), since it does not exist for heavy-tailed variables with infinite variance, and only involves a linear kind of dependence.Recently, other quantities were considered (Nelsen, 2006) to measure the association between pairs of random variables (hereafter, r.v.s): among others, Kendall's τ and Spearman's ρ rank correlation coefficients, or the Blomqvist's β medial correlation coefficient.These measures always exist (being based on the ranks), and model several types of association (for a practical discussion see, e.g., the case studies illustrated in Salvadori et al., 2007).
Instead, the notion of cluster-type dependence, when the size of the cluster is larger than two (i.e., beyond the simple pair-wise case), has only been partially explored.Generalizations of Kendall's τ (Nelsen, 1996), Spearman's ρ (Schmid and Schmidt, 2007a,b), and Blomqvist's β (Durante et al., 2007;Schmid and Schmidt, 2007c) to the d-variate case (d > 2) were only recently introduced -see below.These extensions may be of practical importance: on the one hand, they provide useful tools to quantify the dependence within clusters; on the other hand, they can be used to estimate the parameters of the multivariate model at play (see later).However, at present the application of these measures in actual case studies is still quite limited.
Another important issue is represented by the construction of Multivariate Extreme Value (hereafter, MEV) models involving a significant number of parameters.Using the results of Liebscher (2008), recent works (Durante and Salvadori, 2010;Salvadori and De Michele, 2010) have shown how multiparameter MEV models can be easily constructed via copulas and suitable techniques of extra-parameterization, leading either to the formulation of new models, or to the generalization of existing ones.
A further fundamental question is represented by the estimate of the parameters of the multivariate copulas considered Published by Copernicus Publications on behalf of the European Geosciences Union.
G. Salvadori and C. De Michele: Estimating strategies for MEV copulas (see Genest et al., 1995;Shih and Louis, 1995;Joe, 1997;Genest and Favre, 2007, and references therein).Maximum Likelihood (hereafter, ML) or Pseudo-likelihood procedures involving the ranks of the data are generally used to fit these parameters.Alternatively, the parameters may be sometimes estimated via the Method of Moments and some pair-wise measures of association (usually, the Kendall's τ , the Spearman's ρ, or the Blomqvist's β).Apparently, no application of the d-variate generalizations of these measures to the parameters' estimation is available in literature.
In this paper we focus the attention on the estimation of the parameters in copula-based MEV models, presenting some new fitting strategies.In particular, each procedure exploits a different source of information: (i) a suitable single station, (ii) an appropriate cluster of stations, (iii) all the pairs of the available stations.Below, in Sect. 2 we introduce the concept of multivariate Extreme Value copulas, describing some of the mathematical features of interest here.In Sect. 3 we show several strategies for estimating the relevant parameters.In Sect. 4 an application to maximum annual flood data is presented and discussed.

MEV copulas: an overview
In this Section we briefly outline the mathematics of copulas needed in the sequel; for a thorough theoretical and practical introduction see, respectively, Joe (1997); Nelsen (2006), andSalvadori et al. (2007).Hereafter, for any integer d > 1, we use the vector notation in R d , i.e. x = (x 1 ,...,x d ); operations and inequalities are to be intended componentwise.Also, I = [0,1] will denote the unit interval, and I d the d-dimensional unit cube.
The main target pursued here is to provide a general multivariate framework for modeling non-independent extreme observations sampled via a network of gauge stations; the particular situation of independent ones will be included as a special case.As shown below, this can easily be achieved by using copulas.The r.v.s used in the sequel may represent, for instance, rainfall or flood measurements collected in a given basin, or pollution samples in a region, or wave measurements collected by marine buoys.Below, S = {S 1 ,...,S d } will denote a set of d gauge stations.
The problem of specifying a probability model for dependent multivariate observations can be simplified by expressing the corresponding d-dimensional joint distribution F in terms of its margins F 1 ,...,F d , and the associated copula C, implicitly defined through the following functional identity stated by Sklar's Theorem (Sklar, 1959): (1) A multivariate copula C(u 1 ,...,u d ) is simply a joint distribution over I d with Uniform margins.The link between dcopulas and multivariate distributions is provided by Eq. ( 1).
If F 1 ,...,F d are all continuous, then C is unique.
A copula C is MEV if it is max-stable, i.e. if it satisfies the equation for all u ∈ I d and all t > 0. As a simple example consider the following two copulas: (4) The former one models independent variates, while the latter one models comonotone dependent ones, where each variable is a monotone increasing function of the others.Evidently, both d and M d are max-stable, and hence MEV.
A distribution F is MEV if, and only if, all its margins F i 's are Generalized Extreme Value laws (hereafter, GEV), and the corresponding copula C is MEV.Note that not all copulas are MEV (i.e., satisfy the max-stability property (2)), and consequently should not be used to construct consistent MEV models.In addition, since the GEV law is continuous, the representation F = C(F 1 ,...,F d ) of a MEV distribution F is unique.Most importantly, by exploiting the invariance property of copulas (Nelsen, 2006), we may restrict our attention to copulas only, and do not worry about the GEV margins, as we shall do hereinafter.The construction of multivariate measures of association and/or dependence is an involved mathematical problem, and is still an open question in statistics.Several ideas were developed in the last few years, and various measures were introduced in order to describe concepts like, e.g., concordance for random vectors (Joe, 1990;Nelsen, 1996Nelsen, , 2002;;Úbeda-Flores, 2005;Schmid and Schmidt, 2007a;Taylor, 2007).
For bivariate problems, several measures of association are available (Joe, 1997;Nelsen, 2006).Among others, Kendall's τ and Spearman's ρ are frequently used in applications.The former one is the difference between the probability of concordance and discordance of the variables, the latter one measures the average distance between 2 (i.e., independence) and the bivariate copula of interest.As is well known, these measures only depend upon the copula joining the variables under investigation, and not upon the margins (i.e., they are scale invariant).As already mentioned above, a further advantage is that, if the variables involved are characterized by heavy-tailed distributions, then the second order moment (and, in turn, Pearson's coefficient) may not exist, whereas these latter measures always exist, being based on the ranks.
Interesting extensions of Kendall's τ (Nelsen, 1996) and Spearman's ρ (Schmid and Schmidt, 2007a,b) to a general dvariate framework (d > 2) were recently proposed, and several new measures involving the generic d-copula C were introduced: where h(d) = (d + 1)/(2 d − (d + 1)), and C ij is the bivariate (i,j )-margin of C. Note that ρ d,3 is essentially the average Spearman's ρ for all the pairs in a set of d variables.
Another useful multivariate measure of association is the medial correlation coefficient β d (see Durante et al., 2007;Schmid and Schmidt, 2007c, and references therein), which generalizes the well known Blomqvist's β coefficient (Nelsen, 2006): where C is the survival function associated with C, given by C(u) = P{C > u}, and 1/2 = (1/2,...,1/2).Clearly, also β d is invariant with respect to the distributions of the margins.
As pointed out in Schmid and Schmidt (2007c), β d has some advantages over competing measures such as τ d or ρ d,i 's.In fact, it can explicitly be derived whenever the copula is of explicit form, which is often not possible for other measures, and its estimation requires a low computational complexity.Thus, β d may represent a fast alternative for estimating the copula parameters (see below).
A further notion of interest is represented by Pickands' dependence function A (Pickands, 1981).Recall that a bivariate copula C is MEV if there exists a convex function Conversely, given a bivariate MEV copula C, the corresponding dependence function A is given by where t ∈ I.It is worth noting that the value τ C of the Kendall's τ associated with C, as well as the one of the Spearman's ρ, can be expressed in terms of A via (Nelsen, 2006;Salvadori et al., 2007) and A generalization of Pickands' dependence function to the multivariate case is shown in Falk and Reiss (2005).Since A can be estimated via empirical data (Genest and Segers, 2009), then it may be used to check the statistical adequacy of different models.We shall see later how to use Pickands' dependence function.
Finally, below we shall also use the Kendall's measure function K C (Genest andRivest, 1993, 2001) given by where t ∈ I is a probability level, W = C(U 1 ,...,U d ) is a univariate r.v.taking value on I, and the U i 's are Uniform r.v.s on I with copula C. In the bivariate Extreme Value case, K C is given by (Ghoudi et al., 1998) where τ C is the value of the Kendall's τ associated with the copula C. Clearly, bivariate MEV copulas with the same value of τ share the same function K C .Unfortunately, at present no useful expressions similar to Eq. ( 15) are known for the general multivariate case d > 2. The Kendall's measure K C is a fundamental tool for introducing a mathematically consistent (copula-based) definition of the return period for multivariate events (see also the discussion in Salvadori, 2004;Salvadori and De Michele, 2004;Salvadori et al., 2007;Durante and Salvadori, 2010;Salvadori and De Michele, 2010).In fact, Eq. ( 14) represents a multivariate quantile relationship, since it corresponds to a multidimensional Probability Integral Transform (Genest et al., 2006).
Let µ be the average interarrival time of the events in the sequence observed (e.g., µ = 1 year for annual maxima), and let p ∈ I be an arbitrary critical probability level (usually, p = 90,95,99%, or any other threshold of interest).The multivariate return period T p associated with p (hereafter, Kendall's return period) is defined as where the critical threshold t ∈ I is given by by analogy with the correct definition of quantile.Here

C
indicates the generalized (or pseudo-) inverse (Nelsen (2006)) of the corresponding function.Since K C is generally non-linear (K C (t) = t only if C = M d ), then t = p.More particularly, the relation K C (t) ≥ t holds (Capéraà et al. (1997)), and therefore where u ∈ I d is such that C(u) = t.The right-most term corresponds to the standard definition of multivariate return period (for a thorough review see Zhang (2005); Singh et al. (2007), and references therein).Evidently, the traditional approach may yield an incorrect calculation of the return period, and, in turn, a wrong estimation of the risk.Since empirical estimators of the Kendall's measure function are available (Genest et al., 2009;Salvadori and De Michele, 2010), we shall see later how to use them to perform a return period analyses of practical utility.

Parameters' estimation
As is well known, the estimate of the parameters of multivariate distributions is an involved problem, and still an open question in mathematical statistics.Usually, procedures like Maximum Likelihood are used to simultaneously fit all the parameters of interest.However, if, e.g., the copulas under investigation have singular components, then ML may be difficult to implement and use.Below, we outline several approaches for estimating the parameters of interest: each procedure will exploit different sources of information, and estimations achieved via different techniques will generally differ from one another.For instance, the estimate may rely only upon the information drawn from a suitable single station (Sect.3.1), or an appropriate cluster of stations (Sect.3.2), or the set of pairs of all the stations (Sect.3.3).The methods are general, and can be applied to any MEV copula, including those with singular components.Clearly, other approaches are possible, depending upon the specific needs.Note that the estimates calculated via the methods mentioned above could be used as starting guesses for running other procedures (e.g., ML).
Generally, in the strategies presented below, the fitting criterion is represented by the best agreement, in the Least Squares sense (hereafter, LS), with the "local" dependence structures or association measures: clearly, this may yield estimates different from the ones achieved via other procedures (e.g., the global ML).However, the overall fitting ability will always be certified via global Goodness-of-Fit tests (see Sect. 4), in order to verify whether the resulting parametric model could be accepted or not.

The single station approach
The first approach we propose for the estimate of the parameters of interest consists in using the information drawn from a single station at a time.Practically, for each of the available gauge stations S i 's, a suitable "companion" station S j = S j (i) is identified, possibly according to specific physio-geomorphological conditions and/or hydrological constraints.Then, we may estimate the parameters via a LS fit, involving the empirical estimates of the Pickands' dependence functions A ij 's of the companion pairs.As an alternative, also the Kendall's measure function K C could be used.However, while the former is specific for any copula, the latter is not, for it only depends upon the corresponding value of the Kendall's τ -see Eq. ( 15), and the comment following it.Therefore, we suggest to use Pickands' representation.
The procedure is as follows.Let n be the sample size, i.e. the number of available d-dimensional observations.For each station S i , i = 1,...,d, a companion station S j = S j (i) is identified, and an estimate A ij of the dependence function A of the model under investigation is calculated (Genest and Segers, 2009).In particular, in order to use all the information, since only n bivariate pairs are available, and given the constraints A(0) = A(1) = 1, the unit interval I is partitioned into n uniformly spaced intervals via the set of abscissas x k = k/n, k = 0,...,n (clearly, other choices are possible).Then, the A ij (x k )'s are estimated over the given grid, and the LS objective function is minimized, yielding the LS estimates of the parameters of interest.Note that, if {S i ,S j (i) } forms a pair-cluster (i.e., S j (i) is the station companion to S i , and vice-versa), then there is no need to compute also the symmetric contribution {S i =j (i) ,S j (i )=i }: this may reduce the computational burden.Essentially, the single station approach (hereafter, 1-MEV) exploits the relationships of the S i 's with the corresponding companion stations, i.e. it only uses the local (station based) bivariate dependence structures.
A natural criterion for selecting the companion station would be the use of the Euclidean distance: then S j (i) would simply be the station closest to S i .Note that, except for mathematically "pathological" cases of no interest here, usually S j is unique: a counter-example is given by a (practically improbable) situation in which several stations are exactly positioned on a circle centered in S i .Denoting by ij the distance between S i and S j , only two things may happen: 1. either {S i ,S j } forms a pair-cluster, 2. or, there exists another station S k closer to S j than S i ; clearly, S k may belong to a pair-cluster.
From a geometrical point of view, at least a couple of stations must form a pair-cluster.In fact, the set of N d = d(d − 1)/2 pair distances ij 's is finite, and hence it has (at least) a minimum: this corresponds to a pair-cluster.We stress that the use of the Euclidean distance as a criterion for choosing the source of information (i.e., adopting a nearest neighbor principle) may not always be the most advisable strategy.In fact, it has been shown (see, e.g., GREHYS, 1996;St-Hilaire et al., 2003;Merz and Bloeschl, 2004;Galéa and Canali, 2005;Wagener and Wheater, 2006;Ouarda et al., 2008;Shu and Ouarda, 2008) that the geometrical distance may not completely explain the dependence structure of the hydrological behavior of catchments: indeed, Hydrol.Earth Syst.Sci., 15,[141][142][143][144][145][146][147][148][149][150]2011 www.hydrol-earth-syst-sci.net/15/141/2011/ several are the physio-geomorphological factors that may influence it.Therefore, the validity of the nearest neighbor approach should be tested out by carefully checking the practical case study under investigation.
It is worth pointing out that, if the model involves global parameters (i.e., common to all stations), and these can be estimated a priori via other techniques, then the local parameters (if any) can be calculated as follows.For each station S i , the companion station S j is identified, and the local parameters are estimated via a LS fit of the dependence function A ij , using the values of the global parameters already estimated (i.e., only Z (1) i is minimized).If {S i ,S j (i) } is a pair-cluster, then all the estimates of the local parameters associated with S i and S j are kept; otherwise, only those associated with S i are stored.This latter strategy can easily deal with sets of stations of any size: in fact, only two stations at a time are considered for estimating the local parameters.In other words, a global estimate is necessary only if the global parameters cannot be estimated otherwise.

The cluster approach
The 1-MEV approach adopted in the previous Section only exploited the information drawn by a single station.This strategy can be generalized: in fact, a full cluster of companion gauge stations (instead of just one) may be chosen as a source of information.Clearly, the cluster can be fixed according to specific physio-geomorphological conditions and/or hydrological constraints (e.g., by identifying a homogeneous region, or a basin of influence).
Let S i be the i-th station, and let C (i) m i be a cluster of m i stations "pertinent" to S i , with 1 ≤ m i < d.Clearly, the choice of m i , as well as the selection of the set of relevant companion stations belonging to C (i) m i , can be made dependent upon specific basin characteristics, and changed when considering different stations S i 's.Evidently, the case m i = 1 for all i's corresponds to the 1-MEV approach.Here the idea is to estimate the parameters by exploiting suitable multivariate measures of association φ (i) C calculated over the families of stations For instance, any of the five measures outlined in Eqs. ( 5)-( 9) could be used.
The procedure is as follows.First, for each station S i , an estimate φ i of φ (i) C is calculated over the cluster F i .Then, the LS objective function is minimized, yielding the LS estimates of the parameters of interest.Note that, if F i is a closed cluster (i.e., if the station S j ∈ C (i) m i , then F j = F i ), then the contribution of the cluster can be calculated only once: this may reduce the computational burden.Essentially, the cluster approach (hereafter, c-MEV) exploits the relationships of the S i 's with suitable cluster of stations, i.e. it is based on the local m i -variate association structures.
The c-MEV strategy is quite flexible, and potentially most promising.Unfortunately, its efficacy may be limited by the current lack of easily usable mathematical tools, which at present may turn it into a "weak" approach.In fact, the use of measures of association φ (i) C 's for ruling the fits (instead of full dependence structures, as in the 1-MEV approach) may discard some important details: roughly speaking, a few "moments" of a distribution may not provide the same information as of the distribution itself.As an obvious alternative, we might suggest to use in the fits some multivariate equivalents of the bivariate Pickands' dependence functions (Falk and Reiss, 2005), but the research in this area is still in its infancy, and it is not yet clear how to proceed.
Again, it is worth pointing out that, if the model involves global parameters, and these can be estimated a priori via other techniques, then the local parameters (if any) can be calculated as follows.For each station S i , the family F i is identified, and the local parameters are estimated via a LS fit of φ (i) C , using the values of the global parameters already estimated.If F i is a closed cluster, then all the estimates of the local parameters associated with the stations in F i are kept; otherwise, only those associated with S i are stored.Thus, a global estimate is necessary only if the global parameters cannot be estimated otherwise.

The all-pairs approach
A further approach to the estimate of the parameters may rely upon the use of all the d(d − 1)/2 bivariate margins, by simultaneously considering the dependence structures of all the pairs of stations.The strategy is to fix all the parameters in such a way that the Pickands' dependence functions A ij 's best fit (in the LS sense) the corresponding empirical ones.From a practical point of view, this approach provides the closest "bivariate" approximation to a global fit: mathematically speaking, it is a "combinatorial" strategy.Clearly, as d gets larger and larger, the calculations required may become computationally demanding.
The LS objective function to be minimized is given by yielding the LS estimates of the parameters of interest.We call this method p-MEV approach.By exploiting the same strategy, a faster alternative would be to calculate the parameters by simultaneously fitting all the bivariate Kendall's τ , or Spearman's ρ, or Blomqvist's β coefficients (or any other measure of association): essentially, this corresponds to a Method of Moments procedure.However, while the use of Pickands' function involves the full functional form of the dependence structure (which is   specific for every copula), the coefficients mentioned above may not distinguish between different copulas.For this reason, we neither suggest nor investigate this alternative.

Case study
For the sake of illustration, here we consider the same data and copulas as used in Salvadori and De Michele (2010), to which we make reference for further details: a short summary is reported below.Also, in the following, the use of the Euclidean distance as a criterion for choosing the stations of interest is motivated by illustrative purposes only.
The data are maximum annual flood measurements collected in the Spey catchment (northern Highlands of Scotland).The basin is equipped with a network of 17 flow gauge stations, and is managed by the Scottish Environment Protection Agency (2009).Further details can be found in Gilvear (2004) and Black and Fadipe (2009).In this study we consider four gauge stations located in the middle and lower part of the Spey catchment (see Fig. 1): three on the main stream (i.e., S 2 , S 10 , and S 6 ), and one on Dulnain tributary (i.e., S 9 ).
The available observations amount to 37 quadruples of maximum annual floods.Evidently, from a statistical point of view, the sample size is very small for investigating a multivariate problem: unfortunately, this is a typical situation when extreme data bases are considered.However, here the target is not to provide an ultimate extreme flood model, and no practical project of hydrological works is undertaken.Instead, our point is only to show, in a relatively simple case, how the techniques outlined above can be used in practice: in other words, this is a methodological paper.
As a dependence model, here we use the multiparameter 4-variate MEV copula H introduced in Salvadori and De Michele (2010): with Gumbel parameters ξ,χ ≥ 1, and "extra-parameters" a 1 ,a 2 ,a 3 ,a 4 ∈ I, which represents a 4-variate generalization of the well known Gumbel copula G θ (Nelsen (2006); Salvadori et al. (2007)) with parameter θ ≥ 1.Note that the Gumbel copula G θ represents a sort of "standard" MEV model in hydrology (see, e.g., Yue (2000a,b); Zhang and Singh (2007), and references therein).A straightforward interpretation of the parameters a i 's is as follows.Suppose that a = 1: then, H = G ξ .Conversely, should it be a = 0, then H = G χ .For other values of a, H is a sort of "mixture" between G ξ and G χ : in particular, the a i 's play the role of "local" mixing parameters.
The generic bivariate dependence function i.e. a non-linear, possibly asymmetric, function, able to model non-exchangeable variables (an important issue in applications, not shared by G θ -see, e.g., the discussion in Grimaldi and Serinaldi, 2006).From a practical point of view, this latter feature may provide a consistent model of the asymmetric relationship between upstream and downstream river stations, viz. the upstream stations may "influence" the downstream ones, but the converse may be difficult to prove.
In Table 1 (upper triangular) we show the inter-station distances 's.It is then immediate to identify, for each site, the nearest neighbor station: namely, S 2 ← S 9 , S 6 ← S 10 ,  S 9 ↔ S 10 , i.e. the latter two stations form a pair-cluster (see Table 1 -diagonal).In Table 1 (lower triangular) we show the empirical estimates of the bivariate Kendall's τ , for all the pairs of the four stations of interest here.It is interesting to note that the coefficient is very small for the two farthest stations {S 2 ,S 6 }: this means that the association between the two is negligible, as confirmed by the corresponding p-values, though this does not imply that the stations are statistically independent (as, instead, is commonly misinterpreted).On the contrary, the analysis of the p-values shows that the estimates of the coefficients for all the other pairs are statistically significantly different from zero: this means that the corresponding stations are definitely dependent.
Below we shall statistically compare the performances of the copula model provided by Eq. ( 22), using sets of param-eters fitted via different methods.The estimates are reported in Table 2.The six parameters of H have been estimated in Salvadori and De Michele (2010) via ML (see the first row of Table 2): this will give us the possibility to compare and discuss the results of fitting techniques different from the standard one.The estimates of the same parameters according to, respectively, the 1-MEV, the c-MEV, and the p-MEV strategies are also reported in Table 2.For illustrative purposes, the c-MEV approach is run using as multivariate measure of association the Spearman's ρ d,3 -see Eq. ( 8), and considering the following four clusters of stations, having different sizes: F 2 = {S 2 ,S 9 ,S 10 }, F 6 = {S 6 ,S 9 ,S 10 }, and F 9 = F 10 = {S 9 ,S 10 }.As a variant (not shown), also the multivariate Blomqvist's β d -see Eq. ( 9) -was used, but the results did not significantly change.It is interesting to note that, independently of the fitting procedure, G ξ ≈ 4 -the copula of independence -see Eq. ( 3), -whereas G χ ≈ M 4 -the copula of full dependence -see Eq. ( 4).Thus, as already mentioned, the extra-parametrized copula H is a sort of "mixture" between 4 and M 4 , ruled by the "local" mixing parameters a i 's.
In Fig. 2 we plot the empirical and fitted Pickands' functions A's for all the pairs of stations and the models of interest.The graphs allow for a preliminary visual analysis of the different performances: clearly, being just low-dimensional bivariate slices of the four-dimensional copula H, they cannot (and should not) be used to judge the overall fitting ability of the different models -see below.Furthermore, it must be stressed that the empirical estimates of the true (but unknown) dependence functions do not generally respect the convexity constraint, i.e. they are not intrinsic estimatorssee also the discussion in Genest and Segers (2009), and references therein.
Apparently, none of the ML, 1-MEV, and c-MEV strategies seem to provide uniformly consistent fits, whereas the p-MEV method overall provides valuable approximations.However, the lacks of fit are more apparent than real: in fact, due to the small sample size, the confidence bands are expected to be quite large.Most interestingly, the copula H fitted via the "local" strategies is able to match the asymmetries shown by the empirical functions, and may adapt itself to the "in situ" behaviors of the data.Furthermore, the "degree of dependence", as measured via the Kendall's τ , ranges from ≈ 0.1 to ≈ 0.6 (see Table 3, reporting the estimates for the 1-MEV and p-MEV strategies), whereas the corresponding values fitted via ML only range from ≈ 0.2 to ≈ 0.4 (see Salvadori and De Michele, 2010).
However, when the problem is multivariate, what should always be analyzed is the full dependence structure, and its global ability to fit the actual data.For this purpose, we exploit some robust Goodness-of-Fit tests for multivariate copulas (Genest et al., 2009).These tests use Cramér-von-Mises statistics, and acceptance or rejection of a model is based on the p-values calculated via bootstrap techniques: small ones suggest to discard the corresponding copula, whereas large ones support its suitability.In our case, the p-values are as reported in Table 2.In turn, all the models investigated here S 2 1 0.12 0.37 0.55 S 6 0.12 1 0.60 0.32 S 9 0.39 0.39 1 0.57 S 10 0.43 0.29 0.54 1 should be accepted, since the p-values are much larger than 5%.It is worth mentioning that the p-values should only be used to reject a copula, according to some standard criterion (like, e.g., a value smaller than 1%).It is a common error to consider as "better" those models yielding the highest pvalues: mathematically speaking, this is generally false.
A further issue of interest concerns the investigation of the Kendall's return period: this is a fundamental point in applications, since it provides crucial information of practical utility.In principle, it might be possible to use the estimates of the multivariate return period function in order to choose between models fitted via different strategies.In Fig. 3 we show the empirical and the fitted Kendall's return periods for all the four stations and the models of interest: the plot shows the return periods associated with all critical probability levels t ∈ I.Note that, due to the limited sample size, the estimates of the largest empirical return periods are spoiled (as is well evident in Fig. 3).Visually, both the ML and the p-MEV fits are valuable, whereas the 1-MEV and c-MEV ones apparently fail to provide a consistent approximation: this may not be surprising, since these latter strategies either use the least amount of information, or rely upon measures of association instead of full distributional functions.
As illustrated and discussed in Salvadori and De Michele (2010), these multivariate return periods are generally much larger than the ones calculated via the formulas usually found in literature -see Eq. ( 18), -and the ensuing discussion).Clearly, the underestimates provided by the standard approach, i.e. a return period much smaller than the correct one, may have sizable consequences.Instead, following the Kendall's measure approach illustrated here, a correct risk analysis can be performed.

Conclusions
In order to properly assess the risk, MEV models are fundamental in all areas of geophysics.This paper is of methodological nature, and introduces new estimation techniques for dealing with extremes.In particular, we outline several strategies in order to estimate the parameters of MEV copulas according to different criteria: we use either a "companion" station/cluster approach, or exploit all the pair-wise  relationships between the available gauge stations.The techniques suggested may offer interesting alternatives to standard fitting methods (e.g., ML).An application to flood data is also presented, and a comparison between different estimating strategies is illustrated: this shows how the techniques outlined in the paper can be used in practice.

Fig. 1 .
Fig. 1.Map of the Spey catchment.The black circles indicate the four gauge stations of interest -see text. 17

Fig. 1 .
Fig. 1.Map of the Spey catchment.The black circles indicate the four gauge stations of interest -see text.

Fig. 2 .
Fig. 2. Plots of empirical and fitted Pickands' dependence functions for all the pairs of stations and the models of interest -see text.

18Fig. 2 .
Fig. 2. Plots of empirical and fitted Pickands' dependence functions for all the pairs of stations and the models of interest -see text.

Fig. 3 .
Fig. 3. Plot of empirical and fitted return periods for all the four stations and the models of interest -see text.

19Fig. 3 .
Fig. 3. Plot of empirical and fitted return periods for all the four stations and the models of interest -see text.

Table 1 .
(Upper triangular) Inter-station distances (in km).(Diagonal) Labels of the nearest neighbor station.(Lower triangular) Empirical estimates of the Kendall's τ for all the pairs of the four stations, with the p-values in parentheses -see text.

Table 2 .
Estimates of the parameters of the 4-variate copula H using different fitting techniques -see text.Also shown are the p-values of the corresponding models.

Table 3 .
Values of the Kendall's τ for all the pairs of the four stations -see text: (upper triangular) estimates using the 1-MEV strategy; (lower triangular) estimates using the p-MEV strategy.