Some regional procedures to estimate hydrological quantiles at ungauged sites, such as the index-flood method, require the delineation of homogeneous regions as a basic step for their application. The homogeneity of these delineated regions is usually tested providing a yes/no decision. However, complementary measures that are able to quantify the degree of heterogeneity of a region are needed to compare regions, evaluate the impact of particular sites, and rank the performance of different delineating methods. Well-known existing heterogeneity measures are not well-defined for ranking regions, as they entail drawbacks such as assuming a given probability distribution, providing negative values and being affected by the region size. Therefore, a framework for defining and assessing desirable properties of a heterogeneity measure in the regional hydrological context is needed. In the present study, such a framework is proposed through a four-step procedure based on Monte Carlo simulations. Several heterogeneity measures, some of which commonly known and others which are derived from recent approaches or adapted from other fields, are presented and developed to be assessed. The assumption-free Gini index applied on the at-site L-variation coefficient (L-CV) over a region led to the best results. The measure of the percentage of sites for which the regional L-CV is outside the confidence interval of the at-site L-CV is also found to be relevant, as it leads to more stable results regardless of the regional L-CV value. An illustrative application is also presented for didactical purposes, through which the subjectivity of commonly used criteria to assess the performance of different delineation methods is underlined.

Regional hydrological frequency analysis (RHFA) is needed to estimate extreme hydrological events when no hydrological data are available at a target site or to improve at-site estimates, especially for short data records (e.g. Burn and Goel, 2000; Requena et al., 2016). This is usually done by transferring information from hydrologically similar gauged sites. Delineation of regions formed by hydrologically similar gauged sites is a basic step for the application of a number of regional procedures such as the well-known index-flood method (Dalrymple, 1960; Chebana and Ouarda, 2009). Such a method employs information from sites within a given “homogeneous” region to estimate the magnitude of extreme events related to a given probability (or return period) at a target site, which are called quantiles. Regional homogeneity is often defined as the condition that floods at all sites in a given region have the same probability distribution except for a scale factor (e.g. Cunnane, 1988). The present paper focuses on the heterogeneity concept in hydrology derived from this “regional homogeneity”, which is different from the heterogeneity concept considered in other fields, such as ecology, geology, and information sciences (e.g. Li and Reynolds, 1995; Mays et al., 2002; Wu et al., 2010).

In order to delineate homogeneous regions, numerous studies have proposed
and compared similarity measures entailing climatic (e.g. mean annual
rainfall), hydrologic (e.g. mean daily flow), physiographic (e.g. drainage
area), and combined descriptors (see Ali et al., 2012, and references therein)
to be used as input to statistical tools for grouping sites. The selection
of these descriptors is carried out by stepwise regression, principal
components, or canonical correlation, among others (e.g. Brath et al., 2001;
Ouarda et al., 2001; Ilorme and Griffis, 2013). Traditional statistical
tools, such as cluster analysis, or new approaches, such as the affinity
propagation algorithm, are considered to form homogeneous regions based on
the previously identified similarity measures (e.g. Burn, 1989; Ouarda and
Shu, 2009; Ali et al., 2012; Wazneh et al., 2015). For further references on
regional flood frequency analysis, please see Ouarda (2013) and Salinas et
al. (2013), and references therein. Moreover, many tests have been introduced and
compared throughout the literature to decide whether a given delineated
region can be considered as homogeneous (e.g. Dalrymple, 1960; Wiltshire,
1986; Scholz and Stephens, 1987; Chowdhury et al., 1991; Fill and Stedinger,
1995; Viglione et al., 2007). The homogeneity test proposed by Hosking and
Wallis (1993) is usually utilised. In this test the statistic

In practice, apart from determining if a region can be considered as homogeneous by making a yes/no binary decision (e.g. Warner, 2008) generally based on a significance test, the quantification of the degree of heterogeneity is also necessary. Heterogeneity measures are required for such a task. Two approaches can be considered in this regard: (i) the use of heterogeneity measures for determining the effect of the departure from the homogeneous region assumption on quantile estimation; and (ii) the use of heterogeneity measures for ranking regions according to their degree of heterogeneity. Regarding the former, quantifying the degree of heterogeneity provides a notion of the inaccuracy incurred through the estimation of quantiles by a regional method, for which homogeneous regions are assumed but a `non-perfect' homogeneous region is used. This approach has already been studied, being closely related to the homogeneity test notion (e.g. Hosking and Wallis, 1997; Wright et al., 2014), which is further explained below.

The second approach corresponds to the focus of the present paper. Through this second approach, different regional delineation methods can be properly compared to identify the best one. This will be the method delineating the “most homogeneous region”. Also, heterogeneity measures can be helpful in ranking potential homogeneous regions formed by removing discordant sites. By analogy with distribution selection (e.g. Laio et al., 2009), the concept of heterogeneity measure considered here plays the role of a “model selection criterion”, such as the Akaike information criterion (Akaike, 1973), whereas the homogeneity test plays the role of a “goodness-of-fit test”. The former ranks delineated regions by providing unambiguous results to identify the best one in terms of heterogeneity, whereas the latter indicates if the given region can be considered as homogeneous or not.

In relation to the use of heterogeneity measures as a proxy for quantile
error – approach (i), the test statistic

A number of studies have proposed and compared methods in which different
combinations of similarity measures and/or statistical tools are considered
for delineating regions (references below). These studies usually consider
measures based either on

However,

Instead of using measures based on

Therefore, a general framework is needed to allow the definition and assessment of desirable properties of a heterogeneity measure in the regional hydrological context in order to properly identify a suitable measure. Such a measure should overcome the aforementioned drawbacks: it should be free of assumptions, positive, and unaffected by region size. Furthermore, the use of a heterogeneity measure should allow direct comparison of the heterogeneity of regions delineated by different methods. Indeed, it should allow ranking of the heterogeneity degree of several regions to identify “the most homogeneous region” or to assess the effect of some sites on the “heterogeneity degree” of the region. In the present paper, such a framework is proposed under an evaluation of the heterogeneity measures based on Monte Carlo simulations. Several measures extracted from literature in hydrology and other fields are presented and/or adapted to be assessed as well-justified heterogeneity measures. The present paper is organised as follows. The procedure for the assessment of a heterogeneity measure is presented in Sect. 2. The heterogeneity measures considered to be checked by the proposed procedure are introduced in Sect. 3. Results of the assessment are illustrated in Sect. 4. Discussion of results is presented in Sect. 5. An illustrative application is shown in Sect. 6 and conclusions are summarised in Sect. 7.

A simulation-based procedure consisting of four steps is proposed to study
the behaviour of a given heterogeneity measure (generically denoted

Before further describing the aforementioned steps and desirable properties,
elements of the framework needed for performing the assessment procedure are
presented. The procedure is based on synthetic regions with flood data
samples generated through Monte Carlo simulations from a representative
flood parent probability distribution commonly used in frequency analysis,
the generalised extreme value (GEV) distribution. A region is defined by its
number of gauging sites (

Finally, a given region consists of at-site data generated from a GEV
distribution with parameters obtained through at-site L-moments. At-site
data are standardised by their sample mean to frame them in the regional
context (e.g. Bocchiola et al., 2003; Requena et al., 2016). Note that
heterogeneity measures directly based on L-moments lead to the same results
for standardised or non-standardised data. A region with

It is important to highlight that the use of simulated data in the assessment of new techniques in regional frequency analysis is a well-established approach, and it has been used in a number of publications (e.g. Hosking and Wallis, 1997; Seidou et al., 2006; Chebana and Ouarda, 2007).

The first step of the assessment of a heterogeneity measure

Effect of the heterogeneity rate: The degree of heterogeneity of a region is
the aimed value to be quantified by

Effect of the number of sites: The size of a region, represented by the
number of sites

Effect of the regional average L-moment ratios:

Effect of the record length: The amount of available at-site information,
represented by the data length

The second step in the assessment of

The third step of the assessment of

We consider two regions A and B, without loss of generality. The idea is
that (sub)regions delineated by a given method should theoretically entail
different

The fourth step of the assessment of

The procedure is described below. Note that the values of the factors used
in this section are selected to facilitate the graphical representation.
Thus, a homogeneous region (i.e.

The aim of this section is to present and develop heterogeneity measures based on different approaches to be assessed by the procedure proposed in Sect. 2. Heterogeneity measures are selected as a result of a general and comprehensive literature review in a number of fields, including hydrology. We can distinguish three types of measures: (a) known in RHFA; (b) derived from recent approaches in RHFA; and (c) used in other fields and adapted here to the regional hydrological context. Therefore, a total of eight measures are considered.

The first group consists of the well-known statistics

Even though

The extensions of

The second group is represented by a measure derived from a relatively novel
approach in which the confidence interval for the at-site L-CV

Viglione (2010) proposed a procedure for obtaining the confidence interval
for L-CV without considering a given parent distribution of the data,
applying it to a didactic illustration for comparing several regional
approaches. The procedure is summarised below: the variance of the sample
L-CV

In the present study, the heterogeneity measure considered regarding this
approach is named as

The last group consists of the Gini index (GI) (Gini, 1912; Ceriani and Verme, 2012), which is a measure of inequality of incomes in a population commonly used in economics, and of a measure based on the entropy-based Kullback–Leibler (KL) divergence (Kullback and Leibler, 1951), which estimates the distance between two probability distributions and is used for different purposes in a number of fields, including hydrology (e.g. Weijs et al., 2010).

The definition of the GI is usually given according to the Lorenz
curve (Gastwirth, 1972), but it can be expressed in other ways.
Specifically, the sample GI

The KL divergence (so-called relative entropy) of the probability
distribution

Simulation results obtained by the application of the proposed assessment procedure (Sect. 2) to the considered heterogeneity measures (Sect. 3) are presented in this section. Note that a summary of the results obtained from each step is presented in Table 1.

Summary of the results of the studied measures for the four-step
assessment procedure. The behaviour of a given measure for each sensitivity
analysis in step (i) is graded as follows: good (G), acceptable (A), bad (B), or
unacceptable (U). Measures entailing an “unacceptable (U)” behaviour are not
assessed by the rest of steps; yet a complete assessment of

Sensitivity analysis:

Results of the effect of varying factors defining a region (Sect. 2.2) are
presented through box plots and mean values of the heterogeneity measure over

Sensitivity analysis:

Sensitivity analysis:

The effect of

It is also found that results for

Sensitivity analysis: mean of

Summary of the success rate (SR) minimum, average, and maximum of the
considered measures (

Sensitivity analysis:

The influence of varying regional average L-moments is shown by comparing
the

Finally, the effect of varying the record length

As a result of the aforementioned qualitative sensitivity analysis results
(see Table 1 for a summary),

The ability of the measures to identify the most heterogeneous region
between two regions A and B is shown via the success rate SR (Sect. 2.3).
A summary of the results obtained for

The SR average is shown as a notion of the overall behaviour of the
measures. Recall that the larger SR is, the better

Box plots of representative values of the heterogeneity measure average
obtained for 22 cases, varying the heterogeneity rate

Mean values of the heterogeneity measures over

The SR minimum and SR maximum are displayed as a notion of
the variability of the SR results (Table 2). Results related to the
SR minimum are analogous to those obtained by the SR
average, with

The variability of the heterogeneity measures as a function of the degree of
regional heterogeneity, represented by

The effect of discordant sites (Sect. 2.5) is shown in Fig. 7. The mean
values of the heterogeneity measures over

However, when

Overall, GI can be considered as the best heterogeneity measure
among all the evaluated measures, closely followed by

The

As indicated in Sect. 1, the heterogeneity measures selected in this study
may be used for the assessment of the degree of heterogeneity of regions
obtained through the use of different delineation methods. When a region is
divided into several sub-regions by a given delineation method, the
GI (or

Summary of the statistics of descriptors, spring maximum peak flow series, and available at-site quantiles for the 44 sites considered in the illustrative application.

Results of the illustrative application: heterogeneity measures

An illustrative application on observed data is presented for didactical purposes. The considered case study consists of 44 sites from the hydrometric station network of the southern part of the province of Quebec, Canada (for more description of the data and the region see Chokmani and Ouarda, 2004). The flow data are managed by the Ministry of the Environment of Quebec Services. Descriptors and at-site spring flood quantiles are available for the considered sites (Kouider et al., 2002). A summary of the statistics associated with spring maximum peak-flow data, relevant descriptors for flood frequency analysis (e.g. Shu and Ouarda, 2007), and at-site spring flood quantiles is shown in Table 3. Note that due to the data used in this application being observed instead of simulated, the real degree of heterogeneity of the regions, as well as the real parent distribution of the data, is unknown. Thus, it is not possible to truly compare the performance of the different heterogeneity measures. In this regard, it is important to note that the purpose of this illustrative application is then to show that commonly used criteria for identifying the best method for delineating regions may be subjective, as well as to guide practitioners in the use of heterogeneity measures.

The heterogeneity of the whole study region is evaluated by using a
homogeneity test (Hosking and Wallis, 1997), resulting in a heterogeneous
region (

The results obtained by applying the best heterogeneity measure found in the
present study, the GI, are shown in Table 4. For comparison
purposes, the results obtained by applying commonly used criteria for
identifying the best delineation method are also shown. They are

According to the results in Table 4, the

RRMSE average for

In the present application, the GI identifies Clustering A as the best delineation method. The GI seems to be a more objective criterion for identifying the heterogeneity of a region than criteria commonly used in practice. Besides, its use as heterogeneity measure is supported by the four-step simulation-based assessment procedure performed in the present paper. It is worth mentioning that Clustering A could be ideally assumed to be the best setting for forming sub-regions, as it is based on relevant descriptors for flood frequency analysis. However, this would just be an assumption that cannot be verified due to the use of observed data.

Delineation of homogeneous regions is required for the application of regional frequency analysis methods such as the index flood procedure. The availability of an estimate of the degree of heterogeneity of these delineated regions is necessary in order to compare the performances of different delineation methods or to evaluate the impact of including particular sites. Due to the unavailability of a well-justified and generally recognised measure for performing such comparisons, a number of studies have relied on measures that are not well-defined or approaches that involve additional steps during the delineation stage of regional frequency analysis.

In the present paper, a simulation-based general framework is presented for
assessing the performance of potential heterogeneity measures in the field
of regional hydrological frequency analysis (RHFA), according to a number of
desirable properties. The proposed four-step assessment procedure consists
of the following: sensitivity analysis by varying the factors of a region; evaluation of
the success rate for identification of the most heterogeneous region;
estimation of the evolution of the variability for the heterogeneity measure
average with respect to the degree of regional heterogeneity; and study of
the effect of discordant sites. The procedure is applied on a set of
measures including commonly used ones, measures that are derived from recent
approaches, and measures that are adapted from other fields to the regional
hydrological context. The assumption-free Gini index (GI)
frequently considered in economics and applied here on the L-variation
coefficient (L-CV) of the regional sites obtained the best results. A lower
performance was obtained for the measure of the percentage of sites (

Simulated data may be reproduced by following the indications
given over the paper. Raw observed hydrological data, as well as raw catchment
descriptors used in the illustrative application may be obtained from the
Environment Ministry of the Province of Quebec (

The authors declare that they have no conflict of interest.

The financial support provided by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Merit scholarship program for foreign students – Postdoctoral research fellowship of the Ministère de l'Éducation et de l'Enseignement Supérieur du Québec managed by the Fonds de recherche du Québec – Nature et technologies is gratefully acknowledged. The authors are also grateful to the editor S. Archfield, and to the reviewers W. Farmer, C. Rojas-Serna and K. Sawicz, as well as one other anonymous reviewer whose comments helped improve the quality of this paper. Edited by: S. Archfield Reviewed by: C. Rojas-Serna, K. A. Sawicz, W. H. Farmer, and one anonymous referee