Optimal depth-based regional frequency analysis

Classical methods of regional frequency analysis (RFA) of hydrological variables face two drawbacks: (1) the restriction to a particular region which can lead to a loss of some information and (2) the definition of a region that generates a border effect. To reduce the impact of these drawbacks on regional modeling performance, an iterative method was proposed recently, based on the statistical notion of the depth function and a weight function φ. This depth-based RFA (DBRFA) approach was shown to be superior to traditional approaches in terms of flexibility, generality and performance. The main difficulty of the DBRFA approach is the optimal choice of the weight function φ (e.g.,φ minimizing estimation errors). In order to avoid a subjective choice and näıve selection procedures of φ, the aim of the present paper is to propose an algorithm-based procedure to optimize the DBRFA and automate the choice of φ according to objective performance criteria. This procedure is applied to estimate flood quantiles in three different regions in North America. One of the findings from the application is that the optimal weight function depends on the considered region and can also quantify the region’s homogeneity. By comparing the DBRFA to the canonical correlation analysis (CCA) method, results show that the DBRFA approach leads to better performances both in terms of relative bias and mean square error.


Introduction
Due to the large territorial extents and the high costs associated to installation and maintenance of monitoring stations, it is not possible to monitor hydrologic variables at all sites of interest.Consequently, hydrologists have often to provide estimates of design event quantiles QT, corresponding to a large return period T at ungauged sites.In this situation, regionalization approaches are commonly used to transfer information from gauged sites to the target site (ungauged or partially gauged) (e.g., Burn, 1990b;Dalrymple, 1960;Ouarda et al., 2000).A number of estimation techniques in regional frequency analysis (RFA) have been proposed and applied in several countries (De Michele and Rosso, 2002;Haddad and Rahman, 2012;Madsen and Rosbjerg, 1997;Nguyen and Pandey, 1996;Ouarda et al., 2001).
In general, RFA consists of two main steps: (1) grouping stations with similar hydrological behavior (delineation of hydrological homogeneous regions) (e.g., Burn, 1990a) and (2) regional estimation within each homogenous region at the site of interest (e.g., GREHYS, 1996;Ouarda et al., 2000Ouarda et al., , 2001)).The two main disadvantages of this type of regionalization methods are (i) a loss of information due to the exclusion of a number of sites in the step of delineation of hydrological homogeneous region, and (ii) a border effect problem generated by the definition of a region.
To reduce or eliminate the negative impact of these disadvantages on the estimation quality, a number of regional methods have been proposed that combine the two stages (delineation and estimation) and use all stations (e.g., Ouarda et al., 2008;Shu andOuarda, 2007, 2008).One of these regional methods was developed recently by Chebana and Ouarda (2008).This RFA method is based on statistical depth functions (denoted by DBRFA for depth-based RFA).The DBRFA approach focuses directly on quantile estimation using the weighted least squares (WLS) method to estimate parameters and avoids the delineation step.It employs the multiple regression (MR) model that describes the relation between hydrological and physio-meteorological variables of sites (Girard et al., 2004).
After Chebana and Ouarda (2008), statistical depth functions are used in a number of hydrological and environmental Published by Copernicus Publications on behalf of the European Geosciences Union.
H. Wazneh et al.: Optimal depth-based regional frequency analysis studies.For instance, Chebana and Ouarda (2011a) used these functions in an exploratory study of a multivariate sample including location, scale, skewness and kurtosis as well as outlier detection.In another study, Chebana and Ouarda (2011b) combined depth functions with the orientation of observations to identify the extremes in a multivariate sample.Bardossy and Singh (2008) used the statistical notion of depth to detect unusual events in order to calibrate hydrological models.Recently, some studies present further developments of the approach that calibrate hydrological models by a depth function (e.g., Krauße and Cullmann, 2012;Krauße et al., 2012).
The DBRFA method consists generally of ordering sites by using the statistical notion of depth functions (Zuo and Serfling, 2000).This order is based on the similarity between each gauged site and the target one.Accordingly, a weight is attributed to each gauged site using a weight function denoted ϕ.This function, with a suitable shape, eliminates the border effect and includes all the available sites proportionally to their hydrological similarity to the target site.Note that classical RFA approaches correspond to a special weight function with value 1 inside the region and 0 outside.The definition of a region in the classical RFA approaches becomes rather a question of choice of weight function ϕ according to a given criterion (e.g., relative root mean square error RRMSE).
By construction, the estimation performance in the MR model using the DBRFA approach depends on the choice of the weight functions ϕ.Chebana and Ouarda (2008) applied several families of functions ϕ, where the corresponding coefficients were chosen arbitrarily and after several trials.In addition, even though the obtained results are an improvement of the traditional approaches, they are not necessarily the best ones.
The aim of the present paper is to propose a procedure to optimize the DBRFA approach over ϕ.This aim has theoretical as well as practical considerations.This procedure allows an optimal choice of the weight function ϕ and makes the DBRFA approach automatic and objective.It should be noted that Ouarda et al. (2001) determined the optimal homogenous neighborhood of a target site in the canonical correlation analysis (CCA) based approach.In Ouarda et al. (2001) the optimization corresponds to the selection of the neighborhood coefficient, denoted by α, according to the bias or the squared error.The optimal choice of weight functions has been the topic of numerous studies in the field of statistics (e.g., Chebana, 2004).
To optimize the choice of ϕ, suitable families of functions as well as algorithms are required.In the present context, four families of ϕ are considered: Gompertz (ϕ G ) (Gompertz, 1825), logistic (ϕ logistic ) (Verhulst, 1838), linear (ϕ Linear ) and indicator (ϕ I ).The three families ϕ G , ϕ logistic and ϕ Linear are regular, flexible, S-shaped and have other suitable properties.
Several appropriate algorithms can be considered (Wright, 1996).They are appropriate when the objective function ζ (criterion to be optimized) is not differentiable or the gradient is unavailable and must be calculated by a numerical method (e.g., finite differences).Among these algorithms, the most commonly used are the simplex method (Nelder and Mead, 1965), the pattern search method of Hooke and Jeeves (Hooke and Jeeves, 1961;Torczon, 2000) and the Rosenbrock methods (Rao, 1996;Rosenbrock, 1960).These methods are used successfully in several domains, and are particularly popular in chemistry, engineering and medicine.Specifically, in this paper the simplex and the pattern search algorithms are used because of their advantages.Indeed, they are very robust (e.g., Dolan et al., 2003;Hereford, 2001;Torczon, 2000), simple in terms of programming, valid for nonlinear optimization problems with real coefficients (McKinnon, 1999) and helpful in solving optimization problems with and without constraints (e.g., Lewis andTorczon, 1999, 2002).
In this study, the proposed optimization procedure is applied to the flood data from three different regions of the United States and Canada (Texas, Arkansas and southern Quebec).For each region, the obtained results are compared with those of the CCA approach.
The present paper is organized as follows.Section 2 describes the used technical tools including depth functions, the WLS method and the definitions of the considered weight functions.Section 3 describes the proposed procedure.Then Sect. 4 presents the application to the three case studies as well as the obtained results.The last section is devoted to the conclusions of this work.

Background
In this section, the background elements required to introduce and apply the optimization procedure of the DBRFA approach are briefly presented.This section contains a number of basic notions.

Mahalanobis depth function
The absence of a natural order to classify multivariate data led to the introduction of the depth functions (Tukey, 1975).They are used in many research fields, and were introduced in water science by Chebana and Ouarda (2008).Several depth functions were introduced in the literature (Zuo and Serfling, 2000).Depth functions have a number of features that fit well with the constraint of RFA (Chebana and Ouarda, 2008).
In this study, the Mahalanobis depth function is used to sort sites where the deeper the site is the more it is hydrologically similar to the target site.This function is used for its simplicity, value interpretability, and for the relationship with the CCA approach used in RFA.The Mahalanobis depth function is defined on the basis of the Mahalanobis distance given by d 2 A (x, y) = (x − y) A −1 (x − y) between two points x, y ∈ R d (d ≥ 1) where A is a positive definite matrix (Mahalanobis, 1936).This distance is used by Ouarda et al. (2001) in the development of the CCA approach.The Mahalanobis depth of x with respect to µ is given by for a cumulative distribution function F characterized by a location parameter µ and a covariance matrix A. Note that the Mahalanobis depth function has values in the interval [0, 1].An empirical version of the Mahalanobis depth of x with respect µ is defined by replacing F by a suitable empirical function FN for a sample of size N (Liu and Singh, 1993).In the context of the present paper, the notation in Eq. ( 1) is replaced by where μ and Â are respectively the location and covariance matrix estimated from the observed sample.

Weight functions
Below are the definitions of the four families of weight functions ϕ G , ϕ logistic , ϕ Linear and ϕ I considered in this paper along with special cases of functions ϕ for comparison purposes.

Gompertz function
The Gompertz function is usually employed as a distribution in survival analysis.This function was originally formulated by Gompertz (1825) for modeling human mortality.A number of authors contributed to the studies of the characterization of this distribution (e.g., Chen, 1997;Wu and Lee, 1999).In the field of water resources, the Gompertz function was adopted by Ouarda et al. (1995) to estimate the flood damage in the residential sector.The function ϕ G is increasing, flexible and continuous (Zimmerman and Núñez-Antón, 2001).The Gompertz distribution has different formulations one of which is given by where c is its upper limit, a and b are two coefficients which respectively allow to translate and change the spread of the curve.Figure 1 shows the effects of these coefficients on the form of ϕ G .Note that this function starts at zero (starting phase), then increases exponentially (growth phase) and finally stabilizes by approaching the upper limit c (stationary phase) with 0 ≤ ϕ G (x) ≤ c.The inflection point of this function is ln a b , c e .(1838) proposed this function to study population growth.It is given by

Verhuls
where the coefficients c, a and b play the same role as in ϕ G .This function has similar properties to those of ϕ G (increasing, flexible, continuous and with three phases).However, ϕ logistic is symmetric around its inflection point ln a b , c 2 which is not the case for ϕ G .

Linear function
It is a simple function, linear over three pieces corresponding to the three previous phases.Explicitly it is given by This function is considered as a weight function in the study of Chebana and Ouarda (2008).

Indicator function
This function is given by where A is a subset in R (set of real numbers), such as an interval.The subset A represents the neighborhood or the region in the classical RFA approaches.The weight is equal to 1 if the site is included in the region, otherwise, it is 0.
In the case where the set and χ 2 α,p is the (1 − α) quantile associated to the chi-squared distribution with p degrees of freedom, the DBRFA reduces to the traditional CCA approach (e.g., Bates et al., 1998).The corresponding weight function is denoted by ϕ CCA .
If A = [0, 1], i.e., α = 0, then the DBRFA represents the uniform approach, which includes all available sites with similar importance.The corresponding weight function is denoted by ϕ U .

Weighted least squares estimation
In the RFA framework, the MR model is generally used to describe the relationship between the hydrological variables and the physiographical and climatic variables of the sites of a given region.This model has the advantage to be simple, fast, and not requiring the same distribution for hydrological data at each site within the region (Ouarda et al., 2001).
Let QT be the quantile corresponding to the return period T .It is often assumed that the relationship between QT, as the hydrological variable, and the physio-meteorological variables and basin characteristics A 1 , A 2 , . . ., A r takes the form of a power function (Girard et al., 2004): where e is the model error.
Let s be the number of quantiles QT corresponding to s return periods and N be the total number of sites in the region.A matrix of hydrological variables Y = (QT 1 , QT 2 , . . ., QT s ) of dimension N × s is then constructed.With a log-transformation in Eq. ( 7) we obtain the multivariate log-linear model in the following form: where log X = (1, log A 1 , log A 2 , . . ., log A r ) is the N × (r + 1) matrix formed by (r) physio-meteorological variables series, β is the (r + 1) × s matrix of parameters and ε = (ε 1 , . . ., ε s ) is the N × s matrix that represents the model error (residual) with null mean vectors and variancecovariance matrix : The parameter matrix β can be estimated, using the WLS estimation, by where = diag (w 1 , . . ., w N ) is the diagonal matrix with diagonal elements w i where w i is the weight for the site i.The matrix is estimated by Note that the log-transformation induces generally a bias in the estimation of QT (Girard et al., 2004).

Methodology
This section describes a general procedure for optimizing the DBRFA approach and treats special cases where this procedure is applied using the weight functions defined in Sect.2.2.

General procedure
In order to find the optimal weight function ϕ Optimal in the DBRFA approach, the procedure is composed of three main steps.They are summarized as follows: 1.For a given class of weight functions ϕ and a set of gauged sites (region), use a jackknife procedure to assess the regional flood quantile estimators (Eq.8) for the sites of the region using the DBRFA approach.These estimators depend on the weight function ϕ through its coefficients.
2. For a pre-selected criterion, calculate its value to quantify the performance of the estimates obtained from step (i).
3. Using an optimization algorithm, optimize the criterion (objective function) calculated in step (ii).The parameters of the optimization problem are the coefficients of the weight function.The outputs of this step are ϕ Optimal and the value of the selected criterion.

Description of the procedure
In the first step of the procedure, we use a jackknife resampling procedure to assess the regional flood quantile estimators for the sites of the region.This jackknife procedure consists in considering each site l (l = 1, . . ., N) in the region as an ungauged one by removing it temporarily from the region (i.e., we assume that the hydrological variable Y l of site l is unknown and the physio-meteorological variable X l is known since it can be easily estimated from existing physiographic maps and climatic data).Then we calculate the regional estimator Ŷl ϕ of site l by the iterative WLS regression, using the N − 1 remaining sites, which is related to the given weight function ϕ.The parameters of the starting estimator (initial point) of DBRFA, denoted by β1,l and ˆ 1,l , are calculated by assuming that X = X <−l> , Y = Y <−l> and = I N −1 in Eqs. ( 10) and ( 11), where X <−l> represents the matrix of physio-meteorological variables excluding site l, Y <−l> is the matrix of hydrological variables excluding site l and I N−1 is the identity matrix of dimension (N − 1) × (N − 1).The starting estimator ( Ŷ1,l ) ϕ is obtained by replacing β with β1,l in Eq. ( 8).Then for each depth iteration k, k = 2, 3, . . ., k iter , we calculate the Mahalanobis depth (Eq.2) of the gauged site i, i = 1, . . ., N − 1, with respect to the ungauged site l denoted by D k,(i,l) ϕ = The number of iterations k iter is fixed to ensure the convergence of the depth function (generally k iter = 25 is appropriate).The weight matrix at iteration k is defined by applying the function ϕ to the depth calculated at this iteration.The parameters of the MR model at the k-th iteration are estimated by where k,l ϕ is a N − 1 diagonal matrix with elements: Note that all these parameters depend on ϕ.Then, the regional quantile estimator for the site l in this iteration is In the second step of the procedure, we use the regional estimators at the last iteration since their associated estimation errors are the minimum possible by construction.Consequently, in order to simplify the notations in the rest of this paper, we denote After calculating ( Ŷl ) ϕ , l = 1, . . ., N in step (i), we consider and evaluate one or several performance criteria in step (ii).The considered criteria are employed as objective functions in the optimization step (iii).
The relative bias (RB) and the relative root mean square error (RRMSE) are widely used in hydrology, particularly in RFA, as criteria to evaluate model performances.These two criteria are defined using an element-by-element division by where Y l is the local quantile estimation for the l-th site, ( Ŷl ) ϕ is the regional estimation by DBRFA approach according to ϕ and excluding site l, and N is the number of sites in the region.The RB ϕ measures the tendency of quantile estimates to be uniformly too high or too low across the whole region and the RRMSE ϕ measures the overall deviation of estimated quantiles from true quantiles (Hosking and Wallis, 1997).Note that other criteria can also be considered such as the Nash criterion (NASH) and the coefficient of determination (R 2 ).In the hydrological framework, the previously defined criteria are used as key performance indicators (KPI) to compare different RFA approaches (e.g., Gaál et al., 2008).
Finally in step (iii), we apply an optimization algorithm on the selected and evaluated criterion in step (ii).The algorithms to be considered are indicated in the introduction section.The formulation of the criteria to be optimized, generally complex and non-explicit, suggests the use of zeroorder algorithms.The application of these algorithms allows us to find the optimal function ϕ Optimal with respect to selected criteria.An overview diagram summarizing the optimization procedure of the DBRFA approach is illustrated in Fig. 2.
The procedure described above aims to calculate ϕ Optimal according to the desired criterion.In order to estimate the quantile Y u of an ungauged site u using the optimal DBRFA approach, the user simply repeats step (i) of the procedure without excluding any site and while fixing the weight function, i.e., step (i) with ϕ = ϕ Optimal .
Compute 1, ˆl  and 1, ˆl  Eq. ( 10) and ( 11) using 1 Update  using φ and D: Based on the optimization procedure of the DBRFA approach described previously, the parameters of the optimization problem are the coefficients of the weight function.Consequently, reducing the number of coefficients in ϕ can make the algorithm more efficient and less expensive in terms of memory and computing time.If the weight function is one of the two functions Gompertz (Eq. 3) or logistic (Eq.4), the coefficient c represents the upper limit of these functions.As in the DBRFA approach, the upper limit of ϕ is 1; namely the gauged site is completely similar to the target site, hence the value c = 1 is fixed.In this case, the problem is reduced to find the couple ( âN , bN ) that optimizes one of the preselected criteria, such as Eqs.( 16) and ( 17).
Moreover, in the classes ϕ = ϕ G or ϕ = ϕ logistic , the optimization problem is applied in semi-bounded domain (i.e., a > 0 and b > 0) and without other constraints (linear or nonlinear).In this case, the Nelder-Mead algorithm can also be applied as well as the pattern search one (Luersen and Le Riche, 2004).
On the other hand, in the case where ϕ = ϕ Lineair (Eq.5), the inequality constraint d 2 > d 1 > 0 is imposed.Therefore, the Nelder-Mead algorithm cannot be considered.
Theoretically and generally, the two optimization algorithms used in this paper (i.e., the Nelder-Mead and the pattern search algorithms) converge to a local minimum (or maximum) according to the initial point.To overcome this problem and make the algorithm more efficient, two solutions are proposed in the literature: (a) for each objective function, use several starting points and calculate the optimum for each of these points; the optimum of the function will be the best value of these local optima (Bortolot and Wynne, 2005); or (b) use a single starting point and each time the algorithm converges, the optimization algorithm restarts again using the local optimum as a new starting point.This procedure is repeated until no improvement in the optimal value of the objective function is obtained (Press et al., 2002).

Data sets for case studies
In this section we present the data sets on which the DBRFA approach will be applied in the following section.These data come from three geographical regions located in the states of Arkansas and Texas (USA) and in the southern part of the province of Quebec (Canada).The first region is located between 45 and 55 • N in the southern part of Quebec, Canada.The data set of this region is composed of 151 stations, each station has a flood record of more than 15 yr.The conditions of application of frequency analysis (i.e., homogeneity, stationary and independence) are tested on the historical data of these stations in several studies (Chokmani and Ouarda, 2004;Ouarda and Shu, 2009;Shu and Ouarda, 2008).Three types of variables are considered: physiographical, meteorological and hydrological.The selected variables for the regional modeling are also used in Chokmani and Ouarda (2004).The selected physiographical variables are the basin area (AREA) in km 2 , the mean basin slope (MBS) in % and the fraction of the basin area covered with lakes (FAL) in %.The meteorological variables are the annual mean total precipitation (AMP) in mm and the annual mean degree days over 0 • C (AMD) in degree-day.The selected hydrological variables are represented by at-site specific flood quantiles (QST) in m 3 km −2 s, corresponding to return periods T = 10 and 100 yr.
The two other considered regions correspond to a database of the United States Geological Survey (USGS).This database, called Hydro-Climatic Data Network (HCDN), consists of observations of daily discharges from 1659 sites across the United States and its Territories (Slack et al., 1993).The sites included in this database contain at least 20 yr of observations.As part of the HCDN project, the United States is divided into 21 large hydrological regions.
In this study, the data of the states of Arkansas and Texas (USA) are used for comparison purposes.The applicability conditions of frequency analysis as well as the variables to consider are justified in the study of Jennings et al. (1994).The physiographical and climatological characteristics are the area of drainage basin (AREA) in km 2 , the slope of main channel (SC) in m km −1 , the annual mean precipitation (AMP) in cm, the mean elevation of drainage basin (MED) in m and the length of main channel (LC) in km.The selected hydrological variables in these two regions are the atsite flood quantiles (QT), in m 3 s −1 corresponding to the return periods T = 10 and 50 yr.
The data set of the state of Arkansas is composed of 204 sites.These data and the at-site frequency analysis are published in the study of Hodge and Tasker (1995).Tasker et al. (1996) used these data to estimate the flood quantiles corresponding to the 50 yr return period by the region of influence method (Burn, 1990b).
The Texas database is composed of 90 sites but due to the lack of some explanatory variables at several sites, modeling was performed with only 69 stations.The data set used in this region is the same used by Tasker and Slade (1994).

Results
The results obtained from the CCA-based approach are first presented and then compared to those obtained by the optimized DBRFA approach.
The variations of the two performance criteria RB and RRMSE, obtained by the CCA approach, as a function of the coefficient α (neighborhood coefficient) for the three regions are presented in Fig. 3.The complete variation range of α is the interval [0, 1].However, in this application, the range is [0, 0.30] for Quebec and Arkansas regions and [0, 0.17] for the Texas region.These upper bounds of α are fixed to ensure that all neighborhoods of the sites contain sufficient stations to allow the estimation by the MR model.Note that it is appropriate to have at least three times more stations than the number of parameters in the MR model (Haché et al., 2002).Figure 3 indicates that, for a given region, the same value of α optimizes the two criteria for the various return periods, even though this is not a general result (Ouarda et al., 2001).The optimal α values are 0.25, 0.01 and 0.05 respectively for Quebec, Arkansas and Texas.
The coefficients λ 1 and λ 2 correspond respectively to the correlations of the first and the second couples of the canonical variables.Their values for Arkansas (λ 1 = 0.973, λ 2 = 0.470) and Texas (λ 1 = 0.923, λ 2 = 0.402) are larger than those of Quebec (λ 1 = 0.853, λ 2 = 0.281).This corresponds to a large optimal value of α for the latter region.Indeed, the higher the canonical correlation, the smaller the size of the ellipse defining the homogeneous neighborhood (Ouarda et al., 2001).The value of α should be small enough so that the neighborhood contains an appropriate number of stations to perform the estimation in the MR model, and large enough to ensure an adequate degree of homogeneity within the neighborhood.
Figure 4 shows the projection sites of the three regions in the two canonical spaces (V1, W1) and (V2, W2) corresponding respectively to λ 1 and λ 2 .This figure shows that for these three regions, the relationship between V1 and W1 is approximately linear, in contrast to V2 and W2.The presentation of a site in the space (V1, W1) is useful for an a priori information on the estimation error of this site.For example, in the Quebec region, the two sites 66 and 122 are poorly estimated.By fitting a linear model between V1 and W1 for each region, it is seen that the linearity assumption is more respected in Arkansas and Texas than in Quebec (R 2 Arkansas = 0.94, R 2 Texas = 0.85 and R 2 Quebec = 0.73).The previous results show that the values of λ 1 , λ 2 , α and R 2 can be used as indicators of the quality of the homogeneity in a given region.In this application, the lower values of λ 1 , λ 2 and R 2 as well as the higher value of α for Quebec compared to the values of the other two regions indicate that the Quebec region is less homogeneous than the two others.This conclusion needs to be verified by other criteria or statistical tests.
The DBRFA approach is applied by using the Mahalanobis depth function (Eq.2).The optimal weight functions, from each one of the three considered families, are obtained on the basis of the indicated optimization algorithms (i.e., ϕ G and ϕ logistic using Nelder-Mead and ϕ Linear using pattern search).They are presented in Fig. 5.The corresponding results are summarized in Table 1.The optimization is made with respect to the RB and RRMSE criteria.Note that, for a given region, the regional flood quantile estimation is more accurate for small return periods.This result is valid for local as well as regional frequency analysis approaches (Hosking and Wallis, 1997).In addition, Table 1 shows that the worst estimates are obtained using the uniform approach (weight function ϕ U ).This justifies the usefulness of considering the regional approaches.Note that for all regions, DBRFA with ϕ Optimal leads to more accurate estimates in terms of RB and RRMSE than those obtained using the CCA approach with optimal α.These results show also that the optimal coefficients of a given weight function depend on the chosen criterion (objective function).Finally, for the southern Quebec region, the results of Chebana and Ouarda (2008) are very close to those in the present paper (Table 1).The reason for this closeness is that the above authors forced the DBRFA approach to provide good results by trying several different combinations of values of ϕ coefficients (i.e., iteration loop of coefficients).Consequently, their trials took a long time and did not ensure the optimality of the approach, which is not the case for the present study.
According to Fig. 5, the form of optimal weight function depends on the considered region.For instance, the steep S curve (with long upper extremity) of the two regions, Arkansas and Texas, depicts a large number of gauged sites similar to the target one; however, the high S curve (with short upper extremity) of Quebec shows a small number of gauged sites similar to the target one.This result supports the previously mentioned conclusion about the homogeneity level for these regions.
In order to visualize the influence of gauged sites on the regional estimation of a target site in the DBRFA and CCA approaches, assume that Texas site number 25 is a target site and has to be estimated using the remaining 68 gauged sites.Figure 6 illustrates the weights allocated to each gauged site in the canonical hydrological space (W1, W2) instead of the geographical space.The estimate is made with the optimal α for the CCA approach and the optimal ϕ G for the DBRFA approach.We observe that the influence of a gauged site on the estimation of the target site in the DBRFA approach is proportional to the hydrological similarity between these two sites.Hence, the weight function takes a bell shape in a 3-D presentation (Fig. 6b).However, with the CCA approach, the weight function (Eq.6) takes only two values: 1 within the neighborhood of the target site or 0 otherwise (Fig. 6a).
To study the impact of depth iterations on the performance of the DBFRA method, this approach is applied to the three regions but without iterations on the Mahalanobis depth (i.e., k iter = 2 in step (i) in the DBRFA optimization procedure).The outputs of this application, with ϕ = ϕ G and ζ (.) = RRMSE, are shown in Table 2.These results indicate that the optimal weight function changes depending on the case (with or without iterations) but keeps the S shape (for space limitation, the associated figure is not presented).In addition, using the iterations, we observe an improvement in the performance of the DBRFA method.This improvement varies from one region to another, where it is more significant in Quebec than in Texas and Arkansas (Table 2).This is another result indicating a difference between Quebec and the two other regions.Note that similar results are found for other families of weight functions and for different optimization criteria.In conclusion, the depth iterative step in the DBRFA before weight optimization is important.
In order to examine the convergence speed in terms of the performance criteria, we present the variations of these criteria as a function of depth iteration for different weight functions (Fig. 7).The employed coefficient values of the weight functions are those minimizing the RRMSE (Table 1).We observe a rapid convergence (5 iterations) to the RRMSE values in Table 1 for Arkansas and Texas (Fig. 7b, c), whereas, for Quebec (Fig. 7a) it requires more than 20 iterations to converge to the results in Table 1.These results could be again due to the level of homogeneity in the region.
To compare the relative errors of flood quantile estimates obtained by different approaches for the three regions, Fig. 8 Table 1.Quantile estimation result for Quebec with available approaches and their references.illustrates these errors with respect to the logarithm of basin area.The weight functions used are those optimizing the RRMSE.It is generally observed that the DBRFA relative errors are lower than those obtained with the CCA approach.We also observe large negative errors for some sites, such as number 64 and 66 in southern Quebec, 180 and 175 in Arkansas and 62 and 69 in Texas.
In this paper, the optimal DBRFA approach is mainly compared with the basic formulation of one of the most popular RFA approaches, which is the CCA approach.However, different variants of the latter are developed and are available in the literature, such as the ensemble artificial neural networks-CCA approach (EANN-CCA) (Shu and Ouarda, 2007) and the kriging-CCA approach (Chokmani and Ouarda, 2004).In order to insure the optimality of the optimal DBRFA, it is of interest to expend the above comparison to those approaches.
A comprehensive comparison requires presentation of these approaches as well a number of data sets for the considered regions.Some of the data sets are not available for the regions of Texas and Arkansas, e.g., at-site peak flows to estimate at-site quantiles as hydrological variables.However, all these approaches are already applied to the region of Quebec in different studies.Table 3 summarizes the obtained results for all those methods along with those of the DBRFA approach.The results indicate that the optimal DBRFA performs better than the available approaches both in terms of RB and RRMSE, except a very slight difference of 1 % in the RRMSE of QS10 with EANN-CCA.This could be related to the numerical approximations in the computational algorithms.

Fig. 1 .
Fig. 1.Illustration of Gompertz function: (a) c varies with fixed a and b, (b) a varies with fixed b and c and (c) b varies with fixed a and c.
class ofφ, criterion and optimization method Initialization site l=1 Exclude site l temporarily

Fig. 2 .
Fig. 2.An overview diagram summarizing the optimization procedure of the DBRFA approach.

Fig. 3 .
Fig. 3. Optimal value of the neighborhood coefficient α for the CCA approach for (a) southern Quebec, (b) Arkansas and (c) Texas.The first column illustrates the RB and the second column illustrates the RRMSE.

Fig. 5 .
Fig. 5. Optimal weight functions for (a) southern Quebec, (b) Arkansas and (c) Texas.The first column illustrates the weight function's optimal with respect to RRMSE and the second column illustrates the weight function's optimal with respect to RB.

Fig. 6 .
Fig. 6.Weight allocated to each gauged site to estimate the targetsite number 25 in the Texas region in the canonical hydrological space (W1, W2) using (a) CCA with optimal α and (b) the DBRFA approach with optimal ϕ G .

Fig. 7 .
Fig. 7. Variation of criteria (RB and RRMSE) as a function of the depth iteration number for the estimation of (a) QS100 -southern Quebec, (b) Q50 -Arkansas and (c) Q50 -Texas.

Fig. 8 .
Fig. 8. Relative quantile errors using (a) ϕ CCA and (b) ϕ G .The first column illustrates the error of QS100 in southern Quebec, the second column illustrates the errors of Q50 in Arkansas and the third column illustrates the errors of Q50 in Texas.