Comparison of region-of-influence methods for estimating high quantiles of precipitation in a dense dataset in the Czech Republic

In this paper, we implement the region-ofinfluence (ROI) approach for modelling probabilities of heavy 1-day and 5-day precipitation amounts in the Czech Republic. The pooling groups are constructed according to (i) the regional homogeneity criterion (assessed by a built-in regional homogeneity test), which requires that in a pooling group the distributions of extremes are identical after scaling by the at-site mean; and (ii) the 5 T rule, which sets the minimum number of stations to be included in a pooling group for estimation of a quantile corresponding to return period T . The similarity of sites is evaluated in terms of climatological and geographical site characteristics. We carry out a series of sensitivity analyses by means of Monte Carlo simulations in order to explore the importance of the individual site attributes, including hybrid pooling schemes that combine both types of the site attributes with different relative weights. We conclude that in a dense network of precipitation stations in the Czech Republic (on average 1 station in a square of about 20 ×20 km), the actual distance between the sites plays the most important role in determining the similarity of probability distributions of heavy precipitation. There are, however, differences between the optimum pooling schemes depending on the duration of the precipitation events. While in the case of 1-day precipitation amounts the pooling scheme based on the geographical proximity of sites outperforms all hybrid schemes, for multi-day amounts the inclusion of climatological site characteristics (although with much lower weights compared to the geographical distance) enhances the performance of the pooling schemes. This finding is in agreement with the climatological expectation since multi-day heavy precipitation events are more closely linked Correspondence to: L. Gáal (ladislav.gaal@gmail.com) to some typical precipitation patterns over central Europe (related e.g. to the varied roles of Atlantic and Mediterranean influences) while the dependence of 1-day extremes on climatological characteristics such as mean annual precipitation is much weaker. The findings of the paper show a promising perspective for an application of the ROI methodology in evaluating outputs of regional climate models with high resolution: the pooling schemes might serve for defining weighting functions, and the large spatial variability in the grid-box estimates of high quantiles of precipitation amounts may efficiently be reduced.

Abstract.In this paper, we implement the region-ofinfluence (ROI) approach for modelling probabilities of heavy 1-day and 5-day precipitation amounts in the Czech Republic.The pooling groups are constructed according to (i) the regional homogeneity criterion (assessed by a built-in regional homogeneity test), which requires that in a pooling group the distributions of extremes are identical after scaling by the at-site mean; and (ii) the 5T rule, which sets the minimum number of stations to be included in a pooling group for estimation of a quantile corresponding to return period T .The similarity of sites is evaluated in terms of climatological and geographical site characteristics.We carry out a series of sensitivity analyses by means of Monte Carlo simulations in order to explore the importance of the individual site attributes, including hybrid pooling schemes that combine both types of the site attributes with different relative weights.
We conclude that in a dense network of precipitation stations in the Czech Republic (on average 1 station in a square of about 20×20 km), the actual distance between the sites plays the most important role in determining the similarity of probability distributions of heavy precipitation.There are, however, differences between the optimum pooling schemes depending on the duration of the precipitation events.While in the case of 1-day precipitation amounts the pooling scheme based on the geographical proximity of sites outperforms all hybrid schemes, for multi-day amounts the inclusion of climatological site characteristics (although with much lower weights compared to the geographical distance) enhances the performance of the pooling schemes.This finding is in agreement with the climatological expectation since multi-day heavy precipitation events are more closely linked Correspondence to: L. Gaál (ladislav.gaal@gmail.com)to some typical precipitation patterns over central Europe (related e.g. to the varied roles of Atlantic and Mediterranean influences) while the dependence of 1-day extremes on climatological characteristics such as mean annual precipitation is much weaker.
The findings of the paper show a promising perspective for an application of the ROI methodology in evaluating outputs of regional climate models with high resolution: the pooling schemes might serve for defining weighting functions, and the large spatial variability in the grid-box estimates of high quantiles of precipitation amounts may efficiently be reduced.

Introduction
Frequency analysis, which aims at estimating recurrence probabilities of rare events, is a specific field of statistical hydrology and climatology that has been intensively developed over recent decades and widely applied in studies of hydrological and climatological phenomena.Frequency analysis usually benefits from a regional approach, applicable if the regional homogeneity criterion is met; that is, the sites that form a given region share the same distribution function of the examined variable apart from a site-specific scaling factor called the index value (Dalrymple, 1960).Different aspects of the regional approach to frequency analysis have been examined in connection with heavy precipitation (e.g.Gellens, 2002;Sveinsson et al., 2002;Fowler and Kilsby, 2003;Boni et al., 2006;Wallis et al., 2007), floods (e.g.Burn, 1997;Madsen and Rosbjerg, 1997;Adamowski, 2000;Kjeldsen et al., 2002;Jingyi and Hall, 2004;Solín, 2008), droughts (e.g.Clausen and Pearson, 1995;Chen et al., 2006), extreme sea levels (e.g.van Gelder et al., 2000) and wind speeds (e.g.Sotillo et al., 2006;Modarres, 2008).
L. Gaál and J. Kyselý: ROI precipitation frequency analysis in the Czech Republic Advantages of regional frequency models over the at-site approach (which utilizes data from the site of interest only) stem from the reduced uncertainty of the estimated high quantiles at the upper tails of the distributions (e.g.Lettenmaier et al., 1987;Cunnane, 1988;Stedinger et al., 1993) and the fact that the regional methods allow for the estimation of design values at ungauged locations (e.g.GREHYS 1996a, b;Kohnová et al., 2006).
In the traditional approach to regional frequency analysis, the regions are kept fixed.That is to say, when changing the focus from one site to another within a given region, the information source for the regional transfer remains unchanged (e.g.Hosking and Wallis, 1997).An alternative to regional frequency estimation, the region-of-influence (ROI) approach (Burn, 1990a, b) introduced a fundamentally different concept: the idea of focused pooling.Its main feature is the uniqueness of the "regions" (more precisely, the pooling groups - Reed et al., 1999b), wherein each site under study has its own group of adequately similar sites that form the basis for the transfer of information on extremes to the site of interest.The idea of focused pooling has been adopted in studies of flood flows (e.g.Zrinji andBurn, 1994, 1996;Castellarin et al., 2001;Cunderlik and Burn, 2002;Holmes et al., 2002;Shu and Burn, 2004) and precipitation extremes (Schaefer, 1990;Alila, 1999;Di Baldassare et al., 2006), as well as in complex nationwide projects devoted to the frequency analysis of hydro-climatological extremes (Reed et al., 1999a;Thompson, 2002).
In an analysis of extreme precipitation amounts in Slovakia, Gaál et al. (2008a) adopted the original concept of the ROI approach (Burn, 1990b) even though the fact that Burn's original methodology had previously been subjected to criticism due to the need to set a relatively large number of parameters according to subjective considerations (e.g.Hosking and Wallis, 1997).Zrinji and Burn (1994) revisited the ROI methodology: instead of subjectively selected threshold values, they used a built-in regional homogeneity test based on the χ 2 R statistics (Chowdhury et al., 1991) for assigning sites to a given pooling group.Later, Zrinji and Burn (1996) extended the ROI methodology by a hierarchical feature (Gabriele and Arnell, 1991) that implemented several alternatives to the homogeneity test of Hosking and Wallis (1993).Castellarin et al. (2001) applied the hierarchical pooling methodology of Zrinji and Burn (1996) for a flood frequency analysis in north-central Italy.
The present study attempts to overcome some shortcomings of the methodology applied in Gaál et al. (2008a), particularly with respect to the subjective decisions made in the process of forming the pooling groups.For that purpose, a test of regional homogeneity is incorporated.Further improvements include a detailed sensitivity analysis which examines the performance of various ROI pooling schemes by means of simulation experiments: in addition to those schemes based purely on climatological or geographical site attributes, hybrid pooling schemes are constructed and com- pared.The performance of the ROI methodology for modelling probabilities of extreme 1-day and multi-day precipitation amounts is evaluated using data from a dense network of rain gauges in the Czech Republic.

Precipitation data
Daily precipitation totals measured at 209 stations mostly operated by the Czech Hydrometeorological Institute (CHMI) were used as the input dataset (Fig. 1).The altitudes of the stations range from 150 to 1490 m a.s.l., and the observations at most sites span the period from 1961 to 2005.Three main criteria were applied when selecting the stations and forming the dataset: 1. spatial coverage -the stations about evenly cover the territory of the Czech Republic, 2. relocations of stations -no significant station moves during 1961-2005 (all sites where any location changes exceeded 50 m in altitude were excluded from the analysis), and 3. continuity of records -uninterrupted daily series of precipitation records (except for the sites discussed below).
The data underwent standard quality checking for gross errors.A large majority of the station records cover the whole period of 1961-2005. 36 of the 209 stations have daily data over shorter sub-periods of at least 31 consecutive years (mostly between 38 and 43 years, as the stations started to operate after 1961 or closed before 2005) and/or minor parts of the records had to be omitted owing to stations' relocations.The overall average record length is 43.9 years.
The dataset is superior to the one employed in Kyselý and Picek (2007a), especially since it involves a much larger number of sites with complete daily records, more evenly covers the territory of the Czech Republic, and extends to the very recent past (December 2005).Furthermore, a few errors were identified in the original dataset and have been corrected.
At 45 stations, minor gaps in the daily records occurred (a total of up to 1 month over 45 years at 32 sites; not exceeding 3 months at any of the 45 sites).We decided to preserve these stations in the analysis because of their locations in areas that are insufficiently covered by rain gauges with complete records.The missing daily data were estimated using measurements at 2 to 5 nearest locations available in the climatological database of the CHMI; the methodology is described in Kyselý (2008).(Note that the mean distance to the nearest measuring site was 15.4 km for the locations where the missing data were estimated, and the percentage of the missing daily records in the entire dataset was only 0.05%.)All other station records with more than 3 months of missing values were excluded from the analysis.
Samples of annual maxima of 1-day and 5-day precipitation amounts were drawn from each station record and are further examined.The percentage of stations with a trend significant at the 0.05 level is low and close to the nominal value for both characteristics, so the data do not violate the assumption of stationarity.
Basic features of the precipitation regime of the Czech Republic, with a focus on extremes, may be found in Kyselý and Picek (2007a) and Kyselý (2008).

Pooling attributes
The ROI approach is one of the methods of focused pooling and aims at finding groups of sites that share similar statistical properties of the observed hydro-climatological extremes.It is assumed that the frequency distribution of extremes at a given site is related to its climatological, hydrological, geographical, geomorphological or similar attributes.Therefore, one of the basic issues of the pooling procedure is to select site attributes that are useful for explaining the observed distributions of extremes.
In this study, the similarity of sites is evaluated during the pooling process using two different sets of site attributes.The first group of site attributes consists of general climatological characteristics that describe a long-term precipitation regime: 1. mean annual precipitation (MAP), 2. mean ratio of the precipitation totals for warm/cold seasons (RWC), and 3. mean annual number of dry days (DRY), defined as days with precipitation amount ≤0.1 mm.
The warm (cold) season is defined as April-September (October-March).The basic idea of choosing characteristics of the precipitation regime is that the atmospheric mechanisms generating heavy precipitation are similar under similar climatological conditions, particularly when the small extent of the study area is taken into account.
Geographical site characteristics comprise the second group of attributes that are employed to define the sites' proximity: 1. latitude (φ), 2. longitude (λ), and

elevation above sea level (h).
The geographical co-ordinates are chosen since the actual proximity of the sites may also result in similar regimes of extreme precipitation.

Concepts of pooling
Since the pooling scheme adopted herein originates from that described in detail in Gaál et al. (2008a), we confine the description to the cornerstones of the procedure and accentuate the changes and improvements in the methodology.
The similarity of sites in the attribute space is usually evaluated by means of a weighted Euclidean distance metric: where D ij is the weighted Euclidean distance between sites i and j ; W m is the weight associated with the m-th site attribute, expressing its relative importance; Y im is the value of the m-th attribute at site i; and M is the number of attributes.However, we slightly modified this formula in the following way: where G ij is the actual geographical distance between sites i and j , and W G is its weighting coefficient.G ij is determined according to the relationship for the distance between pairs of points [ϕ i ,λ i ] and ϕ j ,λ j on the surface of a sphere (Weisstein, 2002a): where R denotes the Earth's radius (R=6371 km).Before determining the elements D ij of the distance metric or dissimilarity matrix D, the attributes undergo standardization in order to remove possible bias from the estimation L. Gaál and J. Kyselý: ROI precipitation frequency analysis in the Czech Republic due to different magnitudes.In this study, the attributes (except for the latitude and longitude) were divided by their sample standard deviations while the values of G ij were divided by the standard deviation of non-zero elements of the distance matrix G.For settings of W m and W G see Sect.4.2.
It is important to point out the difference between two types of the site attributes, which are usually termed "characteristics" and "statistics".Site characteristics are quantities independent of whether or not daily measurements of precipitation are carried out at a given site.These include geographical co-ordinates, geomorphological attributes and, to some extent, descriptors of the long-term precipitation regime.On the other hand, site statistics result from statistical processing of the data observed at a given site.It is generally recommended (Hosking and Wallis, 1997;Castellarin et al., 2001) to use site characteristics in the process of forming the regions or pooling groups, while one should take advantage of site statistics in the process of testing the homogeneity of a proposed group of sites.
Pooling groups in the ROI approach are generally constructed using elements D ij arranged in ascending or descending order, but there are basically two different ways to accomplish this.The core idea of the first method lies in gradually building up the pooling groups (termed herein as the "forward" approach).Starting with the target site i, which represents a single-site pooling group at the very beginning of the process, the next closest site (i.e. the site with the next lowest value of D ij , j =1,...,N ) is appended to the existing ROI in each turn as long as a given condition for forming the ROI is met.The process of building up the ROI may be terminated (i) at a given point, defined as a function of the selected quantiles of the dissimilarity matrix D (Burn, 1990b); (ii) when the measure of the regional homogeneity of the proposed group of sites reaches or exceeds an unacceptable level (Castellarin et al., 2001); or (iii) when the size of the proposed pooling group reaches or exceeds a desired threshold value (Jakob et al., 1999).A reversed procedure ("backward" approach) is adopted in the second method of pooling: in its initial stage, all sites in the analysis are supposed to form a "superregion" and, step by step, the most dissimilar sites are removed from the bulk of the sites until the remaining group of sites is homogeneous (Zrinji and Burn, 1994).
Point (iii) above is particularly appealing for pooling methodologies such as the ROI since the composition of a site's pooling group may be accommodated to the target return period.The "5T rule" (Jakob et al., 1999) is one of the most frequently referenced rules of thumb to account for the need of different amounts of information for different target return periods.The 5T rule suggests that one needs 5×T station-years of data for a reliable estimation of a quantile corresponding to the return period T .Considering the fact that the average length of observations at the stations involved in the present analysis is ∼44 years (see Sect. 2.1), for estimation of the 5-year quantiles the at-site approach is ac-ceptable, while it is desirable to have at least N T =2 (12) sites in a pooling group for a reliable estimation of the 10 (100)year precipitation quantiles.
We implemented the regional homogeneity test of Lu and Stedinger (1992) when forming the pooling groups, and for two reasons: (i) its application is computationally straightforward, and (ii) according to the comparative study of Fill and Stedinger (1995), it is one of the most powerful homogeneity tests.A brief description of Lu and Stedinger's homogeneity test, also called the X10 test, is given in the Appendix.
We tested both the "forward" and "backward" approaches to forming homogeneous ROIs, and then decided to form them primarily by building them up gradually (i.e. using the "forward" approach).The main deficiency of the "backward" procedure was that, in some cases, it tended to produce very large homogeneous pooling groups that did not vary much from site to site.This resulted in undesirable spatial smoothing of the estimated quantiles (cf.Castellarin et al., 2001).
Two requirements are imposed on the pooling groups in the present study: they should meet (i) the homogeneity criterion, and (ii) the 5T rule.Having only the first criterion, the following simple iteration procedure is applicable: In each step, the next similar site is added to the existing ROI and the homogeneity of the proposed pooling group is tested.If the proposed ROI is homogeneous, then the procedure goes on with the next loop; otherwise (i.e. if heterogeneity is detected), the procedure is stopped and the formation of the given ROI is finished.When the 5T rule must be met at the same time, however, the result of the iterative procedure may not be sufficient.Problems may occur when the heterogeneity is reached relatively early, i.e. after a few (<4-6) iterations, which is not plausible for longer return periods (T =50 years or more).We tried adopting the idea suggested by Castellarin et al. (2001) to stop the iteration procedure when the heterogeneity of the ROI is detected for the second time (instead of the first time), but the number of groups consisting of a small number of sites was still large.Therefore, the following scheme is proposed for the pooling procedure in the present study: -At the very beginning, the ROI of the target site consists of the target number of stations N T (i.e. it comprises the site itself and the N T −1 closest sites).
-If the initial pooling group of size N T is homogeneous there is no need to start iterations; N T defines the final size of the pooling group.
-If the initial pooling group of size N T is heterogeneous the iteration procedure of testing the homogeneity and adding the next closest site to the pooling group starts.It goes on until the first homogeneous pooling group is found or the set of remaining sites to add is empty.The (first) homogeneous pooling group defines the final composition of the pooling group for the site.
-In a case when no homogeneous stage is reached by successively adding sites to the initial ROI, the program code returns to the initial stage with N T sites and starts looking for a homogeneous composition by removing the least similar sites from the pooling group.The first homogeneous stage then defines the final composition of the pooling group for the site.
-In the worst case, when neither the building-up nor the removal procedure leads to a homogeneous stage, the ROI consists of nothing but the target site (i.e. it is a single-site pooling group).
The application of this procedure means that, in contrast to the scheme adopted in Gaál and Kyselý (2009), the size of the final pooling groups depends on T .
The T -year growth factors (i.e. the T -year values of the cumulative distribution function of dimensionless data; furthermore, the term "growth curve" denotes a set of growth factors for different return periods; cf.Stewart et al., 1999) and precipitation quantiles are estimated using the L-moment-based index storm procedure (Hosking and Wallis, 1997).In the initial step, dimensionless data are calculated by rescaling the original data by the sample mean µ j (index storm): where X j k (x j k ) denotes the original (dimensionless or rescaled) data, N is the number of sites, and n j denotes the sample size of the j -th site.The dimensionless values of x j k at site j are then used to compute the sample L-moments l (j ) 1 , l (j ) 2 ,. . .and L-moment ratios: (5) and where t (j ) is the sample L-coefficient of variation (L-CV) and t (j ) r ,r=3,4,... are the sample L-moment ratios at site j (Hosking, 1990).The case r=3 defines the sample Lskewness t (j ) 3 ; the L-moment ratios of higher degree (r>3) are not of a practical use herein.
The pooled L-moment ratios t (i)R and t (i)R 3 for the target site i are derived from the at-site sample L-moment ratios as their weighted averages: where W ij are the weights associated with the j -th site in the analysis.A relationship analogous to Eq. ( 7) holds true also for t (i)R 3 .In a traditional regional analysis, based on regions with a fixed structure (e.g.Hosking and Wallis, 1997), the weighting coefficients W ij are proportional only to the record length n j for all sites j within a given region (i.e.sites with longer observations provide more information in the regionally averaged statistics).While this concept is also retained in focused pooling, an additional factor, the reciprocal value of the distance metric element D ij , is introduced (Castellarin et al., 2001): where ROI i stands for the region of influence of the site i, and where D ij,min is the lowest non-zero value of the distance metric between the target site i and all other sites j (Castellarin et al., 2001).(Note that D ii =0 for j =i, which would lead to W ii =∞ if D ij was used in Eq. 8.) Using the reciprocal value of the distance metric element D ij as the pooled weighting factor is equivalent to assigning higher weights to sites that lie in the proximity of the target site in the attribute space: the smaller is D ij for site j , the greater the amount of information it brings to the procedure for the growth curve estimation at site i.The (weighted) pooled L-moment ratios t (i)R and t (i)R 3 are then used to estimate the parameters of the GEV distribution and the pooled growth curve.A quantile corresponding to the return period T at site i is calculated as a product of the dimensionless T -year growth factor x T i and the index storm µ i : Throughout this paper, however, results of the simulation experiments are shown for dimensionless T -year growth factors (cf.Gaál et al., 2008a).
L. Gaál and J. Kyselý: ROI precipitation frequency analysis in the Czech Republic

Framework for the inter-comparison of pooling schemes
The performance of the individual pooling schemes based on different combinations of climatological and geographical site attributes (Sect.2.2) is assessed by means of Monte Carlo simulation procedures.
The essential issue of the Monte Carlo simulation is the way the unknown parent (or "true") distribution of the extremes is estimated.We decided to estimate the "true" atsite distribution by adopting a region-of-influence approach in which the similarity of sites is determined according to statistical properties of the at-site data samples of 1-day/5day precipitation maxima (abbr.ROIsta), as in Castellarin et al. (2001) and Gaál et al. (2008a).Three site statistics were selected (cf.Burn, 1990b;Gaál et al., 2008a): 1. the coefficient of variation: c v = σ µ, where µ(σ ) is the sample mean (standard deviation); 2. Pearson's 2nd skewness coefficient: PS=3(µ − m) σ , where m is the sample median (Weisstein, 2002b); and 3. the 10-year growth factor of precipitation, estimated using the GEV distribution (x 10 ).
The selected statistics characterize the scale (c v ), shape (PS) and location (x 10 ) of the empirical distribution.A pooling scheme based on the site statistics is supposed to result in groups of sites that have a frequency distribution of extremes similar to the target site.
The "reference" ROI pooling group for estimating the "true" growth curve is constructed in a slightly different way than the examined ROI pooling schemes.While in the latter case, the size of the pooling groups is adjusted to the target return period T , the ROI pooling group for estimating the "true" growth curve at a given site is independent of the actual target return period.We require that the size of the pooling groups for constructing the "true" quantiles be about N T ref =23, which corresponds to the (sufficiently large) return period of 200 years according to the 5T rule.The idea behind this approach is that if there is a single "true" (and unknown) distribution for a given site, data used for estimating the "true" quantiles should not depend on the actual target return period.Therefore, the choice of a fixed size N T ref allows for having the same platform for comparison of the examined pooling schemes.Except for different N T ref (see also Sect.4.3, in which the size of pooling groups is discussed), the procedure is the same as described in Sect.3.1 for the examined ROI pooling schemes (i.e.homogeneity of the ROI is required).
In each loop of the Monte Carlo simulation procedure, samples of annual maxima that resemble the real world (in terms of the actual number of sites, length of the observations, and spatial correlations between the sites) are drawn for each site from the parent GEV distribution, parameters of which are given by the pooled L-moments according to the ROIsta pooling scheme.Having simulated the at-site samples, the pooling schemes specified in Sect.4.1 and 4.2 are applied to estimate the T -year growth factors, which are then compared with the "true" ones obtained by the ROIsta pooling scheme.The loops of the Monte Carlo simulations are repeated 5000 times.
The different pooling schemes are compared by means of the bias and (primarily) the root mean square error (RMSE) statistics.For a given return period T , and where i (m) is the index over the sites (repetitions); N (M) is the number of sites (repetitions); x T i is the "true" T -year growth factor at site i; and xT i,m is the estimated T -year growth factor at site i from the m-th sample of the Monte Carlo simulation.
The Monte Carlo simulation procedure and some related considerations are described in more detail in Gaál et al. (2008a, Sect. 4).

Sensitivity analysis
In the first step, a sensitivity analysis was performed to explore the role of different site characteristics and site statistics entering the dissimilarity matrix D (Eq. 1) while considering the weighting coefficients as unimportant (W m =W G =1).The basic ROI schemes were analogous to those used in Gaál et al. (2008a).The models were based on 3 climatological (geographical) site characteristics (Sect.2.2) and labelled as ROIcli3 (ROIgeo3), and both were associated with the model ROIsta based on 3 site statistics (ROIsta3) used for estimating the "true" quantiles during the simulation procedures (Sect.3.3).The sensitivity analysis examined the performance of the ROI models after removing one or two site attributes from the basic ROI pooling scheme (ROIcli3, ROIgeo3) or the "true" frequency model (ROIsta3).
The analysis was divided into two parts: (i) examining the effects of changes made to the basic ROI schemes while keeping the "true" model unchanged, and (ii) examining the effects of changes in the "true" frequency model while using the basic ROI schemes with 3 parameters.The different alternatives of the newly constructed ROI pooling schemes and the modified "true" frequency models are summarized in Changes in the ROI pooling schemes models is reduced: while the ROIcli alternatives make use of all 6 possible combinations of the 3 available climatological attributes into singles (labelled as ROIcli1a, b and c) or pairs (labelled as ROIcli2a, b and c), there are no reasons for using other simplified ROIgeo models than those based purely on elevation (ROIgeo1) or the pair of geographical co-ordinates (ROIgeo2).Furthermore, the modified "true" frequency models are based only on pairs of possible combinations of the site statistics defined in Sect.3.3 (labelled as ROIsta2a, b and c) since it is unreasonable to construct "true" models based purely on one statistic.The sensitivity analysis was performed for both datasets of the 1-day and 5-day annual maxima.First, we focus on the consequences of the changes made to the basic pooling schemes ROIcli3 and ROIgeo3.The summary statistics of the models' performance in terms of the average RMSE for the quantiles of the estimated distributions of the 1-day (5-day) maxima corresponding to T =10, 20, 50 and 100 years are given in Table 2.The box-andwhisker plots of both statistics in Figs. 2 and 3 illustrate the spread statistics for return periods T =20 and 100 years.
As expected, the frequency behaviour of the precipitation extremes cannot be explained by a single climatological characteristic.This is demonstrated by the fact that the ROI models based purely on a single site attribute show clearly the poorest performance (Figs. 2 and 3).The ROIcli2 models based on two site attributes perform generally better.Of the three models working with the climatological characteristics, the one with MAP and DRY is inferior, which suggests  that RWC is the most important attribute.However, the best average RMSE among the models based on the climatological characteristics is obtained for the basic ROIcli3 model for both datasets (Table 2).These findings are supported also by the box plots (Figs. 2 and 3), as the smallest values of the 5-th and 25-th percentiles and median of the RMSE, and all quantiles of the bias, are found for the ROIcli3 pooling scheme.While for the ROIcli models more site attributes improve performance, a similar conclusion cannot be drawn for the ROIgeo pooling schemes: the ROIgeo2 model always outperforms the basic ROIgeo3 model, both in terms of the RMSE and bias statistics (Table 2, Figs. 2 and 3).Such behaviour is accounted for by the role of elevation in ROIgeo3.While in ROIgeo2 sites are pooled according to the geographical distance from the site of interest, ROIgeo3 gives preference to sites that are located in similar altitudes as that of the target site (cf.Fig. 4 and related discussion in Gaál and Kyselý, 2009).
model in terms of RMSE (except for high return levels and the ROIsta2c model, which is the inferior one), and both ROIgeo2 and ROIgeo3 pooling schemes perform clearly better than ROIcli3 (Table 3).

Hybrid pooling schemes
A further extensive simulation experiment was carried out in order to identify the optimal setting of the ROI pooling schemes when merging both climatological and geographical site characteristics in hybrid pooling schemes and assigning different weighting coefficients to the selected site attributes of the hybrid pooling schemes.
We constructed seven series of hybrid pooling schemes based on different combinations of site attributes, which are further differentiated according to the values of the weights assigned.The series of the pooling schemes are labelled as ROIhybA to ROIhybG (Table 4).Note that the pooling scheme ROIhybA is not a hybrid in a strict sense since it only makes use of the geographical site characteristics.Inasmuch as the same simulation strategy is applied, however, ROIhybA also is included in this series of experiments.ROI-hybB includes all the 6 site attributes appearing in this study.
In the other models, we excluded some of the less important climatological and/or geographical site attributes (based on the results of the sensitivity analysis, Sect.4.1), which are DRY and MAP on the one hand and elevation on the other.The weighting coefficients were assigned to the series of the pooling schemes ROIhybA-ROIhybG according to the following considerations: since the geographical distance is an important indicator of sites' similarity (Sect.4.1), the weight W G is chosen as the basic parameter.W G takes values from 1.00 to 0.00 with a constant increment of 0.05.So that the sum of the weights is declared to equal to one, the remaining value of (1−W G ) is evenly distributed between the other site attributes involved in the given series of the pooling schemes.Mathematically: and where M is the total number of the site attributes other than latitude and longitude (Table 4).The individual pooling schemes are therefore labelled as geo1.00,geo0.95,. . ., geo0.00.Note that in each series ROIhybA-ROIhybG, the pooling scheme geo1.00 is the same and corresponds to the ROIgeo2 pooling scheme from Sect.4.1.Other special cases are the following: -ROIgeo3 = geo0.50 in ROIhybA, -ROIcli3 = geo0.00 in ROIhybC, -ROIcli2a = geo0.00 in ROIhybE, and -ROIcli2c = geo0.00 in ROIhybG.
The results of the simulation experiments are summarized in Tables 5 and 6 and Figs. 4 and 5.The box plots are shown for the 100-year quantiles only since the results are similar for shorter return periods.
For 1-day precipitation amounts, the average values of RMSE in Table 5 as well as the median and inter-quartile range (75%-25% percentiles) in Fig. 4 show an increasing tendency from left to right (i.e. the error in the quantile estimates increases with decreasing weight put onto the geographical distance).For 5-day precipitation amounts (Fig. 5), this feature is superimposed by local minima (characterized by the lowest values of median and the narrowest boxes) around pooling schemes geo0.80-geo0.70,depending on the site attributes involved in a given hybrid scheme.A pattern similar to that of the box plots (Figs. 4 and 5) is also found in terms of the average RMSE values (Tables 5 and 6).In the case of 1-day precipitation maxima, the pooling schemes with the lowest RMSE statistics, for all return periods, are located at the very left side of the table while for 5-day maxima the best pooling schemes are more scattered.The most remarkable result is that for the ROIhybC series: the model geo1.00loses its superiority, and the best performance is related to pooling schemes utilizing in addition climatological characteristics (geo0.85-geo0.80).
We conclude that (i) for 1-day precipitation maxima, there is no hybrid pooling scheme that outperforms the pooling scheme ROIgeo2 (i.e.geo1.00 in each series ROIhybA-ROIhybG) based on the actual geographical distance between sites, while (ii) for 5-day precipitation maxima, a few hybrid pooling schemes with performance superior to the ROIgeo2 model in terms of the RMSE statistic can be constructed (ROIhybC: geo0.80 and geo0.95,ROIhybG: geo0.85 and geo0.95).The best of these hybrid pooling schemes (i.e. the one with the lowest RMSE values) is ROIhybC-geo0.80,which utilizes all three climatological site characteristics with equal weights 1/15, and the geographical distance between sites with a weighting factor of 12/15.

Inter-comparison of the frequency models
Table 7 and Fig. 6 summarize the performance of the pooling schemes (in terms of the RMSE statistics), corresponding to various concepts of forming the ROIs, for 1-day and 5-day precipitation amounts.Three ROI pooling schemes are compared: (i) the ROIcli3 model based on three climatological characteristics MAP, RWC and DRY, all with equal (unit) weights; (ii) the ROIgeo2 model based only on the actual geographical distance between the sites; and (iii) a hybrid pooling scheme, ROIhybC-geo0.80,which is a weighted combination of all climatological characteristics MAP, RWC and Table 5. Performance of the hybrid ROI pooling schemes based on different combinations of climatological and geographical site characteristics for annual maxima of 1-day precipitation amounts.RMSE T denotes average root mean square error of the estimated growth factors corresponding to return period T [years], expressed in %.The three smallest values of the statistics are marked in bold, and the smallest value is underlined.The heading shows the weighting coefficient for actual geographical distance between sites.The settings of the series of individual pooling schemes are summarized in Table 4 4) is compared.The labels of the individual pooling schemes (geo1.00,..., geo0.00)reflect the weighting coefficient for actual geographical distance between sites (see Sect. 4.2).
DRY with equal weights W m =1/15 and the geographical distance with weight W G =12/15.Note that while the selected hybrid pooling scheme is seen as the best one for multi-day maxima, we include it for 1-day maxima in the comparison of models for the sake of completeness.These ROI pooling schemes are further compared with the at-site (local) estimates.
The average values of the root mean square error in Table 7 reveal that the at-site estimation is inferior to the pooling approaches for all return periods.This also holds true for the shortest return period T =10 years, for which the smallest pooling groups (usually of size N =2, cf.Table 8) are constructed according to the requirements of the 5T rule (Sect.3.1).With increasing return period, the at-site Hydrol.Earth Syst.Sci., 13,[2203][2204][2205][2206][2207][2208][2209][2210][2211][2212][2213][2214][2215][2216][2217][2218][2219]2009 www.hydrol-earth-syst-sci.net/13/2203/2009/ estimation drops more and more behind the pooling schemes in terms of RMSE (Table 7).The poor performance of the at-site approach is explained by the enhanced effects of sampling fluctuations, which are reduced by the multi-site approach in the pooling schemes (cf.Hosking and Wallis, 1997;Gaál et al., 2008a).Among the ROI models, ROIcli3 clearly shows the worst performance for both durations (Fig. 6, Table 7), obviously owing to there being no specific weights assigned to the three site characteristics and no information on geographical distance involved (cf.Sect.4.2).On the other hand, there is no universal "best" ROI pooling scheme: in the case of 1day maxima, the ROIgeo2 model is superior, while for the multi-day maxima, the ROIgeo2 model is outperformed by the hybrid pooling scheme.
Table 8 summarizes the size of pooling groups when the three selected pooling schemes are applied to estimate the T -year growth factors of 1-day and 5-day precipitation maxima.It is obvious that the majority of the pooling groups are homogeneous according to the Lu and Stedinger test of regional homogeneity in the initial stage of their forming, corresponding to the target size N T given by the 5T rule (column "N=N T " in Table 8).Provided that the pooling group is heterogeneous for N T , the procedure of successively adding similar sites to the pooling group (or removing dissimilar sites if necessary) mostly results in a homogeneous stage.Note that at the end of a pooling procedure, no heterogeneous pooling groups appear.This fact stems from the way the pooling groups are constructed (Sect.3.1).In the worst case, the pooling procedure ends up with a single-site pooling group; this is observed altogether in 6 cases related to 5 different stations.The reason the pooling procedure fails in these specific cases can be generalised as follows: For any of these 5 stations, the sites that show a considerable degree of similarity with the target site in terms of site attributes (in the attribute space) appear highly dissimilar in the site statistics.In other words, once a small heterogeneous pooling group is constructed, its degree of heterogeneity cannot be considerably reduced either by assigning the next closest sites to this pooling group or by gradually removing sites from it, since the core of the pooling group (the target site and the next few closest sites) still remains heterogeneous.
A further remarkable feature of Table 8 is that precipitation maxima of longer durations show higher degree of homogeneity compared to those of 1-day duration.This is underpinned by the fact that for the 5-day precipitation amounts the regional homogeneity is reached more often for the target size of the pooling groups N T .
Generally, inter-comparison of the ROI pooling methods suggests that the hybrid pooling schemes including also climatological characteristics may surpass the ROI method based on geographical distance for multi-day precipitation extremes, the spatial variability of which is less affected by random (sampling) variations and more closely linked to some regional patterns in central Europe related  to atmospheric circulation and orographic features.The regional differences in distributions of the multi-day extremes reflect, for example, the varied influences of cyclones of Mediterranean origin (which often produce heavy multiday precipitation) between the eastern and western parts of the Czech Republic (e.g.Kyselý and Picek, 2007b).For 1-day precipitation extremes, which are mostly related to convective phenomena in the warm season (88% of annual maxima of 1-day amounts occur in April-September), the ROI method based on geographical characteristics is clearly superior to all other pooling schemes.

Discussion and conclusions
Based on data from a dense network of rain gauges in the  Lu and Stedinger (1992) is incorporated in order to avoid subjective decisions concerning the parameters involved in the ROI methodology and to avoid forming heterogeneous pooling groups for the estimation.The target number of sites in a pooling group is chosen according to the 5T rule (Jakob et al., 1999), i.e. a rule of thumb for the minimum number of sites within a pooling group needed for reliable estimation of a T -year quantile (or growth factor).Consequently, the size of the pooling groups varies with the return period of the growth factor to be estimated.
The first part of the sensitivity analysis, which examined the consequences of the changes made to the site attribute sets of the pooling schemes (while neglecting the relative weights), confirmed a simple principle "the more attributes included -the better performance" in the case of climatological site characteristics (used in the ROIcli models).On the other hand, in the case of geographical site characteristics (ROIgeo models), the pooling scheme ROIgeo2 based on geographical distance was found superior compared to ROIgeo3 that makes use of all three geographical co-ordinates (i.e.latitude, longitude and altitude).In general, both alternatives to the ROIgeo models have their pros and cons.The main drawback of the ROIgeo3 pooling scheme is the tendency to pool sites from considerable distances away from the target site, while the disadvantage of the ROIgeo2 pooling scheme is that it pools sites regardless of their altitudinal zonality.In the light of the dense precipitation dataset available, however, the drawbacks of the ROIgeo2 model are less pronounced.Therefore, ROIgeo2 always outperforms the ROIgeo3 pooling scheme in terms of RMSE of the estimated growth factors in the present application.
The simulation experiments investigated also the effect of changes made to the "true" frequency model, which was used as a common platform for comparison of the examined pooling schemes in the Monte Carlo simulation.The results show that the relative performance of the selected pooling schemes (ROIgeo2, ROIgeo3 and ROIcli3) does not depend on changes made to the reference frequency model: the most (least) acceptable spread statistics appear in the case of the ROIgeo2 (ROIcli3) pooling scheme.
The second part of the sensitivity analysis focused on reasonable combinations of the geographical and climatological site attributes into hybrid pooling schemes by assigning different weights to the selected site attributes.The extensive simulation procedure showed that the actual proximity of sites is the most important factor in determining their similarity for the frequency analysis of precipitation extremes.However, there is a difference between the hybrid models for the two durations: while in the case of 1-day precipitation amounts there is no pooling scheme making use of a combination of climatological and geographical site characteristics that would outperform the pooling scheme based on the distance between sites (ROIgeo2), further climatological site characteristics do enhance the performance of the hybrid pooling schemes for multi-day amounts.This reflects the fact that multi-day extremes are more strongly linked to basic climatological characteristics of precipitation regimes than are 1-day extremes.
The comparison of the ROI pooling schemes with the atsite approach shows that the local estimates are not satisfactory when one is interested in estimating quantiles with longer return periods (T ≥10 years).The benefits of the pooling approaches over single-site analysis become obvious with increasing T .
We also examined the possible role of the number of simulations used in the Monte Carlo experiments.In general, differences between the series of results based on 1000, 5000 and 10 000 repetitions are small.Therefore, the choice of 5000 is seen as a compromise between a sufficiently high number of simulation loops, and acceptable time demands necessary for accomplishing the numerical calculations on a PC.
Since the spatial resolution of the examined precipitation datasets (the density of sites corresponds to 1 station per an area of 19.4×19.4km) is comparable to the resolution of most current regional climate model simulations over Europe (about 25 km, see e.g.http://ensembles-eu.metoffice.com/results.html),the present findings may have implications for pooling schemes applicable to estimating high quantiles of daily precipitation (and constructing their possible scenarios) in climate change simulations.It is often necessary to "smooth" the estimated distributions and/or quantiles of precipitation amounts (e.g.Semmler and Jacob, 2004) in order to reduce large spatial variability that is related to random fluctuations, and the ROI approach (with the geographical distance in the dissimilarity matrix) appears to be a useful methodology that may easily and naturally be transferred to the context of climate model outputs.With increasing resolution of climate model simulations (and more data available for the estimation), the issue of how to efficiently reduce random sampling variability becomes more appealing.

Fig. 1 .
Fig.1.209 climatological stations available for a regional frequency analysis of heavy precipitation amounts in the Czech Republic.
mean annual precipitation, RWC = mean ratio of the precipitation totals for warm/cold seasons, DRY = mean annual number of dry days, φ = latitude, λ = longitude, h = altitude, c v = coefficient of variation, PS = Pearson's 2nd skewness coefficient, x 10 =10-year growth factor of precipitation.

Fig. 2 .
Fig. 2. Root mean square error (RMSE) and bias of growth factors corresponding to return periods T =20 and 100 years for annual maxima of 1-day precipitation amounts in a sensitivity analysis when changes made to the basic ROI pooling schemes are examined.

Fig. 3 .
Fig. 3. Root mean square error (RMSE) and bias of growth factors corresponding to return periods T =20 and 100 years for annual maxima of 5-day precipitation amounts in a sensitivity analysis when changes made to the basic ROI pooling schemes are examined.

Fig. 4 .
Fig. 4. Root mean square error (RMSE) of growth factors corresponding to return period T =100 years for annual maxima of 1-day precipitation amounts in a sensitivity analysis when the performance of different series of hybrid pooling schemes (ROIhybA-ROIhybG; see also Table4) is compared.The labels of the individual pooling schemes (geo1.00,..., geo0.00)reflect the weighting coefficient for actual geographical distance between sites (see Sect. 4.2).

Fig. 5 .
Fig. 5. Root mean square error (RMSE) of growth factors corresponding to return period T =100 years for annual maxima of 5-day precipitation amounts in a sensitivity analysis when the performance of different series of hybrid pooling schemes (ROIhybA-ROIhybG; see also Table4) is compared.The labels of the individual pooling schemes (geo1.00,..., geo0.00)reflect the weighting coefficient for actual geographical distance between sites (see Sect. 4.2).

Fig. 6 .
Fig. 6.Root mean square error (RMSE) of growth factors corresponding to return periods T =10, 20, 50 and 100 years for annual maxima of 1-day and 5-day precipitation amounts in a comparison of the performance of three selected pooling schemes (ROIcli3, ROIgeo2 and ROIhyb = ROIhybC-geo0.80,see Sect.4.2) with the at-site frequency model.

Table 1 .
Summary of site characteristics and site statistics used in individual ROI pooling schemes.The sign √ indicates that the given site characteristic or statistic is included into the pooling scheme.

Table 2 .
Performance of the ROI pooling schemes based on different combinations of site characteristics as measures of similarity for annual maxima of 1-day and 5-day precipitation amounts.RMSE T denotes average root mean square error of the estimated growth factors corresponding to return period T [years], expressed in %.The smallest values of the statistics are marked in bold, separately for climatological and geographical characteristics.

Table 3 .
Performance of the ROI pooling schemes ROIcli3, ROIgeo3 and ROIgeo2 based on different combinations of site statistics in the "true" frequency model for annual maxima of 1-day and 5-day precipitation amounts.RMSE T denotes average root mean square error of the estimated growth factors corresponding to return period T [years], expressed in %.The smallest values of the statistics are marked in bold, separately for the "true" frequency models evaluated.

Table 4 .
Summary of site characteristics used in hybrid ROI pooling schemes.The sign √ indicates that the given site characteristic is included in the pooling scheme.

Table 6 .
. Performance of the hybrid ROI pooling schemes based on different combinations of climatological and geographical site characteristics for annual maxima of 5-day precipitation amounts.RMSE T denotes average root mean square error of the estimated growth factors corresponding to return period T [years], expressed in %.The three smallest values of the statistics are marked in bold, and the smallest value is underlined.The heading shows the weighting coefficient for actual geographical distance between sites.The settings of the series of individual pooling schemes are summarized in Table4.

Table 7 .
Average root mean square error (RMSE) of growth factors of annual maxima of 1-day and 5-day precipitation amounts for return period T [years], expressed in %.The smallest values of RMSE are marked in bold, separately for both durations.ROIhyb denotes the hybrid ROI pooling scheme ROIhybC-geo0.80.