Hydrology and Earth System Sciences Region-of-influence Approach to a Frequency Analysis of Heavy Precipitation in Slovakia

The paper compares different approaches to regional frequency analysis with the main focus on the implementation of the region-of-influence (ROI) technique for the modelling of probabilities of heavy precipitation amounts in the area of the Western Carpathians. Unlike the conventional regional frequency analysis where the at-site design values are estimated within a fixed pooling group (region), the ROI approach as a specific alternative to focused pooling techniques makes use of flexible pooling groups, i.e. each target site has its own group of sufficiently similar sites. In this paper, various ROI pooling schemes are constructed as combinations of different alternatives of sites' similarity (pooling groups defined according to climatological characteristics and geographical proximity of sites, respectively) and pooled weighting factors. The performance of the ROI pooling schemes and statistical models of conventional (regional and at-site) frequency analysis is assessed by means of Monte Carlo simulation studies for precipitation annual maxima for the 1-day and 5-day durations in Slovakia. It is demonstrated that a) all the frequency models based on the ROI method yield estimates of growth curves that are superior to the standard regional and at-site estimates at most individual sites, and b) the selection of a suitable ROI pooling scheme should be adjusted to the dominant character of the formation of heavy precipitation.


Introduction
Information on design values (quantiles) of heavy one-day and multi-day precipitation is important in various fields of water resources engineering, e.g. the design of dams and sewer systems, flood prevention, protection against soil and vegetation loss, etc.With a traditional at-site approach to frequency analysis, the precipitation quantiles have long been estimated using a data sample at the site of interest only.A recognized drawback of this approach is related to the estimation of rare events, i.e. in practice, one often needs design values corresponding to return periods (T ) that are much larger than the lengths of the available series of observations (n).In the at-site approach, it is not advisable to extrapolate T far beyond n; moreover, according to the "5T rule" (Jakob et al., 1999), 5 times T station-years of data is necessary for a reliable estimation of a design value corresponding to the return period T .
In order to overcome the lack of at-site observations, a regional approach to a frequency analysis that "traded space for time" was developed in the 1960s (Dalrymple, 1960) for the estimation of design floods.This approach, based on the index-flood method, has gained wider popularity since the 1980s (e.g.Wiltshire, 1986;Lettenmaier et al., 1987;Hosking and Wallis, 1993).The core idea of the regional approach is that one can obtain more reliable quantile estimates for a given site based on a multi-site analysis compared to a singlesite approach.The proposed groups of sites should meet the requirement of homogeneity; that is, sites pooled together have to exhibit, except for a scaling factor, similar probability distribution curves (growth curves) of extremes.Provided Published by Copernicus Publications on behalf of the European Geosciences Union.
L. Gaál et al.: Precipitation frequency analysis by the ROI method in Slovakia the groups of sites (regions) are homogeneous, the regional methods not only enhance the reliability of the at-site estimates at observing stations, but also allow for the estimation of design values at ungauged locations.
One of the most discussed issues of regional frequency analysis is a method for pooling groups of sites (e.g.Acreman and Sinclair, 1986;Wiltshire, 1986;Burn, 1990b).For heavy precipitation, the design values are usually estimated based on the fixed structure of the regions, which are drawn either according to political (Pilon et al., 1991;Adamowski et al., 1996;Gellens, 2002), geographical (Sveinsson et al., 2002;Kohnová et al., 2005), or climatological considerations (Smithers and Schulze, 2001;Fowler and Kilsby, 2003;Kyselý and Picek, 2007).The regionalization methods may be further differentiated as to a) whether the final groups of sites form contiguous geographical units (regions in a traditional sense) or they are scattered in the space (pooling groups; Reed et al., 1999b), or b) whether they employ subjective or objective criteria for defining groups of sites.
The region-of-influence (ROI) method introduced by Burn (1990a,b) is an alternative approach to regional frequency estimation.It was proposed for flood frequency analysis in order to overcome possible inconsistencies that may occur on the boundaries of pooling groups (Acreman and Wiltshire, 1989).In such cases, a classical approach to regional analysis may lead to undesirable step changes of the variables and estimated quantiles.To avoid the issue of inconsistency, Wallis et al. (2007) define narrow transition zones between adjacent climatic regions where the geographical interpolation of statistical characteristics of precipitation is adopted.Nevertheless, the ROI method eliminates these deficiencies by defining groups of sites in a flexible way.This means that each site has its own "region", a unique set of sufficiently similar stations, which do not necessarily form contiguous geographical units and from which the information about the extremes is transferred to the site of interest.Such an approach is termed focused pooling (Reed et al., 1999b), as each site is regarded as the center of its own region, and a pooling group "is specifically tailored to a target site of interest and a given return period" (Cunderlik andBurn, 2002, 2006).The ROI method has been used in a number of flood frequency studies (Burn, 1990a, b;Zrinji andBurn, 1994, 1996;Tasker et al., 1996;Burn, 1997;Provaznik and Hotchkiss, 1998;Castellarin et al., 2001;Holmes et al., 2002;Eng et al., 2005;Merz and Blöschl, 2005).
Although the issue of the discontinuity of the variables and quantiles is related mainly to flood risk assessments where catchments have actual physical boundaries, the concept of focused pooling is also applicable in the case of precipitation where the raingauges represent only single points in space.In a frequency analysis of precipitation extremes, focused pooling concepts different from the ROI approach were evaluated by Schaefer (1990), Alila (1999), Di Baldassare et al. (2006) and Wallis et al. (2007).Their common feature is that growth curves of precipitation extremes are estimated within pooling groups, which are defined only in terms of the mean annual precipitation (MAP): the statistical properties (standard moments or L-moments) of the rainfall extremes are considered to be smooth variables that only depend on the value of MAP.The Focused Rainfall Growth Extension (FORGEX) method is a unique alternative to focused pooling that was developed for precipitation frequency estimation in Great Britain (Reed et al., 1999a).The High Intensity Rainfall Design System (HIRDS) in New Zealand (Thompson, 2002) adopted the original ROI concept for extreme precipitation.
To a certain extent, the aforementioned methodological considerations have been implemented at national meteorological offices worldwide where detailed statistical methods for regional rainfall frequency analysis have been developed during recent decades.In Great Britain, the Flood Studies Report (NERC, 1975) and later the Flood Estimation Handbook (FEH, 1999), which aimed at methods of estimation of extraordinary flood events, developed sophisticated procedures for the estimation of design precipitation.The German KOSTRA project (e.g.Malitz, 1999;Malitz and Ertel, 2001), the Italian VAPI project (e.g.Cannarozzo et al., 1995), the Australian Guide to Rainfall and Runoff (Pilgrim, 1987) and the internet-based Precipitation Frequency Atlas of the United States (Bonnin et al., 2006a, b) are other examples of complex national studies on risk assessment of heavy precipitation.
In Slovakia, there is also a need for a new and comprehensive rainfall frequency study, since the design values of precipitation for water-related projects are still often estimated by means of the Gumbel and/or Pearson type III distribution, using the at-site approach (e.g.Šamaj et al., 1985).Thus, Slovak hydrologists and climatologists aim at adopting different regional approaches to precipitation frequency analysis with the long-term ambition of developing proper methods for mapping the risk of heavy precipitation in the area of the Western Carpathians.Gaál (2006) showed that, in principle, the whole area of Slovakia may be treated as a compact homogeneous region, regardless of the duration of precipitation events (1 to 5 days) and/or seasons considered (warm/cold half-years).Such a view, although it correctly fits the statistical concepts of regional frequency analysis, does not seem acceptable from a climatological point of view.Long-term experience indicates that even though the area of Slovakia is relatively small, it is unreasonable to treat the country as a single region (Faško and Lapin, 1996).Several precipitation regimes do exist in the area of the Western Carpathians which are formed by western circulation, and the Mediterranean and continental influences, and these are further differentiated by altitudinal zonality due to the rugged topography (Lapin and Tomlain, 2001).This naturally implies a need for a division of the country into sub-regions.Attempts to delineate homogeneous sub-regions by objective and subjective methods (cluster analysis and process-based regionalization according to the effects of different patterns of general air-mass Hydrol.Earth Syst.Sci., 12, 825-839, 2008 www.hydrol-earth-syst-sci.net/12/825/2008/ circulation, respectively) did not lead to unambiguous and satisfactory results (Gaál, 2005(Gaál, , 2006)), so we turned our attention to the ROI method (Burn, 1990b) as an alternative to focused pooling techniques.The implementation of such a methodology, particularly due to its flexibility, seemed very promising in a complex terrain like Slovakia.The present work is a case study-based inter-comparison of different approaches to the frequency analysis of precipitation extremes in Slovakia.The analysis aims at a) evaluating the applicability of the ROI pooling approach in the given climatological and physical-geographical settings, b) selecting a superior technique for the estimation of the growth curves of extreme precipitation by means of statistical methods (i.e. a Monte Carlo simulation experiment), and c) examining whether the choice of the most suitable frequency model depends on the time scale of the extreme precipitation events.The main innovation in the ROI methodology is the incorporation of a specific index (Lapin's index of the Mediterranean influence) as a measure of the long-term precipitation regime.Even though the methodology applied herein allows, in principle, for the estimation of design precipitation at ungauged sites, the present study only focuses on the estimation of the growth curves at the sites equipped with raingauges.
The paper is structured as follows: after a short description of the selected stations and their data in Sect.2, a detailed overview of the background of the ROI method is presented in Sect.3. Sect. 4 introduces further frequency models and a common basis for their inter-comparison through a Monte Carlo simulation procedure.Different frequency models are evaluated using the observed data in Sect. 5. A discussion and conclusions follow in Sect.6.

Data
Daily precipitation amounts measured at 56 stations (Fig. 1) operated by the Slovak Hydrometeorological Institute (SHMI) were used as the input data set.The altitudes of the stations range from 100 to 2635 m a.s.l., which cover the whole range of elevations in Slovakia, and the density of the selected sites is approximately one per 900 km 2 .
Observations of the daily precipitation amounts without gaps over the period 1961-2003 (in some cases since 1951) are available at 29 climatological stations.Since these sites do not cover the area of Slovakia evenly, it was necessary to extend the data set with sites having minor gaps in their daily rainfall records (breaks of several months).Each of the additional 27 sites has at least 35 complete years of observations.Fig. 1 shows that the central and north parts of Western Slovakia were the main areas where the data set needed to be supplemented with other stations in order to ensure a more homogeneous spatial coverage.The basic data set at the selected 56 sites makes up 2464 station-years.
The block maxima approach (Coles, 2001;Hosking and Wallis, 1997) to the selection of extremes has been adopted in the present study, and samples of precipitation annual maxima for the 1 to 5-day durations were drawn from each station record.All the years with incomplete daily records at a given station were excluded from the analysis.
The entire data set of 5 durations has been confined to the data sets of 1 and 5-day precipitation maxima for this study.The preliminary analyses showed that the statistical behaviour of data sets corresponding to two adjacent durations (1-day vs. 2-day, 2-day vs. 3-day etc.) is rather similar, and the most remarkable difference appears between 1 and 5-day precipitation maxima.
The data underwent standard quality checking for gross errors as well as checking in terms of a discordancy measure based on L-moments (Hosking and Wallis, 1993).A very small number of the data series were flagged as discordant; a detailed scrutiny revealed no rough errors in the data since the uniqueness of each discordant data series was induced by extraordinary local precipitation events (for further details, see Gaál, 2006).
Due to the fact that the selected stations form a rather low density network and the precipitation extremes show a high temporal and spatial variability, it was not possible to test the homogeneity of the data series by comparing a site's data with a reliable and homogeneous reference series.Instead, each site's homogeneity was examined individually.Possible step-like changes in the data series were analyzed by applying four different homogeneity tests (Wijngaard et al., 2003): the standard normal homogeneity test (Alexandersson, 1986), Buishand's range test (Buishand, 1982), Pettitt's test (Pettitt, 1979) and von Neumann's ratio test (von Neumann, 1941).All but one of the data sets have been categorized as useful (according to Wijngaard's definition) at the significance level of α=1%, i.e. no clear signal of inhomogeneity is detected, and the data sets are sufficiently homogeneous for further analyses (Wijngaard et al., 2003).These findings were reaffirmed by testing for trends using the nonparametric method of Wald and Wolfowitz (1943): no significant trends were detected in the individual series (except for the highest elevated station of Lomnický štít, 2635 m a.s.l.), so the data sets can be regarded as stationary as well.The settings of the ROI method are presented in more detail in the following sub-sections.

Distance metric
The distance metric serves to determine the proximity of sites in an attribute space.There are a number of alternative definitions of a distance metric reported mostly in connection with a cluster analysis (Cormack, 1971, classifies 10 different definitions of the distance metric).In the context of the ROI method, the Euclidean distance metric is mostly used (e.g.Burn, 1990a, b;Zrinji and Burn, 1994;Castellarin et al., 2001;Holmes et al., 2002), probably due to the fact that it is the most intuitive one.Nevertheless, Cunderlik and Burn (2006) proposed an alternative definition of the proximity of sites by introducing the Mahalanobis similarity distance.
The Euclidean distance metric is defined as where D ij is the weighted Euclidean distance between sites i and j ; W m is the weight associated with the m-th site attribute; X i m is the value of the m-th attribute at site i; and M is the number of attributes.The distance metric matrix D is symmetrical (D ij =D j i ) with zeros on its diagonal (D ii =0).
The region of influence for a given station is formed according to the following scheme: First, the site with the lowest value from the whole set of D ij , j =1, ..., N is added to the ROI for site i.In the very first step it is the site i itself, for which the distance metric D ii =0 is always the one with the smallest value.Then, the next site with the second smallest value of D ij is added into the ROI for site i.The sites are successively pooled into the ROI as long as a given condition (see Sect. 3.2) is satisfied.
As the site attributes X m may have substantially different magnitudes, a transformation of the initial values before calculating D ij (Eq. 1) is usually applied.The simplest alternative is a standardization of the variables: where X is the mean, and σ X is the standard deviation of attribute X.As a result, all the site attributes X m are of a comparable magnitude, i.e. they have a zero mean and unit variance.
In the current analysis, equal (unit) weights W m =1, m=1, ..., M in Eq. ( 1) have been chosen for both alternatives to the site attribute sets (Sects.3.1.1-3.1.2).We did not attempt to adjust the relative importance of the site attributes because we did not find physically justifiable reasons for assigning different weights to the site attributes.
The wider the database of the site attributes, the greater the potential for successful pooling/clustering/etc.However, one should be aware of redundant information involved among the selected attributes.They are often crosscorrelated, i.e. carrying the same information about the analyzed phenomena.Clustering techniques are sensitive to the redundancy of information (Guttman, 1993;Gong and Richman, 1995); in practice, the selected attributes should be as Hydrol.Earth Syst.Sci., 12, 825-839, 2008 www.hydrol-earth-syst-sci.net/12/825/2008/ independent as possible.There are several multivariate statistical methods for filtering redundant information, such as a principal component analysis, factor analysis or Procrustean analysis -Dinpashoh et al. ( 2004) is an example of a thorough examination of a wide database of site attributes.The risk of having unduly correlated attributes becomes greater when increasing the number of attributes, and that is why we confined the analysis to three attributes in both alternatives to the site attribute sets (Sects.3.1.1-3.1.2).
When dealing with site attributes, it is important to distinguish site characteristics from site statistics.Site characteristics are quantities that are known a priori to the frequency analysis at a given site (e.g. the site's location, its physical-geographical properties or mean annual precipitation), while site statistics are simply the measurements or results of the statistical processing of the observed data at a given site (Hosking and Wallis, 1997).It is strongly recommended (Hosking and Wallis, 1997;Castellarin et al., 2001) that the pooling process be based only on site characteristics since (i) site statistics do not usually reflect the climatological or geographical aspects of the mechanisms of precipitation generation, (ii) site statistics should be exclusively used in the homogeneity testing of pooling groups, and (iii) site characteristics also allow for the extension of the regional analysis to ungauged sites.
The selection of the site attributes plays a key role in the ROI method: the success of the whole procedure depends on finding the right number and combination of proper site characteristics.In Sects.3.1.1-3.1.2below, two different alternatives to the selection of the site attributes are evaluated.

Alternative #1: general climatological site characteristics
The set of the climatological site attributes consists of characteristics that describe the long-term precipitation regime of the country.Slovakia is a relatively small, landlocked country in Central Europe with an area of 49 035 km 2 .Its topography is complex: rugged mountains in its central and northern parts (the Western Carpathian Mountains, encompassing the High and Low Tatras), and lowland areas in its southern parts.60% (15%, 1%) of the area of the country are located in altitudes above 300 (800, 1500) m a.s.l.(Marečková et al., 1997).Slovakia lies in an area where various maritime and continental influences meet.The precipitation regime is affected by different factors; the dominant ones are the effect of a) the Mediterranean area, b) the western circulation (the Atlantic Ocean), and c) the European continent.The influence of the Mediterranean area has a pronounced role in the inter-annual variability of the monthly precipitation amounts, mainly in autumn and in South Slovakia.In general, the annual cycle of precipitation has a maximum in June (on average 95 mm, especially in the south) and a minimum in February (on average 43 mm).However, due to cyclones moving from the area of the Ligurian Sea (in the Mediterranean), a secondary autumn maximum (in October/November) appears at the majority of stations of South Slovakia.Lapin's index of the Mediterranean effect L M is a quantitative characteristic of the magnitude of this influence.It is defined using 3 ratios of certain monthly precipitation amounts: where the indices denote May (V), July (VII) and months with the maximum (Max), secondary maximum (Max2) and secondary minimum (Min2) precipitation amount in the annual cycle.The number 2.5 in Eq. ( 3) is a correction factor.For a detailed description of the L M index, refer to Gaál (2005).
The spatial distribution of the mean annual precipitation (MAP) exhibits strong variability.A general slight descent of the MAP from the west to the east is superimposed on by an altitudinal zonality due to the topography; therefore, the lowest values of the MAP occur in the south-west Danubian lowlands (about 500 mm), while the largest precipitation amounts are observed at the highest windward slopes of the Carpathian Mountains (more than 1500 mm).The daily precipitation amounts may, in extreme cases (due to heavy convective storms), exceed 150 mm.
Considering the general precipitation climate in Slovakia, the following variables have been selected in alternative #1 to the distance metric: 1. the mean annual precipitation; 2. the ratio of the precipitation amounts for the warm/cold season (warm season: April-September, cold season: October-March); 3. Lapin's index of the Mediterranean effect (Eq. 3).
Using the site characteristics of alternative #1 in the distance metric would, in principle, result in groups of sites with similar climatological conditions that may, to some extent, also be related to mechanisms generating heavy precipitation.In practice, however, there is no guarantee that the proximity of sites in the M-dimensional space of the climatological site characteristics implies a similarity in the extreme precipitation regimes.

Alternative #2: geographical site characteristics
The geographical proximity of sites is considered as a further indicator of similar regimes of heavy precipitation.Thunderstorms occurring predominantly in the warm season are local phenomena that usually affect only several neighbouring sites.It is expected that pooling relatively close sites together may result in enhanced estimates of heavy precipitation quantiles mainly for shorter durations (for 1-day maxima in the present paper).
L. Gaál et al.: Precipitation frequency analysis by the ROI method in Slovakia The second set of site attributes consists of the following basic geographical characteristics: 1. latitude; 2. longitude; 3. elevation above sea level.
Alternative #2 yields groups of sites whose members are similar to the site of interest in a geographical sense.Nevertheless, this cannot be interpreted as a simple geographical proximity between two points; e.g.higher-elevated sites are usually grouped together and are not necessarily joined with other nearby sites in the traditional sense of latitude and longitude.

Pooling a station's ROI
When the appropriate site attributes are selected, and the distance metric matrix is calculated, two other issues need to be addressed.The first is to determine the cutoff point of the distance metric for the i-th site: only sites below a selected threshold will be included in the i-th site's ROI: where ROI i is the set of stations in the pooling group for site i, and θ i is the threshold distance value for site i (Burn, 1990b).
The other important issue is a determination of the pooled weighting coefficients, which must reflect the relative proximity of any site in the pooling group to the site of interest.The closer a site of the ROI i to the site i according to the distance metric, the greater amount of information it provides in the pooled frequency analysis.The weight η ij for site j in the ROI i is a function of the distance metric D ij and several other parameters (see Sects.3.2.1-3.2.3).Obviously, sites that are not included in the ROI i have zero weights.Following Burn's (1990b) framework, the threshold distance θ i and the weights η ij in the current analysis are determined according to 3 different options that reflect 3 diverse concepts of pooling information from the sites of the ROI.

Option #1: "Fewer sites with high values of the weights"
The basic idea of option #1 is that the ROI for a given site encompasses only a limited number of stations; however, all of the selected stations are assigned weights markedly different from zero.
The threshold value θ i (Eq.4) is defined as follows: and where θ L (θ U ) is the lower (upper) threshold value, NS i is the number of stations in the ROI i with a threshold at θ L , and NST is the target number of stations for the ROI.
The weighting function for this option includes two parameters (TP, n) to be determined: The settings of 5 parameters to be initialized are in accordance with Burn's original concept: θ L (θ U , TP) is the 25th (75th, 85th) percentile of the distance metric distribution, NST=15, and n=2.5.For a detailed line of the reasoning concerning the parameter settings, see Burn (1990b).
3.2.2Option #2: "More sites with different values of the weights" In option #2, a relatively large number of sites are included in the ROI for a given site.Stations sufficiently similar to the site of interest have unit weights, while lower values of weights are assigned to those less similar.
The threshold value θ i in Eq. ( 4) is constant, The weighting function for this option is defined as: and where θ L is a lower threshold value, and TN and n are the parameters of the weighting function.TN is defined using a further parameter TPP as There are 4 parameters of the weighting function for option #2 to be initialized.θ L (θ U , TPP) is selected as the 25th (75th, 85th) percentile of the distance metric distribution, and n=0.1 (Burn, 1990b).

Option #3: "All sites with different values of the weights"
Option #3 is similar to option #2 with the only difference being that all the available stations are included in the ROI for a given site, with appropriate values of the weighting function.
The threshold value θ i is defined as The definition of the weighting function and the parameter settings are the same as in option #2 (Eqs.9-11).There is no need to deal with the selection of the upper threshold θ U ; the number of the parameters to be initialized is 3 (θ L , TPP, n) (Burn, 1990b When each station's ROI and the appropriate weighting coefficients are known, it is possible to estimate the at-site precipitation quantiles using information from the ROI by means of the L-moment-based index storm procedure (Hosking and Wallis, 1997).The at-site data X j,k , j =1, . . ., N, k=1, . . ., n j (where N stands for the number of sites, and n j denotes the sample size of the j -th site) are rescaled by the sample mean µ j (index storm) in order to get dimensionless data: The dimensionless values of x j,k at site j are then used to compute the sample L-moments l (j ) 2 , . . .and L-moment ratios: and where t (j ) is the sample L-coefficient of variation (L-CV) and t (j ) r , r=3, 4, ... are the sample L-moments ratios at site j (for a definition and description of the L-moments, see Hosking and Wallis, 1997).
The regional (pooled) L-moment ratios t (i)R and t (i)R r , r=3, 4, ..., within the ROI for site i are derived from the at-site sample L-moment ratios as weighted regional averages.Two weights are applied, sample size n j (the length of the observations) and the weighting function based on the ROI distance metric η ij : and where ROI i is the set of stations forming the ROI for site i(Eq.4), for which the weighted regional L-moment ratios are calculated.The regionally weighted values t (i)R and t (i)R r , r=3, 4, ... are then used to estimate the parameters of the selected distribution function in order to get the dimensionless cumulative distribution function (growth curve).
The precipitation quantiles with a return period T are obtained by multiplying the dimensionless T -year growth curve value x T i with the index storm µ i : A universal parametric model for extremes, the generalized extreme value (GEV) distribution (e.g.Coles, 2001), is applied as the pooled distribution function in the current analysis.It has been identified as a suitable statistical model for 1-day as well multi-day precipitation extremes in central Europe, including the area of Slovakia (Gaál, 2006;Kohnová et al., 2005).

A background for inter-comparison of different frequency models in Slovakia
In the present paper, 6 different pooling schemes of the ROI method are created as a combination of 2 proposed alternatives to site attributes (Sects.3.1.1-3.1.2) and 3 options of the construction of the pooling groups (Sects.3.2.1-3.2.3).
For the sake of convenience, the ROI pooling schemes are labelled as RCo1, RCo2, RCo3, RGo1, RGo2 and RGo3, where R denotes the region-of-influence method, C (G) stands for climatological (geographical) site attributes, and o1/o2/o3 shows which of the three options for the transfer of the regional information is applied in the particular scheme.The performance of the different ROI pooling schemes is further compared (i) with the results of a regional frequency analysis using the conventional regionalization approach of Hosking and Wallis (1997), in which 3 homogeneous regions are delineated within Slovakia (HW3r, Fig. 2; Gaál, 2005Gaál, , 2006)); (ii) with the results of a regional frequency analysis which treats the whole country as a single homogeneous region (HW1r); and (iii) with the results of a traditional at-site (local) frequency analysis lacking a regional approach.
In order to evaluate the uncertainty associated with the estimated quantiles and the performance of various frequency models, Monte Carlo simulations are carried out.The crucial problem of the simulation procedure is the assessment of the unknown parent (or true) distribution for the individual sites.We decided to estimate the true at-site quantiles by means of the ROI methodology, using option #3 (Sect.3.2.3)with the combination of a new alternative to the definition of the proximity of sites in which selected site statistics are used in Eq. ( 1).This procedure is in line with the considerations of Castellarin et al. (2001).Such a pooling scheme (abbr.RSo3, where S stands for statistics) ensures that a piece of information is transferred to the site of interest from all the sites under study, while the most remarkable contribution is associated with the sites for which the characteristics of the probability distribution functions are most similar to those at the target site.The following site statistics are considered: 1. coefficient of variation (c v ) -a traditional characteristic of the scale of a data sample: where µ (σ ) is the sample mean (standard deviation); 2. Pearson's second skewness coefficient (PS) -a less traditional characteristic of the skewness of a data sample (Weisstein, 2002): where m is the median of the sample; 3. 10-year design precipitation estimated using the generalized extreme value (GEV) distribution -a characteristic of the extreme value magnitudes of a data sample.
The Monte Carlo simulations are carried out according to the following scheme: 1.For all the ROI pooling schemes, the distance metric matrix D ij is calculated, and the region of influence ROI i and weighting function values η ij are determined for each station.Note that the values of D ij , ROI i and η ij have to be set only once, i.e. before starting the simulation procedures, since in each repetition of the Monte Carlo simulation, the distance metric is determined from invariable site characteristics.
2. Samples of annual maxima are generated at each station, having the same record lengths as their real-world counterparts.The simulated data samples at the i-th site have the GEV distribution as the parent, with parameters corresponding to the pooled L-moments (Eqs.16-17) according to the RSo3 pooling scheme.
3. The at-site estimates of the L-moments, the regional (pooled) L-moments within each region (pooling group), the GEV parameters corresponding to the regional (pooled) L-moments, and finally the simulated extreme precipitation quantiles for each station are determined according to the above-described ROI pooling schemes and the traditional (HW and at-site) frequency models, respectively.
The relative performance of the various frequency models is evaluated through the root mean square error (RMSE) and bias for each quantile at site i: and Eqs. ( 21) and ( 22) are summations over repetitions of the Monte Carlo experiment (m=1 to NR); RMSE T i and BIAS T i are the root mean square error and the relative bias for the return period T at site i, respectively; x T i is the "true" value for the T -year event at site i, and xT i,m is the simulated value of the T -year event at site i from the m-th sample of the Monte Carlo simulations.A summary characteristic describing the performance of the given frequency model is the average root mean square error (RMSE T ) and average bias (BIAS T ), respectively, obtained by summations over all the stations: and In the current study, only the statistical properties of the dimensionless growth curves are examined.We decided not to focus on an analysis of the design values since the simulated design values XT i,m (within the Monte Carlo experiments) are a product of the simulated growth curves xT i,m and simulated index storm values μi,m (analogously to Eq. 18).The uncertainty of the design values is affected by the uncertainties of both factors, which makes the interpretation more difficult and less relevant to the aims of the study.Eq. 23) and the bias (BIAS T , Eq. 24) of the simulated growth curves (averaged over the stations).The box plots (Figs.3-4) offer a broader overview of the mathematical models analyzed in a more transparent form.Besides displaying the point characteristics (median), they enable a comparison of the spread of the statistical characteristics among the stations, in terms of the inter-quartile range (25%-75%) and the 5% and 95% quantiles.Even though only the box plots of RMSE T i , i=1, . . ., 56 corresponding to the return periods of T =10, 20, 50 and 100 years are presented, the general conclusions are drawn according to the whole set of results.

A summary of the results of the
The description of the results is organized as follows: the 6 ROI pooling schemes (the 2 Hosking-Wallis regional models) are inter-compared in Sect.5.1 (5.2); the local models are briefly evaluated in Sect.5.3; and the performance of all the frequency models examined is compared in Sect.5.4.

Evaluation of the ROI pooling schemes
Focusing on the 6 ROI pooling schemes only, all of them have a small positive bias (regardless of the return period), i.e. they slightly overestimate the actual growth curve values (Tables 1-2).The RCo3 model has the smallest average bias for both data sets (Tables 1-2).On the other hand, an outstanding ROI scheme cannot be found when analyzing the spread characteristics of the bias (not shown).Since the intermodel variability of the bias is relatively small and does not yield a clear figure that would support a distinction between the various ROI pooling schemes, the evaluation of the frequency models is based on RMSE in the following parts of the paper.
The root mean square error of the simulated growth curves enables a more efficient comparison of the ROI pooling schemes.Even though the differences between the ROI schemes are rather small, the average values of RMSE T Table 1.Average root mean square error (RMSE T ) and average bias (BIAS T ) of growth curves of 1-day annual maxima for return periods T (in %).The smallest values of RMSE T and BIAS T are indicated in bold.2) show that the RGo3 (RCo3) model performs best for 1-day (5-day) maxima for return periods T ≥10 years.Such a pattern is partially captured in the box plots (Figs.3-4): with the exception of the upper whiskers, the 5, 25, 50 and 75% percentiles are most favourable for the RGo3 (RCo3) model in Fig. 3 (Fig. 4) for the longer recurrence intervals (T ≥20 years).

RMSE T [%] T[yrs
According to the presented results of the simulation procedure, it is more straightforward to compare different alternatives to the distance metric (i.e. the group of ROI models RC versus the group of RG) than the individual ROI models.For 5-day precipitation maxima (Fig. 4), the statistical properties of the RC models seem to indicate a better performance for T ≥20 years at all the stations.However, the situation is more complicated in the case of 1-day annual maxima (Fig. 3).For the return periods T ≥20 years, the RG models generally demonstrate better statistical properties (the lowest 5, 25 and 75% percentiles and the narrowest boxes) than the RC models.The only exception is the position of the upper whiskers of the RG models that belong among the overall worst ones.This may imply that although the ROI pooling schemes based on the geographical proximity of sites are generally suitable ones for the growth curve estimation of 1-day maxima, there are a small number of stations where their performance is rather poor.However, a detailed analysis of the results revealed that the odd position of the upper whiskers need not necessarily be the property of the RG models; rather it might be caused by the reference RSo3 model.There are three "problematic" stations in the northern and central parts of the country, which exhibit a few outliers of 1-day precipitation maxima.These extraordinary values induce the outlying positions of the given sites in the three-dimensional space of the site attributes, which finally results in an enhanced variance of the simulated quantiles (higher RMSE values) at these sites.This line of reasoning may also be underpinned by the fact that the "problematic" stations a) also belong among the worst ones in the case of the RC models for 1-day precipitation maxima, and b) show an average behaviour in the case of the longer duration since they do not have outlying 5-day maxima.
The general behaviour of the ROI pooling schemes in the case of a shorter (longer) duration is in accordance with the climatological expectations.Alternative #2 (the RG models) performs slightly better than alternative #1 in the case of the 1-day duration (Fig. 3).That is likely a consequence of the fact that a) the 1-day annual maxima occur mostly during the warm season (April to September period), and b) extreme precipitation events in the warm season are rather local phenomena, so a distance metric based on geographical characteristics is able to cope with warm season precipitation in a more efficient way than the one in alternative #1 (the RC models).The somewhat better performance of the RC models in the case of 5-day maxima (Fig. 4) may be related to the effects of the complex topography of the area under study on the formation of the precipitation climate.Precipitation extremes of longer durations are predominantly of a frontal origin (larger-scale phenomena, comparable with the extent of the country).Therefore, different precipitation regimes on the windward and leeward sides of the mountains in Slovakia appear.Since the windward/leeward effects are also partially captured in the climatological characteristics used in alternative #1, the RC models result in slightly better general behaviour.
Of the various options for the transfer of regional information (o1/o2/o3) at a given alternative to the distance metric, it is rather difficult to determine the best one.However, in most cases, option #3 benefits from incorporating each station into a given site's ROI.
The histograms in Fig. 5 serve to verify whether the number of sites in the individual pooling groups meet the 5T rule.The "strict threshold" (patterned bars on Fig. 5) is related to Eq. ( 5) in the definition of the first pooling option.Based on this rule, approximately half of the total number of sites have pooling groups consisting of ≤15 sites.Employing the "looser threshold" in option #1 (Eq.6) changes the histogram of the size of the pooling groups in such a way that the majority of pooling groups in option #1 consists of at least 13-15 sites (the grey bars in Fig. 5).Considering the fact that the average length of observation at the stations is 44 years, a "reliable" estimation of the quantiles of recurrence interval T ∼ =110-130 years is possible at most of the stations.Keeping in mind the 5T rule, the quantile estimates corresponding to the return period T =200 years serve only for informative purposes, and should be used in engineering applications with caution.
According to option #2 (Eq.8; black bars in Fig. 5), the pooling groups are rather large; their size in alternative #1 converges to the possibly maximum one.However, in alternative #1 (#2), there are 4 (3) stations, for which it was impossible to find a sufficient number of similar sites based on www.hydrol-earth-syst-sci.net/12/825/2008/ Hydrol.Earth Syst.Sci., 12, 825-839, 2008 L. Gaál et al.: Precipitation frequency analysis by the ROI method in Slovakia the selected attribute sets, even using option #2.These sites are either the highest elevated ones (Lomnický štít, 2635 m; Chopok, 2008 m; Skalnaté pleso, 1783 m a.s.l) or they are outlying ones in the space of climatological characteristics due to their slightly odd precipitation regime.
5.2 Evaluation of the Hosking-Wallis regional models Surprisingly, of the two Hosking-Wallis models of regional frequency analysis, HW1r demonstrates a better performance than HW3r for both durations (Tables 1-2), and both statistical characteristics (RMSE and bias) for the return periods T ≥50 years.Such a feature is also supported by the spread characteristics: the single-region HW model performs slightly better for 1-day annual maxima (Fig. 3).This implies that the delineation of smaller sub-regions in the country does not necessarily improve the quantile estimates.The annual maxima for the 1-day duration are likely to arise mostly from strong, local convective rain showers that are little dependent on topography; thus, in principle, similar properties of heavy rainfall may be found at relatively distant locations.Therefore, pooling information that is geographically little dependent from all the available sites may result in a better performance of the frequency model.For the duration of 5 days (Fig. 4), the statistical characteristics of the models are ambivalent when evaluated in terms of RMSE box plots: the HW1r model provides better lower percentiles (5 and 25%) and the overall best median values; however, the high position of the upper whisker demonstrates that this model fails in the estimation of growth curves at some stations.The narrower range of boxes and whiskers with the HW3r model is favourable for all the return periods.

Evaluation of the local models
The at-site method of estimation shows better behaviour than the regional models only in the case of the bias characteristics and for low return periods (T ≤20 years, Tables 1-2).The RMSE of the at-site estimates is obviously the highest one (Tables 1-2) for any return period T : at least 3-4 times as much as the RMSE values of the regional models for the corresponding T .The same feature is captured by the outlying position of the box plots of the at-site model in Figs.3-4.It is explained by the fact that the at-site estimates, particularly for lower probabilities of occurrence are unduly influenced by the sampling variability.The regional approach, regardless of being traditional one or pooling, considerably reduces this impact and leads to more reliable quantile estimates.
As a consequence, the results demonstrate that the at-site approach to frequency analysis is the least suitable method for the estimation of heavy precipitation quantiles.

Comparison of the ROI, Hosking-Wallis and local models
The results of the comparison of the 9 examined models of the frequency analysis (6 ROI pooling schemes, 2 HW models, and the at-site model) are summarized as follows: -The selection of the most appropriate model for the pooled frequency analysis depends on the duration of the analyzed precipitation data sets: in the case of 1day (5-day) annual maxima, it is the RGo3 model at most stations (RCo3 at all the stations), in which the between-site similarity is determined according to the geographical (climatological) characteristics.The common feature of the best ROI pooling schemes is that the regional information is pooled with appropriate weighting coefficients from all the stations under study.
-In the case of 1-day annual maxima, the ROI approach, regardless of the definition of the similarity measures and the pooling options, outperforms the traditional (HW) frequency models at most stations.
-In the case of 5-day annual maxima, the ROI approach, regardless of the definition of the similarity measures and the pooling options, outperforms the traditional HW3r frequency model.However, a comparison of the HW1r model with the ROI pooling schemes does not yield a clear pattern, since the HW1r model shows the best properties (the 5, 25 and 50% percentiles) and the worst ones (the 95% percentiles for each T and the 75% percentiles for lower T ), too, at the same time (Fig. 4).
The explanation of the fact that the traditional, singleregion model performs better at least at the half of the stations might be the homogeneity of the country.Nevertheless, if the RCo3 and/or RGo3 models are considered that use information from the same pool as the HW1r model (i.e.all the available stations), the average performance of the ROI models is more favourable (Table 2).
-Local at-site models should be avoided since they lead to a large variance in the estimated growth curves.

Conclusions
The region-of-influence (ROI) method, which was designed in flood frequency studies to avoid inconsistencies at the boundaries of regions involved in conventional regional approaches, is shown to be a very useful tool for the frequency modelling of heavy precipitation events.Six different combinations of the site attributes (that enter the distance metric) and weighting functions (used to pool regional information) were evaluated using a Monte Carlo experiment.The simulation procedure implies that the selection of the suitable alternative to the distance metric should be accommodated to the dominant character of the precipitation formation.In the case of a shorter duration (1-day annual maxima), the better alternative is the one based on the geographical proximity of the sites, while in the case of a longer duration (5-day annual maxima), the alternative based on the climatological characteristics of the long-term precipitation regime shows more favourable statistical properties.It is less obvious which option for the transfer of regional information is the most suitable one, but the approach that makes use of all the available observations (with different weights) seems to be superior to the other two.A smaller number of parameters that need to be initialized makes this option advantageous, too.Nevertheless, the superiority of the best ROI pooling schemes (which use all the sites available in the analysis) may stem from the fact that the area under study is relatively small and that the analyzed data sets of 1 to 5-day precipitation extremes are sufficiently homogeneous (Gaál, 2006).An analysis carried out using heterogeneous data sets covering a considerably larger geographical area may lead to different results.
The most important finding stems from a comparison of the two basic approaches to the regional frequency analysis: the ROI method, which makes use of flexible regions in order to pool regional information to the target site, is generally superior to the traditional approach of a regional analysis based on firmly separated groups of sites.There are minor cases where the traditional regional frequency models perform better; nevertheless, such behaviour might be explained by the way the reference frequency model (based on selected site statistics) was designed.In future experiments, site statistics based on conventional moments will be ignored, and the L-moments will be used instead, due to their lower degree of sensitivity to the outliers in the data series (Vogel and Fenessey, 1993).The fact the traditional single-region approach resulted in the best performance at a number of sites in case of 5-day durations may be accounted, again, for the homogeneity of the country.
In the present paper, no attention has been paid to an estimation of an index storm.It is usually estimated with a considerable degree of uncertainty, mainly in a low density network of sites such as the one employed herein.Therefore, our goal was not to make the interpretation of the results of the simulation experiments more difficult by considering the uncertainty of the index storm estimates.
In connection with the estimation of growth curves at ungauged sites, there are two main issues to be discussed.First of all, one may argue that the climatological characteristics used in the first alternative of the ROI approach are not applicable to ungauged sites since they are based on the observed precipitation data (monthly, seasonal or annual totals).In strict sense, it is true.Nevertheless, the long-term characteristics of the precipitation regime can be estimated at ungauged sites from climatological maps with a relatively smaller degree of uncertainty compared to any extremes or the index storm; therefore the derived site attributes are considered to be usable also in the ungauged cases.Secondly, although the methodology applied allows for the estimation of growth curves at ungauged sites, we only focused on a regional analysis of extreme precipitation from sites with direct meteorological observations.The main goal of the paper was to verify the applicability of the ROI methodology in a network of selected raingauges and to find the most appropriate setting of the method; the performance of the frequency models at ungauged locations will be the subject of further investigation.
The results presented were obtained using simulation experiments based on precipitation data in a particular area in central Europe (Slovakia); however, it is likely that at least some of the methodological findings may be rather general and independent of the target area.We recommend the ROI method for frequency estimates of heavy precipitation events in different climatological conditions in other parts of the world, particularly in areas with complex topography.

Fig. 1 .
Fig.1.56 climatological stations in Slovakia selected for a regional frequency analysis of heavy precipitation amounts.

Fig. 2 .
Fig. 2. Delineation of 3 homogeneous regions for frequency analysis of heavy precipitation amounts using the conventional regionalization approach of Hosking and Wallis.

Fig. 5 .
Fig. 5.The size of the individual pooling groups (a) in alternative #1 (based on climatological site characteristics), and (b) in alternative #2 (based on geographical site characteristics).The patterned (grey) bars are related to the strict (looser) threshold of the definition of pooling option #1 in Eq. (5) (Eq.6); the black bars show the distribution of the number of sites in the pooling groups according to option #2 (Eq.8).

Table 2 .
Average root mean square error (RMSE T ) and average bias (BIAS T ) of growth curves of 5-day annual maxima for return periods T (in %).The smallest values of RMSE T and BIAS T are indicated in bold.