Regional frequency analysis of heavy precipitation in the Czech Republic by improved region-of-influence method

Regional frequency analysis of heavy precipitation in the Czech Republic by improved region-of-influence method L. Gaál and J. Kyselý Institute of Atmospheric Physics, Academy of Sciences of the Czech Republic, Bočnı́ II 1401, 141 31 Prague 4, Czech Republic Department of Land and Water Resources Management, Faculty of Civil Engineering, Slovak University of Technology, Radlinského 11, 813 68 Bratislava, Slovakia Received: 16 December 2008 – Accepted: 18 December 2008 – Published: 14 January 2009 Correspondence to: L. Gaál (ladislav.gaal@stuba.sk) Published by Copernicus Publications on behalf of the European Geosciences Union.


Introduction
Frequency analysis, which aims at estimating recurrence intervals of heavy hydroclimatological phenomena, is a specific field of applied statistics, which has been intensively developed over recent decades.Frequency analysis usually benefits from a regional approach, which is applicable if the regional homogeneity criterion is met; that is, the sites that form a given region share the same distribution function of the examined variable apart from a site-specific scaling factor called the index value (Dalrymple, 1960).Different aspects of the regional approach to frequency analysis have been examined in connection with heavy precipitation (e.g.Gellens, 2002;Sveinsson et al., 2002;Fowler and Kilsby, 2003;Boni et al., 2006;Wallis et al., 2007), floods (e.g.Burn, 1997;Madsen and Rosbjerg, 1997;Adamowski, 2000;Kjeldsen et al., 2002;Jingyi  and Hall, 2004;Solín, 2008), droughts (e.g.Clausen and Pearson, 1999;Chen et al., 2006), extreme sea levels (e.g.van Gelder et al., 2000) and wind speeds (e.g.Sotillo et al., 2006;Modarres, 2008).The superiority of regional frequency models over the conventional at-site approach (which only utilizes data from the site of interest itself) stems not only from the reduced uncertainty of the estimated high quantiles at the right tails of the distributions (e.g.Lettenmaier et al., 1987;Cunnane, 1988;Stedinger et al., 1993), but also from the fact that the regional methods allow for the estimation of design values at ungauged sites (e.g.GREHYS 1996a, b;Kohnov á et al., 2006b).
In the traditional approach to regional frequency analysis, the regions are kept fixed; that is, when changing the focus from one site to another within a given region, the information source for the regional transfer remains unchanged (e.g.Hosking and Wallis, 1997).An alternative to regional frequency estimation, the region-of-influence approach (ROI; Burn, 1990a, b), introduced a basically different concept: the idea of focused pooling.Its main feature is the uniqueness of the "regions" (more precisely, the pooling groups - Reed et al., 1999b): each site under study has its own group of adequately similar sites that form the basis for the transfer of information on extremes to the site of interest.The idea of focused pooling has been adopted in studies of flood flows (e.g.Zrinji andBurn, 1994, 1996;Castellarin et al., 2001;Cunderlik and Burn, 2002;Holmes et al., 2002;Shu and Burn, 2004) and precipitation extremes (Schaefer, 1990;Alila, 1999;Di Baldassare et al., 2006), as well as in complex nationwide projects devoted to the frequency analysis of hydro-climatological extremes (Reed et al., 1999a;Thompson, 2002).
In an analysis of extreme precipitation amounts in Slovakia, Ga ál et al. ( 2008) adopted an original concept of the ROI approach (Burn, 1990b), which, however, had been subjected to criticism due to the need to choose a relatively large number of parameters according to subjective considerations (e.g.Hosking and Wallis, 1997).Zrinji and Burn (1994) revisited the ROI methodology: instead of subjectively selected threshold values, a built-in regional homogeneity test was used for assigning sites to a Introduction

Conclusions References
Tables Figures

Back Close
Full given pooling group.They employed χ 2 R statistics (Chowdhury et al., 1991) for testing the regional homogeneity of the proposed pooling groups.Later, Zrinji and Burn (1996) extended the ROI methodology by a hierarchical feature (Gabriele and Arnell, 1991), which implemented different alternatives to the homogeneity test of Hosking and Wallis (1993).Castellarin et al. (2001) applied the hierarchical pooling methodology of Zrinji and Burn (1996) for a flood frequency analysis in northern central Italy.
In this study we modify the pooling methodology of Castellarin et al. (2001), carry out a sensitivity analysis by examining the performance of the ROI models after changing the set of input site attributes that serve for determining the sites' similarity, and compare the improved ROI model with other frequency models by means of simulation experiments.The methods are applied to modelling probabilities of extreme one-day and multi-day precipitation amounts in the Czech Republic (central Europe).

Precipitation data
Daily precipitation totals measured at 209 stations mostly operated by the Czech Hydrometeorological Institute (CHMI) were used as the input dataset (Fig. 1).The altitudes of the stations range from 150 to 1490 m a.s.l., and the observations at most sites span the period from 1961 to 2005.Three main criteria were applied when selecting the stations and forming the dataset: the stations approximately evenly cover the area of the Czech Republic; there were no significant station moves during 1961-2005 (all sites where any location changes exceeded 50 m in altitude were excluded from the analysis), and no other sources of inhomogeneities were reported; the daily series of precipitation records are uninterrupted (except for the sites discussed below).Introduction

Conclusions References
Tables Figures

Back Close
Full Screen / Esc Printer-friendly Version

Interactive Discussion
The data underwent standard quality checking for gross errors (cf.Coufal et al., 1992).
A large majority of the station records cover the whole period of 1961-2005; 36 out of the 209 stations have daily data over shorter subperiods of at least 31 consecutive years (mostly between 38 and 43 years; the stations started to operate after 1961 or closed before 2005) and/or minor parts of the records had to be omitted owing to the stations' relocations.
The dataset is superior to the one employed in Kyselý and Picek (2007a), especially since it involves a much larger number of sites with complete daily records, more evenly covers the area of the Czech Republic, extends to the very recent past (December 2005), and a few errors in the original dataset were corrected (a missing month was identified in the records of 3 stations, and the data were supplemented).
At 45 stations, minor gaps in the daily records occurred (a total of up to 1 month over 45 years at 32 sites and not exceeding 3 months at any of the 45 sites).We decided to preserve these stations in the analysis because of their locations in areas that are insufficiently covered by rain-gauges with complete records.The missing daily data were estimated using measurements at 2 to 5 nearest locations available in the climatological database of the CHMI; the methodology is described in Kyselý (2008).(Note that the mean distance to the nearest measuring site was only 15.4 km for the locations where the missing data were estimated, and the percentage of the missing daily records in the entire dataset was only 0.05%.)All other station records with more than 3 months of missing values were excluded from the analysis.
The basic features of the precipitation climate of the Czech Republic, with a focus on extremes, may be found in Kyselý and Picek (2007a) and Kyselý (2008).
Samples of the annual maxima of 1-day and 5-day precipitation amounts were drawn from each station record and are further examined.

Alternatives of site attributes
The ROI approach as one of the methods of focused pooling techniques is aimed at finding groups of sites that share similar statistical properties of the observed hydro-277 Figures

Back Close Full Screen / Esc
Printer-friendly Version Interactive Discussion climatological extremes.It is assumed that the frequency distribution of the extremes at a given site is closely related to its climatological, hydrological, geographical, geomorphological, etc., attributes.Therefore, one of the basic issues of the pooling procedures is the selection of site attributes that are useful for explaining the observed behaviour of the extremes.
The similarity of sites in the pooling process is evaluated using two different sets of site attributes in this study.
The first group of site attributes consists of general climatological characteristics (abbr.ROIcli) that describe a long-term precipitation regime.The following variables are considered to be the basic characteristics of the precipitation climate: 2. mean ratio of the precipitation totals for the warm/cold seasons [-], and 3. mean annual number of dry days [-] (defined as days with a precipitation amount ≥0.1 mm).
The warm (cold) season is defined as April-September (October-March).The basic idea of the choice of the characteristics of the precipitation climate is that the atmospheric mechanisms generating heavy precipitation are similar under similar climatological conditions, particularly when the small extent of the study area is taken into account.
Geographical site characteristics (abbr.ROIgeo) represent the second group of attributes that are employed to define the sites' proximity: 2. longitude [degrees] and

elevation above sea level [m].
The geographical co-ordinates are chosen since the actual proximity of the sites may also result in similar regimes of extreme precipitation.Introduction

Conclusions References
Tables Figures

Back Close
Full Screen / Esc Printer-friendly Version Interactive Discussion

Description of the pooling scheme
Since the pooling scheme adopted herein originates from that described in the details in Ga ál et al. (2008), we confine the description to the cornerstones of the procedure and accentuate the changes and improvements in the methodology.
The similarity of sites in the attribute space is evaluated by means of a weighted Euclidean distance metric: where D i j is the weighted Euclidean distance between sites i and j ; W m is the weight associated with the m-th site attribute; X i m is the value of the m-th attribute at site i ; and M is the number of attributes.Before determining the elements D i j of the distance metric or dissimilarity matrix D, the attributes undergo normalization in order to remove any possible bias from the estimation due to different magnitudes.The weight W m in Eq. ( 1) is used to express the relative importance of the site attributes.In the current analysis, we use unit weighting coefficients for all the attributes (W m =1, m=1, ..., M), because we did not find any reasons to prefer one site attribute over another.
It is important to point out the difference between two types of the site attributes, which are usually termed as characteristics and statistics.Site characteristics are quantities independent of whether or not daily measurements of precipitation are carried out at a given site.These include geographical co-ordinates, geomorphological attributes and/or descriptors of the long-term (precipitation) climate.On the other hand, site statistics are the results of the statistical processing of the data observed at a given site.It is generally recommended (Hosking and Wallis, 1997;Castellarin et al., 2001) to use site characteristics in the process of the formation of the regions or pooling Introduction

Conclusions References
Tables Figures

Back Close
Full Screen / Esc Printer-friendly Version Interactive Discussion groups, while one should take advantage of site statistics in the process of testing the homogeneity of a proposed group of sites.
Pooling groups in the ROI approach are generally constructed using elements D i j arranged in ascending or descending order; however, there are basically two different ways as to how to accomplish this.The core idea of the first method lies in the gradual building up of the pooling groups (termed as the "forward" approach herein).Starting with the target site i , which represents a single-site pooling group at the very beginning of the process, the next closest site (i.e. the site j with the next lowest value of D i j , j =1, ..., N) is appended to the existing ROI in each turn, until a given condition for forming the ROI is met.The process of building up the ROI is terminated (i) at a given point, which is defined as a function of the selected quantiles of the dissimilarity matrix D (Burn, 1990b) or (ii) when the measure of the regional homogeneity of the proposed group of sites reaches or exceeds an unacceptable level (Castellarin et al., 2001).A reversed "backward" procedure is adopted in the second method of pooling: in its initial stage, all the sites in the analysis are supposed to form a superregion, so step by step, the most dissimilar sites are removed from the bulk of the sites until the remaining group of sites is homogeneous (Zrinji and Burn, 1994).
We implemented the regional homogeneity test proposed by Lu and Stedinger (1992) when forming the regions, for two reasons: (i) its application is computationally straightforward, and (ii) according to the comparative study of Fill and Stedinger (1995), it is one of the most powerful homogeneity tests.A brief description of Lu and Stedinger's homogeneity test, also called the X10 test, is given in the Appendix B.
We tested both the "forward" and "backward" approaches to forming homogeneous ROIs and decided to form the ROIs primarily by gradual building up ("forward") and using a homogeneity test for finding the cutoff point for the inclusion of the sites.The main deficiency of the "backward" procedure was that in some cases, it tended to produce very large homogeneous regions, which did not vary much from site to site; this resulted in the undesirable spatial smoothing of the estimated growth curves in the regional analysis (cf.Castellarin et al., 2001).Introduction

Conclusions References
Tables Figures

Back Close
Full Screen / Esc Printer-friendly Version

Interactive Discussion
As mentioned above, the basic idea of the "forward" pooling procedure is an iterative building up of the ROI for the site of interest.In each step of the iteration, the next similar site is added to the existing ROI, and the homogeneity of the proposed pooling group is tested.In the event the proposed ROI is homogeneous, the procedure goes on with the next loop; otherwise, the procedure is stopped, and the formation of the given ROI is finished.
After a detailed scrutiny of the preliminary outcomes of the analysis, we found it useful to slightly modify this pooling procedure.In a number of cases, the heterogeneity had been reached relatively early, i.e. after a few (<4-6) iterations.Neither was the adoption of the idea suggested by Castellarin et al. (2001) helpful, to stop the iteration procedure when the heterogeneity of the ROI is detected for the second time (instead of the first time).In both cases, the difficulty is that a relatively high number of pooling groups consists of a small number of sites and do not meet the "5T rule" (Jakob et al., 1999), which suggests that for a reliable estimation of a design value corresponding to the return period T , one needs 5 times T station-years of data.Considering the fact that the average length of observations at the selected stations is ∼44 years (see Sect. 2.1), it is desirable to have at least 10-11 sites included in a pooling group for a reliable estimation of the 100-year precipitation quantiles.
We modified the pooling procedure of Castellarin et al. (2001) as follows: -At the very beginning, the ROI of the target site consists of 11 stations (i.e. it comprises the site itself and the 10 closest sites), regardless of whether this initial pooling group is homogeneous or not.
-The iteration procedure of testing the homogeneity and adding the next closest site to the pooling group goes on until the first heterogeneous pooling group is found preceded by at least one homogeneous pooling group.In this case, the last homogeneous stage defines the final composition of the pooling group for the site.
-In the case when no homogeneous stage is reached by successively adding sites Introduction

Conclusions References
Tables Figures

Back Close
Full to the initial ROI, the program code returns to the initial stage with 11 sites and starts looking for a homogeneous composition by removing the least similar sites from the pooling group.The first homogeneous stage then defines the final composition of the pooling group for the site.In the worst case, when the building nor the removing procedure leads to a homogeneous stage, the ROI consists of nothing but the target site (i.e. a "single-site pooling group").

Frequency model of Hosking and Wallis
The classical regional analysis of Hosking andWallis (1993, 1997) consists in delineating fixed regions that are homogeneous according to the statistical characteristics of the probability distributions of the extremes, i.e. the at-site distributions are identical except for a site-specific scaling factor.The delineation of homogeneous regions for 1-day and 5-day precipitation extremes in the Czech Republic has been updated and modified with respect to the original one presented in Kyselý and Picek (2007a).The reason for revisiting the original results of the homogeneity tests was a new dataset of daily precipitation amounts, which consists of 209 stations (compared to 78 in Kyselý and Picek, 2007a), extends to a more recent past (2005), and altogether covers 9197 station-years (compared to the original 3120).The two largest regions of the original regionalization became heterogeneous when the new data were considered; they have been split into 3 and 2 smaller (homogeneous) areas.The new regionalization recognizes 9 regions (Fig. 1) that are homogeneous according to the X10 test of Lu and Stedinger (1992) as well as the H1 test of Hosking and Wallis (1993) with respect to the statistical distributions of the annual maxima of the 1-day to 7-day precipitation amounts.

Estimation of growth curves and quantiles
For the construction of regional (pooled) growth curves and the estimation of the precipitation quantiles, the generalized extreme value (GEV) distribution was applied (see Introduction

Conclusions References
Tables Figures

Back Close
Full Screen / Esc
The growth curves and precipitation quantiles are estimated using the L-momentbased index storm procedure (Hosking and Wallis, 1997).In the initial step, dimensionless data are calculated by rescaling by the sample mean µ j (index storm): where X j,k (x j,k ) denotes the original (dimensionless, rescaled) data; N is the number of sites; and n j denotes the sample size of the j -th site.
The dimensionless values of x j,k at site j are then used to compute the sample L-moments l 2 , . . .and L-moment ratios: where t (j ) is the sample L-coefficient of variation (L-CV) and t (j ) r , r=3, 4, ... are the sample L-moments ratios at site j (Hosking, 1990).
The pooled (regional) L-moment ratios t (i )R and t , r=3, 4, ... for the target site i are derived from the at-site sample L-moment ratios as their weighted averages: (5) Introduction

Conclusions References
Tables Figures

Back Close
Full where W i j are the weights associated with the j -th site in the analysis.The relationships analogous to Eq. ( 5) also hold true for t , r=3, 4, ... .From a mathematical point of view, the most important difference between the traditional regionalization and the focused pooling consists in the way the weighting coefficients W i j in Eq. ( 5) are defined.In the traditional regional analysis, W i j are proportional only to the record length n j for all the sites j within a given region: where R i denotes the region to which the target site i belongs.Equation ( 6) is in accordance with the concept of Hosking and Wallis (1997): sites with longer observations provide more information in the regionally averaged statistics.Note that the regional weighting coefficients in Eq. ( 6) do not change when changing the focus from one site to another within a given region.
In the focused pooling, the length of observations n j is retained in the relationship for W i j ; however, compared to Eq. ( 6), the reciprocal value of the distance metric element D i j is introduced as an additional factor (Castellarin et al., 2001): where ROI i stands for the region of influence of the site i , and where W i j are the weights associated with the j -th site in the analysis.The relationships analogous to Eq. ( 5) also hold true for t (i )R r , r=3, 4, ... .From a mathematical point of view, the most important difference between the traditional regionalization and the focused pooling consists in the way the weighting coefficients W i j in Eq. ( 5) are defined.In the traditional regional analysis, W i j are proportional 5 only to the record length n j for all the sites j within a given region: where R i denotes the region to which the target site i belongs.Equation ( 6) is in accordance with the concept of Hosking and Wallis (1997): sites with longer observations provide more information in the regionally averaged statistics.Note that the regional 10 weighting coefficients in Eq. ( 6) do not change when changing the focus from one site to another within a given region.
In the focused pooling, the length of observations n j is retained in the relationship for W i j ; however, compared to Eq. ( 6), the reciprocal value of the distance metric element D i j is introduced as an additional factor (Castellarin et al., 2001): where ROI i stands for the region of influence of the site i , and where D i j,min is the lowest non-zero value of the distance metric between the target site i and all the other sites j (Castellarin et al., 2001).(The expression for D * i j in Eq. ( 8) is 20 more complex in order to avoid inconsistent results for i =j : in this special case D i i =0, which would lead to W i i =∞.)Using the reciprocal value of the distance metric element 12 where D i j,min is the lowest non-zero value of the distance metric between the target site i and all the other sites j (Castellarin et al., 2001).(The expression for D * i j in Eq. ( 8) is more complex in order to avoid inconsistent results for i =j : in this special case D i i =0, which would lead to W i i =∞.)Using the reciprocal value of the distance metric element Introduction

Conclusions References
Tables Figures

Back Close
Full Screen / Esc Printer-friendly Version Interactive Discussion D i j as the pooled weighting factor is equivalent to assigning higher weights to sites that lie in the proximity of the target site in the attribute space: the smaller D i j for a given site j is, the greater the amount of information it brings to the procedure for the growth curve estimation at site i .The regionally weighted (pooled) L-moment ratios t (i )R and t , r=3, 4, ... are then used to estimate the parameters of the GEV distribution and the dimensionless cumulative distribution function (the growth curve).A quantile corresponding to the return period T is calculated as a product of the dimensionless T -year growth curve value x T i and the index storm µ i :

Framework for an inter-comparison of the frequency models
In this paper, ROI pooling schemes based on climatological and geographical site attributes (Sect.2.2) are compared with other frequency models, which include (i) the conventional regionalization approach of Hosking and Wallis (1997;abbr. HWreg), and (ii) the at-site frequency analysis.The performance of the different frequency models is assessed by means of Monte Carlo simulation procedures.
The essential issue of the Monte Carlo simulation is the way the unknown parent distribution (the "true" distribution) of the extremes is estimated.We decided to estimate the "true" at-site distribution by adopting the region-of-influence approach in which the similarity of sites is determined according to the statistical properties of the at-site data samples (abbr.ROIsta), as in Castellarin et al. (2001) and Ga ál et al. (2008).Three site statistics were selected (cf.Burn, 1990b;Ga ál et al., 2008): 3. the normalized 10-year precipitation quantile, estimated using the GEV distribution (x 10y ).
The selected statistics characterize the scale (c v ), shape (PS) and location (x 10y ) of the empirical distribution of the samples.A pooling scheme based on the site statistics is supposed to result in groups of sites that have a frequency distribution of extremes similar to the target site.
In each loop of the Monte Carlo procedure, samples of the annual maxima that resemble the real world (the actual number of sites, the length of the observations, and the spatial correlations between the sites) are simulated.At each site, the parent distribution is the GEV; its parameters correspond to the pooled L-moments according to the ROIsta pooling scheme.Having simulated the at-site samples, the pooling schemes and frequency models described above are applied to estimate the T -year quantiles of the precipitation extremes, which are then compared with the "true" quantiles obtained by the ROIsta pooling scheme.The loops of the Monte Carlo simulations are repeated 5000 times.
The different frequency models are compared by means of the bias and the root mean square error (RMSE) statistics.For a given return period T , and where i (m) is the index over the sites (repetitions); N (M) is the number of sites (repetitions); x T i is the "real" T -year value at site i ; and xT i ,m is the estimated T -year value at site i from the m-th sample of the Monte Carlo simulation.

Conclusions References
Tables Figures

Back Close
Full Screen / Esc Printer-friendly Version

Interactive Discussion
The Monte Carlo simulation procedure and some related considerations are described in more detail in Ga ál et al. (2008, Sect. 4).

Sensitivity analysis
A sensitivity analysis was performed in order to explore the role of different site characteristics and site statistics entering the dissimilarity matrix D (Eq. 1) and to identify the optimum setting of the ROI pooling schemes.The basic ROI schemes were analogous to those used in Ga ál et al. ( 2008); the models were based on 3 climatological (geographical) site characteristics (Sect.2.2), and labelled as ROIcli3 (ROIgeo3); both were connected with the model ROIsta based on 3 site statistics (ROIsta3) used for the estimation of the "true" quantiles during the simulation procedures (Sect.3.4).The sensitivity analysis examined the performance of the ROI models after removing one or two site attributes from the basic ROI pooling schemes (ROIcli3, ROIgeo3) or from the "true" frequency model (ROIsta3).The analysis was divided into two parts: (i) to examine the effects of changes made to the basic ROI schemes, while keeping the "true" model unchanged, and (ii) to examine the effects of changes in the "true" frequency model, while using the basic ROI schemes with 3 parameters.The different alternatives of the newly constructed ROI pooling schemes and the modified "true" frequency models are summarized in Table 1.Note that the number of alternatives to the ROIgeo models is reduced: While the ROIcli alternatives make use of all 6 possible combinations of the 3 available climatological attributes into singles (labelled as ROIcli1a, b and c) or pairs (labelled as ROIcli2a, b and c), there are no reasons for using other simplified ROIgeo models than those based purely on elevation (ROIgeo1) or the pair of geographical co-ordinates (ROIgeo2).Furthermore, the modified "true" frequency models are based only on pairs of possible combinations of the site statistics defined in Sect.3.4 (labelled with the suffixes "2Sa", "2Sb" and "2Sc") since it is unreasonable Introduction

Conclusions References
Tables Figures

Back Close
Full to construct "true" models based purely on one statistic.The sensitivity analysis was performed for both data sets of the 1-day and 5-day annual maxima.First, we focus on the consequences of the changes made to the basic pooling schemes ROIcli3 and ROIgeo3.The summary statistics of the models' performance in terms of the average RMSE and the average bias for the quantiles of the estimated distributions of the 1-day (5-day) maxima corresponding to T =5, 10, 20, 50, 100 and 200 years are given in Table 2 (Table 3); the box-and-whisker plots of the RMSE in Fig. 2 (Fig. 3) illustrate the spread statistics.In general, growth curves of the various models do not differ much in terms of the bias: that is why the box plots of the bias are not shown herein.
It is obvious that the ROI models based purely on a single site attribute show a very poor degree of performance: the average RMSE is approximately 50% higher (in some cases, especially for the 5-day duration, by ∼100% -Table 3) than for the best ROI models based on more than one site attribute.These results are in accordance with one's expectation: the frequency behaviour of the precipitation extremes cannot be explained by a single climatological characteristic.The ROIcli2 models based on two site attributes clearly perform better; however, the best average and spread characteristics of the RMSE are obtained for the basic ROIcli3 model for both data sets (for 5-day durations, there are three models based on ≥2 attributes with a comparable performance; the odd one is the ROIcli2b model, which suggests that the missing site characteristic of the warm/cold season precipitation ratio (Sect.2.2) plays an important role in the other ROIcli models).
While for the ROIcli models, more site attributes improve performance, a similar conclusion cannot be drawn for the ROIgeo pooling schemes: the ROIgeo2 model always outperforms the basic ROIgeo3 model (Tables 2-3, Figs.2-3).Such behaviour is accounted for by the role of the elevation in ROIgeo3: while in ROIgeo2, sites are pooled according to the geographical distance from the site of interest, ROIgeo3 gives preference to sites that are located in similar altitudes as the target one.
We demonstrate this with an example of two selected stations, Červen á and Olo-Introduction

Conclusions References
Tables Figures

Back Close
Full Screen / Esc Printer-friendly Version Interactive Discussion mouc, which are located not far from each other (40 km) but in different altitudes (750 and 225 m a.s.l., respectively).Pooling groups for the 1-day annual maxima at these stations according to both models are shown in Fig. 4, and some basic statistical characteristics of the composition of the pooling groups are summarized in Table 4. Figure 4a (Fig. 4b) illustrates the pooling group for the Červen á station according to ROIgeo2 (ROIgeo3).There is a relatively clear pattern of the distribution of the sites belonging to the ROI around the target site in ROIgeo2 (Fig. 4a): the further apart they are located, the less weight is assigned to them, and the weights are obviously independent on the elevation.A larger spread of sites in a broader neighbourhood of the target station appears in ROIgeo3 (Fig. 4b): sites located at similar altitudes are preferred over those in the very proximity of the target site, which is also underpinned by the fact that even sites from over 200 km are included in the ROI.Similar spatial patterns are observed for the Olomouc station (Figs.4c-d) when comparing ROIgeo2 and ROIgeo3, although the differences between the two pooling groups are not that large, mainly because many neighbouring stations are also located at similar altitudes.
Generally, when comparing the pooling groups in both models, the pooling groups of the ROIgeo3 model exhibit a greater average distance between the target site and the other sites within the ROI and a narrower elevation range (Table 4).Analogous findings hold true for the other pairs of stations, which are located not far from each other and at different altitudes.
The effects of the changes made to the "true" frequency model are summarized in Tables 5-6 and Figs.5-6.Both the average values and spread statistics of the RMSE indicate that the best "true" model is the one based on all three statistical characteristics, regardless of the data set (1-day/5-day precipitation maxima) and the pooling scheme (ROIcli3/ROIgeo3).Such a conclusion is also underpinned by the spread characteristics of the bias: the narrowest box-and-whisker plots are always related to the models based on the basic version of the "true" ROISsta3 model.
The performance of the modified "true" models with the suffixes "2Sa", "2Sb" and "2Sc" allows for the ranking of the individual site statistics according to their impor-Introduction

Conclusions References
Tables Figures

Back Close
Full tance.The ROIcli2Sc/ROIgeo2Sc models show the worst statistical properties, which suggests that the missing coefficient of variation plays the most important role among the selected site statistics in the "true" frequency models.On the other hand, the Pearson's 2nd coefficient is the least important statistic, since when it is ignored, the ROIcli2Sb/ROIgeo2Sb models perform nearly as well as the best "true" model.
Similar Monte Carlo simulations (examining changes made to the "true" frequency model) were also carried out for the ROIgeo2 pooling scheme (not shown); the general features of its performance in the light of "true" frequency models based on different input statistics are analogous to those for the basic ROIcli3 and ROIgeo3 pooling schemes.

Inter-comparison of the frequency models
Tables 7-8 and Figs.7-8 summarize the performance of the frequency models for 1day and 5-day precipitation amounts, which correspond to different concepts: the two superior ROI models (based on three climatological and two geographical characteristics, ROIcli3 and ROIgeo2, respectively -Sect.4.1), the Hosking-Wallis approach based on fixed regions (HWreg -Sect.3.2), and the at-site (local) estimates.The average values of the root mean square error in Tables 7-8 reveal that the at-site approach, regardless of the duration, is clearly inferior, since the RMSE statistics of the local model are worse nearly by a magnitude compared to the other models.The poor performance of the at-site approach is explained by the enhanced effects of the sampling fluctuations, which are reduced by the multi-site approach in the regional models/pooling schemes (cf.Hosking and Wallis, 1997;Ga ál et al., 2008).That is why we focus on a comparison of the regional/pooling approaches hereafter.
The ROIgeo2 pooling scheme outperforms the other models in terms of the average RMSE for both durations (Tables 7-8).The spread statistics of the RMSE in Fig. 8 clearly confirm this fact for the 1-day maxima.For the 5-day maxima, the box-andwhisker plots for the ROIgeo2 and HWreg models show much more balanced performance; however, the lower values of the 75% and 95% quantiles of the RMSE for all 290 Introduction

Conclusions References
Tables Figures

Back Close
Full the return levels suggest that ROIgeo2 is superior for the 5-day maxima, too (the percentage of sites at which the RMSE is large is reduced more efficiently compared to the other models).Another disadvantage of the HWreg model is a slight tendency towards a negative bias for both 1-day and 5-day durations (Fig. 7).
The results suggest that the Hosking-Wallis regional analysis may compete with the ROI method based on geographical characteristics only for multi-day precipitation extremes, the spatial variability of which is less affected by random (sampling) variations and more closely linked to some regional patterns in central Europe, which are related to atmospheric circulation and orographic features.The regional differences in the distributions of the multi-day extremes reflect, for example, the varied influences of Mediterranean cyclones (which often produce heavy multi-day precipitation) between the eastern and western parts of the country (e.g.Kyselý and Picek, 2007b).For oneday precipitation extremes, which are mostly related to convective phenomena in the warm season (88% of one-day maxima occurs in April-September), the ROI method based on geographical characteristics is clearly superior to all other frequency models, including the Hosking-Wallis regional analysis.On the other hand, it is worth noting that the Hosking-Wallis regional analysis outperforms the ROI model based on climatological characteristics for both durations of precipitation extremes, which contrasts with the results for Slovakia (Ga ál et al., 2008).It is likely related to the choice of the climatological characteristics in Slovakia, particularly the availability of the Lapin's index of Mediterrenean influence (Ga ál, 2005), which is closely linked to the occurrence of heavy precipitation (see also Sect. 5 below); no analogous index is available for the area of the Czech Republic.Another reason may be a different approach to the delineation of the homogeneous regions for the Hosking-Wallis analysis in the two countries, with 3 contiguous regions in Slovakia compared to Introduction

Conclusions References
Tables Figures

Back Close
Full Screen / Esc Printer-friendly Version Interactive Discussion

Discussion and conclusions
The paper deals with the estimation of growth curves of the annual maxima of 1-day and 5-day precipitation amounts in the Czech Republic by improved region-of-influence (ROI) methodology.The improvements consist in the way the pooling groups are constructed.The regional homogeneity test of Lu and Stedinger (1992) is incorporated in order to avoid subjective decisions concerning the parameters involved in the ROI methodology, and to avoid forming heterogeneous pooling groups for the estimation.
The remaining parameter in the improved ROI method, the "baseline" number of sites in a pooling group, is chosen according to the "5T rule" (Jakob et al, 1999), i.e. a rule of thumb for the minimum number of sites within a pooling group needed for a reliable estimation of a T -year quantile (set to 100 years herein).The proposed ROI methodology combines two different concepts of constructing the pooling groups described previously in hydrological studies (the "backward" approach of Zrinji and Burn, 1994, and the "forward" approach of Castellarin et al., 2001), and preserves some of their beneficial features: 1. Reasonably large numbers of sites in the pooling groups typical for a pooling scheme based on the strategy of gradual building up (Castellarin et al., 2001); too large pooling groups tend to be formed by the "backward" approach, which may smooth spatial details.For example, applying the ROIgeo3 scheme for the 1-day maxima in the present study, the average number of stations in the pooling groups is 49.0, 49.3, and 70.5 according to the "forward" approach of Castellarin et al. (2001), our modified method, and the "backward" approach of Zrinji and Burn (1994), respectively.
2. Small numbers of the pooling groups with insufficient number of sites typical for a pooling scheme based on the strategy of cutting down (Zrinji and Burn, 1994).
Using the ROIgeo3 scheme for the 1-day maxima again, the number of ROIs that do not meet the "5T rule" (i.e.small pooling groups of a size ≤11) is 40, 20, and 14, and the number of single-site pooling groups is 8, 1, and 1, according to Introduction

Conclusions References
Tables Figures

Back Close
Full the "forward" approach of Castellarin et al. (2001), our modified method, and the "backward" approach of Zrinji and Burn (1994), respectively.
A sensitivity analysis, which examined the consequences of the changes made to the input attribute sets of the pooling schemes, confirmed a simple principle "the more input variables -the better performance" in the case of climatological site characteristics (used in the ROIcli models) and site statistics (used in the ROIsta pooling scheme as the "true" frequency model in the simulation procedure).On the other hand, in the case of geographical site characteristics (used in the ROIgeo models), the pooling scheme based on two co-ordinates (latitude and longitude) was found superior compared to the one that makes use of all three co-ordinates (including altitude).In general, however, both alternatives to the ROIgeo models have their own pros and cons.The main drawback of the ROIgeo3 pooling scheme is the tendency to pool sites from considerable distances away from the target site, while the disadvantage of the ROIgeo2 pooling scheme is that it pools sites regardless of their altitudinal zonality.However, the drawbacks of the ROIgeo2 model are less pronounced; therefore, it always outperforms the ROIgeo3 pooling scheme in terms of the RMSE of the estimated quantiles in the present application.An open question is whether some combination of the geographical and climatological characteristics would result in a model that outperforms the ROI schemes based on either geographical or climatological attributes, and whether there are other useful site attributes in addition to those examined; these issues go outside the scope of the study and deserve further investigation.
The main finding concerning the inter-comparison of various regional frequency models is that the ROI pooling scheme based on the actual proximity of sites (latitude and longitude) outperforms the other models (including the Hosking-Wallis regional analysis), regardless of the duration of the precipitation extremes.Such a conclusion, in general, is in good agreement with the findings of Ga ál et al. ( 2008) who also showed the superiority of the pooling approach over the other frequency models, when analyzing the precipitation data in Slovakia.Nevertheless, they pointed out that different durations need different ROI approaches: while the pooling scheme based on geographical Introduction

Conclusions References
Tables Figures

Back Close
Full attributes is preferable for 1-day maxima, the ROI model based on climatological attributes shows better statistical properties for 5-day durations.The difference between the main findings of the two studies may be related to the following causes: -The more rugged terrain of Slovakia (the mountain range of the West Carpathians belt, including the High and Low Tatras, prolonged from west to east), which may enhance the role of climatological characteristics in the identification of similar patterns of the precipitation extremes; -A different suite of climatological characteristics used in the analysis for Slovakia, with Lapin's index of Mediterranean influence (Ga ál, 2005); no similar index of precipitation climate related to extremes is available for the Czech Republic; -The poorer performance of the pooling schemes based on three geographical characteristics in Slovakia, which is likely due to the higher altitudinal variability of the selected sites in the country (a much larger percentage of higher-elevated sites at altitudes >1000 m a.s.l.); -The denser network of the sites available in the Czech Republic (about one site per 400 km 2 ) compared to Slovakia (about one site per 900 km 2 ), which tends to give preference to the similarity of sites based on geographical proximity rather than climatological characteristics.
One may argue that in the present study, pooling schemes based on climatological characteristics show a poorer performance due to failing to choose the right attributes for the analysis.We used the same set of site characteristics (Sect.2.2) that had constituted the basis for the identification of homogeneous regions in the traditional regionalization (Sect.3.2; Kyselý and Picek, 2007a), in order to preserve consistency.Furthermore, the selected climatological attributes are among the most appropriate descriptors of the precipitation climate.Therefore, it may be more reasonable to expend efforts on finding a suitable combination of both climatological and geographical attributes for a single pooling scheme rather than searching for some more descriptive 294 Introduction

Conclusions References
Tables Figures

Back Close
Full Screen / Esc Printer-friendly Version Interactive Discussion climatological characteristics.An objective method for identifying the proper weights W m for the site attributes in the dissimilarity matrix (Eq. 1) would have to be implemented for this purpose.

Appendix A
The generalized extreme value distribution The cumulative distribution function of the generalized extreme value (GEV) distribution is provide more information in the regionally averaged statistics.Note t 10 weighting coefficients in Eq. ( 6) do not change when changing the foc to another within a given region.
In the focused pooling, the length of observations n j is retained in the W i j ; however, compared to Eq. ( 6), the reciprocal value of the distance D i j is introduced as an additional factor (Castellarin et al., 2001): where ROI i stands for the region of influence of the site i , and where D i j,min is the lowest non-zero value of the distance metric betwee i and all the other sites j (Castellarin et al., 2001).(The expression for where ξ, α and k is the location, scale and shape parameter, respectively (Hosking and Wallis, 1997).The parameters satisfy −∞<ξ<∞, α>0 and −∞<k<∞.For an estimation of the parameters, we use the approximation of Hosking et al. (1985): and where t 3 is the sample L-skewness, l 1 and l 2 are the first two sample L-moments (cf.Hosking and Wallis, 1997) and Γ (•) denotes the gamma function weighting coefficients in Eq. ( 6) do not change when changing the focus from one to another within a given region.
In the focused pooling, the length of observations n j is retained in the relationship W i j ; however, compared to Eq. ( 6), the reciprocal value of the distance metric elem D i j is introduced as an additional factor (Castellarin et al., 2001): where ROI i stands for the region of influence of the site i , and where D i j,min is the lowest non-zero value of the distance metric between the target i and all the other sites j (Castellarin et al., 2001).(The expression for D * i j in Eq. ( 20more complex in order to avoid inconsistent results for i =j : in this special case D i which would lead to W i i =∞.)Using the reciprocal value of the distance metric elem where t (i ) is the sample L-Cv (Eq. 3 in Sect.3.3) at the i -th site, and the shape parameter k is estimated by Eq. (A2).
The heterogeneity measure of the X10 test is then where N is the total number of sites in the region, is the weighted regional average of x .95,N−1 , the null hypothesis is not rejected at the 0.05 level and the region may be considered homogeneous.In the opposite case, one rejects the null hypothesis and the region is considered heterogeneous (Lu and Stedinger, 1992).Introduction

Conclusions References
Tables Figures

Back Close
Full 1. the coefficient of variation: c v =σ µ, where µ(σ) is the sample mean (standard deviation

20
more complex in order to avoid inconsistent results for i =j : in this spe which would lead to W i i =∞.)Using the reciprocal value of the

R
the weights proportional to the record length n i ), and varx (i ) 10 is the asymptotic variance of x has an approximate chi-square distribution with N−1 degrees of freedom.If χ

Fig. 1 .
Fig. 1. 209 climatological stations in 9 homogeneous regions available for a regional frequency analysis of heavy precipitation amounts in the Czech Republic.

Fig. 2 .
Fig. 2. Root mean square error (RMSE) of growth curves of 1-day annual maxima in a sensitivity analysis when changes made to the basic ROI pooling schemes are examined.T denotes return period.(a) T =10 years, (b) T =20 years, (c) T =50 years, (d) T =100 years.

Fig. 3 .Fig. 4 .
Fig. 3. Root mean square error (RMSE) of growth curves of 5-day annual maxima in a sensitivity analysis when changes made to the basic ROI pooling schemes are examined.T denotes return period.(a) T =10 years, (b) T =20 years, (c) T =50 years, (d) T =100 years.

Fig. 5 .
Fig. 5. Bias and root mean square error (RMSE) of growth curves of 1-day annual maxima in a sensitivity analysis when changes made to the "true" frequency models are examined.T denotes return period.(a) and (c) T =20 years, (b) and (d) T =100 years.

Fig. 6 .Fig. 7 .Fig. 8 .
Fig. 6.Bias and root mean square error (RMSE) of growth curves of 5-day annual maxima in a sensitivity analysis when changes made to the "true" frequency models are examined.T denotes return period.(a) and (c) T =20 years, (b) and (d) T =100 years.

Table 2 .
Performance of ROI pooling schemes based on different combinations of site characteristics as measures of similarity for 1-day annual precipitation maxima.RMSE T denotes the average root mean square error of the estimated growth curves corresponding to the return period T [years], expressed in %; the smallest values of the statistics are marked in bold.

Table 3 .
Performance of ROI pooling schemes based on different combinations of site characteristics as measures of similarity for 5-day annual precipitation maxima.RMSE T denotes the average root mean square error of the estimated growth curves corresponding to the return period T [years], expressed in %; the smallest values of the statistics are marked in bold.

Table 4 .
Basic characteristics of the pooling groups constructed according to two different pooling schemes ROIgeo2 and ROIgeo3 for two selected stations, Červen á and Olomouc.The target sites are not included in the number of sites in the ROI.

Table 5 .
Performance of ROI pooling schemes ROIcli3 and ROIgeo3 based on different combinations of site statistics in the "true" frequency model for 1-day annual precipitation maxima.RMSET (BIAS T ) denotes the average root mean square error (average bias) of the estimated growth curves corresponding to the return period T [years], expressed in %; the smallest values of the statistics (in an absolute sense) are marked in bold.

Table 6 .
Performance of ROI pooling schemes ROIcli3 and ROIgeo3 based on different combinations of site statistics in the "true" frequency model for 5-day annual precipitation maxima.RMSET (BIAS T ) denotes the average root mean square error (average bias) of the estimated growth curves corresponding to the return period T [years], expressed in %; the smallest values of the statistics (in an absolute sense) are marked in bold.

Table 8 .
Average root mean square error (RMSE T ) and average bias (BIAS T ) of growth curves of 5-day annual precipitation maxima for return period T [years], expressed in %.The smallest values of the statistics (in an absolute sense) are marked in bold.