A spatial bootstrap technique for parameter estimation of rainfall annual maxima distribution

Paper summary The reviewed manuscript presents a novel methodology, based on multiple-station probability weighted spatial bootstraping, to estimate the parameters of a GEV distribution model for rainfall maxima at different (ungauged) locations in a geographical region. The applied probability weights are proportional to the product of two Gaussian kernels: the first accounts for the reduced influence of rainfall observations with increasing distance from the location were the GEV parameters are estimated, while the other accounts for orographic effects. Assuming simple scaling of the annual rainfall maxima, the authors: a) standardize rainfall averages at any measuring location by the sample mean, b) construct synthetic samples at ungauged locations inside the catchment (i.e. by probability weighted spatial bootstrap resampling from the standardized data), and c) use the synthetic samples to obtain the spatial variation of GEV parameters, as duration independent. The authors proceed one step further and derive depthduration-frequency relationships.


Introduction
Depth-Duration-Frequency (DDF) curves, as well as Intensity-Duration-Frequency (IDF) curves, are widely used to characterise frequency of rainfall annual maxima in a geographical area.Such characterisation is an essential requirement for architecture and civil engineering, from housing to large industrial plant design.Stewart et al. (1999) reviewed actual applications of estimates of rainfall frequency and estimation methods.
Recently, Svensson and Jones (2010) have reviewed methods applied in 9 countries worldwide to characterise rainfall and flood return periods relevant for dam design.Most methods are based on parameter estimation of Gumbel or GEV distribution and make use of a growth curve to relate a statistical rainfall index to a design rainfall/flood estimate (indexflood method).What is of interest for the present work is that many methods reviewed by Svensson and Jones (2010) make use of some form of regionalisation, either by spatially interpolating the distribution parameters after estimation at station points, or by using together data from homogeneous areas, either defined by geographical boundaries, or centred around a location of interest.
The idea of using an observational network to characterise a region has been exploited by Overeem et al. (2008), who use 12 stations for estimating (by L-moments and assuming stationarity) a single GEV distribution in the Netherlands, after verifying that the spatial dependence of the maximum rainfall is weak.Overeem et al. (2008) also evaluate uncertainties due to sample variability associated to estimation of GEV and DDF curves parameters, by making use of a bootstrap algorithm (GREHYS, 1996) to estimate standard deviations (SDV).
Building on the concept of estimating over an area, rather than a collection of single points, Overeem et al. (2009) make use of a reliable 11 yr series of radar-estimated rainfall to compute DDF curves over the Netherlands.Overeem et al. (2010) take into account spatial variability, by defining a local maximum in a box surrounding each radar pixel and letting the GEV parameters depend on the box size.Regionalisation is possible when the local spatial dependence is weak enough to enable transferring information to a site of interest from surrounding observing sites (Buishand, 1991).Spatial dependence may be important, though, at a larger scale, comparable with the domain size.Lombardy (Italy) has a central position in the Southern Alps, it includes part of the Po Plain and, in the South, it reaches the Appennines (Fig. 1).Despite its limited spatial extension (see the distance scale in Fig. 1), Lombardy contains several different climatic areas, presenting significant rainfall differences within short distances.For example, the northwestern area of Lake Como is adjacent to the area Piedmont-Ticino-Lake Maggiore, which is identified as a mesoscale hotspot for heavy precipitation in the Alps (Isotta et al., 2013).Conversely, the eastern part of Lombardy's Po Plain presents rather low values of mean annual precipitation (Brunetti et al., 2009).
As an example of station-point estimation in an area close to Lombardy, Borga et al. (2005) estimate DDF curves in Trentino (Italy), an area adjacent to northeastern Lombardy, assuming stationarity and making use of a simple scaling law.
After separating the dataset in convective and non-convective events, they estimate all parameters at station locations, and subsequently spatially interpolate them over the area.Separately estimating at station points, however, requires the availability of long, continuous time series of reliable observations at fixed sites.This can constitute a strong limitation in the practice, imposing a drastic reduction on the number of usable observations.In addition, as it is shown in Sect. 3 of the present document, such a station-point approach is sensitive to the possible presence of rare events, which, if untreated, may cause unrealistic spatial patterns in the interpolated fields.
For these reasons, and because of the mentioned characteristics of Lombardy's topography and precipitation climatology, it is necessary to both introduce a regionalisation and to allow for spatial dependence.
In order to estimate distribution parameters at any location, it appears appropriate to take into account all data observed at surrounding stations, at least until a prescribed distance (e.g.Reed et al., 1999), or, as in the approach presented here, with decreasing importance when distance increases.In fact, whenever time series from station sites are used to estimate DDF curves in all the points of a given area, it is implicitly assumed that each time series is representative for an area surrounding the observing station.Conversely, for any given site where an observing station is not present, all data from surrounding stations are relevant: more those from stations located nearby, less those from stations located far away.This assumption does not depend on the actual presence or absence, in the present or in the past, of an observing station at the considered site.
Considering time series observed at several different stations may effectively reduce the influence of outliers on parameter estimation, while appropriately allowing them to influence the evaluation of the estimate uncertainty.Outliers, for the underlying extreme event distribution, are events with a return period much longer than the actual length of the time series, such as very high annual maxima, representing meteorological events that are extremely infrequent, but nevertheless possible.Such maxima, observed very few times at very few locations may have excessive influence on distribution parameters estimated separately at each station point.
In the present work, standard estimation methods, such as L-moments and maximum likelihood, are applied to synthetic series generated at any desired location by resampling from several stations, irregularly distributed in space.By taking into account the length of each time series to avoid the possibility of over-sampling, the use of short time series is enabled, greatly increasing the dataset size.The method is applied over a 1.5 km × 1.5 km regular grid covering the whole area of Lombardy.
In this document, Sect. 2 presents the dataset.In Sect.3, the sensitivity to rare events of point-by-point estimation is discussed with reference to a clarifying example.The multiple-station spatial bootstrap method is presented in Sect. 4 and results are presented in Sect. 5. Section 6 describes the evaluation of uncertainty on the estimated parameters, and discusses how this is transferred into uncertainty on the computed rainfall depth thresholds.

Rainfall annual maxima dataset
Data used in this study are rainfall annual maxima recorded by tipping-bucket catching gauges (Vuerich et al., 2009).Two are the main data sources: an "historical" dataset and the present-day automatic network.
The historical dataset is composed of rainfall annual maxima from "mechanical" stations (hereafter indicated as stations of type M), equipped with analog instrumentation and originally managed by the National Italian Hydrographic Service (Servizio Idrografico e Mareografico Italiano, 1917-1998): these data were digitised in the (recent) past and are now included in the database of the Hydrographic Service at ARPA Lombardia.Data from 1929 to 2001 from 69 observing sites of the historical dataset, selected with a minimum series length of 24 yr (in total 2753 annual records, each with 5 maxima for different event durations), were used in a study (De Michele et al., 2005) which originated the previous operational scheme.In that work, a separate estimation is computed at each of the 69 stations, then a kriging spatial interpolation is performed to assign DDF curve parameters and quantiles to every location in the domain.Few more series from type M stations, not used by De Michele et al. (2005), have been recently acquired and entered the dataset.Stations of type M were manned and underwent frequent maintenance.They continued to be operational until 2005.
Since the 1990s, automatic meteorological stations (hereafter indicated as stations of type A) started to be independently installed by several different public institutions, with different aims and criteria, and with inhomogeneous instrumentation.Unfortunately, the continuity of long data series has not been guaranteed by such an important change of the observational network: the dismissal of a particular "mechanical" station did not automatically imply that a new automatic station was installed at the same location.Starting from 2004, the different existing networks merged in a single observational network, the ARPA Lombardia mesonet.An operational quality assurance system started to be set up since 2008 (Ranci and Lussana, 2009;Lussana et al., 2010).Nowadays, precipitation data undergo both automatic plausibility checks and human operator controls, aimed at assessing its quality and at preventing observations affected by gross errors from entering automatic elaborations.
The high spatial resolution of the present-day station distribution is considered sufficient to appropriately describe stratiform precipitation mesoscale events, and convective events too, though with approximation depending on local station density (Lussana et al., 2009).However, the data distribution is not completely homogeneous in space: border ef-fects are likely to occur where observational information is lacking outside Lombardy; in mountain areas (Alps, Prealps and Apennines), stations are mostly located in the populated valley floors.In such areas, the estimated parameters are expected to have larger uncertainty: it is then important to evaluate it and to assess its spatial distribution, as it can be done by means of the method proposed in this document.
All data used in this study underwent supplementary quality control, including comparison of individual series with data of the same time period from other stations in the same area.Moreover, every annual maximum obtained from less than 90 % of acceptable observations in the year was rejected.Homogenisation and comparison of different data sources is still under way at ARPA Lombardia, also with the purpose of evaluating the stationarity of the statistics.In order to do that, regional averaging is recommended (Klein Tank et al., 2009), then the method proposed in this work might also be useful in that context.For the present study, however, stationarity has been assumed as a working hypothesis.
After quality control, with the consequent exclusion of suspect observations and even of few whole series, it has been possible to merge the historical dataset and the presentday mesoscale network.The resulting dataset ranges from 1929 to 2011.It includes 4510 annual records (each with 5 maxima for different event durations: 22 431 values, excluding missing or invalidated data) from 312 stations, shown in Fig. 1, where different symbols are used to indicate stations of type M (circles, 125 stations) and A (triangles, 187 stations).The longest time series belong, of course, to stations of type M. In Fig. 1, different colours are used to distinguish short series, yellow (less than 10 annual maxima), from longer series, blue.The presence of short series (of both types M and A) is important both for the size of this portion of the dataset, and because one of the advantages of the approach presented in this work is that it enables their use, as opposed to separate station-point estimation that needs to discard them (Sect.4).
The dataset used in this work includes annual rainfall maxima for event durations D = 1, 3, 6, 12 and 24 h.

Sensitivity to rare events
Figure 2 shows the time series of annual maxima for the event duration D = 24 h for two stations located in the Po Plain and distant about 40 km from each other.Their locations are shown in Fig. 1 as the green square, LODI, and the red square, CREMONA.The distance between these two stations is rather small if compared with the expected scale of spatial variations of precipitation climatology in the Po Plain (Isotta et al., 2013;Brunetti et al., 2009).
The average behaviour of the two time series is in fact quite similar, except for the occurrence of two main maxima, much higher for CREMONA than for LODI.The high values When estimation is computed separately at each station, as in the previous operational scheme (De Michele et al., 2005) of the Hydrographic Service of ARPA Lombardia, the presence of the two spikes results in important differences in the L-moments-estimated parameters of both the GEV and Gumbel distributions and, consequently, on the quantiles.The resulting DDF curves are compared for the two stations in Fig. 3, where large differences can be seen, both in return periods for an assigned threshold depth, and in depths associated to an assigned probability of occurrence.It should be remarked that removing the two spikes from both series resulted in much similar extreme event distributions and DDF curves.Finally, the estimation has been repeated using the maximum likelihood method and the resulting DDF curves (not shown) still show a similar sensitivity to the presence of the two spikes in CREMONA's series.
Another difference between the two series, appearing from Fig. 2, is that observations start about 30 yr earlier at CREMONA.However, discarding the first part of CRE-MONA's series so that the two samples have comparable size (i.e.length in time), does not change the parameter estimation in a way comparable with the effect of the two spikes.
The differences found in the estimated probability of rainfall events at LODI and CREMONA are not justified for sites that are so close to each other and so similar to each other, with regard to the surrounding orography.Moreover, estimating parameters of the rainfall annual maxima distributions at station locations is only an intermediate step towards characterising the occurrence probability of extreme rainfall events for the whole Lombardy.In other words, station data need to be used to characterise every point in the area.Any place which is close enough to both LODI and CREMONA station is expected, as a result of station-point estimation, to be influenced by both time series in a way that rapidly vary in space, because of the occurrence of two outliers.

Spatial bootstrap technique
With the purpose of estimating extreme value statistical distribution parameters (Sect.4.4) in any point X (defined by its geographical coordinates), data observed at several surrounding stations are taken into account.At each target point X, the spatial bootstrap procedure can be sketched as follows.
1. Sampling (details in Sect.4.1): observations from different time series are used to resample a new, synthetic, "time series", which is attributed to point X.
3. Points (1, sampling) and (2, parameter estimation) are repeated several times (1000 in the case of the presented results) to obtain, for each parameter, a distribution of estimates: the mean of such distribution is finally used as the actual estimate, while the standard deviation (SDV) measures its associated uncertainty, due to sample variability.
In the following part of this section, implementation details and working hypothesis are presented.Remark that neither the stationarity assumption (Sect.a simple scaling law (Sect.4.4), even if they do influence the obtained estimate, are critically required for the spatial bootstrap technique, which can be applied using alternative assumptions as well.

Sampling
An ensemble of prescribed size M of observed values is extracted randomly among all the N available data.The probability of extraction of each observation is proportional to a prescribed function γ of the distance between station location and point X.In the present case, a function with Gaussian shape is used.If the nth observation has been taken at the k n th station, then: where d (X, k n ) is the horizontal distance between station k n and point X, z and z k n are the elevations above mean sea level, respectively of point X and of station k n .D h and D v are scale parameters which have been set at D h = 30 km (horizontal scale) and D v = 600 m (vertical scale).These values, constrained from below by the observational network resolution, are chosen to impose a weak smoothing (Uboldi et al., 2008).By Eq. ( 1), time series from stations nearby are more frequently sampled from, with respect to time series from stations located far away from X. Remark that the chosen function accounts for elevation difference as well, and that distance-dependent probabilities are assigned to individual observations, rather than to stations, to avoid oversampling from "short" time series located near the target point X.
In traditional estimation, separately performed at each station point, "short" time series (series with few annual maxima) are necessarily discarded, because they do not provide a sufficient number of data for the estimation.Moreover, they are useless for "long" return period, with the practical rule that the maximum significant return period is about twice the series length.Many data would be discarded in this way, in particular from stations which have been installed recently.
With the proposed method the use of very short time series is enabled.For each location X (gridpoint or station), by prescribing distance-dependent extraction probabilities, observations from nearby stations are selected more often than observations from stations located far away.Oversampling from nearby stations with only few data is effectively avoided.In fact, if a station were chosen with probability proportional to its associated γ and, for the selected station, an observation were chosen with uniform probability afterwards, then nearby observing locations with only few observation could be over-sampled: the same few observations could be chosen many times in each synthetic series.This is avoided by assigning distance-dependent probabilities to individual observations, rather than to stations, and normal-ising with the sum of probabilities of all observed values: In this way, all observations from the same station still have the same probability, but the probability of extracting from a particular station depends both on distance and on the size of its observation set, i.e. the length of its time series.For example, if, for a particular location only two stations existed: "a" with only 3 observations, but very close, with γ a = 0.9; and "b" far away, with γ b = 0.1, but with 100 observations, then the sum of all (observational) γ s would be = 3 × 0.9 + 100 × 0.1 = 12.7.The individual probability would be 0.9/12.7 for each observation from "a" and 0.1/12.7 for each observation from "b", but the "station" probability would be 2.7/12.7 for station "a" and 10/12.7 for station "b".
Appendix A describes how the γ s and their sum (over all observations) are used, starting from a pseudo-random real number extracted with uniform probability in the interval (0, 1).
The total number of available observations is about N = 4500 (slightly different for each duration because of missing data, Sect.2).
The length of the synthetic series obtained (i.e. the sample size) for the presented results (Sect.5) is M = 50, considered sufficient to obtain stable estimates and to span the time period covered by the dataset .A larger size, M = 100 has been tested, without any noticeable difference in estimates.

Parameter estimation
Parameter estimation is obtained for each synthetic series, i.e. at each gridpoint, by applying the L-moments algorithm first, then using the L-moments estimate to initialise a maximum log-likelihood estimation, solved by means of a nonlinear conjugate gradient iteration.
Since the GEV distribution (Sect.4.4) has a lower bound when ξ > 0, and a higher bound when when ξ < 0, it may well happen that one or more observation w in the sample do not satisfy the condition 1 + ξ (w − ε) /α > 0 (Sect.4.4) for some set of parameters (ε, α, ξ ) possibly met after the Lmoments estimation, or during the iterative minimisation.It has then been necessary to make sure that, before starting the line minimisation required at each conjugate gradient iteration step (Press et al., 1992), the sought minimum was effectively "bracketed" by two points, i.e. two sets of parameters (ε, α, ξ ), both satisfying the GEV condition.

Stationarity
In the stationarity hypothesis, the actual order in time of the sampled observations is not important, then the time coordinate is not used in the parameter estimation: this is the case of the results presented in Sect. 5. On the contrary, if stationarity were an issue, sampling could be performed by prescribing extraction probabilities that effectively ensure uniformity in time (for example, by extracting one observation, or a fixed number of them, for each year).In that case the time coordinate could be used in the estimation, for example by prescribing a linear dependence on time of one or more distribution parameters, in order to estimate a trend when applying the maximum-likelihood method (Coles, 2001).

Depth-Duration-Frequency curves and scale invariance
The cumulative GEV (Generalised Extreme Value) distribution is: where µ, σ and ξ are known as the location, scale and shape parameters, respectively.The GEV expression is defined for 1+ξ (h − µ) /σ > 0 and includes Gumbel distribution as the particular case ξ = 0 (Wilks, 2011;Coles, 2001).If 1 − F (h) is interpreted as the probability that, in a year, a rainfall event (of a given duration) exceeds the "threshold" h, then the return period is measured in years and defined as: The proposed spatial bootstrap method is quite general and does not depend on particular assumptions with regard to scaling.Stochastic self similarity, or multifractality, is acknowledged as the general scale invariance condition for rainfall fields (Veneziano et al., 2006).Here, as a working hypothesis for the operational implementation, we follow Burlando and Rosso (1996) and De Michele et al. (2005).Rainfall annual maxima follow a scale invariance law such that the statistical distribution for an event duration λD, and that of the reference duration D are the same after scaling by a power λ n , with exponent n to be determined depending on the process and on the site.Quantiles and momenta of the distribution scale then with the same factor λ n .This corresponds to assuming that time series related to different event durations for the same site, after normalisation with the mean, are sampled from the same extreme event statistical distribution.If h D is the sample mean for event duration D, the distribution parameters are redefined as: The same is done for the quantiles: By inverting the expression of cumulative distribution function, Eq. ( 3), the normalised rainfall depth threshold is expressed as a function of the return period: The analytical DDF curve expression for duration D is, finally: The scale invariance coefficient, a 1 , and the scaling exponent, n, are obtained by least square optimisation: 10) where the averages are taken on event durations.
The assumption that standardized maxima of precipitation follow the GEV model with constant parameters is based on standard asymptotic results from Extreme Value theory (Coles, 2001).
Making use of a simple scaling law such as Eq. ( 9), though, is not mandatory.
In recent works (Veneziano et al., 2009;Langousis et al., 2013) a GEV with shape parameter ξ dependent on duration is advocated as the most appropriate distribution for rainfall annual maxima and for durations of finite length.We remark that, by means of the spatial bootstrap technique proposed here, the sensitivity to outliers is effectively reduced (Sect.5), the sampling size is increased (M = 50, Sect.4.1) and the final estimate for each parameter is obtained as the sample mean, effectively reducing uncertainty.The shape parameter ξ remains however the most uncertain (Sect.6): allowing a direct dependence on duration might improve the accuracy of the estimate.
Moreover, Di Baldassarre et al. ( 2006) analyse short durations, up to 1 h, by testing different analytical formulations for the DDF curves and point out the limitations of simple formulations of scale invariance, arguing that a transition in scaling should be acknowledged for storm durations of about 1 h.Overeem et al. (2008), in their study, use event durations from 1 h to 24 h, describing the dependence of GEV parameters on duration by a generalised least-square method.They find that the ratio scale/location σ/µ, and, as a consequence, the coefficient of variation, increase with decreasing duration, concluding that a simple scaling formula cannot hold.In other works (Borga et al., 2005;Svensson and Jones, 2010),   Boxplot labelled A refers, in both panels, to the previous operational method, i.e. separate estimation at each of 69 stations; boxplot B is obtained by applying the proposed method using the same 69 stations only; boxplot C is obtained estimating at the same 69 locations, by applying the proposed method using the complete dataset.Comparing boxplot A and B shows the effect of applying the new method at the same dataset, while comparing boxplot B and C shows the effect of the dataset extension (when estimating at the same, old 69 sites).The combined effect of the method change and of the dataset extension is seen by comparing boxplot A and C. in fact, annual maxima from convective and non-convective events are separately treated.Such criticism could lead us, in the future, to refine the scale invariance assumption, by testing the appropriateness of separating the 1 h duration from the rest of the dataset, or by replacing Eq. ( 9) with an alternative expression, or again allowing a direct dependence on duration of the distribution parameters.

Estimated parameters and quantiles
The estimates of the scale invariance coefficient a 1 and of the scaling exponent n are shown as maps in Fig. 4. Figure 5 shows the GEV distribution parameters: "location" ε (left panel), "scale" α (central panel) and "shape" ξ (right panel).For all these maps, the estimate is obtained as the mean of the ensemble of estimates computed by repeating 1000 times the resampling and estimation procedure at each gridpoint.
Since scale invariance has been assumed and Eqs. ( 5)-( 7) have been used, GEV location parameter ε and GEV scale parameter α (Fig. 5) are dimensionless: the only dimensional parameter is the scale invariance coefficient a 1 (left panel in Fig. 4), measured in mm and representing the average rainfall depth for unity event duration.
The shape parameter ξ assumes values around zero: the GEV distribution reduces to a Gumbel distribution (with the same location and scale parameters) in the limit ξ = 0.The estimate of ξ is obtained by the mean, and the uncertainty on the estimate is evaluated by the SDV: this information is used in the right panel of Fig. 5: the field is masked out when the absolute value of the mean is smaller than the corresponding SDV.In this way the unshaded (masked) areas are attributed to a Gumbel distribution and areas shaded in red to a "heavytailed" Frechét distribution.No negative values of the mean exceed, in absolute value, the corresponding SDV, then no gridpoints are attributed to a reversed Weibull distribution.
For a numeric comparison with the previous operational scheme, the presented method has been applied to the same "old" dataset, composed by 69 stations, used by De Michele et al. ( 2005) and described in Sect. 2. These results are shown in Fig. 6, where each boxplot shows the distribution of the rainfall depth threshold h T values estimated for the return period T = 200 yr at the 69 sites: boxplot labelled A corresponds to the previous operational scheme (De Michele et al., 2005), i.e. separate estimation at each station; boxplot B is obtained by using the same dataset, but with the new method described in Sect.4; boxplot C is obtained by applying the new method to the complete dataset described in Sect.2, but only estimating, in this case, at the 69 sites composing the old dataset.The two panels of Fig. 6 refers to event duration D = 3 h, left, and to event duration D = 24 h, right.The two durations are chosen as representative of annual maxima due to mainly convective events, in the case of D = 3 h, and of longer, stratiform events, in the case of D = 24 h.
Remark that possible deformations due to the interpolation step of the old method are excluded in this comparison.
Comparing boxplot A and boxplot B demonstrates the effect of the method change with the same, old, dataset.Comparing boxplot B and boxplot C shows the effect of the dataset extension with the same, new, method.The cumulative effect of method change and dataset extension can then be seen by comparing boxplot A and boxplot C. The most relevant effects are, in both panels: the upward shift of the box (first to third quartiles including the median) from boxplot A to boxplot B, when the estimation is influenced by nearby stations, and the progressive dispersion decrease, as measured by the decreasing interquartile range (box height), when going from boxplot A to boxplot B, then to boxplot C, with more and more data conditioning the estimation.
The root-mean-square differences between estimates at the 69 sites for the three schemes are shown in Table 1, showing that method change and dataset extension do have an impact, larger for method change.
Comparing the scheme performances at a common subset of observing sites has the advantage of excluding from the analysis most effects of local differences in spatial data density and of differences in elevation above mean sea level.It is however an important goal of the present work to estimate over the whole of an extended and complex area such as Lombardy.The three schemes A, B and C are also compared, then, as rainfall depth threshold maps, in Figs.7-10.
Maps of the rainfall depth h T (mm) exceeded with return period T = 200 yr are shown, for event duration D = 3 h, in Fig. 7 for scheme A (left) and B (right), and in Fig. 8 for the new proposed scheme with the complete dataset, C. Similar Table 1.Root-mean-square differences (mm) of rainfall depth thresholds estimated at the 69 sites composing the "old" dataset, with the three schemes.A: old method, old dataset; B: new method, old dataset; C: new method, new extended dataset (but only estimates at the 69 sites are considered for this comparison).

RMSD (mm) D
maps are shown, for event duration D = 24 h, in Fig. 9 for scheme A (left) and B (right), and in Fig. 10 for scheme C. For schemes B and C, these maps are produced by evaluating Eq. ( 9) at each gridpoint and estimating all parameters by their means.
Using the new method, even with the same, old, 69-station dataset, has the effect, evident when comparing left and right panels in both Figs.7 and 9, that the unjustified maxima and minima appearing in the Po Plain are eliminated: these are due, in scheme A, to the combined effects of outliers present in individual time series, "CREMONA" in particular, as discussed in Sect.3, and of the subsequent spatial interpolation.This problem is straightforwardly overcome in the new Since the interpolation step of scheme A does not account for elevation difference, no orographic signal can be seen in its maps.In schemes B and C, the fields appear correlated with the orographic features for duration D = 24 h, associated to long stratiform events, and much less for duration D = 3 h, associated to important thunderstorms.
Differences between scheme B and C, due to the dataset extension, regard details of the fields.In particular, the correction in the Plain discussed above is further improved by using more data in scheme C, including comparatively short time series from "recent" automatic stations, progressively installed starting from the 1990s.
It is worthwhile remarking that at the locations of the two stations discussed in Sect.3, LODI and CREMONA (Fig. 1), the h T fields assume similar values both in Fig. 8 and in Fig. 10, and that the (weak) gradient across the Po Plain even opposes the difference found in Sect. 3 by separately estimating at the two locations and, consequently, appearing in the left panels of Figs. 7 and 9. To comment on the maps of parameters and of rainfall depth threshold estimated by applying the new proposed method to the complete extended dataset, one has to consider that the Alpine climatology is characterised by moist zones extending along the external flanks of the Alps and dry conditions in the interior of the mountain range (Frei and Schär, 1998).Furthermore, the western part of Lombardy's Prealpine area is among the alpine areas with the higher values of mean annual accumulated precipitation (Isotta et al., 2013;Brunetti et al., 2009).These differences are due to the complex interaction between the moist and unstable largescale southwesterly atmospheric flows with the orography.In our study, this interaction leads to the occurrence, in the distribution parameter fields, of a south-westward gradient across the Alpine area (i.e. the northern part of the domain).This is apparent, for example, in the a 1 field, Fig. 4. Consequently, the most exposed southwestern ridges, the Prealps, present the highest h T values in Figs. 8 and 10, while the lowest h T values are found in the shielded Alta Valtellina, the alpine area in the extreme northeast.
The Po Plain is characterised by a significant westward gradient in many parameters (for example, the a 1 field in Fig. 4), resulting in larger h T values in the west with respect to the east (Figs. 8 and 10).
Comparing h T for D = 3 h, Fig. 8, and for D = 24 h, Fig. 10 (both for a return period of 200 yr), the relative difference between eastern and western Plain for event duration of D = 3 h appears smaller (about 20 %), than the same difference relative to D = 24 h (30 % at least).This could be related to the remark of Isotta et al. (2013), that in the eastern Plain a larger fraction of the mean annual accumulated precipitation comes from the most intense precipitation events.

Uncertainty
An important feature of the proposed method is that it enables evaluating uncertainty by means of the SDV of the 1000 estimates calculated at each target point.Table 2 shows the minimum and maximum value, among gridpoints, of the ratio between SDV and mean (i.e. the coefficient of variation) of each estimated parameter, except ξ : for this parameter, which has values around zero, the ratio between the SDV and the maximum of the absolute value of the mean is shown.For a 1 , n, ε and α the SDV is always below or about 10 % of the mean, so that the uncertainty on these parameter can be considered small.On the contrary, for ξ , the SDV is about 40 % of the maximum (positive) value of the mean.
Uncertainty on parameters does have an impact on rainfall depth threshold.For each of the estimated parameters, the partial relative uncertainty on h T ,D is evaluated as: where the generic parameter π {a 1 , n, ε, α, ξ } appears and σ π is the corresponding standard deviation.Simplified expressions are easily obtained for each parameter using the scale invariance expression (Eq.9): where now, in Eq. ( 15), π {ε, α, ξ } represents any of the three GEV parameters.As it appears in Eqs. ( 13)-( 15), the  a 1 component of relative uncertainty on h T ,D is simply the coefficient of variation for a 1 ; the n component only depends on the duration, D; the ε, α and ξ components coincide with the corresponding partial components of relative uncertainty on the normalised quantile w T : they depend on the return period and on all three GEV parameter estimates, but not on the duration.
Table 3 shows, as percentage, the minimum and maximum value, among all gridpoints in the domain, of partial uncertainty components relative to all parameters, for different values of duration and return period, when this is relevant.These values are generally below 10 %, except for the components Figure 11 shows, as percentage, the relative uncertainty component ρ ξ (Eq.15) on the rainfall depth threshold h T ,D , for return period T = 200 yr and for any duration D, due to the uncertainty on the estimate of the parameter ξ .As it appears in Table 3, for long return periods this is the most important uncertainty component.It is higher in the Alpine area, where the elevation difference between gridpoints and station points effectively reduces the spatial data density with respect to its bi-dimensional appearance.Its maximum can be found in the extreme northwest (Val Chiavenna), where only few stations are located and distributed along a direction which stretches outward (Switzerland) with respect to the more densely and uniformly observed part of the domain (Fig. 1).This is a typical "border" effect, which also necessarily affects any interpolation method, and which can be easily solved by the acquisition of external observations, when they are available.Swiss observations would be necessary in this case, but, for Lombardy, data from other Italian departments would be very useful as well: only stations from Piedmont (west) and Emilia-Romagna (south) were used in this study (Fig. 1).
Maps of uncertainty components on h T due to uncertainty on other parameters are not shown here, but all are geographically distributed in a way similar to ρ ξ (Fig. 11).

Conclusions
The multiple-station spatial bootstrap technique makes use of spatially distributed series of rainfall annual maxima to estimate parameters of the GEV distribution and of DDF curves in every location of a geographical domain.In the case of Lombardy, the domain is characterised by a complex topography, including several climatic areas within short distances, such as the wet Prealpine areas and the dry eastern Po Plain, requiring appropriate techniques to propagate meteorological information in space.In the presence of a mesoscale meteorological network, whenever the goal is to characterise a geographical area rather than a single point, the spatial bootstrap appears more appropriate than interpolating from estimates separately obtained at station points.In fact, the method effectively makes use of the whole dataset, even short time series from stations recently installed, or active for only few years in the past, that must be discarded in the traditional point-by-point estimation.Uncertainties on all estimates are directly evaluated and represent an important information, both for further studies and for users.
In general, as it has been shown, parameter estimation algorithms are sensitive to the presence of outliers in the observed series.The combined use of multiple stations in the estimation enables reducing the impact of such very rare events, whereas these are appropriately allowed to affect the evaluated uncertainties.
L-moments or maximum likelihood algorithms can be indifferently applied to the synthetic series resampled at every desired location.In this work, the log-likelihood function has been maximised by means of a nonlinear conjugate gradient iteration, initialised with L-moments estimates.The whole procedure, from sampling to estimation, has been repeated 1000 times at each point of a regular 1.5 km × 1.5 km grid covering the spatial domain.For each parameter, the mean and the SDV of such an ensemble of estimates provide the final estimate and its associated uncertainty, respectively.
A comparison with the previous operational scheme, based on separate station estimation, has been realised by applying the new method to the same (old) dataset and comparing the distributions of rainfall depth thresholds estimated at observing sites.With the new method all quartiles values including the median are increased and including information from surrounding stations reduces the estimate dispersion.This is further reduced by the dataset extension enabled by spatial bootstrap.
The presence of outliers in the time series, emphasised by separate station estimation, results in unjustified gradients in rainfall depth threshold maps obtained with the old scheme.Such unrealistic signals do not appear in maps obtained with the new method presented here, as a consequence of its ability to appropriately deal with outliers, in particular when the complete dataset is used.
For all event durations and return periods, the maps obtained by the spatial bootstrap method with the complete dataset for distribution parameters and rainfall depth thresholds appear in agreement with the known climatology of the geographical domain (Frei and Schär, 1998;Isotta et al., 2013;Brunetti et al., 2009), and realistically account for elevation above sea level.
The spatial bootstrap technique enables evaluating uncertainty due to sample variability.The partial relative uncertainty component on rainfall depth threshold with respect to each parameter has been evaluated: the largest values are found in the Alpine area, generally below 10 %, with few exceptions, notably the ξ component, reaching 19 % in the extreme northwest.In principle, these results could be further improved by allowing a direct dependence on duration of the GEV shape parameter (Veneziano et al., 2009;Langousis et al., 2013).
The estimated parameters and quantiles can be made available to users as gridded fields, as precomputed interactive maps, for example in a web-based GIS (Geographic Information System), such as that of ARPA Lombardia (http: //idro.arpalombardia.it),or, in principle, even by allowing direct estimation at any requested location in the domain.Besides the engineering applications cited in the Introduction, information on rainfall depths associated to different event durations and return periods, such as the maps shown in Figs. 8 and 10, might also be used to characterise and differentiate the critical precipitation events for Lombardy's climatic areas in civil protection emergency plans.In that context, the availability of thresholds uncertainties is of particular interest, because it enables defining precautionary alert thresholds in a realistic and statistically founded way.
Assuming stationarity and including short time series enables using a large amount of observations.Conversely, this relative data abundance could be used in a stationarity study, with stringent quality control and homogenisation.As a result of the stationarity study, in the future both the parameter estimation and DDF curves might be recomputed, either assuming stationarity in a recent portion of the dataset only (the reference period 1981-2010, for example), or by allowing for, and estimating, a trend in the parameters (such as GEV location and scale, for example).
Finally, as discussed in Sect.4.4, the scale invariance formulation could be refined in the future to allow for a possible different behaviour of short-duration convective events and for the use of more sophisticated GEV models.

Fig. 1 .
Fig. 1.Observing stations providing rainfall annual maxima in Lombardy's orography.Lakes are shaded in cyan.The black line shows the administrative boundary.Circles: "mechanical" stations; triangle: "automatic" stations.Yellow: less than 10 observations; blue: more than 10 observations.The inset shows the geographical position of Lombardy.The squares mark the stations discussed in Sect.3: LODI, green, and CREMONA, red.

Fig. 2 .
Fig. 2. Time series of rainfall annual maxima for event duration D = 24 h for the stations: CREMONA (solid line) and LODI (dotted line)

Fig. 3 .
Fig. 3. Depth-Duration-Frequency curves for return periods T = 100 yr (thick lines) and T = 20 yr (thin lines) for the stations: CRE-MONA (solid lines) and LODI (dotted lines).The assumed distribution is GEV and its parameters have been estimated by the Lmoments method, separately on each station.

Fig. 4 .
Fig. 4. Left: map of the estimated scale invariance coefficient a 1 (mm).Right: map of the estimated scaling exponent n (dimensionless).

Fig. 5 .
Fig. 5. Maps of the estimated GEV parameters ε (location, left) α (scale, centre) and ξ (shape, right), all dimensionless.In the ξ map, the red scale shows positive values, corresponding to a Fréchet distribution; the (not appearing) blue scale would show negative values, corresponding to a reverse Weibull distribution.Values of the ξ mean that are smaller, in absolute value, than the standard deviation are masked out: in such areas the GEV distribution is considered equivalent to a Gumbel distribution.

Fig. 6 .
Fig. 6.Distributions of rainfall depth thresholds (mm) obtained for T = 200 yr and D = 3 h (left panel), D = 24 h (right panel).Boxplot labelled A refers, in both panels, to the previous operational method, i.e. separate estimation at each of 69 stations; boxplot B is obtained by applying the proposed method using the same 69 stations only; boxplot C is obtained estimating at the same 69 locations, by applying the proposed method using the complete dataset.Comparing boxplot A and B shows the effect of applying the new method at the same dataset, while comparing boxplot B and C shows the effect of the dataset extension (when estimating at the same, old 69 sites).The combined effect of the method change and of the dataset extension is seen by comparing boxplot A and C.

Fig. 7 .
Fig. 7. Raindepth threshold h T (mm) for event duration D = 3 h and return period T = 200 yr.Left panel: estimate obtained with the previous operational scheme, based on station-point estimation at 69 sites and subsequent interpolation.Right panel: estimate obtained by applying the proposed method to the same 69 stations.

Fig. 8 .
Fig. 8. Map of the rainfall depth threshold h T (mm) for event duration D = 3 h and return period T = 200 yr, obtained by applying the proposed method to the complete dataset.

Fig. 10 .
Fig. 10.Map of the rainfall depth threshold h T (mm) for event duration D = 24 h and return period T = 200 yr, obtained by applying the proposed method to the complete dataset.

Fig. 11 .
Fig. 11.Relative partial uncertainty component ρ ξ (%) on the rainfall depth threshold h T ,D for return period T = 200 yr (and for any duration D), due to the uncertainty on the estimate of the parameter ξ .

Table 2 .
Ranges of values (in the grid) of the ratio SDV/mean for a 1 , n, ε and α and of ratio SDV/ max(|mean|) for ξ .

Table 3 .
Relative partial uncertainty component (%) on h T ,D with respect to each estimated parameter, minimum and maximum value among all gridpoints.