At-site and regional frequency analysis of extreme precipitation from radar-based estimates

In this paper a peak-over-threshold method is used to perform an extreme rainfall analysis and to derive return levels from weather radar and rain gauges in Belgium. The importance of this work is high, as radar archives are nowadays long enough to permit the development of extreme rainfall analyses which are of fundamental importance for many applications, but the common annual maxima approach needs even longer time series. However, some important explanations and discussions, in addition to those already highlighted by the first review by F. Marra, are missing in the manuscript, and need to be provided before the article can be accepted for publication.


25
A Hellmann-Fuess pluviograph has been in operation in Uccle (RMIB) from 1898 to 2008 and has enabled the compilation of a continuous time series of 10 min precipitation (?) :::::: rainfall ::::::::::::::::::::: (Demarée, Gaston, 2003). The 10 min precipitation :::::: rainfall : values had to be manually extracted from line graphs on papers. Starting from the fifties, additional rain gauges were installed to constitute a network (BUL) for hydrological research. Since the rain gauges underestimate the rainfall by 5-10% due to its mechanism, its records have been calibrated using a collocated gauge from the CLIM network.

20
It should be noted that using the lowest available duration for each network would result in an underestimation of the extremes due to the discrete time sampling :::::::::::::::::::: (Marra and Morin, 2015). Additionally, random errors and time sampling difference can be compensated by performing the sum. For both the radar and the gauge, no missing data is tolerated in the sum to avoid underestimation. Furthermore, only timestamps with both radar and gauge data are kept.
Due to the amount of stations, it is not possible to analyse in details the results at each station. Therefore a few stations are 25 picked at different distances from the radar (see Tab. 1 and Fig. 1). The Uccle station is chosen because it is included in the 3 networks, which makes intercomparison ::: inter :::::::::: comparison : possible. The availability of the 1 h accumulation data is about 95 % for the radar products and close to 100 % for the AWS gauges. The radar availability of the 24h accumulation is lower than the 1 h accumulation because a significant part of the intervals without data are short. The availability of the SPW gauges is around 90 % but this is mainly due to the removal of snow events, when no extreme rainfall :::::::::: precipitation is expected. 3 At-site frequency analysis

Methodology
Extremes are often extracted using block maxima of one year but it is not recommended for small sample size. The peak-over-threshold (POT) method, where values exceeding a given threshold are kept, is prefered here. It has been shown by ? :::::::::::::::: Pickands III (1975) that the extreme values converge asymptotically to a generalized Pareto Distribution (GPD) : (1) with ξ, µ and σ commonly defined as the shape, location and scale parameters. The special case when the shape parameter is equal to zero is defined as the Exponential disbritution ::::::::: distribution : (EXP).
The choice of the threshold has an important impact on the estimation of the distribution parameters. When the number of selected values increases, the variance naturally decreases but the bias increases (due to the deviation from the theoretical distribution). It is more practical to use a threshold rank instead of a threshold value to control the sample size.
The maximum at Humain has been observed by both the radar and the gauge on 7 June 2016. This relatively high value can be due to randomness and the short period of records. But it is also possible that the other quantiles are underestimated 30 (the maximum was recorded by the new weighted gauge). There is generally a good match between the radar and the gauge quantiles except for the following events : event 2 : the radar underestimates globally event 6 : 7 : : the gauge is located at the boundary of a convective cell (most probably with hail) :: the ::::::::: convective :::: cell event 11 : the radar signal is strongly attenuated by a mesoscale convective system.
event 13 : there was probably snow in the gauge event 14 : the gauge is located at the boundary of a convective cell. 5 The second highest quantile at Uccle has been observed by both the radar and the gauge on the 7th of October 2009. There is generally a good match between the two datasets. A few events are problematic : event 1,4 : the gauge is at the boundary of a cell (most probably with hail) event 9 : there is a stationary storm underestimated by the gauge event 10 : the gauge is at the boundary of a cell and the radar is attenuated (same as event 2 in Humain)
When using the CAP product, the higher quantiles are overestimated especially for Uccle. This can be mainly attributed to 5 the effect of hail. This results in an overestimation of the scale parameter.
For Uccle there are not many differences between QPE and MFB because most events are associated with convective 25 storms. Compared to the gauge quantiles, the radar quantiles are lower below 1-year and higher between 1-year and 5-year return periods. This can be attributed mainly to hail overestimation by the radar and gauge losses. It results in a higher scale for the radar, which is close the upper bound of the gauge confidence interval.
Except for the Uccle station, the scale parameter is the lowest for the QPE dataset due to the bias as a result of the small sample size. The scale parameter of the pooled radar datasets is slightly higher at Deurne and significantly higher in Uccle. For Gosselies and Nadrin, the R10 and BUL data have similar scales while it is slighltly :::::: slightly : higher for the R50 :::: RFA data. The 10 fit to the R10 and R50 :::: RFA ::: and :::: R10 : data is within the uncertainty bound of the fit to the BUL data. For those two stations, the fit to the BUL data is even in the small uncertainty bound of the fit to the R50 :::: RFA : data.

Spatial maps
QQR estimator of the scale parameter of the Exponential distribution fitted for each pixel to 2005-2016 QPE data in a radius of 10 km.

15
In Belgium, ? derived a spatial GEV model depending linearly on the altitude. ? found for 6-hourly precipitation in the Czech Republic that the assumption of a linear model might be too restrictive, especially for convective precipitation. Here we are able to use the radar data to study directly the spatial variations of the extremes. It also allows to verify our hypothesis that the distribution parameters do not vary on a 40 km scale.
These results suggest that considering constant extreme statistics over small regions is valid for the 1-hour duration. The potential of a radar-based precipitation dataset ::::::: datasets to study extreme precipitation ::::: rainfall : at a given location is evaluated. The quantitative precipitation estimate ::::::::: estimation (QPE) is obtained by a careful processing of the volumetric reflectivity measurements from a single weather radar in Belgium. The radar dataset covers the period 2005-2016, has a resolution of 1 km, and is available every 5 minutes.
For 24 h accumulation there is a mix of summer and winter events, with more of the latter for stations with higher altitude.
There is a clear benefit of bias correction for the highest station, making the distribution fits similar for both stations. For both 1 h and 24 h accumulations, the basic radar product exhibits unrealistic high extremes, which results in an overestimated scale 10 parameter. Such product is therefore not suitable for an extreme value analysis.
5.2 :::::::: Prospects There is still some room to improve the quality of the radar and gauge datasets. The recently installed weighting gauges are 30 able to cope with intense rainfall and snowfall. One will have to wait a few decades before it can produce reliable statistics.

Code availability
The code used in this study is part of the RMIB radar library.
This study will provide new, interesting information to the field and deserves publication. However, a number of issues are 10 currently preventing it from being published in its present form: literature review is missing key papers that need to be mentioned and, in some cases, discussed; methods are not sufficiently described, motivated and supported by literature; some of the results need to be re-considered and discussed, also in light of the new literature review; presentation and language need some improvement. Below a list of major and minor comments. 15 We would like to thank Francesco Marra for summarising the value of the paper and his in-depth analysis of our work. It allowed us to improve the paper and to put some results in perspective.

Literature review
Literature review is missing some key papers of the field.

20
-Panziera et al., 2016 developed regional rainfall frequency analysis and implemented (and tested) them in an earlywarning system for Switzerland -this is probably the first study deriving rainfall frequency analysis from remote sensing data and providing an actual operational, quantitative product; This paper focusing on areal maximum extremes has been added in the introduction. Please see page 3, line 33.

25
-Peleg et al., 2017 analyzed the impact of small scale rainfall variability on frequency analysis from point (rain gauges) and areal (radar) estimates -they found that, due to the relatively short record length, point and areal estimates, are expected to differ (even if both measurements are 'true'), and observed large differences between frequency analyses from rain gauges located within a 1 km 2 pixel. This means that no exact match between point and areal results should be expected (not only because of the areal reduction factors issue). This contribution needs to be mentioned by the literature 11, lines 18-19] could be updated accordingly); The sub-pixel rainfall spatial variability explains partly the different frequencies. This reference and a related paper are now used in the text. Please see page 3, line 12 ; page 16, line 1 and page 16, line 6. I understand that there are spatial dependence within the pixel and that for higher return periods the higher 'climatic' variability is dominant. It is 5 unclear however if one can extrapolate this result for a larger region. In our study we make the assumption that the 1 hour extremes occurring in the 20 km region on different days are independent.
-Wright et al., 2013 proposed the use of stochastic storm transposition for radar rainfall frequency analysis in order to overcome the limitations due to the short radar records.  The QQR method is now briefly described in the text. Please see from page 9, line 3.
The comment refers to the empirical quantiles and has therefore been moved to the corresponding paragraph. Please see page 10, line 28.
-9, 7-8: this sentence should be somehow supported/motivated; This is a consequence of the previous sentences. The text has been adapted on page 13, line 8. The text has been clarified. Please see page 13, line 25.

10
-Previous studies found important instability of the MFB factor for short periods (1 h), especially in convective conditions.
The use of hourly mean field bias adjustment needs to be supported by sensitivity analyses or references. Please discuss this and provide information on the stability of the factors; As stated, the bias corrected hourly accumulations are only used to study 24 hours precipitation extremes. We added some discussions on page 7, line 19 and page 12, line 1. We are not interested in the regional maximum extreme but in the extreme at a given location, hence the comparison We think that the quality of our datasets can be seen as an improvement compared to previous studies. Thank you for pointing out clearly the originality of our methods in the study. We agree that this originality did not appear explicitly.
The text has been substantially improved.
the presentation of the gauge networks in [2, 20-25] and in section 2.1 is difficult to follow, I recommend reorganizing 25 and rephrasing these parts (how many networks are used?, why are they considered separately?, what are the differences? What the advantages of including each of them? Why not using them together?); The gauge networks used in the study are presented in section 2.1. The rationale for using them is presented in section 2.2. Additional information have been added on page 7, line 16.
These sections have been clarified.
section 3.2: what does "problematic events" mean? How are they identified?; why did the authors focus on 10 extremes? 5 This has been clarified in the text. Please see from page 9, line 24.

10
how does the 20 km regionalization of the parameters relate to the 10 km and 50 km used in the following parts of the study? how did the authors check/motivate that 20 km "provides a sufficiently large data sample"?
The decorellation distance (50 km) and the size of the region (20 km) are two different concepts. To avoid confusion the text has been modified from page 12, line 18. The results suggest that the sample is large enough.
it is not clear whether the method by Reed et al., 1999 is the one the authors used in this paper; 15 The reference has been dropped.
-9, 28-29: did the authors check for non-stationarity in the data (e.g. changes in the instrumentation, or other)?
As stated in the text, there are no instrumentation changes. The annual maxima have been found stationary (Vannitsem and Naveau, 2007). No statistical test for the stationarity of peaks over threshold timeseries have been done since it is beyond the scope of this study.
Not necessarily. Important implications have been derived, thank you. Please see from page 13, line 30.
-11, 24-25: this problem should be solved by the adopted regionalization (20 km); No, it shouldn't. Consider a very intense but super fast storm. The 1-hour accumulation extreme will be overestimated.
Please see page 16, line 33. The abstract has been modified and we think the structure is now more clear.

5
-1, 9: natural rainfall variability in combination with short record lengths is also to be mentioned as a cause of the mismatch between point and areal frequency analyses (see Peleg et al., 2017 and major comment above); Please see response above.
-1, 11: "assuming that the extremes are correlated": this is not clear to me, I guess it is related to the regionalization, but needs to be better written;

10
It has been removed.
-1, 18: please remove "very" and "very"; please add a comma after "activities"; please provide a reference for this sentence; A reference has been added. Please see page 2, line 2.
-1, 19: sewer systems are an example, but I'd insert an example from other applications, such as dams design/management;

15
[1, 21] sewer systems are usually designed for relatively short return periods, applications requiring long return periods are dams, bridges, etc.; Done.

20
-2, 3: an example of what the authors mean with "high-resolution" would be helpful for the reader; This is now specified in the paragraph.
-2, 5-6: "the best potential is provided by radar QPE". Satellite products are fruitfully being used too and are, often, characterized by longer records. This sentence should be motivated and supported by references; Please see page 3, line 7.

25
-2, 11-12: the reference to Saito and Matsuyama, 2015 looks unrelated to the rest of the text, can you provide some information on its relevant parts; Done.
- Figure 1: what do the areas in the figure represent? Are they catchments? Are they used in the manuscript?
These are the provinces of Belgium. They are not used but are of interest for climatological purposes.
-5, 6-10: is this done with a moving window? Or on 24 h blocks? 5 The term "sliding" means that we are using a moving window.
- Table 1: what is the meaning of the "Avail. All" column? Does it mean that "Both" were available?
Yes. This has been clarified.

10
-6, 22 and 7, 25: I'd suggest to change these titles to something focusing on the tested product rather than on the rain gauges against which it is compared; Good idea. The titles have been changed. The references have been added.
In this paper a peak-over-threshold method is used to perform an extreme rainfall analysis and to derive return levels from weather radar and rain gauges in Belgium. The importance of this work is high, as radar archives are nowadays long enough to permit the development of extreme rainfall analyses which are of fundamental importance for many applications, but the common annual maxima approach needs even longer time series. However, some important explanations and discussions, in 5 addition to those already highlighted by the first review by F. Marra, are missing in the manuscript, and need to be provided before the article can be accepted for publication.
We would like to thank Luca Panziera for underlying the importance of our work. The detailed comments allowed us to improve the presentation and the discussion of our results.

Major comments
10 What is new?
It is somehow difficult to understand which new contribution this paper brings with respect to previous studies, and I think that this should be better highlighted in the text. To my understanding, the main novelty of this paper is the use of a POT method for an extreme rainfall analysis for weather radar data.
As pointed out by the first reviewer, the originality of our work did not appear clearly. The use of the POT method together 15 with the other novelties of our approach are now highlighted in the text. Please see page 4, line 3.

Temporal Declustering
As rainfall data need to be declustered in order to remove the temporal correlation in the time series before GPD parameters estimation, the authors choose an interval of 12 hours (for 1-hour rainfall) and 3 days (for daily rainfall) in order for two threshold exceedances to be considered as independent. The choice of these intervals, which should be referred to as run 20 length or run parameters according to the literature, seems reasonable, but it could potentially have a big impact on the derived return levels, as it shapes the exceedances time series whose maxima are used for the parameters estimation. If the data are temporally clustered, such temporal lags could not be long enough to remove dependency, but if the temporal clustering occurs rarely, they could actually lead to a significant bias of the return levels estimates. What do the authors mean as temporal independence? How did the authors choose such temporal lags? Did the authors investigate the effect of changing these values 25 on the parameters estimation and final return levels? The subjective choice of these values should be motivated and discussed in the text.
How to deal with declustering is indeed a crucial point to address in extreme rainfall analysis. It is now properly discussed from page 8, line 12.

Exponential distribution
As the choice of a null shape parameter is fundamental for this work, I think that it should be motivated more in the text.
Therefore, I suggest to briefly report and discuss the main results of Willems (2000), in order to better understand the motivation of this choice The text states also that this choice was taken because of the short period. However, with a POT approach the shortness of the period should not be a limiting factor, as many events are considered. It should also be discussed if this is the 5 best choice for both 1-hour and 24-hours accumulations. Did the authors try to estimate also the shape parameters, to see if from the data a value different from 0 could be derived?
The short period remains a limiting factor to model the tail of the GPD. The choice of the Exponential distribution is further justified from page 8, line 26. We therefore did not try to estimate the shape parameter.

Radar and gauge comparison 10
The authors present an interesting comparison between the radar and gauges extremes, for 1-hour and 24-hours accumulations.
Despite this being very interesting and instructive, the implications for this study are not very clear. I suggest the authors discuss at least qualitatively the influence of this investigations on the overall results of the study. The implications of the radar and gauge comparisons have been added from page 10, line 15 and from page 11, line 20.
Regional frequency analysis 15 The regional frequency analysis needs also to be better explained and the choices which were taken need to be motivated and discussed. How did the authors choose the 20-km radius for the analysis? How the resulting return levels at a given pixel should be interpreted, as they stem from exceedances in rainfall values which were observed all around it? Does it still make sense to speak about point measurement? How are the maps of GPD parameters affected by the choice of the 20-km radius circles?
Our methodology should be better explained indeed. An extended literature review is given from page 2, line 28. Our 20 methodology is discussed from page 12, line 12. We think this explains why we can still speak about point measurements. For the derivation of spatial maps, please see from page 14, line 20.

Return levels maps
I guess the final goal of the study is to derive maps of return levels with relative uncertainty for Belgium. Despite the return levels are shown for given rain gauge locations, it would be desirable to show also maps of return levels for selected return 25 periods. Would it be possible to insert a map or two of the return levels? How would those maps be affected by the 20-km radius selected for the regional frequency analysis? How these maps should be interpreted? Since you are using a constant shape parameter (equal to 0), and the longest return levels are shaped by it, long return periods map will tend to produce maps less variable in space with respect to short return periods. This should be discussed in the text.
Two return level maps have been added and discussed. Please see page 14, line 30.
1. The title is rater general, and you might want to consider adding the name of the region for which this study was performed (Belgium) Good suggestion.
2. In the introduction some relevant papers are missing. I strongly encourage the authors to discuss also the papers refer-5 enced by F. Marra in his review.
The papers are now discussed in the text.
3. Pag.2, line22."in this study, we want to demonstrate the potential of this radar-based QPE to derive point rainfall statistics". I don't think that the aim of this study is this, as the radar pixel does not represent point rainfall statistics. As the authors know, the radar rainfall estimate comes from the reflectivity measured within the sample volume, representing an 10 area-not a point-measurement. The intrinsic difference among radar and gauges measurements should be at discussed in the paper, since a comparison between rain gauges and radar return levels is performed (see also major comment 1 by F. Marra).
The text has been adapted to reflect this important fact. Please see page 3, line 10.

15
The reference is the one mentioned above in the text.
6. Pag. 4, line 25: please clarify the last sentence of section 2.2 which, in its present form, it is not correct. Could change 20 " In addition, the increasing radar sample volume will give lower extreme values" to "In addition, the increasing radar sample volume will produce an underestimation of local small-scale extremes".
Your suggestion has been integrated. 7. Pag. 5, line 24. First two sentences of section 3.1 need to be reformulated as they are very colloquial.
The sentences have been reformulated in the introduction. We don't think there is a risk of overfitting since we are using a simple exponential model. 9. pag.7, line 13-14 and pag.8 lines 9-10. How this percentage would vary by changing the temporal lags considered for independence? (see major comment 2). "This is what we expect from ....". According to which theory/observations? Please clarify and give references.
Please see the response to major comment 2. The sentence has been reformulated on page 10, line 22.
10. Pag. 8, lines 21-28. It would be more appropriate move the literature review to the Introduction, instead of leaving it in 5 this Methodology section.
This part has been moved to the Introduction.
11. Pag. 8, second paragraph of section 4.1: please clarify the explanation of the regional frequency analysis. Given that your circle has a radius of 20 km, what is the aim of considering all the events within a 50 km radius dependent? Isn't this the same as taking just the max value within the 20-km radius? In case it is, wouldn't be easier just say that you take this 10 maximum within the 20-km radius circle?
The decorrelation distance (50 km) and the size of the region (20 km) are not directly related. But the former implies that all extremes observed within the region are independent. We acknowledge that the explanation was a bit confusing. It has been reformulated from page 12, line 18.
12. Pag.10, lines 6-9. Also here I suggest to move the references to other studies in the Introduction. 15 This part has been moved to the Introduction.
13. Pag. 10, line 13: "a few pixels having too much (50) .... removed" . This sentence is rather unclear, and this seems a rather subjective choice which can hardly be motivated.
This part has been reformulated from page 14, line 24.
14. Figure 2. I suggest to rename "Extreme 1-hour precipitation quantiles" to "1-hour return levels", to be consistent with 20 theory and common nomenclature in the field.
The figures legends have been modified.
15. Tables 2 and 4. I actually miss how the events in the tables are ordered, if there is a logic.
This is now explained in the table description.
16. Figures 1, 6 and 7: a scale in km would help the interpretation of the figure, for those who are not familiar with Belgium

25
We added a 100 km circle to the maps.
The authors apply local and regional frequency analysis (RFA) for extreme rainfall on two radar data products (advanced QPE and basic CAP) for Belgium and compare the results with station based extreme value statistics. They find that the basic radar product shows unrealistic high extremes, the 24h extremes need bias correction and that the fit of the QPE probability 5 distribution is within the confidence interval of the point distribution. The results for RFA are more complex. The topic of the paper is very important and of high relevance for the community. The results are interesting. However, the description of methodology is not clear enough to follow the procedures and understand all the results. This concerns especially the sampling strategy for RFA. Also the presentation of results could be more distinct. Details are given below. However, the research is worth of publication after the authors have the opportunity to make some revisions.

10
The authors would like to thank Referee 3 for his encouraging comments and suggestions to improve the paper.
The sentences have been reformulated. Please see from page 1, line 10. In the abstract "rain gauges and collocated radar estimates" is mentioned, so I assume the highest gauge values with collocated radar data are used. This should be stated clearly here in the text as well. The rational for this choice should also be discussed.
This has been reformulated from page 9, line 23. This has been reformulated from page 11, line 8. What about the "index rainfall"? How did you regionalise it? etc.
The decorrelation distance (50 km) and the size of the region (20 km) are not directly related. But the former implies that all extremes observed within the region are independent. The sample is indeed different for each target location.
Since we consider that the extreme statistics are the same for the region, no "index rainfall" is used. The section has been rewritten for the sake of clarity.
the 1-km pixels within the 20-km radius area (statistics of regional maximum extremes) and the probability that a given value is exceeded at a given location within that area (statistics of extremes at a given location). In this study we are using the regional maximum peaks to derive statistics of extremes at a given location. If the goal was to obtain statistics on regional maximum 5 extremes, we would have taken 10 years as the effective length of the timeseries (i.e. the length of the radar dataset). Our goal is to obtain the probability of exceeding a value for a given location in the region and, therefore, we use an effective length based on the number of pixels within the area and the number of independent peaks. This length is much larger than 10 years and gives realistic return period estimates. It is directly related to the average over all pixels of the mean number of exceedance per year. That 's why our approach is similar to the one of Wright et al., 2013. More advanced approaches to spatio-temporal 10 extremes can be considered but these are beyond the scope of the present study.

Temporal Declustering
Choosing 3 days instead of 12 hours for the temporal lag removes rank 30 (radar) and rank 14 (gauge) at station Humain ; it removes rank 27 (gauge) at station Uccle. This changes very slightly the scale parameter but only for the gauge : from 7.5 to 5 7.6 and from 6.8 to 6.9, respectively. Using 6 hours instead of 12 hours does not change the extremes up to rank 30 for the radar and the gauge at the two stations.

Radar and gauge comparison
The text has been clarified as followed : "Since the level of missingness is limited, the impact on the statistics is expected to be small".

Return levels maps
There is indeed an impact of the circle from the RFA but it is relatively limited in most of the study area. Since this is mainly due to radar artifacts we don't consider it as a drawback of the proposed RFA method. The discussion has been improved as follows : "Circular patterns appear on the maps due to the influence of the pixels located at their centers. The high values are caused by pixels contaminated by non-meteorological echoes (e.g. at the German border) and hail. A stronger filter for 15 non-meteorological echoes is not used because it could remove actual precipitation information. The circular effect might be reduced by using a larger radius or a higher threshold rank but this is computationally expensive." The RFA has been limited to 1 hour extremes in this paper since it has the best potential for radar data. Extending the approach to other durations is interesting for future research.
Other comments