Enhancing the usability of weather radar data for the statistical analysis of extreme precipitation events

Hänsler, Andreas; Weiler, Markus

doi:https://doi.org/10.5194/hess-2021-366

Preprints

https://doi.org/10.5194/hess-2021-366

Preprints

12 Jul 2021

| 12 Jul 2021

Status: this discussion paper is a preprint. It has been under review for the journal Hydrology and Earth System Sciences (HESS). The manuscript was not accepted for further review after discussion.

Enhancing the usability of weather radar data for the statistical analysis of extreme precipitation events

Andreas Hänsler and Markus Weiler

Abstract. Spatially explicit quantification on design storms are essential for flood risk assessment and planning. Since the limited temporal data availability from weather radar data, design storms are usually estimated on the basis of rainfall records of a few precipitation stations having a substantially long time coverage. To achieve a regional picture these station based estimates are spatially interpolated, incorporating a large source of uncertainty due to the typical low station density, in particular for short event durations.

In this study we present a method to estimate spatially explicit design storms with a return period of up to 100 years on the basis of statistically extended weather radar precipitation estimates based on the ideas of regional frequency analyses and subsequent bias correction. Associated uncertainties are quantified using an ensemble-sampling approach and event-based bootstrapping.

With the resulting dataset, we compile spatially explicit design storms for various return periods and event durations for the federal state of Baden Württemberg, Germany. We compare our findings with two reference datasets based on interpolated station estimates. We find that the transition in the spatial patterns from short duration (15 minute) to long duration (2 days) events seems to be much more realistic in the weather radar based design storm product. However, the absolute magnitude of the design storms, although bias-corrected, is still generally lower in the weather radar product, which should be addressed in future studies in more detail.

Received: 08 Jul 2021 – Discussion started: 12 Jul 2021

Competing interests: Markus Weiler is editor of HESS

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Andreas Hänsler and Markus Weiler

Status: closed

RC1:
'Comment on hess-2021-366', Francesco Marra, 27 Jul 2021
This paper presents a new dataset of extreme precipitation return levels derived from radar estimates (RAD-BC). The methodology builds over the sampling approach presented by Goudenhoofdt et al. (2017) and improves the sampling strategy by using spatially-varying sampling probabilities which depend on both horizontal distance and terrain elevation. The topic is of interest to the readers of HESS, and the study is timely as it tackles a state-of-the-art problem.

While some results are encouraging as the spatial artifacts generated in the Goudenhoofdt methods are removed and the orographic influence on precipitation statistics is better represented in the product, the final product shows important systematic bias despite the application of a bias correction procedure.

Overall, I found some aspects of the study to be insufficiently robust, as highlighted in the “major comments” below. While I sincerely appreciate the efforts of the authors, these aspects prevent me from recommending acceptance of the paper in its present version. I’d invite the authors to consider my comments, and I’ll be happy to discuss them further in the open discussion in case I misunderstood some parts.

Please do not consider the references below as recommended for inclusion in the paper, they are meant to be examples only.

Kind regards,

Francesco Marra

Major comments:

The reference dataset is sometimes used to support the goodness of the new dataset and sometimes regarded as less accurate (e.g. in the patterns of sub-daily precip – see lines 16-18 in the abstract). Although the reasons behind this can be somehow understood, this is a problematic issue. On what bases is the dataset trusted as a reference (perhaps some durations are and some are not, some return periods are and some are not)?

I think a proper evaluation should rely on a trusted dataset. For example, rain gauges could provide a quantitatively trusted reference to gather information on the quantitative accuracy of the method on some selected locations. This might allow us to understand what aspects the radar product is or isn’t able to reproduce (orographic influence at different durations, different return levels, etc). Alternatively, the trusted parts of the available dataset should be defined a priori and used for the validation, while the parts which are not trusted should be only used for comparison and discussion.

While I understand the need to avoid winter periods due to the known issues of weather radar monitoring with solid precipitation, it is not clear to me how it is possible to compare return levels derived from summer only (Apr-Oct as in this paper) with return levels derived from stations (the reference products) for durations up to 24 hours. The authors mention this at lines 82-84 ("Since we are mainly interested in short to medium range storm events that are mainly of convective nature, we only use data for the (summer) months from April to October, representing the main season for these kind of storm events"), but then durations up to 24 hours (e.g. see lines 244-255) are examined and discussed. This mismatch, which is not discussed by the authors, could also contribute to the overall bias found by the authors. I fear this might represent an important drawback of the presented product and of the presented comparison.

I like the idea of sampling the surrounding pixels using probabilities, and I like the idea of basing the properties of the sampling pdf based on the typical size of convective rain cells in the region, but I am missing why the same mask is used for all durations. Since precipitation accumulations over longer durations are characterized by larger autocorrelation, my guess would be that 4 km might be good for short durations (even 1 hour could be border line according with what is said above), but too short for longer durations.

Lines 107-111: this is presented in a confusing way. There is no guarantee that 100 years of data will provide perfect (or good for what matters) estimates of the 100-year return levels. Monte Carlo simulations run under realistic precipitation statistics show that empirical estimates will be subject to ~90% uncertainty (computed as the 90% confidence interval), while a simple GEV fit (method of the L-moments) will be subject to ~50% uncertainty. The advantage of using ~100 years of data instead of ~20 is clear, but should be presented in a better way.

The results show an important systematic bias (as it can be inferred from fig. 4). This bias concerns most of the study area and cannot be seen as related to stochastic uncertainty, therefore the uncertainty quantification at section 3.4 cannot be accounted for explaining it. This is an important issue and I wonder what is the added value of such a quantitative information for the final user.

To my view, this issue is related to a sub-optimal choice of the bias correction method (see details below), and addressing it should therefore be part of this study. The bias correction described in section 2.5 seems to me insufficient. Basically, this correction includes an additive adjustment to the data (changes the location parameter of the GPD). Since radar errors are far from being only additive, the resulting product is necessarily biased. Eventually, the results presented in the paper confirm this: the underestimation increases with return period, meaning that the other parameters are wrongly represented by the product and therefore also need to be adjusted. While the authors mention these efforts as future directions, I think that the here presented results are not sufficient to justify this publication and that these additional efforts have to be invested here.

Moderate comments:

It seems to me that larger ensembles could produce more accurate estimates (for example they could reduce the stochastic noise still present in the data and which required the smoothing of the maps). Why is a factor of 5 chosen? Are there only statistical-independence limitations or is it also a matter of computational time?

Lines 40-41: this is an over-simplification. The short record length is indeed among the important drawbacks of weather radar archives, but other issues were highlighted in literature. The most important one is definitely estimation inaccuracy: large systematic over- and under- estimations were found due to measurement errors (e.g. Eldardiry et al., 2015; Haberlandt and Berndt, 2016, among others), but in a recent review on the topic we also highlighted the inadequacy of the adopted statistical methods (Marra et al., 2019). As these aspects are somehow addressed by the methodology in this paper, I think the introduction should better present them.

Section 2.2: information on the extreme value methodology used in the reference products has to be provided. Something is said later in the text, but the information should be presented in an organized manner here. Also, the implications of these choices should be discussed. For example, distributions with different tail heaviness will unavoidably show different biases at different return levels. If indeed different methodologies are used, the impact of these aspects on the comparison and on the results have to be discussed.

Lines 116-121: I am missing the relation between the typical size of the convective cells and sampling radius and normal distribution parameters.

Line 132: similar to the previous comment, why is 4 km chosen here?

Line 220: It would be nice to see the results also for 1-year or 2-yr return levels. Since the adjustment is basically done on the 1-yr event, they should well isolate the quality of the product in relation to the bootstrap sampling method.

Line 232: why is the map smoothed? It seems this is to remove some noise. However, the noise we would see in these maps is a direct representation of the stochastic uncertainties affecting the overall methodology. I think the maps would be more informative without the smoothing.

Line 310-313: I might agree on the fact that higher-order moments are more difficult to estimate and to rely on, especially from “indirect” datasets such as the ones used here as a reference. I however, think that this problem can be somehow addressed by using a more trusted reference and by using corresponding statistical methods.

Although not a native speaker myself, I felt that the language level could be improved, in part due to missing use or misuse of technical terms.

Minor comments:

Lines 16-18: this sentence is not completely clear. I could understand it only after reading the paper. Since this is the abstract, I suggest rewording it.

Line 32: some change-permitting GEV methods allow for changes also of the scale parameter (e.g. see Prosdocimi and Kjeldsen, 2021)

Lines 41-42: I personally disagree on this point. While this is very true for traditional methods based on extreme value analysis, there are some novel statistical methods which show promising results in this sense. They are now published since few years (the first papers are by Marani and Ignaccolo, 2015; Zorzetto et al., 2016), and many came after providing evidence (with applications to rain gauge data as well as satellite data) of the fact that 20 years might be sufficient for at-site estimates of even 100-year return levels. I believe it is time to recognize this by specifying that this limit concerns the traditional methods based extreme value analyses.

Line 112: it is not clear to me what the authors mean with "with underlying sampling probabilities"

Line 115: what does "not necessarily present" mean exactly? Is it a way to say “independent”?

Line 127: I suggest to include this information on the elevation range earlier in the text. Perhaps a short section describing the study area could help also in the following discussion.

Line 222: with "lower time steps", do you mean “shorter durations”?

References

Eldardiry, H., Habib, E., Zhang, Y., 2015. On the use of radar-based quantitative precipitation estimates for precipitation frequency analysis. J. Hydrol. 531, 441–453. https://doi.org/10.1016/j.jhydrol.2015.05.016

Goudenhoofdt, E., Delobbe, L., and Willems, P.: Regional frequency analysis of extreme rainfall in Belgium based on radar estimates, Hydrology and Earth System Sciences, 21, 5385-5399, 2017

Haberlandt, U., Berndt, C., 2016. The value of weather radar data for the estimation of design storms an analysis for the hannover region. In: Schumann, A. (Ed.), The Spatial Dimensions of Water Management – Redistribution of Benefits and Risks Data. IAHS, pp. 81–85

Marani, M., Ignaccolo, M., 2015. A metastatistical approach to rainfall extremes. Adv. Water Resour. 79, 121–126. https://doi.org/10.1016/j.advwatres.2015.03.001

Marra F, EI Nikolopoulos, EN Anagnostou, A Bárdossy E Morin, 2019. Precipitation frequency analysis from remotely sensed datasets: A focused review., J. Hydrol. 574, 699-705, https://doi.org/10.1016/j.jhydrol.2019.04.081

Prosdocimi, I., Kjeldsen, T. Parametrisation of change-permitting extreme value models and its impact on the description of change. Stoch Environ Res Risk Assess 35, 307–324 (2021). https://doi.org/10.1007/s00477-020-01940-8

Zorzetto, E., Botter, G., Marani, M., 2016. On the emergence of rainfall extremes from ordinary events. Geophys. Res. Lett. 43, 8076–8082. https://doi.org/10.1002/2016GL069445
Citation: https://doi.org/10.5194/hess-2021-366-RC1
- AC1: 'Reply on RC1', Andreas Hänsler, 08 Nov 2021
  
  Dear Francesco Marra,
  thank you very much for your valuable critics, remarks and suggestions in order to improve our manuscript. Please find our response to the various points you raised indicating what we adapted in the manuscript in the attached pdf (supplementary file).
  Additionally, I would like to apologize for not being active in the open discussion during the review phase. The reason for this is that I was on parental leave, which actually started a bit earlier than was originally foreseen.
  Best regards, also on behalf of my co-author,
  Andreas Hänsler
  
  Citation: https://doi.org/10.5194/hess-2021-366-AC1
RC2:
'Comment on hess-2021-366', Anonymous Referee #2, 12 Aug 2021

This manuscript deals with the important and timely topic of determining design storms with return periods of up to 100 years from rather short time series of precipitation data from radar observations. The authors present a method to statistically extend time series of weather radar rainfall estimates by combining regional frequency analyses with subsequent bias correction. The results show improvement over the sampling approach by Goudenhoofdt et al. (2017) that is used as basis for their method, but uncertainties, e.g. a bias in the radar data for design storms with large return periods, still remain.

The study fits in the scope of HESS and is of interest to the research community. However, I suggest to address some major aspects, that I listed below, before publishing the paper. I’d be happy to discuss my suggestions with the authors in the open discussion and clear up possible misunderstandings.

Major comments:

1. A major concern is the minimum distance of the radar cells that are considered to statistically extend the time series of the cell of interest. As far as I understand the cells have to be at least 4 km apart. The authors mention that the typical size of a convective cell in Germany is 40 km for hourly events according to Lengfeld et al. 2019 (p.4, l.121 in this manuscript). Therefore, the minimum distance of 4 km seems a bit too small to me, especially when considering also daily events that have a much larger typical spatial extent. Did the authors perform any kind of independence check for the time series from the cells that are combined to a long time series, e.g. the correlation of the time series or the percentage of time steps with rainfall in the cell of interest but no rainfall in the other cells of the sample? How do you make sure that the 258 events are actually taken from all 5 time series and not only taken from the 19 year time series of the COI? I was also wondering if the same set of cells are used throughout the study or if the samples vary for the three durations that are considered.

2. The authors only consider precipitation data from April to October, because this is the main season of convective events of short durations. The statistical approach to determine designs storms is based on a partial time series consisting of e (Euler’s number) times the number of years. I was wondering, if this approach is still valid if only 7 out of 12 months of the year are considered. Although it is common knowledge that most of the convective storms occur during summer, some events might still be missed, especially for the design storms with 24 h durations that might also be associated with advective weather situations. To my understanding, the reference data sets KOSTRA and BW-Stat consider all months and might not be comparable to the radar based data set. I would suggest to take all months into account or the authors should provide some kind of validation for their choice of selecting only summer months.

3. Section 2.2 about the reference data sets is quite short. More information about both data sets (e.g. how many stations are considered, length of the time series, interpolation methods, etc.) and on the differences in the statistical approaches to determine design storms from those data sets are desirable. The method for BW-Stat is briefly described in section 2.4. Maybe it would be better to have a general section about the methods first and then describe the data sets and their differences. E.g. that a two parameter GEV distribution is used in KOSTRA, instead of GPD for BW-Stats and the radar-based data set, is only mentioned in the discussion. This is important information that should be given in the method section.

4. A more detailed description of the sampling process, the generation of the ensemble members, the bootstrapping method and the bias correction is needed to allow for better understanding of the results and of the choices made by the authors (e.g. why 5 ensemble members?).

Minor comments:

p.3, l.75: “… which leads to a spatially…” → “…which leads to spatially…”

p.3, l.92: To my knowledge, the KOSTRA-DWD-2010R data set has a resolution of about 8.2 x 8.2 km. Did the authors perform some kind of remapping to achieve the 5 km x 5 km resolution?

p.5, l151-152: Almost the same sentence is repeated on p.6, l.161-162.

p.6, l.174-176: The radar data are adjusted to the 1 year design storms of the station-based BW-Stat data set. In the results section both data sets are also compared to design storms with 100 year return period derived from KOSTRA. For a better assessment of the differences between 100 year design storms from KOSTRA and the other two data sets it would be beneficial to also compare the 1 year design storms. Do they show the same features in the spatial pattern? How large are the differences?

p7., l.205: “… located more the centre…” → “…located more to the centre…”

p.8, l.222: “…time steps can attributed…” → “…time steps can be attributed…”

What is meant by “lower spatial distribution”? Lower spatial resolution?

p.8, l.230: “…as well as the for the bias-corrected…” → “…as well as for the bias-corrected…”

p.8, l.238: Isn’t the 24 h design storm from KOSTRA shown in Fig.4? Or is that something else the authors refer to here?

p.8, l.249: REGNIE is first mentioned here and should be explained.

p.9, l.257: Which figure do the author refer to here regarding the 20 year design storms?

p.9, l.269: 10 th → 10th

p.9, l.276: “… in in the case…” → “…in the case…”

p.9, l.283: “…relatively larger uncertainty…” → “…relatively large uncertainty…” or “…larger uncertainty…”

p.10, l.299-303: This might fit better in the result section.

p.10, l.310: “…can be seen as a rather robust…” → “…can be seen as rather robust…”

p.11, l.318: “…difference…become…” → “…difference…becomes…”

p.11, l.319: “…between in rainfall estimates…” → “…between rainfall estimates…”

p.11, l.331: “…in future” → “…in the future”

p.14, l.417-418: “…based on distance to cell of interest…” → ”…based on the distance to the cell of interest…”

p.14., l.420: “…is marked with in red…” → “…is marked in red…”

April to October of which years?

p.15, l.426: It should either be “for a single member” or “for single members”

p.15, l.430: Remove one of the brackets.

p.16, Figure 4: It would also be interesting to see the differences between KOSTRA and RAD-BC.

p.17, l.440: What is meant by “four different event”? I assume it is supposed to be “for different event durations”?

p.17, l.443: Why is the 96th percentile chosen here instead of the 95th percentile?

P18, l.450: There is no comma needed after “regions”.

Citation: https://doi.org/10.5194/hess-2021-366-RC2
- AC2: 'Reply on RC2', Andreas Hänsler, 08 Nov 2021
  
  Dear Anonymous Referee,
  
  thank you very much for your valuable critics, remarks and suggestions in order to improve our manuscript. Please find our response to the various points you raised, indicating what we adapted in the manuscript in the attached pdf-file (supplement).
  
  Additionally, I would like to apologize for not being active in the open discussion during the review phase. The reason for this is that I was on parental leave, which actually started a bit earlier than was originally foreseen.
  
  Best regards, also on behalf of my co-author,
  
  Andreas Hänsler
  
  Citation: https://doi.org/10.5194/hess-2021-366-AC2

Status: closed

RC1:
'Comment on hess-2021-366', Francesco Marra, 27 Jul 2021
This paper presents a new dataset of extreme precipitation return levels derived from radar estimates (RAD-BC). The methodology builds over the sampling approach presented by Goudenhoofdt et al. (2017) and improves the sampling strategy by using spatially-varying sampling probabilities which depend on both horizontal distance and terrain elevation. The topic is of interest to the readers of HESS, and the study is timely as it tackles a state-of-the-art problem.

While some results are encouraging as the spatial artifacts generated in the Goudenhoofdt methods are removed and the orographic influence on precipitation statistics is better represented in the product, the final product shows important systematic bias despite the application of a bias correction procedure.

Overall, I found some aspects of the study to be insufficiently robust, as highlighted in the “major comments” below. While I sincerely appreciate the efforts of the authors, these aspects prevent me from recommending acceptance of the paper in its present version. I’d invite the authors to consider my comments, and I’ll be happy to discuss them further in the open discussion in case I misunderstood some parts.

Please do not consider the references below as recommended for inclusion in the paper, they are meant to be examples only.

Kind regards,

Francesco Marra

Major comments:

The reference dataset is sometimes used to support the goodness of the new dataset and sometimes regarded as less accurate (e.g. in the patterns of sub-daily precip – see lines 16-18 in the abstract). Although the reasons behind this can be somehow understood, this is a problematic issue. On what bases is the dataset trusted as a reference (perhaps some durations are and some are not, some return periods are and some are not)?

I think a proper evaluation should rely on a trusted dataset. For example, rain gauges could provide a quantitatively trusted reference to gather information on the quantitative accuracy of the method on some selected locations. This might allow us to understand what aspects the radar product is or isn’t able to reproduce (orographic influence at different durations, different return levels, etc). Alternatively, the trusted parts of the available dataset should be defined a priori and used for the validation, while the parts which are not trusted should be only used for comparison and discussion.

While I understand the need to avoid winter periods due to the known issues of weather radar monitoring with solid precipitation, it is not clear to me how it is possible to compare return levels derived from summer only (Apr-Oct as in this paper) with return levels derived from stations (the reference products) for durations up to 24 hours. The authors mention this at lines 82-84 ("Since we are mainly interested in short to medium range storm events that are mainly of convective nature, we only use data for the (summer) months from April to October, representing the main season for these kind of storm events"), but then durations up to 24 hours (e.g. see lines 244-255) are examined and discussed. This mismatch, which is not discussed by the authors, could also contribute to the overall bias found by the authors. I fear this might represent an important drawback of the presented product and of the presented comparison.

I like the idea of sampling the surrounding pixels using probabilities, and I like the idea of basing the properties of the sampling pdf based on the typical size of convective rain cells in the region, but I am missing why the same mask is used for all durations. Since precipitation accumulations over longer durations are characterized by larger autocorrelation, my guess would be that 4 km might be good for short durations (even 1 hour could be border line according with what is said above), but too short for longer durations.

Lines 107-111: this is presented in a confusing way. There is no guarantee that 100 years of data will provide perfect (or good for what matters) estimates of the 100-year return levels. Monte Carlo simulations run under realistic precipitation statistics show that empirical estimates will be subject to ~90% uncertainty (computed as the 90% confidence interval), while a simple GEV fit (method of the L-moments) will be subject to ~50% uncertainty. The advantage of using ~100 years of data instead of ~20 is clear, but should be presented in a better way.

The results show an important systematic bias (as it can be inferred from fig. 4). This bias concerns most of the study area and cannot be seen as related to stochastic uncertainty, therefore the uncertainty quantification at section 3.4 cannot be accounted for explaining it. This is an important issue and I wonder what is the added value of such a quantitative information for the final user.

To my view, this issue is related to a sub-optimal choice of the bias correction method (see details below), and addressing it should therefore be part of this study. The bias correction described in section 2.5 seems to me insufficient. Basically, this correction includes an additive adjustment to the data (changes the location parameter of the GPD). Since radar errors are far from being only additive, the resulting product is necessarily biased. Eventually, the results presented in the paper confirm this: the underestimation increases with return period, meaning that the other parameters are wrongly represented by the product and therefore also need to be adjusted. While the authors mention these efforts as future directions, I think that the here presented results are not sufficient to justify this publication and that these additional efforts have to be invested here.

Moderate comments:

It seems to me that larger ensembles could produce more accurate estimates (for example they could reduce the stochastic noise still present in the data and which required the smoothing of the maps). Why is a factor of 5 chosen? Are there only statistical-independence limitations or is it also a matter of computational time?

Lines 40-41: this is an over-simplification. The short record length is indeed among the important drawbacks of weather radar archives, but other issues were highlighted in literature. The most important one is definitely estimation inaccuracy: large systematic over- and under- estimations were found due to measurement errors (e.g. Eldardiry et al., 2015; Haberlandt and Berndt, 2016, among others), but in a recent review on the topic we also highlighted the inadequacy of the adopted statistical methods (Marra et al., 2019). As these aspects are somehow addressed by the methodology in this paper, I think the introduction should better present them.

Section 2.2: information on the extreme value methodology used in the reference products has to be provided. Something is said later in the text, but the information should be presented in an organized manner here. Also, the implications of these choices should be discussed. For example, distributions with different tail heaviness will unavoidably show different biases at different return levels. If indeed different methodologies are used, the impact of these aspects on the comparison and on the results have to be discussed.

Lines 116-121: I am missing the relation between the typical size of the convective cells and sampling radius and normal distribution parameters.

Line 132: similar to the previous comment, why is 4 km chosen here?

Line 220: It would be nice to see the results also for 1-year or 2-yr return levels. Since the adjustment is basically done on the 1-yr event, they should well isolate the quality of the product in relation to the bootstrap sampling method.

Line 232: why is the map smoothed? It seems this is to remove some noise. However, the noise we would see in these maps is a direct representation of the stochastic uncertainties affecting the overall methodology. I think the maps would be more informative without the smoothing.

Line 310-313: I might agree on the fact that higher-order moments are more difficult to estimate and to rely on, especially from “indirect” datasets such as the ones used here as a reference. I however, think that this problem can be somehow addressed by using a more trusted reference and by using corresponding statistical methods.

Although not a native speaker myself, I felt that the language level could be improved, in part due to missing use or misuse of technical terms.

Minor comments:

Lines 16-18: this sentence is not completely clear. I could understand it only after reading the paper. Since this is the abstract, I suggest rewording it.

Line 32: some change-permitting GEV methods allow for changes also of the scale parameter (e.g. see Prosdocimi and Kjeldsen, 2021)

Lines 41-42: I personally disagree on this point. While this is very true for traditional methods based on extreme value analysis, there are some novel statistical methods which show promising results in this sense. They are now published since few years (the first papers are by Marani and Ignaccolo, 2015; Zorzetto et al., 2016), and many came after providing evidence (with applications to rain gauge data as well as satellite data) of the fact that 20 years might be sufficient for at-site estimates of even 100-year return levels. I believe it is time to recognize this by specifying that this limit concerns the traditional methods based extreme value analyses.

Line 112: it is not clear to me what the authors mean with "with underlying sampling probabilities"

Line 115: what does "not necessarily present" mean exactly? Is it a way to say “independent”?

Line 127: I suggest to include this information on the elevation range earlier in the text. Perhaps a short section describing the study area could help also in the following discussion.

Line 222: with "lower time steps", do you mean “shorter durations”?

References

Eldardiry, H., Habib, E., Zhang, Y., 2015. On the use of radar-based quantitative precipitation estimates for precipitation frequency analysis. J. Hydrol. 531, 441–453. https://doi.org/10.1016/j.jhydrol.2015.05.016

Goudenhoofdt, E., Delobbe, L., and Willems, P.: Regional frequency analysis of extreme rainfall in Belgium based on radar estimates, Hydrology and Earth System Sciences, 21, 5385-5399, 2017

Haberlandt, U., Berndt, C., 2016. The value of weather radar data for the estimation of design storms an analysis for the hannover region. In: Schumann, A. (Ed.), The Spatial Dimensions of Water Management – Redistribution of Benefits and Risks Data. IAHS, pp. 81–85

Marani, M., Ignaccolo, M., 2015. A metastatistical approach to rainfall extremes. Adv. Water Resour. 79, 121–126. https://doi.org/10.1016/j.advwatres.2015.03.001

Marra F, EI Nikolopoulos, EN Anagnostou, A Bárdossy E Morin, 2019. Precipitation frequency analysis from remotely sensed datasets: A focused review., J. Hydrol. 574, 699-705, https://doi.org/10.1016/j.jhydrol.2019.04.081

Prosdocimi, I., Kjeldsen, T. Parametrisation of change-permitting extreme value models and its impact on the description of change. Stoch Environ Res Risk Assess 35, 307–324 (2021). https://doi.org/10.1007/s00477-020-01940-8

Zorzetto, E., Botter, G., Marani, M., 2016. On the emergence of rainfall extremes from ordinary events. Geophys. Res. Lett. 43, 8076–8082. https://doi.org/10.1002/2016GL069445
Citation: https://doi.org/10.5194/hess-2021-366-RC1
- AC1: 'Reply on RC1', Andreas Hänsler, 08 Nov 2021
  
  Dear Francesco Marra,
  thank you very much for your valuable critics, remarks and suggestions in order to improve our manuscript. Please find our response to the various points you raised indicating what we adapted in the manuscript in the attached pdf (supplementary file).
  Additionally, I would like to apologize for not being active in the open discussion during the review phase. The reason for this is that I was on parental leave, which actually started a bit earlier than was originally foreseen.
  Best regards, also on behalf of my co-author,
  Andreas Hänsler
  
  Citation: https://doi.org/10.5194/hess-2021-366-AC1
RC2:
'Comment on hess-2021-366', Anonymous Referee #2, 12 Aug 2021

This manuscript deals with the important and timely topic of determining design storms with return periods of up to 100 years from rather short time series of precipitation data from radar observations. The authors present a method to statistically extend time series of weather radar rainfall estimates by combining regional frequency analyses with subsequent bias correction. The results show improvement over the sampling approach by Goudenhoofdt et al. (2017) that is used as basis for their method, but uncertainties, e.g. a bias in the radar data for design storms with large return periods, still remain.

The study fits in the scope of HESS and is of interest to the research community. However, I suggest to address some major aspects, that I listed below, before publishing the paper. I’d be happy to discuss my suggestions with the authors in the open discussion and clear up possible misunderstandings.

Major comments:

1. A major concern is the minimum distance of the radar cells that are considered to statistically extend the time series of the cell of interest. As far as I understand the cells have to be at least 4 km apart. The authors mention that the typical size of a convective cell in Germany is 40 km for hourly events according to Lengfeld et al. 2019 (p.4, l.121 in this manuscript). Therefore, the minimum distance of 4 km seems a bit too small to me, especially when considering also daily events that have a much larger typical spatial extent. Did the authors perform any kind of independence check for the time series from the cells that are combined to a long time series, e.g. the correlation of the time series or the percentage of time steps with rainfall in the cell of interest but no rainfall in the other cells of the sample? How do you make sure that the 258 events are actually taken from all 5 time series and not only taken from the 19 year time series of the COI? I was also wondering if the same set of cells are used throughout the study or if the samples vary for the three durations that are considered.

2. The authors only consider precipitation data from April to October, because this is the main season of convective events of short durations. The statistical approach to determine designs storms is based on a partial time series consisting of e (Euler’s number) times the number of years. I was wondering, if this approach is still valid if only 7 out of 12 months of the year are considered. Although it is common knowledge that most of the convective storms occur during summer, some events might still be missed, especially for the design storms with 24 h durations that might also be associated with advective weather situations. To my understanding, the reference data sets KOSTRA and BW-Stat consider all months and might not be comparable to the radar based data set. I would suggest to take all months into account or the authors should provide some kind of validation for their choice of selecting only summer months.

3. Section 2.2 about the reference data sets is quite short. More information about both data sets (e.g. how many stations are considered, length of the time series, interpolation methods, etc.) and on the differences in the statistical approaches to determine design storms from those data sets are desirable. The method for BW-Stat is briefly described in section 2.4. Maybe it would be better to have a general section about the methods first and then describe the data sets and their differences. E.g. that a two parameter GEV distribution is used in KOSTRA, instead of GPD for BW-Stats and the radar-based data set, is only mentioned in the discussion. This is important information that should be given in the method section.

4. A more detailed description of the sampling process, the generation of the ensemble members, the bootstrapping method and the bias correction is needed to allow for better understanding of the results and of the choices made by the authors (e.g. why 5 ensemble members?).

Minor comments:

p.3, l.75: “… which leads to a spatially…” → “…which leads to spatially…”

p.3, l.92: To my knowledge, the KOSTRA-DWD-2010R data set has a resolution of about 8.2 x 8.2 km. Did the authors perform some kind of remapping to achieve the 5 km x 5 km resolution?

p.5, l151-152: Almost the same sentence is repeated on p.6, l.161-162.

p.6, l.174-176: The radar data are adjusted to the 1 year design storms of the station-based BW-Stat data set. In the results section both data sets are also compared to design storms with 100 year return period derived from KOSTRA. For a better assessment of the differences between 100 year design storms from KOSTRA and the other two data sets it would be beneficial to also compare the 1 year design storms. Do they show the same features in the spatial pattern? How large are the differences?

p7., l.205: “… located more the centre…” → “…located more to the centre…”

p.8, l.222: “…time steps can attributed…” → “…time steps can be attributed…”

What is meant by “lower spatial distribution”? Lower spatial resolution?

p.8, l.230: “…as well as the for the bias-corrected…” → “…as well as for the bias-corrected…”

p.8, l.238: Isn’t the 24 h design storm from KOSTRA shown in Fig.4? Or is that something else the authors refer to here?

p.8, l.249: REGNIE is first mentioned here and should be explained.

p.9, l.257: Which figure do the author refer to here regarding the 20 year design storms?

p.9, l.269: 10 th → 10th

p.9, l.276: “… in in the case…” → “…in the case…”

p.9, l.283: “…relatively larger uncertainty…” → “…relatively large uncertainty…” or “…larger uncertainty…”

p.10, l.299-303: This might fit better in the result section.

p.10, l.310: “…can be seen as a rather robust…” → “…can be seen as rather robust…”

p.11, l.318: “…difference…become…” → “…difference…becomes…”

p.11, l.319: “…between in rainfall estimates…” → “…between rainfall estimates…”

p.11, l.331: “…in future” → “…in the future”

p.14, l.417-418: “…based on distance to cell of interest…” → ”…based on the distance to the cell of interest…”

p.14., l.420: “…is marked with in red…” → “…is marked in red…”

April to October of which years?

p.15, l.426: It should either be “for a single member” or “for single members”

p.15, l.430: Remove one of the brackets.

p.16, Figure 4: It would also be interesting to see the differences between KOSTRA and RAD-BC.

p.17, l.440: What is meant by “four different event”? I assume it is supposed to be “for different event durations”?

p.17, l.443: Why is the 96th percentile chosen here instead of the 95th percentile?

P18, l.450: There is no comma needed after “regions”.

Citation: https://doi.org/10.5194/hess-2021-366-RC2
- AC2: 'Reply on RC2', Andreas Hänsler, 08 Nov 2021
  
  Dear Anonymous Referee,
  
  thank you very much for your valuable critics, remarks and suggestions in order to improve our manuscript. Please find our response to the various points you raised, indicating what we adapted in the manuscript in the attached pdf-file (supplement).
  
  Additionally, I would like to apologize for not being active in the open discussion during the review phase. The reason for this is that I was on parental leave, which actually started a bit earlier than was originally foreseen.
  
  Best regards, also on behalf of my co-author,
  
  Andreas Hänsler
  
  Citation: https://doi.org/10.5194/hess-2021-366-AC2

Andreas Hänsler and Markus Weiler

Viewed

Total article views: 1,638 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,231	344	63	1,638	82	91

HTML: 1,231
PDF: 344
XML: 63
Total: 1,638
BibTeX: 82
EndNote: 91

Views and downloads (calculated since 12 Jul 2021)

Month	HTML	PDF	XML	Total
Jul 2021	233	48	4	285
Aug 2021	41	20	1	62
Sep 2021	23	7	1	31
Oct 2021	30	8	1	39
Nov 2021	46	13	2	61
Dec 2021	27	8	1	36
Jan 2022	18	2	0	20
Feb 2022	19	7	0	26
Mar 2022	15	1	1	17
Apr 2022	16	7	0	23
May 2022	11	8	1	20
Jun 2022	5	4	2	11
Jul 2022	20	2	0	22
Aug 2022	9	10	3	22
Sep 2022	5	7	0	12
Oct 2022	8	2	1	11
Nov 2022	4	6	0	10
Dec 2022	10	5	0	15
Jan 2023	7	13	1	21
Feb 2023	9	3	0	12
Mar 2023	7	3	1	11
Apr 2023	5	6	0	11
May 2023	13	6	2	21
Jun 2023	13	9	1	23
Jul 2023	31	9	3	43
Aug 2023	29	5	1	35
Sep 2023	14	7	2	23
Oct 2023	6	7	0	13
Nov 2023	13	1	1	15
Dec 2023	8	1	1	10
Jan 2024	12	4	0	16
Feb 2024	10	7	0	17
Mar 2024	9	11	2	22
Apr 2024	13	8	5	26
May 2024	31	3	4	38
Jun 2024	34	2	3	39
Jul 2024	28	2	3	33
Aug 2024	32	5	0	37
Sep 2024	33	8	0	41
Oct 2024	25	10	2	37
Nov 2024	28	2	1	31
Dec 2024	24	1	0	25
Jan 2025	35	2	0	37
Feb 2025	36	6	0	42
Mar 2025	33	5	4	42
Apr 2025	28	5	2	35
May 2025	37	6	1	44
Jun 2025	37	8	1	46
Jul 2025	43	12	2	57
Aug 2025	8	2	2	12

Cumulative views and downloads (calculated since 12 Jul 2021)

Month	HTML	PDF	XML	Total
Jul 2021	233	48	4	285
Aug 2021	41	20	1	62
Sep 2021	23	7	1	31
Oct 2021	30	8	1	39
Nov 2021	46	13	2	61
Dec 2021	27	8	1	36
Jan 2022	18	2	0	20
Feb 2022	19	7	0	26
Mar 2022	15	1	1	17
Apr 2022	16	7	0	23
May 2022	11	8	1	20
Jun 2022	5	4	2	11
Jul 2022	20	2	0	22
Aug 2022	9	10	3	22
Sep 2022	5	7	0	12
Oct 2022	8	2	1	11
Nov 2022	4	6	0	10
Dec 2022	10	5	0	15
Jan 2023	7	13	1	21
Feb 2023	9	3	0	12
Mar 2023	7	3	1	11
Apr 2023	5	6	0	11
May 2023	13	6	2	21
Jun 2023	13	9	1	23
Jul 2023	31	9	3	43
Aug 2023	29	5	1	35
Sep 2023	14	7	2	23
Oct 2023	6	7	0	13
Nov 2023	13	1	1	15
Dec 2023	8	1	1	10
Jan 2024	12	4	0	16
Feb 2024	10	7	0	17
Mar 2024	9	11	2	22
Apr 2024	13	8	5	26
May 2024	31	3	4	38
Jun 2024	34	2	3	39
Jul 2024	28	2	3	33
Aug 2024	32	5	0	37
Sep 2024	33	8	0	41
Oct 2024	25	10	2	37
Nov 2024	28	2	1	31
Dec 2024	24	1	0	25
Jan 2025	35	2	0	37
Feb 2025	36	6	0	42
Mar 2025	33	5	4	42
Apr 2025	28	5	2	35
May 2025	37	6	1	44
Jun 2025	37	8	1	46
Jul 2025	43	12	2	57
Aug 2025	8	2	2	12

Viewed (geographical distribution)

Total article views: 1,469 (including HTML, PDF, and XML) Thereof 1,469 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 08 Aug 2025

Short summary

Spatially explicit quantification on design storms are essential for flood risk assessment. However this information can be only achieved from substantially long records of rainfall measurements, usually only available for a few stations. Hence, design storms estimates from these few stations are then spatially interpolated leading to a major source of uncertainty. Therefore we defined a methodology to extend spatially explicit weather radar data to be used for the estimation of design storms.


Total:	0
HTML:	0
PDF:	0
XML:	0