Evaluation of high-intensity rainfall observations from personal weather stations in the Netherlands

Rombeek, Nathalie; Hrachowitz, Markus; Droste, Arjan; Uijlenhoet, Remko

doi:https://doi.org/10.5194/hess-29-4585-2025

Articles | Volume 29, issue 18

https://doi.org/10.5194/hess-29-4585-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/hess-29-4585-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 29, issue 18

Research article

|

24 Sep 2025

Research article |

| 24 Sep 2025

Evaluation of high-intensity rainfall observations from personal weather stations in the Netherlands

Nathalie Rombeek, Markus Hrachowitz, Arjan Droste, and Remko Uijlenhoet

Download

Final revised paper (published on 24 Sep 2025)
Preprint (discussion started on 08 Nov 2024)

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-3207', Anonymous Referee #1, 17 Jan 2025
Review report on Evaluation of high-intensity rainfall observations from personal weather stations in the Netherlands from Rombeek et al. 2024.
The authors evaluate the robustness of rainfall estimates from personal weather stations (PWSs) by comparing them to automatic weather stations in the Netherlands over six years, identifying significant underestimation of PWS rainfall, especially for extreme events. They select rain events for different aggregations and seasons and apply part of a previously published QC method. Adjustments like bias correction improved the accuracy for moderate events, but limitations persist for high-intensity rainfall, suggesting the need for dynamic calibration and additional filtering techniques. The overall quality of the manuscript and research is good and well within the scope of HESS. To enhance its depth and utility for future readers, the authors should provide a stronger justification for their choice of QC methods, or better, show some analysis in this regarding who their choice affects the results, and propose a specific bias correction factor for hourly time scales (details provided in specific comments). Otherwise I only have minor comments and would recommend the publication of this manuscript after the above issue (selected as major, but IMO a minor manjor) is addressed.

Major Comments
Influence of QC/Bias correction on performance.

You already point out the importance of QC and bias correction throughout the manuscript. Therefore, my main question is, how did the choice and the parameters of the HI and FZ filter and bias correction influence the results. I miss the reasoning for the two used filters (and not using the SO filter from de Vos et al. 2019) and not the method from Bardossy et al. (2021) or other QC methods typically used for rain gauge data. I am not asking you to compare all available methods or a detailed sensitivity analysis for each parameter, but the choice of methods and parameters will have an effect over different seasons.
One example: the FZ filter discards a value if half of its neighbors are also zero and the HI filter relies on a maximum allowed factor that a station can deviate from the surrounding ones. Winter and summer precipitation might cause a different need of these parameters.
You discuss the bias correction well and it is reasonable to use a default value from a previous study. By showing the residual bias over different aggregations e.g. in Fig 8c you already indirectly give the optimal bias correction factor. You could add this as a result to the paper extending its scope a little bit. A suggestion would be to support the bias correction factor it by giving the uncertainty through a bootstrap sampling. Checking both the filtering and bias correction would allow you to further assess the robustness of the PWS rainfall rates.
Structure of motivation for this study

I find the reasoning and structure of uncertainty factors of rain gauges in general and PWS specifically given in L 85ff to be unclear and not exhaustive. Errors for rain gauges and personal weather station are for example also undercatch due to wind, solid precipitation or evaporation, which are missing already in L39. They could be a fourth group of errors in L58ff.

Also, the phrase “in addition” in L61 after a list of three points suggests that you could add a (4) item to the list?

It would be good to have a more structured and complete list of the potential errors as they motive they choice for the quality control routine. You could even link individual QC methods to errors i.e. (3) setup and maintenance à bias correction and FZ filter

Minor comments
Abstract: You could sharpen the scope of the paper by including the content from L79 (the gap you aim to close) in it.
Introduction:
You may include following literature if you find them fitting
https://doi.org/10.1016/j.atmosres.2024.107228 give additional motivation to evaluate the performance of PWS as they are already used in applications like radar adjustment

https://www.jstor.org/stable/26496995 also investigate PWS in the Netherlands, focusing e.g. on the difference between urbn and rural stations

https://doi.org/10.1145/3276774.3276792 have a bit a different perspective on QC while also relying an station clustering

https://doi.org/10.1016/j.ejrh.2021.100883 compared (interpolated) PWS data for different time scales

L29 560 ha seems very specific, could you better give a range of what is considered a small, fast reacting catchment
L52 For Netatmo I think users can decide whether data is uploaded or not? For other platforms, that is certainly the case.
L77 gives the impression that you also use the QC from Bardossy et al (2021) – which would be interesting, but might be too much here
L93 data availability was too low before, right? You could state this more specifically here
L102 You could add a statement about spatial correlation from de Beek (2012) already here to describe the area further
Fig 2. You could add IQR or min/max to the monthly barplots to give a feeling for variability
L146 Did you use a fixed window for resampling i.e. always to the full hour like XX:00 to XX:55? This could be important for the selection of events.
L189 You could add the best/worst/range for all validation metrics
L195 You mix bias and relative bias here and, in the text, please clarify
L 220 Will the
L276 with an average PWS intensity
Figure 8: Any idea why JJA and MAM seem to be very similar and DJF and SON?
L345 and L353 are a bit counter-intuitive. You want to give insight in he uncertainty, but at the same try to reduce uncertainty due to aggregation. Please clarify.
L364 to 366 Do you refer to the two highest hourly events in JJA? Maybe you could check the radar images for those two events and check the spatial distribution of rainfall during these events? Similar, looking at the 5/10 minute time series from AWS and surrounding PWS could give some insight on those two events.

Technical issues
L15 duplicated “with”
L52 “are” instead of “is”
Citation: https://doi.org/10.5194/egusphere-2024-3207-RC1
- AC1: 'Reply on RC1', Nathalie Rombeek, 21 Feb 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-3207/egusphere-2024-3207-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-3207-AC1
RC2:
'Comment on egusphere-2024-3207', Anonymous Referee #2, 24 Jan 2025

In this manuscript, the authors present a systematic study of uncertainties from personal weather stations for different seasons and rainfall durations which were evaluated against a reference dataset from the Royal Netherlands Meteorological Institute. The paper is well written and the topic fits into the scope of HESS. However, the are some aspects in the research design which need to be addressed before publication.
One major point of criticism from my perspective is the approach to bias correction used in this study. If the results show that the bias varies with different durations, then why is a single static value used? When you choose a MFBC of 1.24, the bias is reduced for longer durations, but this choice appears somewhat arbitrary. For instance, a higher MFBC value could potentially reduce the bias for shorter durations but might lead to an overestimation of longer durations. I miss the reasoning behind the choice of this value. Rather than applying a single MFBC value, the study could focus on providing insights into the biases of PWS with respect to duration an intensity, as already indicated in Table 2 and Figure 8 for example. Additionally, more sophisticated methods, such as quantile mapping, might be more appropriate for bias correction, as biases in PWS data can be either positive or negative. It may also be worth considering applying bias correction individually for each PWS.
Another aspect you should consider are outliers in the winter: Why didn’t you apply the temperature filter directly and exclude obvious snowfall events from the analysis? Data from unheated rain gauges, like those from Netatmo, can be inaccurate in winter, potentially leading to outliers and other issues. From my perspective, demonstrating the process of detecting these outliers and showing how results improve after their removal doesn't offer any new insights. Removing events with snowfall a priori would prevent the evaluation of obviously erroneous data, as shown in figures 7b) and 8a) and b).
Last but not least, it isn’t entirely clear to me what the motivation behind the ARFs is. They are only briefly addressed in section 3.6 and 4.5. Are you trying to compare ARFs from Radar with PWS? Or are using the ARFs to adjust the values of the AWS to the cluster size? This topic should either be motivated and discussed in more detail in the manuscript or omitted entirely.

Minor comments
-Regarding the selection of the events, why didn't you select events such that the PWS data for the events have no missing values, i.e. are complete? This would also avoid the problem of missing data and underestimation as discussed in l. 240ff. If only one PWS in the cluster has missing data around the AWS infilling could also be an option.
- Did you check whether the PWS used in this study were calibrated by the user, i.e. if the amount of each tip differs from 0.101mm? This might also have an influence on the bias. For consistent results, only PWS with the default calibration could be used - or the number of tips could be determined from the PWS where a calibration was done by the user and converted to the original default value.
- Section 5.4. All rain gauges (AWS and PWS) suffer from undercatch alike, maybe PWS a bit more depending on how they are installed. This section could be omitted from my point of view as it offers no new insights.

Specific comments:
l.27ff consider deleting the brackets or rephrasing in more general terms
l.40ff Radar attenuation should be mentioned here, this is a major source for underestimation of radar data.
l. 47 Are these all PWS? Or just the ones with rain gauges?
l.62 use
l. 78 El Hachem et al. 2024 is published in the meantime, please update the citation:
El Hachem, A., Seidel, J., O'Hara, T., Villalobos Herrera, R., Overeem, A., Uijlenhoet, R., Bárdossy, A., and de Vos, L.: Technical note: A guide to using three open-source quality control algorithms for rainfall data from personal weather stations, Hydrol. Earth Syst. Sci., 28, 4715–4731, https://doi.org/10.5194/hess-28-4715-2024, 2024.
l. 89 Consider rephrasing this sentence, correcting limitations sounds awkward. Furthermore the the potential of PWS is not only limited to forecasts, it is also a major benefit for hydrological modelling, urban hydrology etc.
l. 107 maybe mention that is this done via software by changing the volume per tip.
l. 109 consider rephrasing, e.g. “the data is transmitted via WiFi to the Netatmo platform“
l. 126f mentioning this limit here is not necessarily required.
l. 128 „exact recording stations“? You are referring to the PWS that were online at that time I suppose.
l. 153ff you are using „selected“ quite a few times, consider rephrasing
l. 172f How many time steps were removed by applying the HI and FZ filter?
l. 217f try to avoid 2 -line paragraphs. And consider providing more detailed information on the retained data (with respect to seasons, duration, etc.) here.
Figure 7 It's difficult to distinguish the number of PWS in this plot. I suggest to use a different more differentiable colour ramp
l. 170ff High influx can also result from a connection interruption between the rain gauge module and the base station.
l. 184 ff Instead of percentages consider stating how many records per aggregation were required for the valid value (e.g. 10 out of 12 5 minutes for one hour)
l. 259 ff Could you please provide more details or discuss why longer transferring and processing errors are reduce for longer intervals or be more specific about intervals for which this holds

Citation: https://doi.org/10.5194/egusphere-2024-3207-RC2
- AC2: 'Reply on RC2', Nathalie Rombeek, 21 Feb 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-3207/egusphere-2024-3207-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-3207-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Reconsider after major revisions (further review by editor and referees) (18 Mar 2025) by Efrat Morin

AR by Nathalie Rombeek on behalf of the Authors (29 Apr 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (01 May 2025) by Efrat Morin

RR by Anonymous Referee #1 (08 May 2025)

RR by Anonymous Referee #2 (23 May 2025)

ED: Publish subject to technical corrections (14 Jun 2025) by Efrat Morin

AR by Nathalie Rombeek on behalf of the Authors (21 Jun 2025) Author's response Manuscript

Short summary

Rain gauge networks from personal weather stations (PWSs) have a network density 100 times higher than dedicated rain gauge networks in the Netherlands. However, PWSs are prone to several sources of error, as they are generally not installed and maintained according to international guidelines. This study systematically quantifies and describes the uncertainties arising from PWS rainfall estimates. In particular, the focus is on the highest rainfall accumulations.

Evaluation of high-intensity rainfall observations from personal weather stations in the Netherlands

Download

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection