the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
On the visual detection of non-natural records in streamflow time series: challenges and impacts
Laurent Strohmenger
Eric Sauquet
Claire Bernard
Jérémie Bonneau
Flora Branger
Amélie Bresson
Pierre Brigode
Rémy Buzier
Alban de Lavenne
Olivier Delaigue
Alexandre Devers
Guillaume Evin
Maïté Fournier
Shu-Chen Hsu
Sandra Lanini
Thibault Lemaitre-Basset
Claire Magand
Guilherme Mendoza Guimarães
Max Mentha
Simon Munier
Charles Perrin
Tristan Podechard
Léo Rouchy
Malak Sadki
Myriam Soutif-Bellenger
François Tilmant
Yves Tramblay
Anne-Lise Véron
Jean-Philippe Vidal
Guillaume Thirel
Abstract. Large datasets of long-term streamflow measurements are widely used to infer and model hydrological processes. However, streamflow measurements may suffer from what users can consider as anomalies, i.e., non-natural records that may be erroneous streamflow values or anthropogenic influences that can lead to misinterpretation of actual hydrological processes. Since identifying anomalies is time consuming for humans, no study has investigated their proportion, temporal distribution, and influence on hydrological indicators over large datasets. This study summarizes the results of a large visual inspection campaign of 674 streamflow time series in France made by 43 evaluators, who were asked to identify anomalies falling under five categories, namely, linear interpolation, drops, noise, point anomaly, and other. We examined the evaluators’ individual behavior in terms of severity and agreement with other evaluators, as well as the temporal distributions of the anomalies and their influence on commonly used hydrological indicators. We found that inter-evaluator agreement was surprisingly low, with an average of 12 % of overlapping periods reported as anomalies. These anomalies were mostly identified as linear interpolation and noise, and they were more frequently reported during the low-flow periods in summer. The impact of cleaning data from the identified anomaly values was higher on low-flow indicators than on high-flow indicators, with change rates lower than 5 % most of the time.
We conclude that the identification of anomalies in streamflow time series is highly dependent on the aims and skills of each evaluator, which raises questions about the best practices to adopt for data cleaning.
Laurent Strohmenger et al.
Status: final response (author comments only)
-
RC1: 'Comment on hess-2023-58', Martin Gauch, 16 May 2023
Summary
The paper describes the outcomes of an effort to have human annotators detect non-natural/incorrect streamflow measurements. The authors find remarkable inconsistency in different annotators' responses.Overall, I think this paper is well-written and to the point. I appreciated the discussion and suggestions for future related studies.I think that a few changes in the analysis of results could further improve the paper. Please find my detailed comments below.Major Comments- I would guess that it is not always clear which exact timesteps around an anomaly should be annotated. I think it might therefore be interesting to see if the consistency increases noticeably when we consider a windowed approach: e.g., consider two annotations to agree when they fall within a 1-week window.
- I appreciate the effort to measure the impact of removing annotations through the "change rate", but it is hard to put the values into perspective. For that, I would propose the following baseline: report the change rate that you get when randomly removing the same fraction of time steps (ideally, do this a few times and report the average change rate).
- It might be interesting to see if there is a noticeable difference in the responses/consistency of novice vs. advanced vs. senior annotators.
- Reproducibility/data availability: For purposes of reproducibility, I would appreciate the download to include:
- the streamflow data (and ideally also the supplemental information like precipitation, temperature, GR5J simulation, etc.), such that others can reproduce and build upon the analyses. The French-only HydroPortail link is not helpful and it's unclear what data one would need to collect from there.
- the code used in the study (e.g., to calculate the various statistics)
- information on the evaluators (background academic/operational, level of experience)
- It would be really interesting to see how removing bad data impacts the quality of hydrological models. Is a model that is calibrated/trained only on "good" timesteps better than one that is calibrated on all timesteps (on a validation period)? This might be out of scope for this paper, but could at least be mentioned as future work.
Minor Comments- L90 states >30 years of available records as a condition, but then L96 reports that some rivers have only 26 years of streamflow. Am I missing something?
- L224 "little impact on the change rate". If I understand correctly, the change rate is exactly what measures the magnitude of impact. I.e., the change rate only exists after removing the anomalies (and not before). So shouldn't this rather be reworded to something like "did not result in large change rates"?
- L242f Regarding the first reason you provide for the stabilized variability: I don't think it is the variability that lowers as the number of stations increases. I think it is the quality of the estimate that improves as more samples are available. Both station quality and human randomness have some variance, which will be better estimated from larger sample sizes.
- L266: This notion of annotators getting "bored" could be verified by checking whether the number of annotations also decreases across stations (those rated first vs. those rated last)
- L281: Do you know how this "good-quality" label was assigned? Does this mean another human had visually deemed these records as good? Or via automatic checks?
- L290: The logarithmic scale is an important detail to me, because I think it makes linear interpolation much harder to notice. This might be something to add to L349f.
- L328ff: I would move this brief literature review to the related work section and remove it from the discussion.
- Out of curiosity: would a map of France show any spatial patterns in the annotations?
TyposL125 rainfall--runoffCitation: https://doi.org/10.5194/hess-2023-58-RC1 -
RC2: 'Comment on hess-2023-58', Frederik Kratzert, 25 May 2023
The manuscript, by Laurent Strohmenger et al., presents results of a survey, in which 42 hydrologists were asked to annotate anomalies in streamflow time series. In my eyes, the main results of this manuscript:
- A dataset of anomalies in streamflow time series annotated by human experts.
- Another proof of the subjectivity and inconsistency of human experts, when tasked to rate/compare/annotate hydrographs.
- Recommendations for future studies of this kind.
Overall, the manuscript is very well written, easy to follow. Given the already published comments by Martin Gauch, I have very little (see below) to add and recommend the publication of this manuscript after considering these minor comments.
Data sharing.
The authors state in L 354ff. The following:
An automatic detection of anomalies could avoid these issues of subjectivity and weariness. Using the bias between model simulations and measured time series could be a starting point for identifying potential anomalies. Unfortunately, to our knowledge, these techniques still require improvements. Such an algorithm should be flexible regarding the types of anomaly to identify, and might be trained for each study to avoid the risk of removing data of interest (e.g., using a visual inspection such as the one reported in this study). Ideally, hydrologists should share a common library of anomaly types such as suggested by Wilby et al. (2017). “
I 100% agree with this statement and e.g. the automatic quality control functions in the GSIM paper yield very suboptimal results. I therefore value that the authors opted for publishing the annotations from their study, which could be beneficial when testing future approaches for automatic detection algorithms. In this regard though, I agree with the review comment by Martin Gauch, that it would be very helpful if you would also include the streamflow time series in the published data, and maybe the GR5J simulations. As Martin Gauch mentioned, the linked homepage is in French, which e.g. I do not speak. I tried to use a translation tool but after 10 minutes of trying to figure out how to download data, I gave up. It also seems like there is no API access for downloading the data, which makes the effort to get the ~600 station time series quite cumbersome. If this data already exists in the hands of the authors and there are no constraints from the data provider that prohibits the publication of their data, then why not include the streamflow time series as well. Otherwise I see limited use in the published annotations, which would be a shame.
Citation: https://doi.org/10.5194/hess-2023-58-RC2 -
CC1: 'Referee Comment on hess-2023-58', Alexander Gelfan, 03 Jun 2023
Publisher’s note: this comment is a copy of RC3 and its content was therefore removed.
Citation: https://doi.org/10.5194/hess-2023-58-CC1 -
RC3: 'Comment on hess-2023-58', Alexander Gelfan, 03 Jun 2023
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2023-58/hess-2023-58-RC3-supplement.pdf
Laurent Strohmenger et al.
Laurent Strohmenger et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
510 | 157 | 12 | 679 | 5 | 3 |
- HTML: 510
- PDF: 157
- XML: 12
- Total: 679
- BibTeX: 5
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1