Articles | Volume 29, issue 3
https://doi.org/10.5194/hess-29-767-2025
© Author(s) 2025. This work is distributed under the Creative Commons Attribution 4.0 License.
Creating a national urban flood dataset for China from news texts (2000–2022) at the county level
Download
- Final revised paper (published on 13 Feb 2025)
- Preprint (discussion started on 27 May 2024)
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on hess-2024-146', Anonymous Referee #1, 24 Jun 2024
- AC1: 'Reply on RC1', Heng Lyu, 06 Sep 2024
-
RC2: 'Comment on hess-2024-146', Anonymous Referee #2, 01 Aug 2024
- AC2: 'Reply on RC2', Heng Lyu, 06 Sep 2024
-
RC3: 'Comment on hess-2024-146', Anonymous Referee #3, 02 Aug 2024
- AC3: 'Reply on RC3', Heng Lyu, 06 Sep 2024
-
RC4: 'Comment on hess-2024-146', Anonymous Referee #4, 03 Aug 2024
- AC4: 'Reply on RC4', Heng Lyu, 06 Sep 2024
Peer review completion
AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
ED: Publish subject to revisions (further review by editor and referees) (27 Sep 2024) by Marnik Vanclooster
AR by Heng Lyu on behalf of the Authors (07 Nov 2024)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (07 Nov 2024) by Marnik Vanclooster
RR by Anonymous Referee #1 (30 Nov 2024)
ED: Publish subject to technical corrections (03 Dec 2024) by Marnik Vanclooster
AR by Heng Lyu on behalf of the Authors (12 Dec 2024)
Manuscript
Post-review adjustments
AA – Author's adjustment | EA – Editor approval
AA by Heng Lyu on behalf of the Authors (10 Feb 2025)
Author's adjustment
Manuscript
EA: Adjustments approved (10 Feb 2025) by Marnik Vanclooster
Manuscript Review of "Extracting Spatiotemporal Flood Information from News Texts Using Machine Learning for a National Dataset in China"
Overview
The authors developed a machine learning approach based on Natural Language Processing (NLP) applied to news data to identify urban-flood-related news items and extract the timing and location of thousands of flood events in China from 2000 to 2022. The research is original and aligns with the burgeoning trend of NLP-driven research and applications, including in disaster risk research. The proposed approach is efficient, demonstrating high-performance metrics for both event detection and information extraction.
The proposed approach and resulting dataset could serve as a valuable, significant, and complementary basis for future research and improving risk management and modeling practices. Historical catalogs of flood hazards, crucial for understanding flood risks, remain scarce and biased, whether constructed from textual documents or satellite imagery, both regionally and globally. Given the high performance of the event detection and information extraction approach and the high number of retrieved events, I conclude that the research provides significant results.
However, my primary expectation for the introduction and its expansion within the discussion section would have been to gain more insight into how this approach compares to existing NLP-based flood event extraction studies and how the resulting dataset fills gaps in existing catalogs. In the current manuscript version, these aspects are not sufficiently covered. While some NLP approaches are introduced, the manuscript lacks a sufficient overview of the state-of-the-art on this hot topic despite a high number of references, not all of which are directly relevant. Moreover, the performance of the proposed method is not compared with existing methods, leaving the reader without a clear understanding of how this method ranks within its application domain (considering language differences). Similarly, while a few catalogs of events are introduced, the resulting catalog is not sufficiently compared to these existing ones, thus failing to demonstrate a reduction of the knowledge gap the paper intends to lower.
On the other hand, eight figures attempt to depict the spatiotemporal and environmental context of the resulting dataset, which could be done more efficiently to reorient the paper towards a more balanced discussion. Additionally, I raise in the general comments some concerns about the considered flood types, the analysis of the GDP-economic drivers of flood reporting that do not consider population density, the chosen flood susceptibility indicators, and a few concerns about FAIR standards and the format of the shared dataset.
In conclusion, while I consider the paper a valuable piece of research and application, it shows weaknesses, in my opinion, by not being sufficiently contextualized and lacking some key focuses on the result rank and fit in its context.
General Comments
Note: I compare the authors'catalog initiative with the Global Landslide Catalog (GLC) to support some general comments (see Kirschbaum et al., 2010, 2015; Dandridge et al., 2023).
G1. Flood Query Keywords
The flood query was limited to "flood" and "flood disasters" (L142, L154), while many other terms could hint at flood events in news items, e.g., "typhoon," "cyclone," "mud," "heavy rainfall," "inundated areas,"… Query terms are an essential aspect of event detection and this could be seen as a restriction limiting the detection power of the proposed approach. It raises some questions: Should this be documented as a limitation? Is it a decision to limit the size of the corpus? Does the Q&A approach prevent that concern?
G2. Flood Types and Multi-Hazard Concerns
The paper focuses on urban floods, excluding other types of floods, yet flood types are interrelated and very often not mutually exclusive. Hence, referring, for instance, to the Hazard Information Profiles (HIPs, https://www.preventionweb.net/drr-glossary/hips ), an urban flood could also be related to a flash flood (despite the exclusion of the query of "flash flood," L151), a riverine flood, a coastal flood, a groundwater flood. Floods are also secondary hazards associated with other hazards, such as a flood that could result from a Typhoon, heavy rainfall, a storm surge, an intense monsoon etc. Floods are also associated with geo-hazards such as landfall (See GLC studies). I found the Typhoon case study in the paper interesting. It also illustrates the multi-hazard nature of floods well. As in GLC studies, I would be interested in having the authors' view on multi, cascading, and co-occurring type issues, the possibilities of detecting multi-type floods, and the challenges, limitations, and perspectives concerning their proposed approach.
G3. A More Balanced Discussion: Trend Analyses vs. Gap Filling Potential
The manuscript extensively discusses spatiotemporal trend analysis, necessitating more caution and clarity on trends influencing factors. I understand the need to illustrate trends in the resulting dataset, but, in my opinion, this matter could be more efficiently summarized, and the paper could be more descriptive and less assertive in the interpretation. Some analyses are simplistic and do not go deep enough. Rather than make the paper even longer, I invite the authors to distinguish more between the essential and the accessory and, if anticipated, to cover in greater depth the spatiotemporal analysis of events and cross-referencing with third-party data in other papers (see GLC studies).
Some figures may be grouped, e.g., maps in different pannels of one figure, allowing not only to focus on the trends of the output data but also on how the output data compares to other datasets, which is currently limited to Figure 4, despite the numerous datasets being listed in the introduction. The reader has little clue as to what gap is being filled. In particular, the Chinese bulletin appears as a more exhaustive dataset (although coarser). This point may be worth further discussion.
Note regarding temporal trends:
Trends in hazard occurrences are complex, influenced by variations in hazard intensity and alteration of environmental susceptibility, as well as demographic shifts that alter exposure or vulnerability. Moreover, climatic cycles (e.g., ENSO or other climate indices) can distort linear trend estimations over brief periods due to their cyclical nature.
The complexity is further compounded when analyzing trends from news data. Changes in reporting capacity, especially in remote areas, along with new communication technologies like satellite and social media, may introduce significant biases. The proliferation of the internet during the 1990s and 2000s has notably impacted flood event reporting (Gall et al., 2009; Kron et al., 2012; Delforge et al., 2023). Kron et al., 2012 illustrate well the challenges in building a hazard database with flood examples. These works underscore the necessity for standardized flood event definitions to mitigate discrepancies in reporting scales. In the case of news scraping, the framing by journalists can significantly alter the perceived frequency, spatial representation, and the type of events.
In conclusion, the total number of flood events is a highly relative figure. It is essential to acknowledge that while flood hazards are natural phenomena, flood disasters and their reporting are social phenomena with potentially distinct and diverging trend patterns. Given these complexities, attributing trends depicted in the news (i.e., social variables, not physical ones) to climate change or land use changes requires careful consideration.
G4. Analyses of GDP
The manuscript highlights the GDP as the primary driver of media attention. However, the boxes in Figure 5 do not seem to show any significant difference between the occurrence of floods for different GDP groups. So, to highlight a possible effect of GDP on media attention, it is vital to use GDP per capita (see GLC studies).
The population is a critical factor in media attention and hazard exposure. More densely populated cities should receive more media attention in the event of a flood. It is likely the primary factor explaining the spatial patterns in the dataset. It is likely to be correlated with GDP, as well as other factors such as elevation, distance to river or coast, or climate (see G5). Therefore, controlling that factor when investigating some effects is essential.
G5. Analyses of Flood Susceptibility
Figure 7 and the underlying analysis of flood susceptibility present some issues and do not bring much to the paper. The proposed pattern is not very neat (the points also overlap with no transparency), likely because the chosen indicators are quite remote proxies of flood susceptibility and should not be presented as acknowledged indicators in hydrology (the supporting references are weak).
Average daily precipitation depicts a hydrological equilibrium rather than an extreme event. Naturally, arid regions are less susceptible (also less populated, hence, exposed). However, the indicator becomes less relevant to other hydrological systems with higher precipitation averages (a mixture of blue and red dots). Likewise, elevated areas are also likely to be less populated and then less exposed, and the elevation effect tends to disappear at a lower elevation. Flow accumulation or topographical wetness indices could have been more reliable indicators of flood susceptibility.
I would recommend removing this analysis given its low informative value and also because these variables are related to climate variability, which is already pictured in Figure 12. See GLC studies for comparisons.
G6. Flood Events Dataset Resolution
While the final dataset is reported at the county-month level, the reader is left with little insight into the level of detail directly resulting from the information extraction process, which remains unclearly described. Based on Figures 4 and 6, it appears that information at the city-daily level was collected. It seems that a much more precise dataset could have been shared without much additional effort, raising questions about the motivation behind disaggregating the data to such a coarser level.
G7. Data Content, FAIR Principles, and Reusability
Also, given that a central outcome of the paper is a dataset, alignment with FAIR principles (https://www.go-fair.org/) should be particularly encouraged. Regarding the data shared, GitHub is not considered FAIR as it does not allow for persistent identifiers. Also, a few additional data could greatly increase the reusability of the dataset, e.g., precise column descriptions in the readme, the reference for the administrative unit shapefile to link the data with the post-code or administrative units as described in the paper (L275-278), using international time standards, and possibly translate region names to English to maximize reuse in the global context.
Regarding reproducibility, the data and code availability section could be improved. Input news data and their conditions of (re-)use are not described in this section. Tools and libraries being used to develop the approach are not referred to (except references to the Python "Re" module at L187). There is no comment about whether or not the developed models are accessible and under which conditions of use.
There are no links or references to the news articles that have been used to construct the dataset. Sharing the links could drastically increase the paper's outreach and support future research and NLP applications to extract additional information, such as flood impact variables or associated hazard types, without redeveloping an NLP flood event detection model. Annotated corpora are also valuable datasets in the context of NLP for future benchmarking. Consider commenting on that dataset as well.
References
Kirschbaum, D., Stanley, T., and Zhou, Y.: Spatial and temporal analysis of a global landslide catalog, Geomorphology, 249, 4–15, https://doi.org/10.1016/j.geomorph.2015.03.016, 2015.
Specific Comments
S1. L8: "similar" could be more nuanced.
S2. L9:10: "the connection between…": the connection does not support accuracy and the analysis is oversimplistic (See G5).
S3. L43 (and after): "natural disaster" is a controversial terminology often avoided by Disaster Risk experts, acknowledging that a disaster is not natural (as opposed to natural hazards).
S4. L43-L52: Table 2 could distinguish between catalogs from remote and social sensing, e.g., that DFO is based on remote sensing, EM-DAT on the collection of text documents and manual extraction of the information. Some missing recent initiatives could be worth mentioning, e.g., a global remote sensing catalog is the global flood database and a global catalog obtained from social media:
S5. L65: Beyond cloud cover for optical imagery, mapping urban flood is challenging per se.
S6. L75: "Yang et al. (2023)" Such a paper of high relevance should be rediscussed later in the discussion section, among others, to identify (see Overview).
S7. L77: The authors acknowledge the multi-hazard nature of floods here and after, but the issue is not discussed in light of their own work (see G2).
S8. L90: "Conditional Random Fields (CRF) layer" appears to be a central part of the methodology appearing multiple times in the paper; however, it lacks a clear explanation of what it is and why it is used.
S9. L110:116: since the paper follows a conventional structure, it is unnecessary to detail it in the introduction.
S10. Table 2: EM-DAT is continuously updated (see Delforge et al., 2023). I would also refer to the Global Flood Awareness System (https://global-flood.emergency.copernicus.eu/), the flood component of CEMS, instead of CEMS. See also S4.
S11. L134: check url link (404 error).
S12. Figure 1: I appreciate the availability of an example. However, consider selecting a more topic-appropriate example or asking for a where/when the question for more relevance.
S13. L142, L151, and L154: See G1.
S14. L145-148: The description of the data and its processing, including test/train split, may be confusing. It may be more appropriate to move to the method section.
S15. L157: "Validation" unless China Flood and Drought Bulletin is considered a gold standard, I think referring to comparative data and cross-comparison instead of validation is more appropriate.
S16. L168-L174: oversimplistic view of hydrology and weak references. See G5.
S17. L190-199: This section could indicate the total/train/test sample sizes more clearly.
S18. L235: words should be singular in "and does contain the words 'will'…". Also, I wonder if this approach successfully separated actual events from forecasts? Is there any language specificity in Chinese invoved here?
S19. Figure 3: Is [SEP] a requirement given the specificity of the Chinese language?
S20. L243: In the first sentence, correct "flood information extraction" into "(i) flood event detection and (ii) flood information extraction" for clarity.
S21. L259: it is not clear to me how Exact Match behaves in case of multiple locations, zero if any error? What is it clearly meant by the location data? City? County? How is location handled before the flood location recognition is explained in section 3.2? Perhaps 3.2 should be explained before.
S22. L276: consider adding the reference of the used administrative unit shapefile. See also G7.
S23. L285, section 4.1. The performance seems good in an absolute manner, but the reader has no clue how this performs in relation to the context of social sensing of flood or in the context of Chinese NLP. This is quite important to document.
S24. Figure 4: Bulletin seems more exhaustive. This could be discussed more and the authors could highlight better complementarities between data collection approaches, e. g., how would the proposed approach improve Chinese bulletin?
S25. L298-L308: The analysis of media attention due to GDP biases is not significat and do not control for the population bias (see G4).
S26. L313-314: The two case studies were selected as the author assumed a good coverage because of their important hazard magnitude and impact. This is a known bias and an issue worth mentioning, as small-impact disasters tend to be less well-covered and documented. See Kron et al., 2012, Gall et al. 2009, and Delforge et al. 2023 and references therein for more insights about hazard catalog biases.
S27. L328-339 + Figure 7. These selected indicators are bad proxies of flood susceptibility, and I do not see how this analysis validates something about the spatial distribution of floods (see G5). Consider removing.
S28. L340: how the information was structured prior to harmonizing the data into the urban flood dataset is unclear. See also G6.
S29. Figures 8 and 9, it would be great to have an additional column or a time series on the Y axis with the annual total. This could help identify pluriannual cycles as a result of climate indices. Consider adding the total number of occurrences and items in the figure caption.
S30. L354: "seasonality" instead of "climate's tendency" could be more appropriate.
S31. L390: "exposure" or "susceptibility" (the environmental side of vulnerability) is maybe more appropriate than vulnerability because the latter also encompasses social vulnerability.
S32. Maps Figures 10, 11, and 12 could be grouped into a multipanel figure for conciseness. Consider adding population density as well since it drives hazard exposure. DEM and river networks may also be considered as information to include (parsimoniously).
S33. L409: The comparison with other datasets is quite limited, and the Chinese bulletin seems more exhaustive if one can trace the original data. To what extent the proposed dataset fills gaps is thus not very well documented (see G1). Adding more than one catalog from Table 1 and 2 in Figure 4 for comparison can improve this discussion.
S34. L473: The data availability section does not include the input news data accessibility information. In line with HESS recommendations and FAIR standards, I also encourage the authors to share information about code and model availabilities.
S35. L414-L416: this sentence (and the section in general) looks like the authors do their best to fit in the context of climate change and urbanization, even excluding some peak values to retrieve a positive trend. Trends, in particular for disaster news, are much more complex than trends observed on physical variables and include important social drivers and biases. The discussion is oversimplified, and the authors should take more distance and inquire about the biases arising from social sensing of hazards. See G3 and references.
S36. L445: Perspectives are neither exhaustive nor detailed. Consider adding more relevant perspectives, differentiating those related to the method (NLP-detection, extraction) and those related to the valorization of the resulting dataset.
S37. L473: data and code availabilities: see G7.
S38. Table A2: Same as Figure 4. It may be removed, in my opinion.