27 May 2024
 | 27 May 2024
Status: this preprint is currently under review for the journal HESS.

Extracting Spatiotemporal Flood Information from News Texts Using Machine Learning for a National Dataset in China

Shengnan Fu, David M. Schultz, Heng Lyu, Zhonghua Zheng, and Chi Zhang

Abstract. Urban floods present a threat in China, demanding an understanding of their spatiotemporal distribution. Current flood datasets primarily offer provincial-scale insights and lack temporal continuity, which leads to a challenge in detailed analysis. To create a consistent national dataset of flood events, this study introduces a machine learning framework by applying online news media as a primary data source to construct a county-level dataset of urban flood events from 2000 to 2022. Using the Bidirectional Encoder Representations from Transformers (BERT) model, we achieved robust performance in information extraction, with an F1 score of 0.86 and an exact match score of 0.82. Further, a combined model of Bidirectional Long Short-term Memory (BiLSTM) networks with a Conditional Random Field (CRF) layer effectively identified flood locations. Our analysis reveals that the temporal trend of flooded cities in the news-based dataset is similar to the China Flood and Drought Bulletin. Furthermore, the consistency of flood events in the news with the typhoon trajectory in two cases, and the connection between flood occurrences and flood conditioning factors, confirm the accuracy of spatial distribution. The validated news-based dataset analyzes urban floods in China from both temporal and spatial perspectives. First, this dataset shows the seasonal characteristics of flood events, which are concentrated in the summer. From 2000 to 2022, the peak year for floods was 2010, and excluding the influence of peak year, the overall temporal trend of total flood occurrence shows an increase. Spatially, the distribution of floods decreases from southeast to northwest, with Guangxi Province having the highest number of floods. Additionally, the Yangtze and Pearl River basins are most frequently affected by urban floods. The subtropical climate zone is the most susceptible to flooding. This study provides an automated and effective method for constructing a national flood event dataset and reveals the spatiotemporal characteristics of urban flooding in China.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Shengnan Fu, David M. Schultz, Heng Lyu, Zhonghua Zheng, and Chi Zhang

Status: open (until 22 Jul 2024)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Shengnan Fu, David M. Schultz, Heng Lyu, Zhonghua Zheng, and Chi Zhang

Data sets

China-urban-flood-dataset Shengnan Fu, David M. Schultz, Heng Lyu, Zhonghua Zheng, and Chi Zhang

Shengnan Fu, David M. Schultz, Heng Lyu, Zhonghua Zheng, and Chi Zhang


Total article views: 210 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
170 32 8 210 5 6
  • HTML: 170
  • PDF: 32
  • XML: 8
  • Total: 210
  • BibTeX: 5
  • EndNote: 6
Views and downloads (calculated since 27 May 2024)
Cumulative views and downloads (calculated since 27 May 2024)

Viewed (geographical distribution)

Total article views: 198 (including HTML, PDF, and XML) Thereof 198 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 16 Jun 2024
Short summary
To address the lack of county-level flood dataset in China, we used machine learning techniques to accurately identify flood events and locations from news reports. This dataset offers crucial insights into the spatiotemporal distribution of urban flooding from 2000 to 2022, highlighting increases in flood occurrences and identifying key vulnerable areas. These findings are vital for enhancing urban planning in China, aiming to mitigate the impact of future floods.