Regionalization of IDF curves for mainland China: a comparative evaluation of machine learning versus spatial interpolation techniques

Jiang, Yuantian; Wang, Wenting; Fullhart, Andrew T.; Yu, Bofu; Chen, Bo

doi:10.5194/hess-30-2931-2026

Articles | Volume 30, issue 10

https://doi.org/10.5194/hess-30-2931-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/hess-30-2931-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 30, issue 10

Research article

|

18 May 2026

Research article |

| 18 May 2026

Regionalization of IDF curves for mainland China: a comparative evaluation of machine learning versus spatial interpolation techniques

Yuantian Jiang, Wenting Wang, Andrew T. Fullhart, Bofu Yu, and Bo Chen

Download

Final revised paper (published on 18 May 2026)
Supplement to the final revised paper
Preprint (discussion started on 23 Jul 2025)
Supplement to the preprint

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-3228', Anonymous Referee #1, 28 Jul 2025

The manuscript presents a comprehensive and methodologically sound comparison of traditional interpolation and machine learning methods for regionalizing Intensity Duration Frequency (IDF) curves across mainland China. The evaluation of five interpolation methods and five machine learning methods is a significant strength, demonstrating robust performance metrics and providing a valuable dataset for flood risk assessment and infrastructure planning. It is particularly noteworthy that the authors find machine learning methods, using only daily data, can achieve comparable accuracy to interpolation methods relying on hourly data, as it highlights the potential for IDF regionalization in regions lacking data coverage. The use of four representative IDF cases effectively captures the variability in prediction challenges across durations and return periods. The manuscript is well-organized, with a clear structure that guides readers through the methodology, results, and implications. However, the study could be strengthened by addressing some scientific gaps, such as the mechanisms behind machine learning's temporal downscaling, the reliability of results in data limited regions like the southwest, and the lack of comprehensive uncertainty quantification. Additionally, minor typographical errors and inconsistent figure formatting slightly detract from the presentation. Overall, this is a high quality study with significant contributions to hydrology and climate adaptation, but it requires minor revisions to enhance clarity, rigor, and practical applicability.
Specific comments are as follows:
The study demonstrates that ML models, like gradient boosting, can estimate sub-daily intensities from daily gridded data with accuracy comparable to interpolation methods using hourly data. However, the manuscript lacks a detailed explanation of how ML achieves this temporal downscaling. What specific features or model structures enable this capability? For example, are statistical features like daily extreme precipitation or skewness critical? A discussion or sensitivity analysis of key input variables (Table 1) would clarify this process.
Table 1 lists geographic coordinates, elevation, and precipitation statistics as independent variables for ML. Why were these variables chosen, and were other meteorological variables, such as temperature or humidity, tested? Given their potential influence on extreme precipitation, justifying their exclusion or inclusion would enhance the robustness of the ML approach.
Line 291, you mention it was repeated five times. Clarify if this was with or without replacement.
Section 2.1, the division of mainland China into four regions (NE, SE, NW, SW) is based on climate and topography, with the Eastern Monsoon region split along the Qinling-Huaihe line due to its heterogeneity. Was this subdivision sufficient to capture regional variability, particularly in the SE region with extreme precipitation? Could further sub-regionalization or alternative regionalization schemes improve model performance?
The study interpolates missing hourly data for gaps <12 hours and assigns zero for gaps ≥12 hours (beginning on line 157). How was the impact of this imputation strategy assessed, and what are its implications for IDF curve accuracy in regions with frequent missing data?
The SW region shows significantly lower accuracy (KGE as low as 0.31 for KED_AP and 0.14 for GB), attributed to sparse station density and complex topography. Given the lack of validation stations in parts of the NW and SW regions, how reliable are the IDF curves in these areas? Should users be explicitly cautioned against using these curves without further validation?
The manuscript notes that hyperparameter tuning via grid search did not significantly improve ML performance, so default settings were used. Why do you think tuning was ineffective? Were the default parameters near-optimal, or were the tuning ranges too narrow? Clarifying this would help readers assess the robustness of the ML models.
The introduction references non-stationarity in IDF curves due to climate change, but the methodology does not account for it (for example, different RCP scenarios). Were tests conducted to evaluate the impact of non-stationarity, particularly for long return periods, 100 or 1000 years? A brief discussion or analysis of this issue would align the study with current climate research trends.
The manuscript cites a high-resolution IDF dataset in the introduction for the Qinghai-Tibet Plateau (Ren et al., 2025). A quantitative comparison with this dataset in the SW region would benchmark the study’s results and highlight its unique contributions.
The IDF curves are provided at 0.1° and 0.5° resolutions, but their alignment with specific applications (such as urban drainage design, flood modeling) is unclear. Are these resolutions optimized for particular use cases, and how should users select between them? Providing guidance would enhance the dataset’s practical utility.
GB outperforms other ML methods, but the manuscript does not discuss its interpretability or the relative importance of input features. A feature importance analysis would provide insights into which variables drive performance, aiding future model development.
Inconsistent spacing in "machine learning" vs. "machinelearning" appears in several instances (for example lines 103, 408). Standardize to "machine learning."
Inconsistent spacing before references (for example, line 108, 110, 113). Check formatting.
Line 735: "Deepseek R1" clarify the tool’s name and provide a citation or link for transparency.
Figure 1: Include a description of the inset in the caption.
Figure 6: Standardize color scales across panels (a–d for KED_AP, e–h for GB) to facilitate direct comparisons. Ensure units (mm/h) are explicitly labeled in the caption or legend.
Figure 7: The caption mentions 500 samples but does not explain the sampling method (for example, bootstrap or Monte Carlo). Add a brief clarification.
Table 2 and Table 3: Ensure consistent formatting of numerical values (for example, PBIAS values should all include the % symbol). Add a footnote clarifying that negative PBIAS indicates underestimation.
The manuscript contains numerous acronyms and would benefit from a consolidated list or table of definitions.
Final recommendation: Minor revisions. The manuscript is of high quality and makes significant contributions to IDF curve regionalization and the scientific community, but addressing the specific comments (like clarifying temporal downscaling, quantifying uncertainty, and improving regional performance in the southwest) and technical corrections will enhance its scientific rigor and accessibility.

Citation: https://doi.org/10.5194/egusphere-2025-3228-RC1
- AC1: 'Reply on RC1', Wenting Wang, 28 Sep 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-3228/egusphere-2025-3228-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-3228-AC1
RC2:
'Comment on egusphere-2025-3228', Anonymous Referee #2, 23 Aug 2025
The study investigates two methodologies (i.e, Interpolation and Machine Learning (ML)) for regionalising Intensity-Duration-Frequency (IDF) curves across various time scales. The findings indicate that Machine Learning outperforms Interpolation, both in terms of accuracy and data requirements. Given its methodological rigour and practical relevance, the study holds significant importance for publication in HESS.
However, the following questions could enhance the depth of the manuscript:
With the possibility of extreme events predicted to increase in the future, why hasn’t the study considered duration less than 1 hour (i.e. 30 min)?

The study mentions using widely adopted ML methods from previous research. Given LSTM’s proven effectiveness in IDF study, why hasn’t it been considered?

What are the limitations of the study?

Is it possible to transfer the outcomes to other geographically similar regions? If so, what considerations or adaptations would be necessary?
Citation: https://doi.org/10.5194/egusphere-2025-3228-RC2
- AC2: 'Reply on RC2', Wenting Wang, 28 Sep 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-3228/egusphere-2025-3228-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-3228-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (27 Nov 2025) by Lelys Bravo de Guenni

AR by Wenting Wang on behalf of the Authors (06 Dec 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (13 Jan 2026) by Lelys Bravo de Guenni

RR by Anonymous Referee #1 (17 Jan 2026)

RR by Anonymous Referee #2 (31 Mar 2026)

ED: Publish as is (21 Apr 2026) by Lelys Bravo de Guenni

AR by Wenting Wang on behalf of the Authors (27 Apr 2026) Manuscript

Short summary

Intensity-Duration-Frequency (IDF) curves is important for designing infrastructure that can withstand floods. We compared traditional interpolation methods with machine learning to map these curves across mainland China. ML using widely available daily gridded data can estimate sub-daily intensity as accurately as methods needing rarer hourly site data. This study provides a valuable understanding for IDF in data-limited regions and generates a new IDF dataset for mainland China.