Preprints
https://doi.org/10.5194/hess-2021-642
https://doi.org/10.5194/hess-2021-642
 
17 Jan 2022
17 Jan 2022
Status: a revised version of this preprint was accepted for the journal HESS and is expected to appear here in due course.

A two-step merging strategy for incorporating multi-source precipitation products and gauge observations using machine learning classification and regression over China

Huajin Lei1, Hongyu Zhao2, and Tianqi Ao1 Huajin Lei et al.
  • 1State Key Laboratory of Hydraulics and Mountain River Engineering, College of Water Resource and Hydropower, Sichuan University, Chengdu 610065, China
  • 2State Key Laboratory of Earth Surface Processes and Resource Ecology, Beijing Normal University, Beijing 100875, China

Abstract. Although many multi-source precipitation products (MSPs) with high spatio-temporal resolution have been extensively used in water cycle research, they are still subject to considerable uncertainties due to the spatial variability of terrain. Effective detection of precipitation occurrence is the key to enhancing precipitation accuracy. This study presents a two-step merging strategy to incorporate MSPs (GSMaP, IMERG, TMPA, PERSIANN-CDR, CMORPH, CHIRPS, and ERA-Interim) and rain gauges to improve the precipitation capture capacity and precipitation intensity simultaneously during 2000–2017 over China. Multiple environment variables and the spatial autocorrelation between precipitation observations are selected as auxiliary variables in the merging process. Three machine learning (ML) classification and regression models, including gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), and random forest (RF), are adopted and compared. The strategy first employs classification models to identify wet and dry days in warm and cold seasons, then combines regression models to predict precipitation amounts based on wet days. The results are also compared with those of traditional methods, including multiple linear regression (MLR), ML regression models, and gauge-based Kriging interpolation. A total of 1680 (70 %) rain gauges are randomly chosen for model training and 692 (30 %) for performance evaluation. The results show that: (1) The multi-sources merged precipitation products (MSMPs) perform better than original MSPs in detecting precipitation occurrence under different intensities, followed by Kriging. The average Heidke Skill Score (HSS) of MSPs, Kriging, and MSMPs is 0.30–0.69, 0.71, 0.79–0.8, respectively. (2) The proposed method significantly alleviates the bias and deviation of original MSPs in temporal and spatial. The MSMPs strongly correlate with gauge observations with the CC of 0.85. Moreover, the modified Kling-Gupta efficiency (KGE) improves by 17 %–62 % (MSMPs: 0.74–0.76) compared with MSPs (0.34–0.65). (3) The spatial autocorrelation factor (KP) is the most important variable in models, which contributes considerably to improving the model accuracy. (4) The proposed method outperforms MLR and ML regression models, and XGBoost algorithm is more recommended for large-scale data merging owing to its high computational efficiency. This study provides a robust and reliable method to improve the performance of precipitation data with full consideration of multi-source information. This method could be applied globally and produce large-scale precipitation products if rain gauges are available.

Huajin Lei et al.

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on hess-2021-642', Anonymous Referee #1, 07 Mar 2022
    • AC1: 'Reply on RC1', huajin Lei, 06 Apr 2022
  • RC2: 'Comment on hess-2021-642', Oscar Manuel Baez Villanueva, 11 Mar 2022
    • AC2: 'Reply on RC2', huajin Lei, 06 Apr 2022

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on hess-2021-642', Anonymous Referee #1, 07 Mar 2022
    • AC1: 'Reply on RC1', huajin Lei, 06 Apr 2022
  • RC2: 'Comment on hess-2021-642', Oscar Manuel Baez Villanueva, 11 Mar 2022
    • AC2: 'Reply on RC2', huajin Lei, 06 Apr 2022

Huajin Lei et al.

Huajin Lei et al.

Viewed

Total article views: 704 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
573 114 17 704 6 8
  • HTML: 573
  • PDF: 114
  • XML: 17
  • Total: 704
  • BibTeX: 6
  • EndNote: 8
Views and downloads (calculated since 17 Jan 2022)
Cumulative views and downloads (calculated since 17 Jan 2022)

Viewed (geographical distribution)

Total article views: 633 (including HTML, PDF, and XML) Thereof 633 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 26 May 2022
Download
Short summary
How to combine multi-source precipitation data effectively is one of the hot topics in hydrometeorological research. This study presents a two-step merging strategy based on machine learning to merge multi-source precipitation over China. The results demonstrate that the proposed method effectively distinguishes the occurrence of precipitation events and reduces the error of precipitation amounts. XGBoost is more efficient and has referential significance for other areas with large data amounts.