Preprints
https://doi.org/10.5194/hess-2021-65
https://doi.org/10.5194/hess-2021-65

  01 Mar 2021

01 Mar 2021

Review status: this preprint is currently under review for the journal HESS.

Gaining Hydrological Insights Through Wilk's Feature Importance: A Test-Statistic Interpretation method for Reliable and Robust Inference

Kailong Li1, Guohe Huang1, and Brian Baetz2 Kailong Li et al.
  • 1Faculty of Engineering, University of Regina, Regina, Saskatchewan, Canada S4S 0A2
  • 2Department of Civil Engineering, McMaster University, Hamilton, Ontario, Canada L8S 4L8

Abstract. Feature importance has been a popular approach for machine learning models to investigate the relative significance of model predictors. In this study, we developed a Wilk's feature importance (WFI) method for hydrological inference. Compared with conventional feature importance methods such as permutation feature importance (PFI) and mean decrease in impurity (MDI), the proposed WFI aims to provide more reliable importance scores that could partially address the equifinality problem in hydrology. To achieve this, the WFI measures the importance scores based on Wilk's Ʌ (a test-statistic that can be used to distinguish the differences between two or more groups of variables) throughout a decision tree. The WFI has an advantage over PFI and MDI as it does not account for predictive accuracy so the risk of overfitting will be greatly reduced. The proposed WFI was applied to three interconnected irrigated watersheds located in the Yellow River Basin, China. By employing the recursive feature elimination approach, our results indicated that the WFI could generate more stable relative importance scores in response to the reduction of irrelevant predictors, as compared with PFI and MDI embedded in three different machine learning algorithms. In addition, the comparative study also shows that the predictors identified by WFI achieved the highest predictive accuracy on the testing dataset, which indicates the proposed WFI could identify more informative predictors among many irrelevant ones. We also extended the WFI to the local importance scores for reflecting the varying characteristics of a predictor in the hydrological processes. The related findings could help to gain insights into different hydrological behaviours.

Kailong Li et al.

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on hess-2021-65', Anonymous Referee #1, 01 Apr 2021
    • AC1: 'Reply on RC1', kailong Li, 26 Apr 2021
  • RC2: 'Comment on hess-2021-65', Anonymous Referee #2, 09 Apr 2021
    • AC2: 'Reply on RC2', kailong Li, 26 Apr 2021

Kailong Li et al.

Data sets

Gaining Hydrological Insights Through Wilk's Feature Importance: A Test-Statistic Interpretation method for Reliable and Robust Inference Kailong Li https://doi.org/10.5281/zenodo.4568482

Model code and software

Gaining Hydrological Insights Through Wilk's Feature Importance: A Test-Statistic Interpretation method for Reliable and Robust Inference Kailong Li https://doi.org/10.5281/zenodo.4568482

Kailong Li et al.

Viewed

Total article views: 420 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
344 66 10 420 20 1 2
  • HTML: 344
  • PDF: 66
  • XML: 10
  • Total: 420
  • Supplement: 20
  • BibTeX: 1
  • EndNote: 2
Views and downloads (calculated since 01 Mar 2021)
Cumulative views and downloads (calculated since 01 Mar 2021)

Viewed (geographical distribution)

Total article views: 375 (including HTML, PDF, and XML) Thereof 375 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 18 Jun 2021
Download
Short summary
We proposed a test-statistic method to quantify the importance of predictors for decision-tree-based hydrological models. The proposed method does not account for model predictive accuracy when estimates the importance, such that an unbiased hydrological inference could be obtained. Compared with conventional feature importance methods, the most relevant predictors identified by the proposed method could be more informative in explaining the hydrological process in terms of model performance.