Preprints
https://doi.org/10.5194/hess-2021-481
https://doi.org/10.5194/hess-2021-481

  22 Sep 2021

22 Sep 2021

Review status: this preprint is currently under review for the journal HESS.

Parsimonious statistical learning models for low flow estimation

Johannes Laimighofer1, Michael Melcher2, and Gregor Laaha1 Johannes Laimighofer et al.
  • 1Institute of Statistics, University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
  • 2Institute of Information Management, FH JOANNEUM – University of Applied Sciences, Graz, Austria

Abstract. Statistical learning methods offer a promising approach for low flow regionalization. We examine seven statistical learning models (lasso, linear and non-linear model based boosting, sparse partial least squares, principal component regression, random forest, and support vector machine regression) for the prediction of winter and summer low flow based on a hydrological diverse dataset of 260 catchments in Austria. In order to produce sparse models we adapt the recursive feature elimination for variable preselection and propose to use three different variable ranking methods (conditional forest, lasso and linear model based boosting) for each of the prediction models. Results are evaluated for the low flow characteristic Q95 (Pr(Q>Q95) = 0.95) standardized by catchment area using a repeated nested cross validation scheme. We found a generally high prediction accuracy for winter (R2CV of 0.66 to 0.7) and summer (R2CV of 0.83 to 0.86). The models perform similar or slightly better than a Top-kriging model that constitutes the current benchmark for the study area. The best performing models are support vector machine regression (winter) and non-linear model based boosting (summer), but linear models exhibit similar prediction accuracy. The use of variable preselection can significantly reduce the complexity of all models with only a small loss of performance. The so obtained learning models are more parsimonious, thus easier to interpret and more robust when predicting at ungauged sites. A direct comparison of linear and non-linear models reveals that non-linear relationships can be sufficiently captured by linear learning models, so there is no need to use more complex models or to add non-liner effects. When performing low flow regionalization in a seasonal climate, the temporal stratification into summer and winter low flows was shown to increase the predictive performance of all learning models, offering an alternative to catchment grouping that is recommended otherwise.

Johannes Laimighofer et al.

Status: open (until 17 Nov 2021)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on hess-2021-481', Anonymous Referee #1, 15 Oct 2021 reply

Johannes Laimighofer et al.

Johannes Laimighofer et al.

Viewed

Total article views: 413 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
345 62 6 413 4 2
  • HTML: 345
  • PDF: 62
  • XML: 6
  • Total: 413
  • BibTeX: 4
  • EndNote: 2
Views and downloads (calculated since 22 Sep 2021)
Cumulative views and downloads (calculated since 22 Sep 2021)

Viewed (geographical distribution)

Total article views: 398 (including HTML, PDF, and XML) Thereof 398 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 21 Oct 2021
Download
Short summary
The aim of this study is to predict long term averages of low flow on a hydrological diverse dataset in Austria. We compared seven statistical learning methods and included a backward variable selection approach. We found that separating the low flow processes in winter and summer low flow leads to good performance for all models. Variable selection results in parsimonious and more interpretable models. Linear approaches for prediction and variable selection are sufficient for our dataset.