21 Nov 2022
21 Nov 2022
Status: this preprint is currently under review for the journal HESS.

Comparing machine learning and deep learning models for probabilistic post-processing of satellite precipitation-driven streamflow simulation

Yuhang Zhang1, Aizhong Ye1, Phu Nguyen2, Bita Analui2, Soroosh Sorooshian2, Kuolin Hsu2, and Yuxuan Wang3 Yuhang Zhang et al.
  • 1State Key Laboratory of Earth Surface Processes and Resource Ecology, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
  • 2Center for Hydrometeorology and Remote Sensing, Department of Civil and Environmental Engineering, University of California, Irvine, Irvine, California, CA 92697, USA
  • 3College of Arts and Sciences, University of Virginia, Charlottesville, Virginia, 22903, USA

Abstract. Deep learning (DL) models are popular but computationally expensive, machine learning (ML) models are old-fashioned but more efficient. Their differences in hydrological probabilistic post-processing are not clear at the moment. This study conducts a systematic model comparison between the quantile regression forest (QRF) model and probabilistic long short-term memory (PLSTM) model as hydrological probabilistic post-processors. Specifically, we compare these two models to deal with the biased streamflow simulation driven by three kinds of satellite precipitation products in 522 sub-basins of Yalong River basin of China. Model performance is comprehensively assessed by a series of scoring metrics from the probabilistic and deterministic perspectives, respectively. In general, the QRF model and the PLSTM model are comparable in terms of probabilistic prediction. Their performance is closely related to the flow accumulation area of the sub-basin. For sub-basins with flow accumulation area less than 60,000 km2, the QRF model outperforms the PLSTM model in most of the sub-basins. For sub-basins with flow accumulation area larger than 60,000 km2, the PLSTM model has an undebatable advantage. In terms of deterministic predictions, the PLSTM model should be more preferred than the QRF model, especially when the raw streamflow is poorly simulated and used as an input. But if we put aside the model performance, the QRF model is more efficient in all cases, saving half the time than the PLSTM model. This study can deepen our understanding of ML and DL models in hydrological post-processing and enable more appropriate model selection in practice.

Yuhang Zhang et al.

Status: open (until 16 Jan 2023)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse

Yuhang Zhang et al.

Yuhang Zhang et al.


Total article views: 332 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
260 69 3 332 9 1 1
  • HTML: 260
  • PDF: 69
  • XML: 3
  • Total: 332
  • Supplement: 9
  • BibTeX: 1
  • EndNote: 1
Views and downloads (calculated since 21 Nov 2022)
Cumulative views and downloads (calculated since 21 Nov 2022)

Viewed (geographical distribution)

Total article views: 312 (including HTML, PDF, and XML) Thereof 312 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 30 Nov 2022
Short summary
We compared probabilistic long short-term memory (PLSTM) model and quantile regression forest model (QRF). The results show the QRF model is more efficient, taking only half the time of the PLSTM model to do all the experiments in terms of model efficiency, the QRF model and the PLSTM model are comparable in terms of probabilistic (multi-point) prediction, the QRF model performs better in small watersheds and the PLSTM model performs better in large watersheds.