Preprints
https://doi.org/10.5194/hess-2024-169
https://doi.org/10.5194/hess-2024-169
14 Jun 2024
 | 14 Jun 2024
Status: a revised version of this preprint is currently under review for the journal HESS.

A diversity centric strategy for the selection of spatio-temporal training data for LSTM-based streamflow forecasting

Everett Snieder and Usman T. Khan

Abstract. Deep learning models are increasingly being applied to streamflow forecasting problems. Their success is in part attributed to the large and hydrologically diverse datasets on which they are trained. However, common data selection methods fail to explicitly account for hydrological diversity contained within training data. In this research, clustering is used to characterise temporal and spatial diversity, in order to better understand the importance of hydrological diversity within regional training datasets. This study presents a novel, diversity-based resampling approach to creating hydrologically diverse datasets. First, the undersampling procedure is used to undersample temporal data, and is used to show how the amount of temporal data needed to train models can be halved without any loss in performance. Next, it is applied to reduce the number of basins in the training dataset. While basins cannot be omitted from training without some loss in performance, we show how hydrologically dissimilar basins are highly beneficial to model performance. This is shown empirically for Canadian basins; models trained to sets of basins separated by thousands of kilometres outperform models trained to localised clusters. We strongly recommend an approach to training data selection that encourages a broad representation of diverse hydrological processes.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Everett Snieder and Usman T. Khan

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on hess-2024-169', Anonymous Referee #1, 16 Jul 2024
    • AC1: 'Reply on RC1', Everett Snieder, 06 Sep 2024
  • RC2: 'Comment on hess-2024-169', Anonymous Referee #2, 16 Jul 2024
    • AC2: 'Reply on RC2', Everett Snieder, 06 Sep 2024
Everett Snieder and Usman T. Khan
Everett Snieder and Usman T. Khan

Viewed

Total article views: 474 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
306 109 59 474 39 12 11
  • HTML: 306
  • PDF: 109
  • XML: 59
  • Total: 474
  • Supplement: 39
  • BibTeX: 12
  • EndNote: 11
Views and downloads (calculated since 14 Jun 2024)
Cumulative views and downloads (calculated since 14 Jun 2024)

Viewed (geographical distribution)

Total article views: 454 (including HTML, PDF, and XML) Thereof 454 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 20 Nov 2024
Download
Short summary
Improving the accuracy of flood forecasts is paramount to minimising flood damage. Machine-learning models are increasingly being applied for flood forecasting. Such models are typically trained to large historic hydrometeorological datasets. In this work, we evaluate methods for selecting training datasets, that maximise the spatiotemproal diversity of the represented hydrological processes. Empirical results showcase the importance of hydrological diversity in training ML models.