Preprints
https://doi.org/10.5194/hess-2024-169
https://doi.org/10.5194/hess-2024-169
14 Jun 2024
 | 14 Jun 2024
Status: a revised version of this preprint was accepted for the journal HESS and is expected to appear here in due course.

A diversity centric strategy for the selection of spatio-temporal training data for LSTM-based streamflow forecasting

Everett Snieder and Usman T. Khan

Abstract. Deep learning models are increasingly being applied to streamflow forecasting problems. Their success is in part attributed to the large and hydrologically diverse datasets on which they are trained. However, common data selection methods fail to explicitly account for hydrological diversity contained within training data. In this research, clustering is used to characterise temporal and spatial diversity, in order to better understand the importance of hydrological diversity within regional training datasets. This study presents a novel, diversity-based resampling approach to creating hydrologically diverse datasets. First, the undersampling procedure is used to undersample temporal data, and is used to show how the amount of temporal data needed to train models can be halved without any loss in performance. Next, it is applied to reduce the number of basins in the training dataset. While basins cannot be omitted from training without some loss in performance, we show how hydrologically dissimilar basins are highly beneficial to model performance. This is shown empirically for Canadian basins; models trained to sets of basins separated by thousands of kilometres outperform models trained to localised clusters. We strongly recommend an approach to training data selection that encourages a broad representation of diverse hydrological processes.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Everett Snieder and Usman T. Khan

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on hess-2024-169', Anonymous Referee #1, 16 Jul 2024
    • AC1: 'Reply on RC1', Everett Snieder, 06 Sep 2024
  • RC2: 'Comment on hess-2024-169', Anonymous Referee #2, 16 Jul 2024
    • AC2: 'Reply on RC2', Everett Snieder, 06 Sep 2024

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on hess-2024-169', Anonymous Referee #1, 16 Jul 2024
    • AC1: 'Reply on RC1', Everett Snieder, 06 Sep 2024
  • RC2: 'Comment on hess-2024-169', Anonymous Referee #2, 16 Jul 2024
    • AC2: 'Reply on RC2', Everett Snieder, 06 Sep 2024
Everett Snieder and Usman T. Khan
Everett Snieder and Usman T. Khan

Viewed

Total article views: 494 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
319 115 60 494 42 13 12
  • HTML: 319
  • PDF: 115
  • XML: 60
  • Total: 494
  • Supplement: 42
  • BibTeX: 13
  • EndNote: 12
Views and downloads (calculated since 14 Jun 2024)
Cumulative views and downloads (calculated since 14 Jun 2024)

Viewed (geographical distribution)

Total article views: 474 (including HTML, PDF, and XML) Thereof 474 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 13 Dec 2024
Download
Short summary
Improving the accuracy of flood forecasts is paramount to minimising flood damage. Machine-learning models are increasingly being applied for flood forecasting. Such models are typically trained to large historic hydrometeorological datasets. In this work, we evaluate methods for selecting training datasets, that maximise the spatiotemproal diversity of the represented hydrological processes. Empirical results showcase the importance of hydrological diversity in training ML models.