09 Sep 2022
 | 09 Sep 2022
Status: a revised version of this preprint is currently under review for the journal HESS.

Streamflow Estimation in Ungauged Regions using Machine Learning: Quantifying Uncertainties in Geographic Extrapolation

Manh-Hung Le, Hyunglok Kim, Stephen Adam, Hong Xuan Do, Peter Beling, and Venkataraman Lakshmi

Abstract. The majority of ungauged regions around the world are in protected areas and rivers with non-perennial flow regimes, which are vital to water security and conservation. There is a limited amount of ground data available in such regions, making it difficult to obtain streamflow information. This study examines how in situ streamflow datasets in data- rich regions can be used to extrapolate streamflow information into regions with poor data availability. These data-rich regions include North America (987 catchments), South America (813 catchments), and Western Europe (457 catchments). South Africa and Central Asia are defined as data-poor regions. We obtained 81 catchments and 133 catchments for these two data-poor regions, respectively, and assumed they are pseudo ungauged regions for our analysis. We trained machine learning (ML) algorithms using climate and catchments attributes input variables in data-rich (i.e., source) regions and analyzed the possibility of using these pre-trained ML models to estimate climatological monthly streamflow over data-poor (i.e., target) regions. We found that including diverse climate and catchment attributes in training data sets can greatly improve ML algorithms' performance regardless of significant geographical distance between input datasets. The pre-trained ML models over North America and South America could be used effectively to estimate streamflow over data-poor regions. This study provides insight into the selection of input datasets and ML algorithms with different sets of hyperparameters for a geographic streamflow extrapolation.

Manh-Hung Le et al.

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • CC1: 'Comment on hess-2022-320', Alex Sun, 14 Sep 2022
    • CC2: 'Reply on CC1', Manh-Hung Le, 05 Nov 2022
  • RC1: 'Comment on hess-2022-320', Anonymous Referee #1, 08 Oct 2022
    • AC1: 'Reply on RC1', Hyunglok Kim, 19 Dec 2022
  • RC2: 'Comment on hess-2022-320', Anonymous Referee #2, 20 Oct 2022
    • AC2: 'Reply on RC2', Hyunglok Kim, 19 Dec 2022
  • RC3: 'Comment on hess-2022-320', Anonymous Referee #3, 21 Oct 2022
    • AC3: 'Reply on RC3', Hyunglok Kim, 19 Dec 2022

Manh-Hung Le et al.

Manh-Hung Le et al.


Total article views: 1,235 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
838 373 24 1,235 54 4 6
  • HTML: 838
  • PDF: 373
  • XML: 24
  • Total: 1,235
  • Supplement: 54
  • BibTeX: 4
  • EndNote: 6
Views and downloads (calculated since 09 Sep 2022)
Cumulative views and downloads (calculated since 09 Sep 2022)

Viewed (geographical distribution)

Total article views: 1,142 (including HTML, PDF, and XML) Thereof 1,142 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 14 May 2023
Short summary
Limited ground data makes streamflow information difficult to obtain in ungauged regions. We demonstrate how data-rich areas (North America, South America, and Western Europe) can provide streamflow information to data-poor areas (South Africa, Central Asia). By using machine learning algorithms, we observed diverse climate and catchment attributes that could be useful for our demonstration. In this study, we attempt to understand the uncertainty associated with geographic extrapolation.