09 Sep 2022
09 Sep 2022
Status: this preprint is currently under review for the journal HESS.

Streamflow Estimation in Ungauged Regions using Machine Learning: Quantifying Uncertainties in Geographic Extrapolation

Manh-Hung Le1,2,a, Hyunglok Kim3,4,a, Stephen Adam5, Hong Xuan Do6,7, Peter Beling5, and Venkataraman Lakshmi8 Manh-Hung Le et al.
  • 1Hydrological Sciences Laboratory, NASA Goddard Space Flight Center, Greenbelt, MD 20771, USA
  • 2Science Applications International Corporation, Greenbelt, MD 20771, USA
  • 3USDA Hydrology and Remote Sensing Laboratory, Beltsville, MD 20705, USA
  • 4Oak Ridge Institute for Science and Education, Oak Ridge, TN 37830, USA
  • 5Virginia Tech National Security Institute, Arlington, VA 22201, USA
  • 6Faculty of Environment and Natural Resources, Nong Lam University - Ho Chi Minh City, Ho Chi Minh City, 700000 Vietnam
  • 7Center for Technology Business Incubation, Nong Lam University - Ho Chi Minh City, Ho Chi Minh City, 700000 Vietnam
  • 8Department of Engineering Systems and Environment, University of Virginia, Charlottesville, VA 22904, USA
  • aformer affiliation at Department of Engineering Systems and Environment, University of Virginia, Charlottesville, VA 22904, USA

Abstract. The majority of ungauged regions around the world are in protected areas and rivers with non-perennial flow regimes, which are vital to water security and conservation. There is a limited amount of ground data available in such regions, making it difficult to obtain streamflow information. This study examines how in situ streamflow datasets in data- rich regions can be used to extrapolate streamflow information into regions with poor data availability. These data-rich regions include North America (987 catchments), South America (813 catchments), and Western Europe (457 catchments). South Africa and Central Asia are defined as data-poor regions. We obtained 81 catchments and 133 catchments for these two data-poor regions, respectively, and assumed they are pseudo ungauged regions for our analysis. We trained machine learning (ML) algorithms using climate and catchments attributes input variables in data-rich (i.e., source) regions and analyzed the possibility of using these pre-trained ML models to estimate climatological monthly streamflow over data-poor (i.e., target) regions. We found that including diverse climate and catchment attributes in training data sets can greatly improve ML algorithms' performance regardless of significant geographical distance between input datasets. The pre-trained ML models over North America and South America could be used effectively to estimate streamflow over data-poor regions. This study provides insight into the selection of input datasets and ML algorithms with different sets of hyperparameters for a geographic streamflow extrapolation.

Manh-Hung Le et al.

Status: open (until 04 Nov 2022)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • CC1: 'Comment on hess-2022-320', Alex Sun, 14 Sep 2022 reply

Manh-Hung Le et al.

Manh-Hung Le et al.


Total article views: 473 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
340 126 7 473 18 1 3
  • HTML: 340
  • PDF: 126
  • XML: 7
  • Total: 473
  • Supplement: 18
  • BibTeX: 1
  • EndNote: 3
Views and downloads (calculated since 09 Sep 2022)
Cumulative views and downloads (calculated since 09 Sep 2022)

Viewed (geographical distribution)

Total article views: 451 (including HTML, PDF, and XML) Thereof 451 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 28 Sep 2022
Short summary
Limited ground data makes streamflow information difficult to obtain in ungauged regions. We demonstrate how data-rich areas (North America, South America, and Western Europe) can provide streamflow information to data-poor areas (South Africa, Central Asia). By using machine learning algorithms, we observed diverse climate and catchment attributes that could be useful for our demonstration. In this study, we attempt to understand the uncertainty associated with geographic extrapolation.