09 Aug 2022
09 Aug 2022
Status: a revised version of this preprint is currently under review for the journal HESS.

Benchmarking High-Resolution, Hydrologic Performance of Long-Term Retrospectives in the United States

Erin Towler1, Sydney S. Foks2, Aubrey L. Dugger1, Jesse E. Dickinson3, Hedeff I. Essaid4, David Gochis1, Roland J. Viger2, and Yongxin Zhang1 Erin Towler et al.
  • 1National Center for Atmospheric Research (NCAR), Boulder, CO, USA
  • 2U.S. Geological Survey (USGS), Lakewood, CO, USA
  • 3U.S. Geological Survey, Arizona Water Science Center, Tucson, AZ, USA
  • 4U.S. Geological Survey, Moffett Field, CA, USA

Abstract. As high-resolution hydrologic models become more widespread, there is a pressing need for systematic evaluation and documentation of their performance. This paper develops and demonstrates a benchmark statistical design that evaluates the long-term performance of two process-oriented, high-resolution, continental-scale hydrologic models that have been developed to assess water availability and risks in the United States (US): the National Water Model v2.1 application of WRF-Hydro (NWMv2.1) and the National Hydrologic Model v1.0 application of the Precipitation-Runoff Modeling System (NHMv1.0). The evaluation is performed on 5,390 streamflow gages from 1983 to 2016 (~33 years) at a daily time step, including both natural and human-impacted catchments, representing one of the most comprehensive evaluations over the conterminous US. The benchmark consists of a suite of metrics for overall performance, their components, and hydrologic-specific signatures. Overall, the model applications show similar performance, with better performance at sites that are less disturbed by human activities, particularly in the West. Both model applications exhibit better performance in the Northeast, Southeast, Pacific Northwest, and high elevation sites in the West. Relatively worse performance is found in the Central region, Southwest, and lower-elevation West. Both models overestimate streamflow volumes at disturbed gages in the West, which could be attributed to not accounting for human activities, such as active management. Both models underestimate flow variability, especially the highest flows. The model applications showed differences in estimation of low flows, with consistent overestimation by the NWMv2.1, and both over- and under-estimation by the NHMv1.0. This benchmark provides a baseline to document performance and measure the evolution of each model application.

Erin Towler et al.

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on hess-2022-276', Anonymous Referee #1, 11 Aug 2022
    • AC1: 'Reply on RC1', Erin Towler, 16 Sep 2022
    • AC2: 'General Response to All Reviewers', Erin Towler, 15 Nov 2022
    • AC3: 'RC1 Point-by-Point Response', Erin Towler, 15 Nov 2022
  • RC2: 'Comment on hess-2022-276', Robert Chlumsky, 29 Sep 2022
  • RC3: 'Comment on hess-2022-276', Anonymous Referee #3, 03 Oct 2022

Erin Towler et al.

Erin Towler et al.


Total article views: 900 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
659 220 21 900 40 5 6
  • HTML: 659
  • PDF: 220
  • XML: 21
  • Total: 900
  • Supplement: 40
  • BibTeX: 5
  • EndNote: 6
Views and downloads (calculated since 09 Aug 2022)
Cumulative views and downloads (calculated since 09 Aug 2022)

Viewed (geographical distribution)

Total article views: 844 (including HTML, PDF, and XML) Thereof 844 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 28 Jan 2023
Short summary
Models that have been developed to assess water availability and risks need to be evaluated for skill and utility. This study documents baseline performance for two hydrologic models that simulate streamflow across the continental United States. Both models show similar overall performance, and better performance at sites that are less disturbed by human activities. The models showed their biggest differences in their estimates of low streamflows.