Preprints
https://doi.org/10.5194/hess-2024-279
https://doi.org/10.5194/hess-2024-279
18 Sep 2024
 | 18 Sep 2024
Status: this preprint is currently under review for the journal HESS.

Technical note: How many models do we need to simulate hydrologic processes across large geographical domains?

Wouter J. M. Knoben, Ashwin Raman, Gaby J. Gründemann, Mukesh Kumar, Alain Pietroniro, Chaopeng Shen, Yalan Song, Cyril Thébault, Katie van Werkhoven, Andrew W. Wood, and Martyn P. Clark

Abstract. Robust large-domain predictions of water availability and threats require models that work well across different basins in the model domain. It is currently common to express a model's accuracy through aggregated efficiency scores such as the Nash-Sutcliffe Efficiency and Kling-Gupta Efficiency, and these scores often form the basis to select among competing models. However, recent work has shown that such scores are subject to considerable sampling uncertainty: the exact selection of time steps used to calculate the scores can have large impacts on the scores obtained. Here we explicitly account for this sampling uncertainty to determine the number of models that are needed to simulate hydrologic processes across large spatial domains. Using a selection of 36 conceptual models and 559 basins, our results show that model equifinality, the fact that very different models can produce simulations with very similar accuracy, makes it very difficult to unambiguously select one model over another. If models were selected based on their validation KGE scores alone, almost every model would be selected as the best model in at least some basins. When sampling uncertainty is accounted for, this number drops to 4 models being needed to cover 95% of investigated basins, and 10 models being needed to cover all basins. We obtain similar conclusions for an objective function focused on low flows. These results suggests that, under the conditions typical of many current modeling studies, there is limited evidence that using a wide variety of different models leads to appreciable differences in simulation accuracy compared to using a smaller number of carefully chosen models.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Wouter J. M. Knoben, Ashwin Raman, Gaby J. Gründemann, Mukesh Kumar, Alain Pietroniro, Chaopeng Shen, Yalan Song, Cyril Thébault, Katie van Werkhoven, Andrew W. Wood, and Martyn P. Clark

Status: open (until 20 Nov 2024)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on hess-2024-279', Anonymous Referee #1, 15 Oct 2024 reply
Wouter J. M. Knoben, Ashwin Raman, Gaby J. Gründemann, Mukesh Kumar, Alain Pietroniro, Chaopeng Shen, Yalan Song, Cyril Thébault, Katie van Werkhoven, Andrew W. Wood, and Martyn P. Clark
Wouter J. M. Knoben, Ashwin Raman, Gaby J. Gründemann, Mukesh Kumar, Alain Pietroniro, Chaopeng Shen, Yalan Song, Cyril Thébault, Katie van Werkhoven, Andrew W. Wood, and Martyn P. Clark

Viewed

Total article views: 405 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
334 63 8 405 3 4
  • HTML: 334
  • PDF: 63
  • XML: 8
  • Total: 405
  • BibTeX: 3
  • EndNote: 4
Views and downloads (calculated since 18 Sep 2024)
Cumulative views and downloads (calculated since 18 Sep 2024)

Viewed (geographical distribution)

Total article views: 388 (including HTML, PDF, and XML) Thereof 388 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 17 Nov 2024
Download
Short summary
Hydrologic models are needed to provide simulations of water availability, floods and droughts. The accuracy of these simulations is often quantified with so-called performance scores. A common thought is that different models are more or less applicable to different landscapes, depending on how the model works. We show that performance scores are not helpful in distinguishing between different models, and thus cannot easily be used to select an appropriate model for a specific place.