Technical note: High Nash–Sutcliffe Efficiencies conceal poor simulations of interannual variance in seasonal regimes

Ruzzante, Sacha W.; Knoben, Wouter J. M.; Wagener, Thorsten; Gleeson, Tom; Schnorbus, Markus

doi:10.5194/hess-30-2337-2026

Articles | Volume 30, issue 8

https://doi.org/10.5194/hess-30-2337-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/hess-30-2337-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 30, issue 8

Technical note

| Highlight paper

|

23 Apr 2026

Technical note | Highlight paper |

| 23 Apr 2026

Technical note: High Nash–Sutcliffe Efficiencies conceal poor simulations of interannual variance in seasonal regimes

Sacha W. Ruzzante, Wouter J. M. Knoben, Thorsten Wagener, Tom Gleeson, and Markus Schnorbus

Download

Final revised paper (published on 23 Apr 2026)
Supplement to the final revised paper
Preprint (discussion started on 20 Oct 2025)
Supplement to the preprint

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-3851', Anonymous Referee #1, 26 Nov 2025

The authors touch a very important and increasingly spotted (luckily) topic: should we blindly trust our traditional performance metrics for hydrological modeling? Aside other very interesting insights, they discuss a sad (although needed) truth: high NSEs (or even KGEs) do not necessarily mean that the simulations are adequate. It urges in some aspect our need to improve, as modelers, our optimization metrics. The paper is definitely a fit for HESS and should be published, but as should be expected, some concerns should be clarified/corrected/improve before, aside many suggestions.
1. I believe that the methodology used for the time-series decommission needs to be better explained (with more details) and if needed, authors could make use of Appendix/Supporting information. This is a crucial part and needs to be ensured to be easy to follow by readers.
2. Also on that, I feel that authors could justify better the choice of the decomposition. Was it motivated by previous work? Are there more references? This needs to be made clear in the text.
3. The authors called the seasonal component the long-term seasonality of the basins. Our rivers are under changes and the seasonality is consequently changing in many of our rivers. I think this could fit a bit better in the text. I understand the choice (L85-89), and also I believe that much of the change is captured in the irregular, but the text would benefit for a bit of clarification in the choices.
4. Simulations: If I understood correct, the authors used simulated data from several models (and in one case they simulated themselves). Did the authors check for the different periods of calibration/evaluation/tests for all the models? Or for overlapping period? Did the authors used only what was classified as test? my main concern, is that during the model comparison, the authors might be using simulated streamflow from test for some models and for "calibration" for other models. Or even, single-basin versus regional simulations. I see no problem in using different settings, but this needs to be extensively reported and discussed in the results. For example, I have the feeling that for the PREVAH-CH simulations, the authors might have used all the simulation (including calibration) and not only evaluation (I might be wrong). My suggestion is to review these aspects, and incorporate such information in the manuscript.
5. Regarding Figure 3 (along also L275 onwards) models that performed better for highly seasonal catchments were the ones with the lowest performances overall, or is it my impression? I think you should discuss better this, maybe showing the median performances? A box plot in appendix? Something to clarify if these models being better in seasonal are actually just the case that they had overall poor performance? Also touching point 4, how were these simulations obtained by the original authors? did they report them as the evaluation phase? or are they actually for the calibration period? This would be worthy clarifying for the readers.
6 L328-332: Needs to be rephrased (maybe) after reviewing points 4 and 5.

Citation: https://doi.org/10.5194/egusphere-2025-3851-RC1
- AC2: 'Reply on RC1', Sacha Ruzzante, 19 Dec 2025
  
  See attached PDF.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3851-AC2
RC2:
'Comment on egusphere-2025-3851', Anonymous Referee #2, 02 Dec 2025

This Technical Note (TN) uses available streamflow simulations from several datasets to show that high NSE and KGE values for seasonal catchments do not necessarily translate in good model performances of interannual variability of streamflow. This is a relevant topic in the scope of HESS. I agree with all points raised in RC1 and provide a few comments below.

TITLE – the title could be more straightforward regarding highly seasonal streamflow regimes rather than specifying tropical, alpine, and polar catchments.

ABSTRACT – the short summary reads better than the abstract. The introduction of the abstract is too long. There should be only one opening sentence (e.g., “…common metrics used to evaluate hydrological models…”) followed by a sentence clarifying the scientific gap (e.g., “however, simulating interannual variability might be a problem…”). It should be made clear that the paper is mostly based on simulation available in the literature (i.e., the sentences “we show that hydrologic models…” and “we analyse 18 regional and global hydrologic models…” are quite ambiguous regarding the nature of this technical note).
L13 – is “irregular variance” the best term here?
L20 – how were “ecologically relevant” signatures determined?
L21-23 – It would be nice to finalize the abstract with the important technical implications for hydrologic modeling (so what should we do now?) rather than a general comment about climate change and vulnerable regions (not really the core topic of this TN).

INTRODUCTION – The story is not clear. First 12 lines about streamflow and climate change. But this TN is about performance metrics. It seems that the most important paragraph starts at L50. This paragraph should be developed further to clarify the relevance of this TN.
L37 & L56 – What about each of these references is interesting? Expand on it or cut it out.

METHODS –
Section 2.1 describes several data selection choices. Perhaps, moving Section 3 before Section 2 would be better.
L74-77 – this is not completely clear.
Section 2.2 – this was done by running a model or using available simulations? The language is ambiguous here.
L91 – Isn’t the NSE using the average streamflow as a benchmark?
Section 2.3 – again using “Modelling” in the title is a bit ambiguous as to the methods.
L110 – Where do I_o and I_s come from?
L128 – Why is this interesting? It should be clear in the intro why changes in hydrological signatures should be evaluated. There are several important references missing here.

DATA – This data and simulation use should be clarified in the abstract and introduction sections.

RESULTS AND DISCUSSION
Section 4.1 – What is a highly seasonal catchment? What are the signature values that were used to classify the catchments?
L194-205 – A lot of climatological explanation here, but nothing about important hydrological catchment characteristics. What is the area of the chosen catchments? What is average annual rainfall? What is ET? Why were these three catchments selected?
Section 4.3 The discussion here is not linear and difficult to follow. This section could be reduced considerably and the paragraphs should be grouped around main messages.
L275 – Is this hypothesis exhaustive? Could you think about any other case where that would happen or any exception to this?

CONCLUSIONS AND RECOMMENDATIONS – This section is a bit convoluted and repetitive. The conclusions should strictly address the knowledge and recommendations without repeating the results section (e.g., “higher than 0.8. We observe, in Figure 3…”).
L378-383 – a bit of repetition of the introduction.
L387 – “Lastly…” Why is that? How much is enough?

Citation: https://doi.org/10.5194/egusphere-2025-3851-RC2
- AC1: 'Reply on RC2', Sacha Ruzzante, 19 Dec 2025
  
  See attached PDF.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3851-AC1

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (13 Jan 2026) by Elena Toth

AR by Sacha Ruzzante on behalf of the Authors (20 Feb 2026) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (07 Mar 2026) by Elena Toth

RR by Anonymous Referee #2 (10 Mar 2026)

RR by Anonymous Referee #1 (20 Mar 2026)

Suggestions for revision or reasons for rejection

Ruzzante and colleagues presented a very interesting and original work and have done major revisions according to what was requested before. The manuscript deserves to be published in HESS, but with some small minor details that should be addressed before.

Introduction:
L67 – 77: From many examples of papers published in HESS and other places, I find a bit more elegant to finish an introduction with a paragraph rather than with bullet points. Could the authors think about not mixing sections with research questions, but rather first place the RQ, and then write (in chronological order) what is done in each section?

L87 and elsewhere: I think that when using the exempli gratia ("for example") authors should use always: e.g., with periods after each letter (and the comma in American English). Please revise it.

L132: Please could you add the reference to the test?

L150-152: I feel that these sentences should be place at the results.

3.1. Streamflow data first paragraph: Could you maybe place the references after each country instead of all mixed?

L284: Not also for irregular?
Or are you calling interannual for both: interannual and irregular? If yes, it is confusion.

Figure 4: so, it means that NSE seems significantly better in highly seasonal catchments, but generally the signatures are worse? If yes, I feel that this should be even more framed in the paper. I know that in the introduction you mention that, but I feel that you could make more emphasis on that too.

Discussion: did you also add something about arid and semi-arid catchments? At least a sentence would be interesting given their recurrent lower performance using either model architecture in many studies.

Hide

ED: Publish subject to technical corrections (21 Mar 2026) by Elena Toth

AR by Sacha Ruzzante on behalf of the Authors (09 Apr 2026) Author's response Manuscript

Editorial statement

Are the current hydrologic models able to simulate non-stationary responses to climate change in highly seasonal climates, which include tropical, alpine, and polar regions that are some of the most vulnerable regarding climate change. This paper addresses this research question in a compelling, novel and comprehensive way, with a focus on the suitability of our performance metrics for assessing the reproduction of interannual variability.