Technical note: PMR – a proxy metric to assess hydrological model robustness in a changing climate

Royer-Gaspard, Paul; Andréassian, Vazken; Thirel, Guillaume

doi:https://doi.org/10.5194/hess-25-5703-2021

Articles | Volume 25, issue 11

https://doi.org/10.5194/hess-25-5703-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/hess-25-5703-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 25, issue 11

Technical note

|

08 Nov 2021

Technical note |

| 08 Nov 2021

Technical note: PMR – a proxy metric to assess hydrological model robustness in a changing climate

Paul Royer-Gaspard, Vazken Andréassian, and Guillaume Thirel

Download

Final revised paper (published on 08 Nov 2021)
Preprint (discussion started on 04 Feb 2021)

Interactive discussion

Status: closed

RC1:
'Comment on hess-2021-58', Anonymous Referee #1, 22 Mar 2021

I have completed my review of the technical note “PMR - a proxy metric to assess hydrological model robustness in a changing climate”, by P. Royer-Gaspard et al., submitted to HESS. The paper presents a metric for evaluating model robustness in estimating flow volumes over disparate historical climate conditions. It then applies this metric to a set of 337 catchment models and compares it to the conventional differential split-sample test (DSST) approach.

The paper is concise, clear, well-written and well-organized. The readily calculated metric has the potential to be of use in model intercomparison studies, for characterizing model robustness in model selection, and potentially for multi-objective calibration. I recommend acceptance subject to moderate revision, as outlined below. I have also included minor comments in the supplemental document as pdf.

1) My one main issue with the evaluation approach used here is in the exclusive use of the absolute model bias from the DDST as an ‘default’ indicator of robustness, with the expectation that if the PMR metric is correlated to the absolute model bias (determined from DSST testing), then the PMR is an adequate proxy for robustness. The problem with this is in the use of absolute model bias. I will here address this via an example. In a standard DSST, the model is calibrated to a period of the historical record and validated to another period. Performance is deemed “robust” if the performance is minimally sensitive to the characteristics of the calibration and validation periods. For instance, if a model calibrated during wet years and validated during dry years exhibits similar validation performance (in terms of NSE, KGE, Bias, etc.) than the same model calibrated during dry years and validated during wet years, then it would be deemed robust to changes in climate. Thus, if these two model configurations both had a percent bias of 20%, the model is robust to changes in climate, even if not particularly accurate. If one model configuration had a percent bias of 20% in the validation period and one of -20%, then the model is not robust – it exhibits strong sensitivity to climate conditions. However, this is not sensitivity that would be picked up in a comparison of absolute model bias as calculated using equation 2 nor is this sensitivity fully picked up by the raw value of model bias in validation, which is a measure of accuracy rather than robustness (though I recognize that a robust model should ideally minimize the variance of this model bias on an annual basis). A better indicator of robustness in this context might be the absolute difference in bias exhibited by the two alternate configurations of the model, e.g.,

$$\left|\frac{\bar{Q}_{sim,i}}{\bar{Q}_{obs,i} -\frac{\bar{Q}_{sim,j}}{\bar{Q}_{obs,j}\right|$$

where i and j denote the dry/humid or warm/cold sub-periods periods. While I am not averse to the additional comparisons made to the absolute bias metrics, these are not themselves particularly strong indicators of robustness because they don’t compare two different climate conditions – the whole value of the DFFT. I think that the authors need to therefore use a more appropriate DFFT-derived robustness metric (such as this one) as an additional basis for comparison. Because they have already done the analysis herein and would only have to post-process model results, I hope that such an addition would be relatively straightforward, and could add much to the paper.

2) I also believe that the authors should make it clear that this metric only addresses one form of model robustness – robustness in estimating annual volumes. Other approaches would be needed to examine robustness with respect to peak flows, baseflows, etc.

3) Lastly, this analysis should really have been carried out in terms of water years rather than Julian years, but I see no reference to this in the text. It would be appreciated if this could be clarified.

Citation: https://doi.org/10.5194/hess-2021-58-RC1
- AC2: 'Reply on RC1', Paul Royer-Gaspard, 31 May 2021
  
  The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2021-58/hess-2021-58-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/hess-2021-58-AC2
- AC3: 'Supplement for the reply on RC1', Paul Royer-Gaspard, 31 May 2021
  
  The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2021-58/hess-2021-58-AC3-supplement.pdf
  
  Citation: https://doi.org/10.5194/hess-2021-58-AC3
RC2:
'Comment on hess-2021-58', Anonymous Referee #2, 22 Apr 2021

Firstly, apologies for my delay.

This is an interesting, well written and presented manuscript proposing a simple metric for evaluation of hydrological model robustness to climate variability during the calibration period. Overall I think that the paper is suitable for publication with some minor revisions and clarifications.

First, while the motivation for such an indicator is framed in the context of climate change impact assessment and the simulation of conditions different from those observed, the metric is limited to assessing transferability within the context of observed climate variability/change. In the conclusion it is stated that the new metric can be used to help select models for climate change impact assessment. I think it should be clarified that just because a model is more robust in the period of observations, that does not necessarily mean it will be robust to changes outside the range of variability in those observations.

Upon reading I was left wondering how the idea of moving biases might help inform model selection. The authors indicated that they saw little clustering of biases by indicators of catchment topography or climate. Did this assessment (not presented) include information on groundwater storage which may be more challenging for GR4J to capture.

The authors identify sub periods without any recognition of drivers of climate variability and their periodicities. Might it be possible to condition selection of L based on predominant modes of variability, eg. NAO?

I agree with the other reviewer on the limitation of examining absolute bias and urge the authors to consider the solution offered. I was also a little concerned that using the average to limit the effect of ‘drastically wrong’ years may in fact be ignoring the most informative data points we have. Understanding why such years are so poor surely offers important insight. Perhaps some further discussion of this could be offered.

Does the metric assume that the observational uncertanties are stationary?

Finally, the aim of the modelling exercise is often important to study design. The metric is limited here to assessing annual flows. How might this be used if the emphasis of the modelling study is on low flows, or high flows under climate change for example.

Citation: https://doi.org/10.5194/hess-2021-58-RC2
- AC1: 'Reply on RC2', Paul Royer-Gaspard, 31 May 2021
  
  The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2021-58/hess-2021-58-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/hess-2021-58-AC1

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (07 Jun 2021) by Genevieve Ali

AR by Paul Royer-Gaspard on behalf of the Authors (03 Aug 2021) Author's response Manuscript

EF by Sarah Buchmann (04 Aug 2021) Author's tracked changes

ED: Referee Nomination & Report Request started (12 Aug 2021) by Genevieve Ali

RR by Anonymous Referee #1 (06 Sep 2021)

ED: Publish as is (22 Sep 2021) by Genevieve Ali

AR by Paul Royer-Gaspard on behalf of the Authors (05 Oct 2021)

Short summary

Most evaluation studies based on the differential split-sample test (DSST) endorse the consensus that rainfall–runoff models lack climatic robustness. In this technical note, we propose a new performance metric to evaluate model robustness without applying the DSST and which can be used with a single hydrological model calibration. Our work makes it possible to evaluate the temporal transferability of any hydrological model, including uncalibrated models, at a very low computational cost.