26 Jul 2023
 | 26 Jul 2023
Status: this preprint is currently under review for the journal HESS.

Metamorphic Testing of Machine Learning and Conceptual Hydrologic Models

Peter Reichert, Kai Ma, Marvin Höge, Fabrizio Fenicia, Marco Baity-Jesi, Dapeng Feng, and Chaopeng Shen

Abstract. Predicting the response of hydrologic systems to modified driving forces, beyond patterns that have occurred in the past, is of high importance for estimating climate change impacts or the effect of management measures. This kind of predictions requires a model, but the impossibility of testing such predictions against observed data makes it still difficult to estimate their reliability. Metamorphic testing offers a methodology for assessing models beyond validation with real data. It consists of defining input changes for which the expected responses are assumed to be known at least qualitatively, and to test model behavior for consistency with these expectations. To increase the gain of information and reduce the subjectivity of this approach, we extend this methodology to a multi-model approach and include a sensitivity analysis of the predictions to training or calibration options. This allows us to quantitatively analyse differences in predictions between different model structures and calibration options in addition to the qualitative test to the expectations. In our case study, we apply this approach to selected conceptual and machine learning hydrological models calibrated to basins from the CAMELS data set. Our results confirm the superiority of the machine learning models over the conceptual hydrologic models regarding the quality of fit during calibration and validation periods. However, we also find that the response of machine learning models to modified inputs can deviate from the expectations and the magnitude and even the sign of the response can depend on the training data. In addition, even in cases in which all models passed the metamorphic test, there are cases in which the quantitative response is different for different model structures. This demonstrates the importance of this kind of testing beyond the usual calibration-validation analysis to identify potential problems and stimulate the development of improved models.

Peter Reichert et al.

Status: open (until 21 Oct 2023)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • CC1: 'Comment on hess-2023-168', Scott Steinschneider, 29 Jul 2023 reply
    • CC2: 'Reply on CC1', Peter Reichert, 03 Aug 2023 reply

Peter Reichert et al.

Peter Reichert et al.


Total article views: 630 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
460 158 12 630 25 5 5
  • HTML: 460
  • PDF: 158
  • XML: 12
  • Total: 630
  • Supplement: 25
  • BibTeX: 5
  • EndNote: 5
Views and downloads (calculated since 26 Jul 2023)
Cumulative views and downloads (calculated since 26 Jul 2023)

Viewed (geographical distribution)

Total article views: 608 (including HTML, PDF, and XML) Thereof 608 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 28 Sep 2023
Short summary
We compared the predicted change in catchment outlet discharge to precipitation and temperature change for conceptual and machine-learning hydrological models. We found that machine-learning models, despite providing excellent fit and prediction capabilities, can be unreliable regarding the prediction of the effect of temperature change for low elevation catchments. This indicates the need for caution when applying them for the prediction of the effect of climate change.