the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Metamorphic Testing of Machine Learning and Conceptual Hydrologic Models
Peter Reichert
Kai Ma
Marvin Höge
Fabrizio Fenicia
Marco Baity-Jesi
Dapeng Feng
Chaopeng Shen
Abstract. Predicting the response of hydrologic systems to modified driving forces, beyond patterns that have occurred in the past, is of high importance for estimating climate change impacts or the effect of management measures. This kind of predictions requires a model, but the impossibility of testing such predictions against observed data makes it still difficult to estimate their reliability. Metamorphic testing offers a methodology for assessing models beyond validation with real data. It consists of defining input changes for which the expected responses are assumed to be known at least qualitatively, and to test model behavior for consistency with these expectations. To increase the gain of information and reduce the subjectivity of this approach, we extend this methodology to a multi-model approach and include a sensitivity analysis of the predictions to training or calibration options. This allows us to quantitatively analyse differences in predictions between different model structures and calibration options in addition to the qualitative test to the expectations. In our case study, we apply this approach to selected conceptual and machine learning hydrological models calibrated to basins from the CAMELS data set. Our results confirm the superiority of the machine learning models over the conceptual hydrologic models regarding the quality of fit during calibration and validation periods. However, we also find that the response of machine learning models to modified inputs can deviate from the expectations and the magnitude and even the sign of the response can depend on the training data. In addition, even in cases in which all models passed the metamorphic test, there are cases in which the quantitative response is different for different model structures. This demonstrates the importance of this kind of testing beyond the usual calibration-validation analysis to identify potential problems and stimulate the development of improved models.
- Preprint
(6804 KB) - Metadata XML
-
Supplement
(95915 KB) - BibTeX
- EndNote
Peter Reichert et al.
Status: open (until 21 Oct 2023)
-
CC1: 'Comment on hess-2023-168', Scott Steinschneider, 29 Jul 2023
reply
I read this paper with interest, as I agree that such metamorphic tests on ML hydrologic models are needed to assess their appropriateness for certain hydrologic modeling applications like projections under climate change.
My main comment is that I think the authors could contextualize thier study with past work that has conducted a similar exploration. The first paper that I am aware of which attempted a metamorphic test on an LSTM was Razavi (2021) (see their Figure 11). They only considered an LSTM fit to one site, and so there are limitations to that work, but I think its important to recognize it. Afterwards, Wi and Steinschneider (2022) conducted a similar metamorphic test as conducted in the present study, using both 1) an LSTM and physics-informed LSTMs fit to 15 sites across California, as well as an LSTM fit across the entire CAMELS dataset. They found related challenges with LSTM projections under warming as found in this work.
Therefore, I recommend that the authors adjust their Introduction to recognize these past studies, and then to articulate how their work provides a contribution over these past studies. I believe this is very straightforward, as the present study 1) considers changes in precipitation as well; 2) explores responses separately by basin elevation and temperature; and 3) explore sensitivity to calibration choices (this later one was particularly helpful to see). In addition, I might adjust the Summary and Conclusion to discuss the results of the present study in comparison to the metamorphic results seen in Wi and Steinschneider (2022), in order to help synthsize related results in the literature.
References
Razavi, S. (2021). Deep learning, explained: Fundamentals, explainability, and bridgeability to process-based modelling, Environmental Modelling and Software, 105159, https://doi.org/10.1016/j.envsoft.2021.105159.
Wi, S., & Steinschneider, S. (2022). Assessing the physical realism of deep learning hydrologic model projections under climate change. Water Resources Research, 58, e2022WR032123. https://doi.org/10.1029/2022WR032123
Citation: https://doi.org/10.5194/hess-2023-168-CC1 -
CC2: 'Reply on CC1', Peter Reichert, 03 Aug 2023
reply
Thank you very much for these hints. We will expand our introduction and conclusions as recommended.
Citation: https://doi.org/10.5194/hess-2023-168-CC2
-
CC2: 'Reply on CC1', Peter Reichert, 03 Aug 2023
reply
Peter Reichert et al.
Peter Reichert et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
460 | 158 | 12 | 630 | 25 | 5 | 5 |
- HTML: 460
- PDF: 158
- XML: 12
- Total: 630
- Supplement: 25
- BibTeX: 5
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1