Evaluation of water flux predictive models developed using eddy covariance observations and machine learning: a meta-analysis
- 1State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, Xinjiang, 830011, China
- 2University of Chinese Academy of Sciences, 19 (A) Yuquan Road, Beijing, 100049, China
- 3Research Centre for Ecology and Environment of Central Asia, Chinese Academy of Sciences, Urumqi, China
- 4Department of Geography, Ghent University, Ghent 9000, Belgium
- 5Sino-Belgian Joint Laboratory of Geo-Information, Ghent, Belgium and Urumqi, China
- 6Department of Computer Vision & Remote Sensing, Technische Universität Berlin, 10587 Berlin, Germany
- 1State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, Xinjiang, 830011, China
- 2University of Chinese Academy of Sciences, 19 (A) Yuquan Road, Beijing, 100049, China
- 3Research Centre for Ecology and Environment of Central Asia, Chinese Academy of Sciences, Urumqi, China
- 4Department of Geography, Ghent University, Ghent 9000, Belgium
- 5Sino-Belgian Joint Laboratory of Geo-Information, Ghent, Belgium and Urumqi, China
- 6Department of Computer Vision & Remote Sensing, Technische Universität Berlin, 10587 Berlin, Germany
Abstract. With the rapid accumulation of water flux observations from global eddy-covariance flux sites, many studies have used data-driven approaches to model site-scale water fluxes with various predictors and machine learning algorithms used. However, systematic evaluation of such models is still limited. We therefore performed a meta-analysis of 32 such studies, derived 139 model records, and evaluated the impact of various features on model accuracy throughout the modeling flow. SVM (average R-squared = 0.82) and RF (average R-squared = 0.81) outperformed over evaluated algorithms in both cross-study and intra-study (with the same training dataset) comparisons. The average accuracy of the model applied to arid regions is higher than other climate classes. The average accuracy of the model was slightly lower for forest sites (average R-squared = 0.76) than for cropland and grassland sites (average R-squared = 0.8 and 0.79), but higher than for shrub sites (average R-squared = 0.67). Among various predictor variables, the use of net/sun radiation, precipitation, air temperature, and the fraction of absorbed photosynthetically active radiation improved the model accuracy. Among the different validation methods, random cross-validation shows higher model accuracy than spatial cross-validation and temporal cross-validation, but spatial cross-validation is more important for the application for water flux predictive models when used for spatial extrapolation. The findings of this study are promising to guide future research on such machine learning-based modeling.
- Preprint
(3819 KB) -
Supplement
(77 KB) - BibTeX
- EndNote
Haiyang Shi et al.
Status: open (until 27 May 2022)
-
RC1: 'Comment on hess-2022-90', Anonymous Referee #1, 07 May 2022
reply
The authors conducted a meta-analysis to evaluate the performance of machine learning (ML) algorithms in the estimation of evapotranspiration. I believe this topic is timely and of interest to the HESS community. The motivation of the study, method, and results are clearly outlined, and they reach clear conclusions. Overall, this manuscript is informative and well structured. However, I believe there are several minor aspects which can be improved. Therefore, I support its publication in HESS with minor revisions.
1) L34 “ET is the most important indicator of the water cycle”: ET is not an indicator. It is a water balance component. Also, it may be not the most important component. I suggest writing “ET is one of the most important components of the water cycle ~”
2) L51-53: add examples and references to support the argument.
3) L82: define NDVI, EVI and LAI.
4) L83: define GPP
5) L153-155: I agree with the authors' point, but RMSE is still an important measure of the model performance. I think there is a way to normalize the RMSE when the magnitude or standard deviation of water flux are available. If possible, I recommend analyzing RMSE as well.
6) L225-229 and Figure 5 and Figure7: I think the authors should discuss variables which decrease the performance of the ML models (NDVI etc.). To do this, the authors may need to refer to Figure 7. Therefore, I suggest reordering Figures (i.e., 7 ->6 and 6->7). Figure 7 implies performance decreases due to NDVI (and other variables) may be spurious. In order to overcome such limitations, I suggest performing additional analysis by grouping ML models which use Rn/Rs and Ta and then generating Figure 5.
7) Figure5: difficult to compare variables. I think visualization can be improved by grouping variables which improve performance or not.
8) L261-263: I cannot agree. Data-driven approach and process-based approach are complementary. This should be revised.
9) L336-338: As the authors briefly mentioned here, eddy covariance observations are subject to random, gap-filling, and systematic (energy balance closure) uncertainty. There are several ways to address this uncertainty. For example, some studies may use a gap-filled dataset but some studies may choose observation only. Also, the energy balance closure problem can be addressed differently (uncorrected, Bowen-ratio corrected, and use of energy balance residual). Depending on this choice, the performance of ML algorithms may vary significantly (particularly energy closure problem is important). Although the authors mentioned observational uncertainty as a limitation of this research in L336-338, I believe this brief mention is not enough. If you can extract this information from the literature, I suggest performing an additional analysis (e.g., performance comparison for energy balance corrected vs uncorrected). If it is indeed difficult to extract the information from the literature, this topic should be discussed more thoroughly at least.
Haiyang Shi et al.
Haiyang Shi et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
352 | 110 | 8 | 470 | 40 | 2 | 4 |
- HTML: 352
- PDF: 110
- XML: 8
- Total: 470
- Supplement: 40
- BibTeX: 2
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1