the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Comparison of four machine learning models for forecasting daily reference evaporation based on public weather forecast data
Yunfeng Liang
Dongpu Feng
Zhaojun Sun
Abstract. Real-time accurate prediction of daily reference evapotranspiration (ETo) is critical for real-time irrigation decisions and water resource management. Although many public weather forecast-based machine learning models have been successfully used for daily ETo prediction, these models are developed with long-term historical daily observed meteorological data. The use of training and testing samples from different data sources can lead to the selection of the best model, and the performance of the best model for predicting daily ETo is not ideal. In this study, based on Food and Agriculture Organization (FAO) 56 Penman–Monteith (PM) equations, four machine learning models (multilayer perceptron (MLPo), extreme gradient boosting (XGBoosto), light gradient boosting machine (LightGBMo), and gradient boosting with categorical features support (CatBoost1o)) were trained and validated with daily observed meteorological data from 1995–2015 and 2016–2019, respectively, and five machine learning models (MLPp, XGBoostp, LightGBMp, CatBoost1p, and CatBoost2) were trained and validated with daily public weather forecast data with a 1-day lead time (2014–2018 and 2019, respectively). Based on public weather forecast and daily observed meteorological data (2020–2021), the predicted daily ETo performance of nine machine learning models (MLPo, XGBoosto, LightGBMo, CatBoost1o, MLPp, XGBoostp, LightGBMp, CatBoost1p, and CatBoost2) was compared. The results show that for all three studied climate zones, the performance of the four models developed based on public weather forecast data with a 1-day advance is better than that of the four models developed based on daily observed meteorological data with corresponding input combinations, and the mean MAE and RMSE ranges for the four models (MLP, XGBoost, LightGBM, and CatBoost1) in the three studied climate zones were reduced by 2.93 %–11.67 % and 2.20 %–9.46 %, respectively, and the mean R range was improved by 1.31 %–5.31 %. The top three models for the AR climate zone were XGBoostp, LightGBMp, and MLPp, the top three models for the SAR climate zone were MLPp, XGBoostp, and LightGBMp, and the top three models for the SHZ climate zone were XGBoostp, MLPp, and LightGBMp. In addition, the prediction performance for daily ETo is found to be highest in winter and lowest in summer in all three climate zones. Wspd from public weather forecasts was the most important source of daily ETo error in model predictions for the AR climate zone, followed by SDun, Tmax, and Tmin, while SDun from public weather forecasts was the most important source of daily ETo error in model predictions for the SAR (SHZ) climate zone, followed by Wspd, Tmax, and Tmin (Tmax, Wspd, and Tmin).
- Preprint
(1926 KB) - Metadata XML
-
Supplement
(338 KB) - BibTeX
- EndNote
Yunfeng Liang et al.
Status: open (until 21 Oct 2023)
-
CC1: 'Comment on hess-2023-158', quanrong wang, 03 Sep 2023
reply
This manuscript developed nine machine learning models to predict the daily reference evaporation among which four machine learning models (multilayer perceptron (MLPo), extreme gradient boosting (XGBoosto), light gradient boosting machine (LightGBMo), and gradient boosting with categorical features support (CatBoost1o)) were trained and validated with daily observed meteorological data from 1995-2015 and 2016-2019, respectively, and five machine learning models (MLPp, XGBoostp, LightGBMp, CatBoost1p, and CatBoost2) were trained and validated with daily public weather forecast data with a 1-day lead time (2014-2018 and 2019, respectively). Based on public weather forecasts and daily observed meteorological data (2020-2021), the predicted daily reference evaporation performance of nine machine learning models (MLPo, XGBoosto, LightGBMo, CatBoost1o, MLPp, XGBoostp, LightGBMp, CatBoost1p, and CatBoost2) was compared. However, the focus of this study is to compare the predictive performance of different machine learning models, not to propose new predictive models or predictive frameworks. A lot of work has been done in this field by previous researchers (as mentioned in the introduction), and it is not clear that the authors have achieved new insights based on the previous work. In addition, I am not sure if your model is superior to the previous ones. Authors need to highlight or come up with what new findings have been made here.
If I had reviewed this paper for other journals, I would have probably just recommended major revisions. However, in my opinion, this is not a significant enough advancement to warrant publication in HESS. I hope my candor does not discourage the authors.
It could make a good contribution somewhere else with some appropriate revisions. I have a few general and specific comments that I will list below to improve the current manuscript.
General comments
The manuscript compares the performance of nine data-driven machine-learning models for daily reference evaporation. The topic is interesting and worthy of investigation, but the manuscript is not presented well. The authors must consider revising the manuscript to build a proper narrative. The results are also not aptly discussed, wherein a detailed exploration of the diverse climate regimes that influence the model performance is missing. Overall, I recommend a decline of the manuscript. The authors need to revise their manuscript substantially.
There is a lack of a novel contribution statement in the abstract as well as the Introduction. The authors need to be more precise and add insightful statements rather than just a literature review - there is a lack of objective and research gaps.
Again, I see a lack of novelty in this article, the authors need to highlight or come up with what new findings has been made here. Authors need to provide detailed analysis for their results. I only see model calibration and validation and comparison of the simulated daily reference evaporation. They need to highlight this with some quantification with statistical indicators, radar plots, etc. as it is hard to read or grasp anything from tables.
If possible, I would like to see the three study cases expanded to more representative situations, such as when the aquifer medium is heterogeneous. This, I believe, is easily achievable through numerical modeling.
A few specific comments
1. There are too many undercurrents for the same machine learning model in the text, in addition to the fact that the text makes several references to four machine learning models and five machine learning models, a formulation that can easily be confusing, e.g., Line 116, Line 122…
2. Table 4: Does the RMSE represent standardized values? If so, then while comparing the model performance, one has to de-standardize and report the same.
3. Lines 69-77: The research gap is not established. The authors need to differentiate their study from the existing literature strongly.
4 Lines 105-122: Literature in the abstract is recommended to be listed in chronological order
5 Lines 241-245: The format of Tmax and Tmin in C1 is different from C2 to C9.
6. Line 362: There is a logical problem with Figure 3, which shows that Wspd corresponds to the smallest MAE and RMSE, while R is also the smallest; in fact, R should be the largest
7. Conclusions need revision. They fail to highlight the significant outcomes of the study.
8. The manuscript also needs significant language correction.
Citation: https://doi.org/10.5194/hess-2023-158-CC1 -
AC1: 'Reply on CC1(Response to review queries and review suggestions from Quanrong Wang)', Zhaojun Sun, 07 Sep 2023
reply
Dear Quanrong Wang,
I would like to express my deep appreciation for your hard and conscientious work in reviewing the manuscript.
Now I explain the main innovations of this article:
First of all, this paper proposes for the first time to use public weather forecast data (with a lead time of 1 day) to train and validate five models (the subscript p is marked at the bottom right of the model name). Four models trained and validated with daily observational meteorological data (the subscript o is marked at the lower right of the model name). Please note that training and validating models with daily observational meteorological data is a commonly used method in the existing literature, while training and validating models with public weather forecast data is not reported in the existing literature. Then, these nine machine learning models (five models trained and validated based on the public weather forecast data with a lead time of 1 day plus four models trained and validated based on daily observation meteorological data, a total of nine models ) and the public weather forecast data of 1-7 days lead time (2020-2021) are used to predict daily ETo, and compared with the standard values of daily ETo calculated by FAO-56 PM equation and daily observation meteorological data ( 2020-2021 ) to test the performance of the nine models. The results show that the performance of the model based on public weather forecast training and validation is better than that of the model based on daily observation meteorological data training and validation under the same input combination.
Secondly, the selected machine learning models, in addition to the MLP machine learning model, the other three machine learning models (XGBoost, LightGBM and CatBoost) are only used in the existing literature for the estimation of daily ETo ( that is, training and validation with daily observation meteorological data, and then estimating daily ETo with daily observation meteorological data and developed machine learning models ), and in the estimated machine learning models, these three models are models with high accuracy and good stability. However, these three machine learning models (XGBoost, LightGBM, and CatBoost) for predicting daily ETo have not been reported in the existing literature. In addition, in the existing literature, it is necessary to convert the category data (wind level to wind speed and weather type to sunshine hours) of public weather forecast data before they can be used in the model to predict the daily ETo, but this conversion of the category data of public weather forecast data brings large errors, which in turn affects the accuracy of the model to predict the daily ETo. Therefore, in this paper, we use the wind levels and weather types from public weather forecast data directly for the training and validation of the CatBoost2 model (which can deal with the type variables directly) and test the prediction of daily ETo for the forecasting period of 1-7 days.This method of using the wind levels and weather types from public weather forecast data directly for the prediction of daily ETo by a machine learning model has not been reported in the existing literature either. reported. This approach is also presented for the first time in this paper.
Finally, for agricultural irrigation, the most important thing is to accurately predict the daily ETo in the irrigation season of the crop, so this paper recommends the best model for predicting the daily ETo in the irrigation season among the machine learning models used for the research site. In addition, the source of the daily ETo error predicted by the machine learning model was analyzed. For the research sites in the arid area, the public weather forecast Wspd was the most important source of the daily ETo error predicted by the model, followed by SDun ; for the study sites in semi-arid and semi-humid regions, the public weather forecast SDun is the most important source of the daily ETo error predicted by the model, followed by Wspd. Therefore, when selecting the input combination of machine learning models, careful consideration should be given to the public weather forecasts SDun and Wspd. Because the addition of these two quantities may lead to a decline in the performance of the machine learning model to predict ETo.Responses to the recommendations.
- The four machine learning models are trained and validated based on daily meteorological data (the lower right of the model name is represented by the subscript o) ; the five machine learning models are trained and validated based on the public weather forecast data of one day in advance (represented by the lower right of the model name plus the subscript p). Daily ETo was predicted using these nine models and public weather forecast data (2020-2021), and compared with the standard values of daily ETo calculated using the FAO-56 PM equation and daily observational meteorological data (2020-2021) to test the performance of the model.
- RMSE is the abbreviation of root mean square error, which represents the root mean square error. To measure the deviation between the predicted value of the variable and the corresponding observed value, RMSE does not need to be standardized.
- It has been modified according to the requirements of review.
- It has been adjusted in chronological order.
- Thank you very much for the reviewer 's careful and serious work attitude, for pointing out that the subscript writing problem, I have modified.
- Row 362 : Note that in Figure 3 (c) represents R, (d) represents RM, Wspd corresponds to the smallest MAE and RMSE, and R is also the smallest ; RM should be the largest ( the Wspd in the weather forecast is overvalued, so the Wspd prediction performance is poor, which is one of the main error sources of ETo prediction ).
- The conclusion has been revised
- The language correction of the manuscript is being revised.
Citation: https://doi.org/10.5194/hess-2023-158-AC1
-
AC1: 'Reply on CC1(Response to review queries and review suggestions from Quanrong Wang)', Zhaojun Sun, 07 Sep 2023
reply
Yunfeng Liang et al.
Yunfeng Liang et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
227 | 53 | 10 | 290 | 22 | 3 | 3 |
- HTML: 227
- PDF: 53
- XML: 10
- Total: 290
- Supplement: 22
- BibTeX: 3
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1