Comparison of four machine learning models for forecasting daily reference evaporation based on public weather forecast data

Liang, Yunfeng; Feng, Dongpu; Sun, Zhaojun

doi:https://doi.org/10.5194/hess-2023-158

Preprints

https://doi.org/10.5194/hess-2023-158

Preprints

31 Jul 2023

| 31 Jul 2023

Status: this preprint has been withdrawn by the authors.

Comparison of four machine learning models for forecasting daily reference evaporation based on public weather forecast data

Yunfeng Liang, Dongpu Feng, and Zhaojun Sun

Abstract. Real-time accurate prediction of daily reference evapotranspiration (ET_o) is critical for real-time irrigation decisions and water resource management. Although many public weather forecast-based machine learning models have been successfully used for daily ET_o prediction, these models are developed with long-term historical daily observed meteorological data. The use of training and testing samples from different data sources can lead to the selection of the best model, and the performance of the best model for predicting daily ET_o is not ideal. In this study, based on Food and Agriculture Organization (FAO) 56 Penman–Monteith (PM) equations, four machine learning models (multilayer perceptron (MLP_o), extreme gradient boosting (XGBoost_o), light gradient boosting machine (LightGBM_o), and gradient boosting with categorical features support (CatBoost1_o)) were trained and validated with daily observed meteorological data from 1995–2015 and 2016–2019, respectively, and five machine learning models (MLP_p, XGBoost_p, LightGBM_p, CatBoost1_p, and CatBoost2) were trained and validated with daily public weather forecast data with a 1-day lead time (2014–2018 and 2019, respectively). Based on public weather forecast and daily observed meteorological data (2020–2021), the predicted daily ET_o performance of nine machine learning models (MLP_o, XGBoost_o, LightGBM_o, CatBoost1_o, MLP_p, XGBoost_p, LightGBM_p, CatBoost1_p, and CatBoost2) was compared. The results show that for all three studied climate zones, the performance of the four models developed based on public weather forecast data with a 1-day advance is better than that of the four models developed based on daily observed meteorological data with corresponding input combinations, and the mean MAE and RMSE ranges for the four models (MLP, XGBoost, LightGBM, and CatBoost1) in the three studied climate zones were reduced by 2.93 %–11.67 % and 2.20 %–9.46 %, respectively, and the mean R range was improved by 1.31 %–5.31 %. The top three models for the AR climate zone were XGBoost_p, LightGBM_p, and MLP_p, the top three models for the SAR climate zone were MLP_p, XGBoost_p, and LightGBM_p, and the top three models for the SHZ climate zone were XGBoost_p, MLP_p, and LightGBM_p. In addition, the prediction performance for daily ET_o is found to be highest in winter and lowest in summer in all three climate zones. Wspd from public weather forecasts was the most important source of daily ET_o error in model predictions for the AR climate zone, followed by SDun, T_max, and T_min, while SDun from public weather forecasts was the most important source of daily ET_o error in model predictions for the SAR (SHZ) climate zone, followed by Wspd, T_max, and T_min (T_max, Wspd, and T_min).

This preprint has been withdrawn.

Received: 27 Jun 2023 – Discussion started: 31 Jul 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1926 KB)

Withdrawal notice
This preprint has been withdrawn.
Preprint (1926 KB)

Supplement (338 KB)

Download & links

This preprint has been withdrawn.

Yunfeng Liang, Dongpu Feng, and Zhaojun Sun

Interactive discussion

Status: closed

CC1:
'Comment on hess-2023-158', quanrong wang, 03 Sep 2023

This manuscript developed nine machine learning models to predict the daily reference evaporation among which four machine learning models (multilayer perceptron (MLPo), extreme gradient boosting (XGBoosto), light gradient boosting machine (LightGBMo), and gradient boosting with categorical features support (CatBoost1o)) were trained and validated with daily observed meteorological data from 1995-2015 and 2016-2019, respectively, and five machine learning models (MLPp, XGBoostp, LightGBMp, CatBoost1p, and CatBoost2) were trained and validated with daily public weather forecast data with a 1-day lead time (2014-2018 and 2019, respectively). Based on public weather forecasts and daily observed meteorological data (2020-2021), the predicted daily reference evaporation performance of nine machine learning models (MLPo, XGBoosto, LightGBMo, CatBoost1o, MLPp, XGBoostp, LightGBMp, CatBoost1p, and CatBoost2) was compared. However, the focus of this study is to compare the predictive performance of different machine learning models, not to propose new predictive models or predictive frameworks. A lot of work has been done in this field by previous researchers (as mentioned in the introduction), and it is not clear that the authors have achieved new insights based on the previous work. In addition, I am not sure if your model is superior to the previous ones. Authors need to highlight or come up with what new findings have been made here.
If I had reviewed this paper for other journals, I would have probably just recommended major revisions. However, in my opinion, this is not a significant enough advancement to warrant publication in HESS. I hope my candor does not discourage the authors.
It could make a good contribution somewhere else with some appropriate revisions. I have a few general and specific comments that I will list below to improve the current manuscript.
General comments
The manuscript compares the performance of nine data-driven machine-learning models for daily reference evaporation. The topic is interesting and worthy of investigation, but the manuscript is not presented well. The authors must consider revising the manuscript to build a proper narrative. The results are also not aptly discussed, wherein a detailed exploration of the diverse climate regimes that influence the model performance is missing. Overall, I recommend a decline of the manuscript. The authors need to revise their manuscript substantially.
There is a lack of a novel contribution statement in the abstract as well as the Introduction. The authors need to be more precise and add insightful statements rather than just a literature review - there is a lack of objective and research gaps.
Again, I see a lack of novelty in this article, the authors need to highlight or come up with what new findings has been made here. Authors need to provide detailed analysis for their results. I only see model calibration and validation and comparison of the simulated daily reference evaporation. They need to highlight this with some quantification with statistical indicators, radar plots, etc. as it is hard to read or grasp anything from tables.
If possible, I would like to see the three study cases expanded to more representative situations, such as when the aquifer medium is heterogeneous. This, I believe, is easily achievable through numerical modeling.
A few specific comments
1. There are too many undercurrents for the same machine learning model in the text, in addition to the fact that the text makes several references to four machine learning models and five machine learning models, a formulation that can easily be confusing, e.g., Line 116, Line 122…
2. Table 4: Does the RMSE represent standardized values? If so, then while comparing the model performance, one has to de-standardize and report the same.
3. Lines 69-77: The research gap is not established. The authors need to differentiate their study from the existing literature strongly.
4 Lines 105-122: Literature in the abstract is recommended to be listed in chronological order
5 Lines 241-245: The format of Tmax and Tmin in C1 is different from C2 to C9.
6. Line 362: There is a logical problem with Figure 3, which shows that Wspd corresponds to the smallest MAE and RMSE, while R is also the smallest; in fact, R should be the largest
7. Conclusions need revision. They fail to highlight the significant outcomes of the study.
8. The manuscript also needs significant language correction.

Citation: https://doi.org/10.5194/hess-2023-158-CC1
- AC1:
  'Reply on CC1（Response to review queries and review suggestions from Quanrong Wang）', Zhaojun Sun, 07 Sep 2023
  Dear Quanrong Wang,
  I would like to express my deep appreciation for your hard and conscientious work in reviewing the manuscript.
  Now I explain the main innovations of this article:
  First of all, this paper proposes for the first time to use public weather forecast data (with a lead time of 1 day) to train and validate five models (the subscript p is marked at the bottom right of the model name). Four models trained and validated with daily observational meteorological data (the subscript o is marked at the lower right of the model name). Please note that training and validating models with daily observational meteorological data is a commonly used method in the existing literature, while training and validating models with public weather forecast data is not reported in the existing literature. Then, these nine machine learning models (five models trained and validated based on the public weather forecast data with a lead time of 1 day plus four models trained and validated based on daily observation meteorological data, a total of nine models ) and the public weather forecast data of 1-7 days lead time (2020-2021) are used to predict daily ET_o, and compared with the standard values of daily ET_o calculated by FAO-56 PM equation and daily observation meteorological data ( 2020-2021 ) to test the performance of the nine models. The results show that the performance of the model based on public weather forecast training and validation is better than that of the model based on daily observation meteorological data training and validation under the same input combination.
  Secondly, the selected machine learning models, in addition to the MLP machine learning model, the other three machine learning models (XGBoost, LightGBM and CatBoost) are only used in the existing literature for the estimation of daily ET_o ( that is, training and validation with daily observation meteorological data, and then estimating daily ET_o with daily observation meteorological data and developed machine learning models ), and in the estimated machine learning models, these three models are models with high accuracy and good stability. However, these three machine learning models (XGBoost, LightGBM, and CatBoost) for predicting daily ET_o have not been reported in the existing literature. In addition, in the existing literature, it is necessary to convert the category data (wind level to wind speed and weather type to sunshine hours) of public weather forecast data before they can be used in the model to predict the daily ETo, but this conversion of the category data of public weather forecast data brings large errors, which in turn affects the accuracy of the model to predict the daily ETo. Therefore, in this paper, we use the wind levels and weather types from public weather forecast data directly for the training and validation of the CatBoost2 model (which can deal with the type variables directly) and test the prediction of daily ETo for the forecasting period of 1-7 days.This method of using the wind levels and weather types from public weather forecast data directly for the prediction of daily ETo by a machine learning model has not been reported in the existing literature either. reported. This approach is also presented for the first time in this paper.
  
  Finally, for agricultural irrigation, the most important thing is to accurately predict the daily ET_o in the irrigation season of the crop, so this paper recommends the best model for predicting the daily ET_o in the irrigation season among the machine learning models used for the research site. In addition, the source of the daily ET_o error predicted by the machine learning model was analyzed. For the research sites in the arid area, the public weather forecast Wspd was the most important source of the daily ET_o error predicted by the model, followed by SDun ; for the study sites in semi-arid and semi-humid regions, the public weather forecast SDun is the most important source of the daily ET_o error predicted by the model, followed by Wspd. Therefore, when selecting the input combination of machine learning models, careful consideration should be given to the public weather forecasts SDun and Wspd. Because the addition of these two quantities may lead to a decline in the performance of the machine learning model to predict ET_o.
  Responses to the recommendations.
  The four machine learning models are trained and validated based on daily meteorological data (the lower right of the model name is represented by the subscript o) ; the five machine learning models are trained and validated based on the public weather forecast data of one day in advance (represented by the lower right of the model name plus the subscript p). Daily ET_o was predicted using these nine models and public weather forecast data (2020-2021), and compared with the standard values of daily ET_o calculated using the FAO-56 PM equation and daily observational meteorological data (2020-2021) to test the performance of the model.
  
  RMSE is the abbreviation of root mean square error, which represents the root mean square error. To measure the deviation between the predicted value of the variable and the corresponding observed value, RMSE does not need to be standardized.
  
  It has been modified according to the requirements of review.
  
  It has been adjusted in chronological order.
  
  Thank you very much for the reviewer 's careful and serious work attitude, for pointing out that the subscript writing problem, I have modified.
  
  Row 362 : Note that in Figure 3 (c) represents R, (d) represents RM, Wspd corresponds to the smallest MAE and RMSE, and R is also the smallest ; RM should be the largest ( the Wspd in the weather forecast is overvalued, so the Wspd prediction performance is poor, which is one of the main error sources of ET_o prediction ).
  
  The conclusion has been revised
  
  The language correction of the manuscript is being revised.
  
  Citation: https://doi.org/10.5194/hess-2023-158-AC1
RC1:
'Comment on hess-2023-158', Anonymous Referee #1, 30 Sep 2023
This paper presents an interesting study that compared four machine learning models for forecasting daily reference evaporation based on public weather forecast data. However, the current manuscript suffers from several drawbacks that are mentioned below. Therefore, I think this paper should be rejected, but encourage re-submission after a major revision.
General comments:
This paper lacks innovation as it adopts mature algorithms, and the studied region is not representative enough. We suggest the author to emphasize the innovative aspects of this paper.

The language and structure of the manuscript is not very reader-friendly. We strongly recommend the author to polish the language of this article. Meanwhile, the results section of this article seems slightly redundant, with an excessive number of tables. We suggest condensing and reducing them after summarizing.

The use of abbreviations in this article is confusing. On one hand, many abbreviations are not defined when first introduced, and on the other hand, some abbreviations are reiterated excessively. For example, in Abstract, the full name of “MLP_p” “XGBoost_p” “CatBoost” “AR” “SHZ” “Wspd” are not explicitly mentioned when it is first introduced. The definitions of abbreviations “AR” and “SHZ” are not introduced throughout the entire manuscript, even in Lines 128-130.

Specific comments:
Lines 166-176: We suggest listing the sources and links of these models in a table.

Section 2.1: We recommend adding specific descriptions of the differences in the three climate zones, and it would be better to illustrate meteorological data using graphs if possible.

Section 2.2.2 mostly provides an overview of several methods, which is not suitable for the Methods section. The Methods section should describe the four methods in terms of formulas and theories, as well as state the reasons for choosing these methods.

Equation (3): How was the coefficient of sunshine duration obtained? If it was derived from previous research, please indicate the source. If it was self-defined, please explain the criteria used for the definition.

We recommend carefully organizing and categorizing the abbreviations used throughout the text. The current labeling of abbreviations has rather increased the difficulty of reading.

For better understanding, we suggest summarizing all the test case designs used in this study in a table.

Line 252: should be the standard deviation, not variance.

Figure 2: The current flowchart is too complex. We suggest simplifying it to enhance readability.

In Section 2.3, it is recommended to separately introduce different statistical indicators.

The Results and Discussion section of this paper mainly consists of a simple statement of the results, lacking in-depth analysis and discussion. For example, in Section 3.2, the reasons for the algorithm’s differences in different climatic regions are not adequately addressed, and in Section 3.1.2, the reasons for seasonal variations are not analyzed.

There are excessive tables in this paper (23 tables). It is recommended to explore alternative ways of presenting the information.

The generalizability of the conclusions in this paper may be limited. Do these conclusions apply to other regions as well?
Citation: https://doi.org/10.5194/hess-2023-158-RC1
- AC2:
  'Reply on RC1', Zhaojun Sun, 02 Oct 2023
  Dear Anonymous Referee #1,
  I would like to express my deep gratitude to you for your hard and serious review work.
  
  The main innovations of this paper are described:
  
  First, this paper presents for the first time the training and validation of five models (model names are labeled with the subscript p at the lower right of the model name) with public weather forecast data (with a lead time of 1 day). Four models trained and validated with day-by-day observed weather data (subscript o labeled at the bottom right of the model name). Note that training and validating models with day-by-day observed meteorological data is a common approach in the literature, while training and validating models with public weather forecast data is not reported in the literature. These nine developed machine learning models (five models trained and validated based on 1-day-ahead public weather forecast data plus four models trained and validated based on day-by-day observed meteorological data, for a total of nine models) are then used with 1-7-day-ahead public weather forecast data (2020-2021) to predict the daily ETo and compared with those predicted using the FAO-56 PM equation and day-by-day observed meteorological data (2020-2021) to test the performance of the nine models by comparison with the standardized values of daily ETo computed with the FAO-56 PM equation and day-by-day observed meteorological data (2020-2021). The results show that the performance of both the training and validation models for predicting daily ETo based on public weather forecasts outperforms the performance of the training and validation models for predicting daily ETo based on day-by-day observed meteorological data with the same input combinations.
  
  Second, the selected machine learning models, except for the MLP machine learning model, the other three machine learning models (XGBoost, LightGBM, and CatBoost) have only been used for daily ETo estimation in the literature (that is, they are trained and validated with day-by-day meteorological data, and then the daily ETo is estimated using the day-by-day meteorological data and the developed machine learning models). These three models are the models with high accuracy and stability among the estimated machine learning models, but the use of these three machine learning models (XGBoost, LightGBM, and CatBoost) for the prediction of daily ETo has not been reported in the literature. In addition, in the literature, it is necessary to convert the category data (wind level to wind speed and weather type to sunshine hours) of public weather forecast data before they can be used in the model to predict the daily ETo, but this conversion of the category data of public weather forecast data brings large errors, which in turn affects the accuracy of the model to predict the daily ETo. Therefore, in this paper, we use the wind levels and weather types from public weather forecast data directly for the training and validation of the CatBoost2 model (which can deal with the type variables directly) and test the prediction of daily ETo for the forecasting period of 1-7 days. This method of using the wind levels and weather types from public weather forecast data directly for the prediction of daily ETo by a machine learning model has not been reported in the literature either. This approach is also presented for the first time in this paper.
  
  Finally, for agricultural irrigation, the most important thing is the accurate prediction of daily ETo during the irrigation season of the crop, so this paper recommends the best of the machine learning models used for the research sites to predict daily ETo during the irrigation season. In addition, the sources that cause errors in the daily ETo predicted by the machine learning models were analyzed; for the study sites in the arid zone, Wspd from the public weather forecast was the most important source of errors in the daily ETo predicted by the models, followed by SDun; for the study sites in the semiarid and semimoisture zones, SDun from the public weather forecast was the most important source of errors in the daily ETo predicted by the models, followed by Wspd. In this way, the combination of inputs from public weather forecasts SDun and Wspd should be carefully considered when choosing the inputs to the machine learning model. because the inclusion of these two quantities may lead to a degradation of the performance of the machine learning model in predicting daily ETo.
  
  In addition, the full names of the relevant abbreviations you mentioned in this paper are explained at the relevant places in this paper." The full names of the abbreviations "MLP_p", "XGBoost_p", and "CatBoost" are specified in lines 8-10 of the Abstract; the full names of the abbreviations "AR", "SHZ", and "SAR" are specified in Table 1, line 135; and the full name of the abbreviation "Wspd" is specified in lines 203-204.
  
  Response to specific comments：
  This is a good suggestion, as long as it meets the formatting requirements of the journal, I will list the sources and links to the model in a table format.
  
  The three climatic zones studied, the mesothermal arid zone (northern yellow irrigation zone), the mesothermal semiarid zone in the center (central arid zone), and the mesothermal semihumid zone in the south, were obtained according to the Köppen classification, which is also in agreement with what is expressed in the literature. Meteorological data for the three climatic zones are described in Table 2.
  
  The reasons for choosing these methods have been explained both in the introduction (lines 113-116) and in the methods section (lines 197-199).
  
  The daylight time factor has already been explained in lines 204-205.
  
  In fact for the acronyms used in this paper, they have all been explained at the appropriate places in this paper, but of course I might consider starting this paper with a uniform description of the acronyms used in the text.
  
  A tabular overview of all the test case designs used in this study is presented later in this paper.
  
  Note that here, the sample data are normalized, and σ is the variance of the sample data, not the standard deviation.
  
  The flowchart (Fig. 2) mainly expresses the process of hyperparameter tuning of the four machine learning algorithms using appropriate methods to obtain the hyperparameters of the optimal results. I have tried to make it as simple as possible.
  
  These metrics are commonly used to measure model performance and have been described in the text.
  
  The use of meteorological data (either public weather forecast data or day-by-day observed meteorological data) as inputs to the model for training the model makes the model to predict the ETo differ in two ways (either by analyzing the performance of the model day-by-day or by analyzing the performance of the model according to the seasonality): first, it is the reason for the input meteorological data, such as the different combinations of weather variables in the input meteorological data and the forecasting errors of the weather variables from public. The first is due to the input meteorological data, such as different combinations of weather variables in the input meteorological data and forecast errors of weather variables from public weather forecasts, which have actually been analyzed and presented in Section 3.2 and Section 3.1. The second reason is the model itself. First, the model is trained and validated by using weather data samples, and then the samples are tested with public weather forecast data, which is a test of the stability of the model itself. The four different types of models are combined and compared to obtain a better model for predicting daily ETo. These processes actually address the reasons why the algorithms differ in different climate zones.
  
  In addition, the purpose of analyzing the performance of models for predicting daily ETo according to seasonality is to recommend the best model for predicting daily ETo for crops in different growing seasons and then to use the best model for predicting daily ETo in subsequent irrigation forecasting and irrigation decision scheduling for the crops concerned.
  
  The relevant tables have been replaced using radar charts to rationalize and simplify the presentation of relevant results.
  
  The conclusions of this paper are of course fully generalizable to other regions.
  
  The conclusion that "the performance of training and validating models based on public weather forecasts to predict daily ETo is better than that of training and validating models based on day-by-day observations with the same input combinations" can be generalized to other regions, i.e., it is recommended to train and validate the models with public weather forecast data before testing the models with future public weather forecast data to generalize the use of the models, when there are enough samples of public weather forecast data. That is, when there are enough samples of public weather forecast data, it is recommended to train and validate the model with public weather forecast data first, test the model with future public weather forecast data, and then generalize the use of the model.
  
  Citation: https://doi.org/10.5194/hess-2023-158-AC2
RC2:
'Comment on hess-2023-158', Anonymous Referee #2, 08 Oct 2023

The study undertakes an in-depth examination of various machine learning models to achieve a comprehensive grasp of the most efficient techniques for ET prediction. It differentiates its approach by comparing the efficacy of models trained on diverse data sources, particularly highlighting the potential value of publicly available weather forecasts for these predictions. While the research provides valuable insights into the effectiveness of machine learning models in predicting daily ET using different data sources, there are areas for improvement.
It is clear that significant dedication and effort have been poured into the manuscript by the authors. However, as a reader, I sometimes find myself navigating through an overwhelming expanse of information. There's a pressing need for a clearer presentation of results. Prioritizing pivotal findings and possibly relegating secondary results to supplementary material might streamline the narrative. For instance, the extensive analysis of public forecast weather data, which doesn't seem central to the main objective. Besides, the authors calculated the four evaluation metrics for different variables, different climate zone, different datasets, and different models. There are 23 tables that show these statistics in the manuscript, and it is very difficult for readers to extract the pivotal results from the massive amount of information.
The study is geographically limited to three different climates. The results' global applicability remains in question. Can we anticipate consistent model performance across diverse global climates, especially given the shifts in weather patterns due to phenomena like global warming? The selection of just nine stations for such a vast expanse warrants further discussion about potential uncertainties.
Besides, a deeper dive into the rationale behind selecting these specific machine learning models would be beneficial for readers and researchers looking to replicate or build upon this study. It was introduced in the introduction that multiple similar studies have been conducted in other regions across China, How does this research differentiate itself from those previous work?
Specific comments:
Line 44-45: This expression is a little confusing, prediction is a process a of estimating.
Line 50-51: The statement isn't quite right as ET can be affected by historical conditions such as water availability, which is associated with past weather conditions such as historical precipitation and temperature.
Line 61-68: It might be better to move this information to the methods section.
Line 105-115: While these methods have been extensively verified in China, it's essential to clarify the unique contribution of this study
Line 116: Use the term "applied" instead of "developed" to accurately reflect the models' utilization in this study.
Consider reevaluating objective 2 to ensure it brings a unique value to the study. The testing of methods should ideally be integrated into the overall research framework rather than as a standalone objective I am not sure the testing of methods can stand alone as one of the three objectives. This would affect the novelties of this work.
Line 135: Clearly define acronyms like "SDun." Specify the range of weather data used in table 1 and include these details in the table caption.
Line 155: Need to describe how the FAO 56 PM is used in this study.
Line 167: Provide a rationale for selecting these three bagging methods. Are they especially suited for this type of data or problem?
Line 183: This section does not flow well, line 183-201, the authors introduced a few machine learning methods and then made an introduction about the MLP. The authors should consider improving the logic flow. Start by giving an overview of the machine learning methods explored, followed by a detailed introduction of the MLP.
Line 189: “…Brazil using the first four days of data…”, what does this mean?
Line 299-422: While the evaluation of public forecast weather data is commendable, it's essential to keep the focus on the study's primary objectives. Consider summarizing these results to maintain brevity and relevance.
Line 362: what does different colors mean?’
Line 423: Begin this section by detailing the chosen input variable combinations and the rationale behind their selection. This provides readers with context before diving into the results.

Citation: https://doi.org/10.5194/hess-2023-158-RC2
- AC3:
  'Reply on RC2', Zhaojun Sun, 10 Oct 2023
  Dear Anonymous Referee #2,
  I would like to express my deep gratitude to you for your hard and serious review work.
  First, the main innovations of this paper are described:
  This paper presents for the first time the training and validation of five models (with subscript p labeled at the bottom right of the model name) with public weather forecast data (with a lead time of 1 day). Note that training and validating models with day-by-day observed meteorological data is a common approach in the literature, while training and validating models with public weather forecast data is not reported in the literature.
  In addition, in the literature, it is necessary to convert the category data (wind level to wind speed and weather type to hours of sunshine) in the public weather forecast data before they can be used in the model to predict the daily ET_o, but this conversion of the category data of the public weather forecast data will bring a large error, which will affect the accuracy of the model in predicting the daily ET_o. Therefore, in this paper, we directly use the wind level and weather type from public weather forecast data for the training and validation of the CatBoost2 model (which can directly handle the type variables) and test the prediction of daily ET_o for the forecasting period of 1-7 days. This method of directly using the wind level and weather type from public weather forecast data in the prediction of daily ET_o by a machine learning model has not been reported in the literature either. reported. This approach is also presented for the first time in this paper.
  Second, the main conclusions of this paper are as follows:
  Five models trained and validated with public weather forecast data (1-day lead time) (subscript p labeled at the bottom right of the model name). Four models trained and validated with day-by-day observed weather data (subscript o labeled at the bottom right of the model name). These nine developed machine learning models (five models trained and validated with 1-day-ahead public weather forecast data plus four models trained and validated with day-by-day observed weather data, for a total of nine models) were then used with 1-7-day-ahead public weather forecast data (2020-2021) to predict daily ET_o. The performance of the nine models is also tested by comparison with the standardized values of daily ET_o computed with the FAO-56 PM equation and day-by-day observed weather data (2020-2021). The results show that the performance of both the training and validation models for predicting daily ET_o based on public weather forecasts is better than the performance of the training and validation models for predicting daily ET_o based on day-by-day observed meteorological data with the same input combinations.
  Finally, a note on a few points of doubt:
  I have modified the 23 tables, e.g., to radar plots, so that readers can extract the key results more easily;
  The introduction of this paper describes several similar studies conducted in other parts of China, which are different in that the models are trained and validated only with day-by-day observed meteorological data, and the developed models are only tested with public weather forecasts, whereas in this paper, the models are trained, validated and tested with public weather forecasts;
  Training and validating the model requires a long series of historical public weather forecast data. I can only obtain the relevant historical public weather forecast data from meteorological stations in Ningxia, China, and cannot obtain the historical public weather forecast data from meteorological stations in other places. Therefore, only nine study stations in Ningxia were selected. Nevertheless, combined with the conclusions of this paper, the method proposed in this paper still has a very great value of popularization. At stations with access to historical public weather forecast data, training and validation of models for predicting ET_o using different combinations of inputs of historical public weather forecast data variables can be carried out so that the best models for predicting ET_o at the study sites can be obtained.
  
  Responses to specific comments:
  Forecasting ET_o is a process of estimating ET_o, and here, the emphasis is on forecasting future ET_o. However, the estimate in lines 44-45 is an estimate of past ET_o using historical day-by-day actual measurements.
  
  Note that rows 50-51 are ET_o, which is the reference crop evaporative transpiration, and according to the FAO-56 PM equation, the prediction of daily ET_o is largely governed by weather conditions (i.e., weather variables).
  
  Very good suggestion, which I have modified.
  
  These three integrated learning models, XGBoost, LightGBM and CatBoost, outperform other machine learning models in estimating ET_o using historical day-by-day observed meteorological data. However, these three integrated learning models are not used in the study of predicting future ET_o. Therefore, in this paper, these three integrated learning models are selected as a method for predicting ET_o.
  
  I have made the suggested changes.
  
  Related note: All other eight models in this paper need to convert the category data (wind level to wind speed and weather type to sunshine hours) in the public weather forecast data before they can be used for the prediction of daily ET_o by these eight models. Since the CatBoost model can handle type variables directly, in this paper, wind levels and weather types from public weather forecast data are used directly as inputs to the CatBoost model (the model is denoted as CatBoost2). Due to the worse prediction performance of wind levels and weather types in the public weather forecasts for 1-7 days in the forecast period and the poor stability performance of the CatBoost model itself, the direct use of wind levels and weather types from the public weather forecast data as inputs to the CatBoost model did not improve the performance of the CatBoost2 model for predicting the daily ET_o in the test period. Nonetheless, this approach of using wind levels and weather types from public weather forecast data directly for machine learning models to predict daily ET_o has not been reported in the literature. This approach is also presented for the first time in this paper.
  
  I can consider your suggestion and make necessary changes.
  
  Changes have been made in accordance with the recommendations, and the acronyms have been harmonized at the beginning of the document.
  
  Please note that this paper only uses the equation proposed by FAO to calculate the daily ET_o: the FAO-56 PM equation.
  
  In general, there are three types of integration algorithms: bagging, boosting and stacking.
  
  Bagging is the process of constructing multiple independent evaluators and then averaging their predictions or using the majority voting principle to determine the outcome of the integrated evaluators. Representative algorithm: Random forest.
  Boosting constructs multiple weak learners to form a strong learner according to the corresponding weights. Representative algorithms: AdaBoost, GBDT, XGBoost, LightGBM and CatBoost.
  All of these methods are suitable for the data and problem in this study. It is just that XGBoost, LightGBM and CatBoost have better performance than Random Forest. Specific details can be found in these three documents. The official document of the XGBoost open source is at http://xgboost.readthedocs.io; the official document of the LightGBM open source is at http://lightgbm.readthedocs.io; and the CatBoost open source official document address is https://catboost.ai/en/docs/.
  
  MLP, which is a multilayer perceptron, is a well-established algorithm and has been applied to the prediction of daily ET_o, but most of the reported studies use MLP with a single hidden layer. In this study, based on the TensorFlow 2.8.0 framework, the parameters of the MLP hidden layer are used as hyperparameters and tuned using RandomzedSearchCV. The results show that the performances of the MLP with 2-3 hidden layers for predicting daily ET_o are all better than the performance of the MLP with a single hidden layer for predicting daily ET_o.
  
  Please note that lines 188-190 are a one-sentence citation. Ferreira et al. (2019) estimated daily ET o for all of Brazil using the first four days of data, and an ANN (model structure 16-50-50-1) was the best choice among temperature- and relative humidity-based models.
  
  Changes have been made accordingly, as recommended.
  
  This is the case with box-and-line plots, where different colors represent statistical results for different climatic zones.
  
  The selected combination of input variables has been explained in lines 235-245. Since the public weather forecast provides only four variables, which are the inputs to the machine learning model, the selection of the input combinations is made based on the existing research reports and this paper's evaluation of the prediction performance of weather variables in the public weather forecast (lines 299-422).
  
  Citation: https://doi.org/10.5194/hess-2023-158-AC3

Interactive discussion

Status: closed

CC1:
'Comment on hess-2023-158', quanrong wang, 03 Sep 2023

This manuscript developed nine machine learning models to predict the daily reference evaporation among which four machine learning models (multilayer perceptron (MLPo), extreme gradient boosting (XGBoosto), light gradient boosting machine (LightGBMo), and gradient boosting with categorical features support (CatBoost1o)) were trained and validated with daily observed meteorological data from 1995-2015 and 2016-2019, respectively, and five machine learning models (MLPp, XGBoostp, LightGBMp, CatBoost1p, and CatBoost2) were trained and validated with daily public weather forecast data with a 1-day lead time (2014-2018 and 2019, respectively). Based on public weather forecasts and daily observed meteorological data (2020-2021), the predicted daily reference evaporation performance of nine machine learning models (MLPo, XGBoosto, LightGBMo, CatBoost1o, MLPp, XGBoostp, LightGBMp, CatBoost1p, and CatBoost2) was compared. However, the focus of this study is to compare the predictive performance of different machine learning models, not to propose new predictive models or predictive frameworks. A lot of work has been done in this field by previous researchers (as mentioned in the introduction), and it is not clear that the authors have achieved new insights based on the previous work. In addition, I am not sure if your model is superior to the previous ones. Authors need to highlight or come up with what new findings have been made here.
If I had reviewed this paper for other journals, I would have probably just recommended major revisions. However, in my opinion, this is not a significant enough advancement to warrant publication in HESS. I hope my candor does not discourage the authors.
It could make a good contribution somewhere else with some appropriate revisions. I have a few general and specific comments that I will list below to improve the current manuscript.
General comments
The manuscript compares the performance of nine data-driven machine-learning models for daily reference evaporation. The topic is interesting and worthy of investigation, but the manuscript is not presented well. The authors must consider revising the manuscript to build a proper narrative. The results are also not aptly discussed, wherein a detailed exploration of the diverse climate regimes that influence the model performance is missing. Overall, I recommend a decline of the manuscript. The authors need to revise their manuscript substantially.
There is a lack of a novel contribution statement in the abstract as well as the Introduction. The authors need to be more precise and add insightful statements rather than just a literature review - there is a lack of objective and research gaps.
Again, I see a lack of novelty in this article, the authors need to highlight or come up with what new findings has been made here. Authors need to provide detailed analysis for their results. I only see model calibration and validation and comparison of the simulated daily reference evaporation. They need to highlight this with some quantification with statistical indicators, radar plots, etc. as it is hard to read or grasp anything from tables.
If possible, I would like to see the three study cases expanded to more representative situations, such as when the aquifer medium is heterogeneous. This, I believe, is easily achievable through numerical modeling.
A few specific comments
1. There are too many undercurrents for the same machine learning model in the text, in addition to the fact that the text makes several references to four machine learning models and five machine learning models, a formulation that can easily be confusing, e.g., Line 116, Line 122…
2. Table 4: Does the RMSE represent standardized values? If so, then while comparing the model performance, one has to de-standardize and report the same.
3. Lines 69-77: The research gap is not established. The authors need to differentiate their study from the existing literature strongly.
4 Lines 105-122: Literature in the abstract is recommended to be listed in chronological order
5 Lines 241-245: The format of Tmax and Tmin in C1 is different from C2 to C9.
6. Line 362: There is a logical problem with Figure 3, which shows that Wspd corresponds to the smallest MAE and RMSE, while R is also the smallest; in fact, R should be the largest
7. Conclusions need revision. They fail to highlight the significant outcomes of the study.
8. The manuscript also needs significant language correction.

Citation: https://doi.org/10.5194/hess-2023-158-CC1
- AC1:
  'Reply on CC1（Response to review queries and review suggestions from Quanrong Wang）', Zhaojun Sun, 07 Sep 2023
  Dear Quanrong Wang,
  I would like to express my deep appreciation for your hard and conscientious work in reviewing the manuscript.
  Now I explain the main innovations of this article:
  First of all, this paper proposes for the first time to use public weather forecast data (with a lead time of 1 day) to train and validate five models (the subscript p is marked at the bottom right of the model name). Four models trained and validated with daily observational meteorological data (the subscript o is marked at the lower right of the model name). Please note that training and validating models with daily observational meteorological data is a commonly used method in the existing literature, while training and validating models with public weather forecast data is not reported in the existing literature. Then, these nine machine learning models (five models trained and validated based on the public weather forecast data with a lead time of 1 day plus four models trained and validated based on daily observation meteorological data, a total of nine models ) and the public weather forecast data of 1-7 days lead time (2020-2021) are used to predict daily ET_o, and compared with the standard values of daily ET_o calculated by FAO-56 PM equation and daily observation meteorological data ( 2020-2021 ) to test the performance of the nine models. The results show that the performance of the model based on public weather forecast training and validation is better than that of the model based on daily observation meteorological data training and validation under the same input combination.
  Secondly, the selected machine learning models, in addition to the MLP machine learning model, the other three machine learning models (XGBoost, LightGBM and CatBoost) are only used in the existing literature for the estimation of daily ET_o ( that is, training and validation with daily observation meteorological data, and then estimating daily ET_o with daily observation meteorological data and developed machine learning models ), and in the estimated machine learning models, these three models are models with high accuracy and good stability. However, these three machine learning models (XGBoost, LightGBM, and CatBoost) for predicting daily ET_o have not been reported in the existing literature. In addition, in the existing literature, it is necessary to convert the category data (wind level to wind speed and weather type to sunshine hours) of public weather forecast data before they can be used in the model to predict the daily ETo, but this conversion of the category data of public weather forecast data brings large errors, which in turn affects the accuracy of the model to predict the daily ETo. Therefore, in this paper, we use the wind levels and weather types from public weather forecast data directly for the training and validation of the CatBoost2 model (which can deal with the type variables directly) and test the prediction of daily ETo for the forecasting period of 1-7 days.This method of using the wind levels and weather types from public weather forecast data directly for the prediction of daily ETo by a machine learning model has not been reported in the existing literature either. reported. This approach is also presented for the first time in this paper.
  
  Finally, for agricultural irrigation, the most important thing is to accurately predict the daily ET_o in the irrigation season of the crop, so this paper recommends the best model for predicting the daily ET_o in the irrigation season among the machine learning models used for the research site. In addition, the source of the daily ET_o error predicted by the machine learning model was analyzed. For the research sites in the arid area, the public weather forecast Wspd was the most important source of the daily ET_o error predicted by the model, followed by SDun ; for the study sites in semi-arid and semi-humid regions, the public weather forecast SDun is the most important source of the daily ET_o error predicted by the model, followed by Wspd. Therefore, when selecting the input combination of machine learning models, careful consideration should be given to the public weather forecasts SDun and Wspd. Because the addition of these two quantities may lead to a decline in the performance of the machine learning model to predict ET_o.
  Responses to the recommendations.
  The four machine learning models are trained and validated based on daily meteorological data (the lower right of the model name is represented by the subscript o) ; the five machine learning models are trained and validated based on the public weather forecast data of one day in advance (represented by the lower right of the model name plus the subscript p). Daily ET_o was predicted using these nine models and public weather forecast data (2020-2021), and compared with the standard values of daily ET_o calculated using the FAO-56 PM equation and daily observational meteorological data (2020-2021) to test the performance of the model.
  
  RMSE is the abbreviation of root mean square error, which represents the root mean square error. To measure the deviation between the predicted value of the variable and the corresponding observed value, RMSE does not need to be standardized.
  
  It has been modified according to the requirements of review.
  
  It has been adjusted in chronological order.
  
  Thank you very much for the reviewer 's careful and serious work attitude, for pointing out that the subscript writing problem, I have modified.
  
  Row 362 : Note that in Figure 3 (c) represents R, (d) represents RM, Wspd corresponds to the smallest MAE and RMSE, and R is also the smallest ; RM should be the largest ( the Wspd in the weather forecast is overvalued, so the Wspd prediction performance is poor, which is one of the main error sources of ET_o prediction ).
  
  The conclusion has been revised
  
  The language correction of the manuscript is being revised.
  
  Citation: https://doi.org/10.5194/hess-2023-158-AC1
RC1:
'Comment on hess-2023-158', Anonymous Referee #1, 30 Sep 2023
This paper presents an interesting study that compared four machine learning models for forecasting daily reference evaporation based on public weather forecast data. However, the current manuscript suffers from several drawbacks that are mentioned below. Therefore, I think this paper should be rejected, but encourage re-submission after a major revision.
General comments:
This paper lacks innovation as it adopts mature algorithms, and the studied region is not representative enough. We suggest the author to emphasize the innovative aspects of this paper.

The language and structure of the manuscript is not very reader-friendly. We strongly recommend the author to polish the language of this article. Meanwhile, the results section of this article seems slightly redundant, with an excessive number of tables. We suggest condensing and reducing them after summarizing.

The use of abbreviations in this article is confusing. On one hand, many abbreviations are not defined when first introduced, and on the other hand, some abbreviations are reiterated excessively. For example, in Abstract, the full name of “MLP_p” “XGBoost_p” “CatBoost” “AR” “SHZ” “Wspd” are not explicitly mentioned when it is first introduced. The definitions of abbreviations “AR” and “SHZ” are not introduced throughout the entire manuscript, even in Lines 128-130.

Specific comments:
Lines 166-176: We suggest listing the sources and links of these models in a table.

Section 2.1: We recommend adding specific descriptions of the differences in the three climate zones, and it would be better to illustrate meteorological data using graphs if possible.

Section 2.2.2 mostly provides an overview of several methods, which is not suitable for the Methods section. The Methods section should describe the four methods in terms of formulas and theories, as well as state the reasons for choosing these methods.

Equation (3): How was the coefficient of sunshine duration obtained? If it was derived from previous research, please indicate the source. If it was self-defined, please explain the criteria used for the definition.

We recommend carefully organizing and categorizing the abbreviations used throughout the text. The current labeling of abbreviations has rather increased the difficulty of reading.

For better understanding, we suggest summarizing all the test case designs used in this study in a table.

Line 252: should be the standard deviation, not variance.

Figure 2: The current flowchart is too complex. We suggest simplifying it to enhance readability.

In Section 2.3, it is recommended to separately introduce different statistical indicators.

The Results and Discussion section of this paper mainly consists of a simple statement of the results, lacking in-depth analysis and discussion. For example, in Section 3.2, the reasons for the algorithm’s differences in different climatic regions are not adequately addressed, and in Section 3.1.2, the reasons for seasonal variations are not analyzed.

There are excessive tables in this paper (23 tables). It is recommended to explore alternative ways of presenting the information.

The generalizability of the conclusions in this paper may be limited. Do these conclusions apply to other regions as well?
Citation: https://doi.org/10.5194/hess-2023-158-RC1
- AC2:
  'Reply on RC1', Zhaojun Sun, 02 Oct 2023
  Dear Anonymous Referee #1,
  I would like to express my deep gratitude to you for your hard and serious review work.
  
  The main innovations of this paper are described:
  
  First, this paper presents for the first time the training and validation of five models (model names are labeled with the subscript p at the lower right of the model name) with public weather forecast data (with a lead time of 1 day). Four models trained and validated with day-by-day observed weather data (subscript o labeled at the bottom right of the model name). Note that training and validating models with day-by-day observed meteorological data is a common approach in the literature, while training and validating models with public weather forecast data is not reported in the literature. These nine developed machine learning models (five models trained and validated based on 1-day-ahead public weather forecast data plus four models trained and validated based on day-by-day observed meteorological data, for a total of nine models) are then used with 1-7-day-ahead public weather forecast data (2020-2021) to predict the daily ETo and compared with those predicted using the FAO-56 PM equation and day-by-day observed meteorological data (2020-2021) to test the performance of the nine models by comparison with the standardized values of daily ETo computed with the FAO-56 PM equation and day-by-day observed meteorological data (2020-2021). The results show that the performance of both the training and validation models for predicting daily ETo based on public weather forecasts outperforms the performance of the training and validation models for predicting daily ETo based on day-by-day observed meteorological data with the same input combinations.
  
  Second, the selected machine learning models, except for the MLP machine learning model, the other three machine learning models (XGBoost, LightGBM, and CatBoost) have only been used for daily ETo estimation in the literature (that is, they are trained and validated with day-by-day meteorological data, and then the daily ETo is estimated using the day-by-day meteorological data and the developed machine learning models). These three models are the models with high accuracy and stability among the estimated machine learning models, but the use of these three machine learning models (XGBoost, LightGBM, and CatBoost) for the prediction of daily ETo has not been reported in the literature. In addition, in the literature, it is necessary to convert the category data (wind level to wind speed and weather type to sunshine hours) of public weather forecast data before they can be used in the model to predict the daily ETo, but this conversion of the category data of public weather forecast data brings large errors, which in turn affects the accuracy of the model to predict the daily ETo. Therefore, in this paper, we use the wind levels and weather types from public weather forecast data directly for the training and validation of the CatBoost2 model (which can deal with the type variables directly) and test the prediction of daily ETo for the forecasting period of 1-7 days. This method of using the wind levels and weather types from public weather forecast data directly for the prediction of daily ETo by a machine learning model has not been reported in the literature either. This approach is also presented for the first time in this paper.
  
  Finally, for agricultural irrigation, the most important thing is the accurate prediction of daily ETo during the irrigation season of the crop, so this paper recommends the best of the machine learning models used for the research sites to predict daily ETo during the irrigation season. In addition, the sources that cause errors in the daily ETo predicted by the machine learning models were analyzed; for the study sites in the arid zone, Wspd from the public weather forecast was the most important source of errors in the daily ETo predicted by the models, followed by SDun; for the study sites in the semiarid and semimoisture zones, SDun from the public weather forecast was the most important source of errors in the daily ETo predicted by the models, followed by Wspd. In this way, the combination of inputs from public weather forecasts SDun and Wspd should be carefully considered when choosing the inputs to the machine learning model. because the inclusion of these two quantities may lead to a degradation of the performance of the machine learning model in predicting daily ETo.
  
  In addition, the full names of the relevant abbreviations you mentioned in this paper are explained at the relevant places in this paper." The full names of the abbreviations "MLP_p", "XGBoost_p", and "CatBoost" are specified in lines 8-10 of the Abstract; the full names of the abbreviations "AR", "SHZ", and "SAR" are specified in Table 1, line 135; and the full name of the abbreviation "Wspd" is specified in lines 203-204.
  
  Response to specific comments：
  This is a good suggestion, as long as it meets the formatting requirements of the journal, I will list the sources and links to the model in a table format.
  
  The three climatic zones studied, the mesothermal arid zone (northern yellow irrigation zone), the mesothermal semiarid zone in the center (central arid zone), and the mesothermal semihumid zone in the south, were obtained according to the Köppen classification, which is also in agreement with what is expressed in the literature. Meteorological data for the three climatic zones are described in Table 2.
  
  The reasons for choosing these methods have been explained both in the introduction (lines 113-116) and in the methods section (lines 197-199).
  
  The daylight time factor has already been explained in lines 204-205.
  
  In fact for the acronyms used in this paper, they have all been explained at the appropriate places in this paper, but of course I might consider starting this paper with a uniform description of the acronyms used in the text.
  
  A tabular overview of all the test case designs used in this study is presented later in this paper.
  
  Note that here, the sample data are normalized, and σ is the variance of the sample data, not the standard deviation.
  
  The flowchart (Fig. 2) mainly expresses the process of hyperparameter tuning of the four machine learning algorithms using appropriate methods to obtain the hyperparameters of the optimal results. I have tried to make it as simple as possible.
  
  These metrics are commonly used to measure model performance and have been described in the text.
  
  The use of meteorological data (either public weather forecast data or day-by-day observed meteorological data) as inputs to the model for training the model makes the model to predict the ETo differ in two ways (either by analyzing the performance of the model day-by-day or by analyzing the performance of the model according to the seasonality): first, it is the reason for the input meteorological data, such as the different combinations of weather variables in the input meteorological data and the forecasting errors of the weather variables from public. The first is due to the input meteorological data, such as different combinations of weather variables in the input meteorological data and forecast errors of weather variables from public weather forecasts, which have actually been analyzed and presented in Section 3.2 and Section 3.1. The second reason is the model itself. First, the model is trained and validated by using weather data samples, and then the samples are tested with public weather forecast data, which is a test of the stability of the model itself. The four different types of models are combined and compared to obtain a better model for predicting daily ETo. These processes actually address the reasons why the algorithms differ in different climate zones.
  
  In addition, the purpose of analyzing the performance of models for predicting daily ETo according to seasonality is to recommend the best model for predicting daily ETo for crops in different growing seasons and then to use the best model for predicting daily ETo in subsequent irrigation forecasting and irrigation decision scheduling for the crops concerned.
  
  The relevant tables have been replaced using radar charts to rationalize and simplify the presentation of relevant results.
  
  The conclusions of this paper are of course fully generalizable to other regions.
  
  The conclusion that "the performance of training and validating models based on public weather forecasts to predict daily ETo is better than that of training and validating models based on day-by-day observations with the same input combinations" can be generalized to other regions, i.e., it is recommended to train and validate the models with public weather forecast data before testing the models with future public weather forecast data to generalize the use of the models, when there are enough samples of public weather forecast data. That is, when there are enough samples of public weather forecast data, it is recommended to train and validate the model with public weather forecast data first, test the model with future public weather forecast data, and then generalize the use of the model.
  
  Citation: https://doi.org/10.5194/hess-2023-158-AC2
RC2:
'Comment on hess-2023-158', Anonymous Referee #2, 08 Oct 2023

The study undertakes an in-depth examination of various machine learning models to achieve a comprehensive grasp of the most efficient techniques for ET prediction. It differentiates its approach by comparing the efficacy of models trained on diverse data sources, particularly highlighting the potential value of publicly available weather forecasts for these predictions. While the research provides valuable insights into the effectiveness of machine learning models in predicting daily ET using different data sources, there are areas for improvement.
It is clear that significant dedication and effort have been poured into the manuscript by the authors. However, as a reader, I sometimes find myself navigating through an overwhelming expanse of information. There's a pressing need for a clearer presentation of results. Prioritizing pivotal findings and possibly relegating secondary results to supplementary material might streamline the narrative. For instance, the extensive analysis of public forecast weather data, which doesn't seem central to the main objective. Besides, the authors calculated the four evaluation metrics for different variables, different climate zone, different datasets, and different models. There are 23 tables that show these statistics in the manuscript, and it is very difficult for readers to extract the pivotal results from the massive amount of information.
The study is geographically limited to three different climates. The results' global applicability remains in question. Can we anticipate consistent model performance across diverse global climates, especially given the shifts in weather patterns due to phenomena like global warming? The selection of just nine stations for such a vast expanse warrants further discussion about potential uncertainties.
Besides, a deeper dive into the rationale behind selecting these specific machine learning models would be beneficial for readers and researchers looking to replicate or build upon this study. It was introduced in the introduction that multiple similar studies have been conducted in other regions across China, How does this research differentiate itself from those previous work?
Specific comments:
Line 44-45: This expression is a little confusing, prediction is a process a of estimating.
Line 50-51: The statement isn't quite right as ET can be affected by historical conditions such as water availability, which is associated with past weather conditions such as historical precipitation and temperature.
Line 61-68: It might be better to move this information to the methods section.
Line 105-115: While these methods have been extensively verified in China, it's essential to clarify the unique contribution of this study
Line 116: Use the term "applied" instead of "developed" to accurately reflect the models' utilization in this study.
Consider reevaluating objective 2 to ensure it brings a unique value to the study. The testing of methods should ideally be integrated into the overall research framework rather than as a standalone objective I am not sure the testing of methods can stand alone as one of the three objectives. This would affect the novelties of this work.
Line 135: Clearly define acronyms like "SDun." Specify the range of weather data used in table 1 and include these details in the table caption.
Line 155: Need to describe how the FAO 56 PM is used in this study.
Line 167: Provide a rationale for selecting these three bagging methods. Are they especially suited for this type of data or problem?
Line 183: This section does not flow well, line 183-201, the authors introduced a few machine learning methods and then made an introduction about the MLP. The authors should consider improving the logic flow. Start by giving an overview of the machine learning methods explored, followed by a detailed introduction of the MLP.
Line 189: “…Brazil using the first four days of data…”, what does this mean?
Line 299-422: While the evaluation of public forecast weather data is commendable, it's essential to keep the focus on the study's primary objectives. Consider summarizing these results to maintain brevity and relevance.
Line 362: what does different colors mean?’
Line 423: Begin this section by detailing the chosen input variable combinations and the rationale behind their selection. This provides readers with context before diving into the results.

Citation: https://doi.org/10.5194/hess-2023-158-RC2
- AC3:
  'Reply on RC2', Zhaojun Sun, 10 Oct 2023
  Dear Anonymous Referee #2,
  I would like to express my deep gratitude to you for your hard and serious review work.
  First, the main innovations of this paper are described:
  This paper presents for the first time the training and validation of five models (with subscript p labeled at the bottom right of the model name) with public weather forecast data (with a lead time of 1 day). Note that training and validating models with day-by-day observed meteorological data is a common approach in the literature, while training and validating models with public weather forecast data is not reported in the literature.
  In addition, in the literature, it is necessary to convert the category data (wind level to wind speed and weather type to hours of sunshine) in the public weather forecast data before they can be used in the model to predict the daily ET_o, but this conversion of the category data of the public weather forecast data will bring a large error, which will affect the accuracy of the model in predicting the daily ET_o. Therefore, in this paper, we directly use the wind level and weather type from public weather forecast data for the training and validation of the CatBoost2 model (which can directly handle the type variables) and test the prediction of daily ET_o for the forecasting period of 1-7 days. This method of directly using the wind level and weather type from public weather forecast data in the prediction of daily ET_o by a machine learning model has not been reported in the literature either. reported. This approach is also presented for the first time in this paper.
  Second, the main conclusions of this paper are as follows:
  Five models trained and validated with public weather forecast data (1-day lead time) (subscript p labeled at the bottom right of the model name). Four models trained and validated with day-by-day observed weather data (subscript o labeled at the bottom right of the model name). These nine developed machine learning models (five models trained and validated with 1-day-ahead public weather forecast data plus four models trained and validated with day-by-day observed weather data, for a total of nine models) were then used with 1-7-day-ahead public weather forecast data (2020-2021) to predict daily ET_o. The performance of the nine models is also tested by comparison with the standardized values of daily ET_o computed with the FAO-56 PM equation and day-by-day observed weather data (2020-2021). The results show that the performance of both the training and validation models for predicting daily ET_o based on public weather forecasts is better than the performance of the training and validation models for predicting daily ET_o based on day-by-day observed meteorological data with the same input combinations.
  Finally, a note on a few points of doubt:
  I have modified the 23 tables, e.g., to radar plots, so that readers can extract the key results more easily;
  The introduction of this paper describes several similar studies conducted in other parts of China, which are different in that the models are trained and validated only with day-by-day observed meteorological data, and the developed models are only tested with public weather forecasts, whereas in this paper, the models are trained, validated and tested with public weather forecasts;
  Training and validating the model requires a long series of historical public weather forecast data. I can only obtain the relevant historical public weather forecast data from meteorological stations in Ningxia, China, and cannot obtain the historical public weather forecast data from meteorological stations in other places. Therefore, only nine study stations in Ningxia were selected. Nevertheless, combined with the conclusions of this paper, the method proposed in this paper still has a very great value of popularization. At stations with access to historical public weather forecast data, training and validation of models for predicting ET_o using different combinations of inputs of historical public weather forecast data variables can be carried out so that the best models for predicting ET_o at the study sites can be obtained.
  
  Responses to specific comments:
  Forecasting ET_o is a process of estimating ET_o, and here, the emphasis is on forecasting future ET_o. However, the estimate in lines 44-45 is an estimate of past ET_o using historical day-by-day actual measurements.
  
  Note that rows 50-51 are ET_o, which is the reference crop evaporative transpiration, and according to the FAO-56 PM equation, the prediction of daily ET_o is largely governed by weather conditions (i.e., weather variables).
  
  Very good suggestion, which I have modified.
  
  These three integrated learning models, XGBoost, LightGBM and CatBoost, outperform other machine learning models in estimating ET_o using historical day-by-day observed meteorological data. However, these three integrated learning models are not used in the study of predicting future ET_o. Therefore, in this paper, these three integrated learning models are selected as a method for predicting ET_o.
  
  I have made the suggested changes.
  
  Related note: All other eight models in this paper need to convert the category data (wind level to wind speed and weather type to sunshine hours) in the public weather forecast data before they can be used for the prediction of daily ET_o by these eight models. Since the CatBoost model can handle type variables directly, in this paper, wind levels and weather types from public weather forecast data are used directly as inputs to the CatBoost model (the model is denoted as CatBoost2). Due to the worse prediction performance of wind levels and weather types in the public weather forecasts for 1-7 days in the forecast period and the poor stability performance of the CatBoost model itself, the direct use of wind levels and weather types from the public weather forecast data as inputs to the CatBoost model did not improve the performance of the CatBoost2 model for predicting the daily ET_o in the test period. Nonetheless, this approach of using wind levels and weather types from public weather forecast data directly for machine learning models to predict daily ET_o has not been reported in the literature. This approach is also presented for the first time in this paper.
  
  I can consider your suggestion and make necessary changes.
  
  Changes have been made in accordance with the recommendations, and the acronyms have been harmonized at the beginning of the document.
  
  Please note that this paper only uses the equation proposed by FAO to calculate the daily ET_o: the FAO-56 PM equation.
  
  In general, there are three types of integration algorithms: bagging, boosting and stacking.
  
  Bagging is the process of constructing multiple independent evaluators and then averaging their predictions or using the majority voting principle to determine the outcome of the integrated evaluators. Representative algorithm: Random forest.
  Boosting constructs multiple weak learners to form a strong learner according to the corresponding weights. Representative algorithms: AdaBoost, GBDT, XGBoost, LightGBM and CatBoost.
  All of these methods are suitable for the data and problem in this study. It is just that XGBoost, LightGBM and CatBoost have better performance than Random Forest. Specific details can be found in these three documents. The official document of the XGBoost open source is at http://xgboost.readthedocs.io; the official document of the LightGBM open source is at http://lightgbm.readthedocs.io; and the CatBoost open source official document address is https://catboost.ai/en/docs/.
  
  MLP, which is a multilayer perceptron, is a well-established algorithm and has been applied to the prediction of daily ET_o, but most of the reported studies use MLP with a single hidden layer. In this study, based on the TensorFlow 2.8.0 framework, the parameters of the MLP hidden layer are used as hyperparameters and tuned using RandomzedSearchCV. The results show that the performances of the MLP with 2-3 hidden layers for predicting daily ET_o are all better than the performance of the MLP with a single hidden layer for predicting daily ET_o.
  
  Please note that lines 188-190 are a one-sentence citation. Ferreira et al. (2019) estimated daily ET o for all of Brazil using the first four days of data, and an ANN (model structure 16-50-50-1) was the best choice among temperature- and relative humidity-based models.
  
  Changes have been made accordingly, as recommended.
  
  This is the case with box-and-line plots, where different colors represent statistical results for different climatic zones.
  
  The selected combination of input variables has been explained in lines 235-245. Since the public weather forecast provides only four variables, which are the inputs to the machine learning model, the selection of the input combinations is made based on the existing research reports and this paper's evaluation of the prediction performance of weather variables in the public weather forecast (lines 299-422).
  
  Citation: https://doi.org/10.5194/hess-2023-158-AC3

Yunfeng Liang, Dongpu Feng, and Zhaojun Sun

Supplement

https://doi.org/10.5194/hess-2023-158-supplement

Yunfeng Liang, Dongpu Feng, and Zhaojun Sun

Viewed

Total article views: 1,531 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
1,160	319	52	1,531	107	57	81

HTML: 1,160
PDF: 319
XML: 52
Total: 1,531
Supplement: 107
BibTeX: 57
EndNote: 81

Views and downloads (calculated since 31 Jul 2023)

Month	HTML	PDF	XML	Total
Jul 2023	39	8	1	48
Aug 2023	140	34	4	178
Sep 2023	50	11	5	66
Oct 2023	61	17	8	86
Nov 2023	9	8	0	17
Dec 2023	14	7	0	21
Jan 2024	21	10	1	32
Feb 2024	16	13	1	30
Mar 2024	30	11	4	45
Apr 2024	28	6	3	37
May 2024	36	11	7	54
Jun 2024	16	2	0	18
Jul 2024	28	7	1	36
Aug 2024	24	5	0	29
Sep 2024	23	3	0	26
Oct 2024	19	5	0	24
Nov 2024	22	5	1	28
Dec 2024	26	2	0	28
Jan 2025	16	9	2	27
Feb 2025	24	7	1	32
Mar 2025	25	13	4	42
Apr 2025	24	22	0	46
May 2025	19	16	2	37
Jun 2025	33	19	1	53
Jul 2025	26	14	0	40
Aug 2025	67	23	2	92
Sep 2025	295	15	1	311
Oct 2025	29	16	3	48

Cumulative views and downloads (calculated since 31 Jul 2023)

Month	HTML	PDF	XML	Total
Jul 2023	39	8	1	48
Aug 2023	140	34	4	178
Sep 2023	50	11	5	66
Oct 2023	61	17	8	86
Nov 2023	9	8	0	17
Dec 2023	14	7	0	21
Jan 2024	21	10	1	32
Feb 2024	16	13	1	30
Mar 2024	30	11	4	45
Apr 2024	28	6	3	37
May 2024	36	11	7	54
Jun 2024	16	2	0	18
Jul 2024	28	7	1	36
Aug 2024	24	5	0	29
Sep 2024	23	3	0	26
Oct 2024	19	5	0	24
Nov 2024	22	5	1	28
Dec 2024	26	2	0	28
Jan 2025	16	9	2	27
Feb 2025	24	7	1	32
Mar 2025	25	13	4	42
Apr 2025	24	22	0	46
May 2025	19	16	2	37
Jun 2025	33	19	1	53
Jul 2025	26	14	0	40
Aug 2025	67	23	2	92
Sep 2025	295	15	1	311
Oct 2025	29	16	3	48

Viewed (geographical distribution)

Total article views: 1,498 (including HTML, PDF, and XML) Thereof 1,498 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 24 Oct 2025

Download

This preprint has been withdrawn.

Preprint (1926 KB)
Metadata XML

Short summary

During the testing period, the performance of the predicted ET_o from the machine learning model trained and validated based on the public weather forecast 1 day before outperforms the performance of the predicted ET_o from the machine learning model trained and validated based on the daily observed meteorological data. Wspd and SDun in the public weather forecast are the most important sources of daily ET_o errors in the model predictions for the AR and SAR (SHZ) climate zone, respectively.


Total:	0
HTML:	0
PDF:	0
XML:	0