the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Spatiotemporal changes of drought area as input for a machine-learning approach for crop yield prediction
Ahmed A. A. Osman
Gerald A. Corzo Perez
Henny A. J. Lanen
Shreedhar Maskey
Dimitri Solomatine
Abstract. Climate change has increased the possibility of more severe and prolonged droughts worldwide, which requires innovative methods to predict their impacts on different sectors such as agriculture. Crop growth models calculate yield and variables related to plant development and are used for crop yield estimation, a useful variable for monitoring drought impacts. Although used for prediction, these crop models are not explicit forecasting models; they are limited to the physical assumptions reflected in their conceptual model. In addition, the input data availability, the spatial and temporal aggregation, and different sources of uncertainty make the crop yield prediction challenging. Given these limitations, machine learning (ML) models are often utilised following a multivariable forecasting approach, but their use with the spatial characteristics of droughts as input data is limited. This research explored the spatial extent of drought as input data for building an approach for predicting seasonal crop yield. This ML approach is made up of two components. The first includes polynomial regression (PR) models, and the second considers artificial neural network (ANN) models. This approach aimed to evaluate both types of ML models (PR and ANN) and integrate them into one operational tool. The logic is as follows: ANN models determine the most accurate predictions, but in practice, issues regarding data retrieval and processing can make the use of equations, i.e. PR, preferable. The proposed approach provides these PR equations with early and preliminary input to perform such calculations. The estimates can be further improved when the ANN models are run with the final input data. The results indicated that the empirical equations (PR) produced good predictions when using drought area as the input. ANN provides better estimates, in general. Research results show that the spatiotemporal changes of drought area and its temporal aggregation provide an important pre-processing alternative to implement ML models for drought impact prediction.
Vitali Diaz et al.
Status: open (until 28 Mar 2023)
-
RC1: 'Comment on hess-2022-252', Anonymous Referee #1, 08 Jan 2023
reply
General comments
The manuscript (MS) “Spatiotemporal changes of drought area as input for a machine-learning approach for crop yield prediction” written by Diaz et al., which argued the limitation of dynamic crop model in predicting crop yield and thus introduced machine learning (ML) method for yield forecasting in three main rice growing regions in India (1967-2015). Two ML approaches: polynomial regression (PR) and artificial neural network (ANN) were employed to investigate in separated or combined modes using drought area as single input for grain yield prediction. Since ML comes to practices and being helpful tools and different applications in our life nowadays, especially in agriculture such as yield predication, remote sensing, this study and MS could provide meaningful approaches for yield forecasting as complementary knowledge for other existing approaches, especially in India.
The figure and visual features are informative and easy to follow. English grammar was well-written. The data 1967-2015 was also a strong point for this MS. However, given some major issues which are listed here (i) the objectives of the work and MS were not well determined and clearly stated (ii) structure of MS was not in well-designed and formulated with concrete objectives (iii) a lot of repetition and redundant information among sections, figures and tables were not followed with the main text (iv) lack of more detailed discussion of how other work/other approaches (crop models + ML) has been done elsewhere (in the introduction and discussion) (v) critical issue via using drought area as input for model without clarification of other factor or drought intensity. With these, it could not be accepted as the current MS state. Please see many comments and suggestions in detail below.
Abstract
Line 20-28: it is a bit too long for approach description while it is lack of concrete (overall) statistical number for the results
Line 26: explicitly mentioned to PR, only two approaches here
Line 33: space after “implement”
Introduction
There is redundant information in the first paragraph (line 38-51) that needs to be rewritten.
The MS emphasized the limitation of crop modeling which has been well established in long time in crop yield simulation, yield prediction, and climate change impact assessment as well as understandings crop responses to different abiotic or biotic stresses. Both crop models and ML have uncertainties with regards of spatial-temporal input data when bring into larger scales and long-term application. The comparison between ML and crop model should be further elaborated in the text to convince the reader towards ML? (line 52-59).
Similarly, the MS focused on spatial extent of drought, and it convinced it as an issue that ML model could cover but there is no detail literature and reference that have been done for that in the MS (line 68). Why it is important?
Line 78: what are the specific objectives, about spatial extent impact on grain yield prediction in ML or determine which the best approach of ML are or temporal aggregation effects? Please clearly state
Line 89-123: paragraph “Crop yield prediction in India” came to this. This section should be rewritten or merged with above section to make the introduction more streamline with clear issues and associated objectives. The mentioned information in this section was repeated in section 2
Line 99-109: writing need to be improved
Line 119: which are “other solutions”?
Is there any study using the drought area for yield prediction before?Materials and Methods
Section 2 and 3 need to be reconstructed for more concise and easy reading. It is better to merge in one: like “Materials and methods” with further subheadings.
Line 131-135 is repetition with lines 99-102
Line 131: accessed when? Also the DAC is not similar to the name in line 95
Line 143, separately for each state?
Line 145: it is not clear, it is the spatial aggregation of two states with the average yield?
Figure 1: Why are the color of left and right figures are so different? Same color scale? What is spatial resolution of grid at legend?
Line 156: there is no reference on the reference list
Line 160: access when?
Line 162-163: this information is really important for the whole MS that do not need to repeat explanations. Please state clearly the aggregation: how to get DI and DA? DA1 is aggregated of what from when to when? And soon DA3, 6, 9, 12 because it is confusing with 12 months or 24 months (line 245, 246).Â
Line 185-203 and section 2.2 was rather replicated.
It is really important to explain further how to estimate such SPEI, in term of equation, variables and since this is only input for the model. The MS mentioned many times the limitation of different drought types, by explanations further this SPEI could determine or clearly show drought? Which ET approach was used and climatic variables? Information of irrigation (if it is available) should be mentioned and described for all years.
Using a single input variable like DA might not be concrete enough for yield prediction and the soundness of approach is rather weak, how about other climatic factors like temperature? How is uncertainties of SPEI at global scales?
Figure 2 should be right away after line 203
Line 207 how about pest and diseases, heat stress, ozone?
Line 229-237 was repeated somewhere else before, for instance line 160-163 or 199-203
Section 32. it was too long and need to be sharpened due to a lot of repeated information
Line 280: Table 2 should be mentioned right away. Line 280 to 289 should be in the result and discussion section, i.e. line 457
Section 3.3 need to be restructured following subsequence equations
Section 3.4 also too long and overlapped with the introduction. Did the work choose the FFNN?
Line 346? Is that a common threshold for different objects? Any justification to use this threshold for single input variable model?
Line 350: is that “period” or whole dataset?
Section 3.5: mentioning various approaches but which one do you choose and what are criteria that has been used?
Results and Discussion
It was too lengthy and repeated information. Substantial improvement in writing is required to make the MS well-structured following the objectives with good discussion and reflections with previous studies
Line 362-366: legend does the job.
Line 368: “theree” -> “three”
Line 394: the decrease and maximum of what?
Line 394: where is Figure 4? It should be shown directly.
Any explanations of the de-trended yield from 2003-2015 of region 1 was much fluctuated as compared to region 2 and 3 in the same period?
Line 403: why is so much different in three regions although only yield from Kharif was presented? Any studies before?
Line 407: what is SPEI6?
Line 411-416 about figure 5: peak of what and in which figure? 5a 5b or 5c, please more precise
Figure 5: each point on 5 a, b, and c from how many n sample? Line 440: “rein” -> “rain” Line 441: data for “2014 or for which years? Or average of which years? This is very important information together with SPEI and DA that should be used to interpret the input data and yield prediction results.
Figure 447 (figure 6): “, respectively” Is that correlation coefficiency with significant level of 95%
Line 466-470 is redundant since it has mentioned in the material and method.
Section 4.3 too much information was shown in same time, fig. 7, 8, 9 as once but less discussion and comparison with other literature for this section. Is there any study elsewhere has been done?
Is there any explanation why both models are less accurate from around 2000-2015 as compared to 1967-2000 for instance for region 1 and region 3? Authors mentioned about the “spatial extent” which was considered in the models. But, this was not well discussed.
Section 4.4. Table 4, 5, 6 could be moved to Supplementary material if this is possible since these has not been discussed much or not informative. Line 539, 547, and 556: “moth” -> “month”
Section 4.5 The limitation was listed but has not been shown through the discussion of results and how they affected to the model performance? Or they has not been clearly discussed and compared with other studies?
Point 6 (line 580-581) it is not clear. In fact, India could provide 3 sets of yield data per year (three growing seasons). Three sets of yield could correspond to at least three periods of temporal aggregation. Why did the work not take three sets of yield data then having more grain yield data with montly DA?
Section 4.6: Repetition of introduction and too general without literature comparison and discussion.
Line 596-598: is similar to point 2 Section 4.5
Section 4.7 a lot information was mentioned and repeated with the previous section line 4.5 and 4.6
Vitali Diaz et al.
Vitali Diaz et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
389 | 113 | 20 | 522 | 3 | 13 |
- HTML: 389
- PDF: 113
- XML: 20
- Total: 522
- BibTeX: 3
- EndNote: 13
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1