Analysis of flash droughts in China using machine learning

Zhang, Linqi; Liu, Yi; Ren, Liliang; Teuling, Adriaan J.; Zhu, Ye; Wei, Linyong; Zhang, Linyan; Jiang, Shanhu; Yang, Xiaoli; Fang, Xiuqin; Yin, Hang

doi:https://doi.org/10.5194/hess-26-3241-2022

Articles | Volume 26, issue 12

https://doi.org/10.5194/hess-26-3241-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/hess-26-3241-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 26, issue 12

Research article

|

24 Jun 2022

Research article |

| 24 Jun 2022

Analysis of flash droughts in China using machine learning

Linqi Zhang, Yi Liu, Liliang Ren, Adriaan J. Teuling, Ye Zhu, Linyong Wei, Linyan Zhang, Shanhu Jiang, Xiaoli Yang, Xiuqin Fang, and Hang Yin

Download

Final revised paper (published on 24 Jun 2022)
Preprint (discussion started on 03 Nov 2021)

Interactive discussion

Status: closed

RC1:
'Comment on hess-2021-541', Anonymous Referee #1, 04 Dec 2021

This paper studies the predictability of flash drought over China using machine learning methods. The starting point is ERA5 soil moisture over China for the period 1979-2021. They use a definition of flash drought based on changes in soil moisture percentiles (SMP) which they term the rate of intensification (RI) during periods when SMP is decreasing. They define flash droughts as occurring when SMP crosses the 40^th percentile and is decreasing at a rate of at least 6.5 percent per week (time step is weekly). There is some confusion in Figure 1 and text surrounding it as to whether crossing of the 20th percentile of SMP is also required (the figure implies this, but text does not). There also is a criterion for a termination time Tn “when the rapid decline of soil moisture ceases”, but this is not shown in Figure 1 nor are specifics in the text.

My main problem with this paper is philosophical. Why are you using machine learning at all? It reflects no physical process understanding water – you just throw a bunch of variables that you think could possibly have something to do with RI and turn the crank. Rather obviously, flash droughts are going to occur during dry periods (during precipitating periods, presumably soil moisture increases rather than decreases). So given that it’s dry, it must have to do with evaporative demand, and the soil moisture you start with. We do understand those processes (albeit imperfectly), so surely you could use a physically based model to predict the RI. Now, if you did that first, and then applied ML and could somehow (not clear at all to me how) use the ML predictions to diagnose the physically based ones so as to improve them, I would be interested. But I don’t really see where the hydrologic content is in this paper.

My other complaint is that key information needed to understand the results is either buried in text or missing altogether. For instance, were flash drought periods extracted from the entire period of record, without regard for season? Ordinarily, one would expect such events to occur primarily in summer, when evaporative demand is the highest. But RI is determined in terms of soil moisture percentage changes, which complicates the picture considerably. In winter, for instance, evaporative demand will be reduced, but the range of soil moisture percentages likely is also reduced, so it could be that the statistics of RIs are being dominated by events that in a practical sense aren’t really droughts at all. I don’t know if this is true but constraining the analysis to a window in the summer (if this hasn’t already been done – I searched the document and didn’t find any indication that it was) would make the most sense.

My suggestion is, the paper needs to go around the track again, and the authors need to include a physically based alternative. If ML provides better predictions, they need a very good explanation for why, and some diagnosis of why the physically based predictions are failing. Ate this point, I don’t see that this paper is really about hydrology.

Citation: https://doi.org/10.5194/hess-2021-541-RC1
- AC1: 'Reply on RC1', Linqi Zhang, 01 Jan 2022
  
  The comment was uploaded in the form of a supplement: https://www.hydrol-earth-syst-sci-discuss.net/hess-2021-541/hess-2021-541-AC1- supplement.pdf
  
  Citation: https://doi.org/10.5194/hess-2021-541-AC1
RC2:
'Comment on hess-2021-541', Anonymous Referee #2, 11 Dec 2021

Overall, I consider this to be a worthwhile contribution to the rapidly expanding flash drought literature. The authors provide a new definition that can be compared to other proposed definitions and they examine association with a range of potential drought predictors. My two major comments are on the framing and the comparison between flash droughts and "slow droughts."

Major comments:

1. The methods applied in the study are, formally, supervised statistical learning algorithms. While one can debate what "AI" means, I think it's fair to assume that very few people think of linear regression, or even nonparametric statistical approaches like Random Forest, as AI. LTSM does sometimes get put in the AI basket, but it's no longer really a leading edge, advanced AI application. All that to say, I was surprised by the content of the manuscript after reading the title, and I suspect others may be as well. The paper simply does not provide an AI-oriented methodological advance, nor does it present results that are interesting because of novel application of relatively new methods. For this reason I recommend retitling and reframing the paper to focus on the flash drought findings, and removing the prominent use of the term AI in title, abstract, and throughout the paper. There are many published studies in many fields that compare performance of parametric and nonparametric methods for various applications, sometimes including NN as well, and at this point I really think that the difference in performance between those methods is best presented as a comparison of statistical methods that is useful but not particularly innovative. Instead, I recommend that the authors focus on their actual flash drought results in the framing of the paper, as those results are quite interesting for the flash drought community.

2. I appreciate the section of the manuscript that compares the predictability of flash drought to conventional drought. But in making this distinction the authors implicitly assume that flash and slow droughts, as distinguished using the RI threshold employed in this paper, are meaningful and relatively homogeneous types of drought with respect to the predictor variables. Are the flash droughts and slow droughts in the inventory relatively homogeneous and separable with respect to these predictors, when evaluated using standard clustering or homogeneity tests? And is there evidence of the greater spread in meteorological predictors for slow drought relative to flash drought, as the authors suggest when explaining poorer performance in predicting slow droughts as a function of meteorology?

Other comments:

1. I have no issue with the authors using their own, new definition to define flash drought events in their inventory, but it would be useful to, at a minimum, see a discussion of how the choice of definition is expected to influence results. Ideally, a comparison of inventories generated using one or two other definitions would be included.

2. The authors use a combination of ERA5 and meteorological station data. Can they show or cite a study that shows how consistent ERA5 is with meteorological station data in China?

Citation: https://doi.org/10.5194/hess-2021-541-RC2
- AC2: 'Reply on RC2', Linqi Zhang, 01 Jan 2022
  
  The comment was uploaded in the form of a supplement: https://www.hydrol-earth-syst-sci-discuss.net/hess-2021-541/hess-2021-541-AC2- supplement.pdf
  
  Citation: https://doi.org/10.5194/hess-2021-541-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (18 Jan 2022) by Rohini Kumar

AR by Linqi Zhang on behalf of the Authors (23 Feb 2022) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (25 Feb 2022) by Rohini Kumar

RR by Anonymous Referee #2 (31 Mar 2022)

RR by Anonymous Referee #3 (02 May 2022)

Suggestions for revision or reasons for rejection

Zhang et al. (2022); HESSD

Zhang et al. (2022) quantified the relationship between the rate of intensification (RI) of flash drought and nine climate variables using three machine learning methods across China. This manuscript is written clearly, and it is an interesting study, particularly by linking different climate variables to the rate of intensification of drought. The results show that the random forest is preferable for estimating the flash drought rate of intensification and monitoring flash droughts in adjacent weeks of drought onset. This is my first time reviewing this manuscript. As I read the earlier discussions with Reviewers and the Author’s replies, the manuscript has improved significantly since the original submission (e.g., analysis of the spatial distribution of frequency of occurrence of flash droughts in different seasons over China). The manuscript is written clearly. I have just a few extra comments, which authors should clarify:
The reanalysis ERA5-Interim soil moisture product is independent of the meteorological data used in the study. Did you consider checking the differences in the observed-based meteorological forcing data and the ERA5-Interim in-built meteorological forcing data? Additionally, please clarify why you are using ERA-Interim and why you did not use the ERA5; on the website of your data link, it is written: “ERA Interim is being phased out. Users are strongly advised to migrate to ERA5.”

I missed some discussion; if you tried to link the results of your study with the impacts and mention in the discussion/outlook, how the results of your study can be linked with impacts on agricultural production. Is the flash drought more impactful than the slowly evolving drought?

Last but not least: Code and data availability statement is missing in your manuscript. Please make sure that for reproducibility, you make your analysis available to general public.

------------
Line 61: typo: researches => researchers
Line 188: blow => below
Line 188: decreases => decrease
Figure 1: correct typo in figure legend: “anmoly” => ”anomaly”
Line 195: represent => represents
Line 277: not => do not
Line 283: captured => are captured
Line 294: were serves => served
Line 347: remove “model”
Line 362: were => was

Hide

ED: Publish subject to minor revisions (review by editor) (02 May 2022) by Rohini Kumar

AR by Linqi Zhang on behalf of the Authors (12 May 2022) Author's response Author's tracked changes Manuscript

ED: Publish as is (13 May 2022) by Rohini Kumar

AR by Linqi Zhang on behalf of the Authors (21 May 2022)

Short summary

In this study, three machine learning methods displayed a good detection capacity of flash droughts. The RF model was recommended to estimate the depletion rate of soil moisture and simulate flash drought by considering the multiple meteorological variable anomalies in the adjacent time to drought onset. The anomalies of precipitation and potential evapotranspiration exhibited a stronger synergistic but asymmetrical effect on flash droughts compared to slowly developing droughts.