Analysis of Flash Drought in China using Artificial Intelligence models
- 1State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering, Hohai University, Nanjing 210098, China
- 2College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China
- 3Hydrology and Quantitative Water Management Group, Wageningen University, Wageningen 6708PB, The Netherlands
- 4College of Hydrology and Water Resources, Nanjing University of Information Science & Technology, Nanjing 210044, China
- 5Institute of Water Resources for Pastoral Area, Ministry of Water Resources, Inner Mongolia 010020, China
- 1State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering, Hohai University, Nanjing 210098, China
- 2College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China
- 3Hydrology and Quantitative Water Management Group, Wageningen University, Wageningen 6708PB, The Netherlands
- 4College of Hydrology and Water Resources, Nanjing University of Information Science & Technology, Nanjing 210044, China
- 5Institute of Water Resources for Pastoral Area, Ministry of Water Resources, Inner Mongolia 010020, China
Abstract. The term “Flash drought” describes a type of drought with rapid onset and strong intensity, which is co-affected by both water-limited and energy-limited conditions. It has aroused widespread attention in related research communities due to its devastating impacts on agricultural production and natural system. Based on a global reanalysis dataset, we identify flash droughts across China during 1979~2016 by focusing on the depletion rate of weekly soil moisture percentile. The relationship between the rate of intensification (RI) and nine related climate variables is constructed using three artificial intelligence (AI) technologies, namely, multiple linear regression (MLR), long short-term memory (LSTM), and random forest (RF) models. On this basis, the capabilities of these algorithms for estimating RI and droughts (flash droughts and traditional slowly-evolving droughts) detection were analyzed. Results showed that the RF model achieved the highest skill in terms of RI estimation and flash droughts identification among the three approaches. Spatially, the RF-based RI performed best in the southeastern China, with an average CC of 0.90 and average RMSE of 2.6th percentile per week, while the poor performances were found in Xinjiang region. For drought detection, all three AI technologies presented a better performance in monitoring flash droughts than in conventional slowly-evolving droughts. Particularly, the probability of detection (POD), false alarm ratio (FAR), and critical success index (CSI) of flash drought derived from RF were 0.93, 0.15, and 0.80, respectively, indicating that RF technology is preferable to estimate the RI and monitoring flash droughts by considering multiple meteorological variable anomalies in adjacent weeks of drought onset. In terms of the meteorological driving mechanism of flash drought, the negative precipitation (P) anomalies and positive potential evapotranspiration (PET) anomalies exhibited a stronger synergistic effect on flash droughts comparing to slowly-developing droughts, along with asymmetrical compound influences in different regions over China. For the Xinjiang region, P deficit played a dominant role in triggering the onset of flash droughts, while in the southwestern China, the lack of precipitation and enhanced evaporative demand almost contributed equally to the occurrence of flash drought. This study is valuable to enhance the understanding of flash drought and highlight the potential of AI technologies in flash droughts monitoring.
Linqi Zhang et al.
Status: closed
-
RC1: 'Comment on hess-2021-541', Anonymous Referee #1, 04 Dec 2021
This paper studies the predictability of flash drought over China using machine learning methods. The starting point is ERA5 soil moisture over China for the period 1979-2021. They use a definition of flash drought based on changes in soil moisture percentiles (SMP) which they term the rate of intensification (RI) during periods when SMP is decreasing. They define flash droughts as occurring when SMP crosses the 40th percentile and is decreasing at a rate of at least 6.5 percent per week (time step is weekly). There is some confusion in Figure 1 and text surrounding it as to whether crossing of the 20th percentile of SMP is also required (the figure implies this, but text does not). There also is a criterion for a termination time Tn “when the rapid decline of soil moisture ceases”, but this is not shown in Figure 1 nor are specifics in the text.
My main problem with this paper is philosophical. Why are you using machine learning at all? It reflects no physical process understanding water – you just throw a bunch of variables that you think could possibly have something to do with RI and turn the crank. Rather obviously, flash droughts are going to occur during dry periods (during precipitating periods, presumably soil moisture increases rather than decreases). So given that it’s dry, it must have to do with evaporative demand, and the soil moisture you start with. We do understand those processes (albeit imperfectly), so surely you could use a physically based model to predict the RI. Now, if you did that first, and then applied ML and could somehow (not clear at all to me how) use the ML predictions to diagnose the physically based ones so as to improve them, I would be interested. But I don’t really see where the hydrologic content is in this paper.
My other complaint is that key information needed to understand the results is either buried in text or missing altogether. For instance, were flash drought periods extracted from the entire period of record, without regard for season? Ordinarily, one would expect such events to occur primarily in summer, when evaporative demand is the highest. But RI is determined in terms of soil moisture percentage changes, which complicates the picture considerably. In winter, for instance, evaporative demand will be reduced, but the range of soil moisture percentages likely is also reduced, so it could be that the statistics of RIs are being dominated by events that in a practical sense aren’t really droughts at all. I don’t know if this is true but constraining the analysis to a window in the summer (if this hasn’t already been done – I searched the document and didn’t find any indication that it was) would make the most sense.
My suggestion is, the paper needs to go around the track again, and the authors need to include a physically based alternative. If ML provides better predictions, they need a very good explanation for why, and some diagnosis of why the physically based predictions are failing. Ate this point, I don’t see that this paper is really about hydrology.
- AC1: 'Reply on RC1', Linqi Zhang, 01 Jan 2022
-
RC2: 'Comment on hess-2021-541', Anonymous Referee #2, 11 Dec 2021
Overall, I consider this to be a worthwhile contribution to the rapidly expanding flash drought literature. The authors provide a new definition that can be compared to other proposed definitions and they examine association with a range of potential drought predictors. My two major comments are on the framing and the comparison between flash droughts and "slow droughts."
Major comments:
1. The methods applied in the study are, formally, supervised statistical learning algorithms. While one can debate what "AI" means, I think it's fair to assume that very few people think of linear regression, or even nonparametric statistical approaches like Random Forest, as AI. LTSM does sometimes get put in the AI basket, but it's no longer really a leading edge, advanced AI application. All that to say, I was surprised by the content of the manuscript after reading the title, and I suspect others may be as well. The paper simply does not provide an AI-oriented methodological advance, nor does it present results that are interesting because of novel application of relatively new methods. For this reason I recommend retitling and reframing the paper to focus on the flash drought findings, and removing the prominent use of the term AI in title, abstract, and throughout the paper. There are many published studies in many fields that compare performance of parametric and nonparametric methods for various applications, sometimes including NN as well, and at this point I really think that the difference in performance between those methods is best presented as a comparison of statistical methods that is useful but not particularly innovative. Instead, I recommend that the authors focus on their actual flash drought results in the framing of the paper, as those results are quite interesting for the flash drought community.
2. I appreciate the section of the manuscript that compares the predictability of flash drought to conventional drought. But in making this distinction the authors implicitly assume that flash and slow droughts, as distinguished using the RI threshold employed in this paper, are meaningful and relatively homogeneous types of drought with respect to the predictor variables. Are the flash droughts and slow droughts in the inventory relatively homogeneous and separable with respect to these predictors, when evaluated using standard clustering or homogeneity tests? And is there evidence of the greater spread in meteorological predictors for slow drought relative to flash drought, as the authors suggest when explaining poorer performance in predicting slow droughts as a function of meteorology?
Other comments:
1. I have no issue with the authors using their own, new definition to define flash drought events in their inventory, but it would be useful to, at a minimum, see a discussion of how the choice of definition is expected to influence results. Ideally, a comparison of inventories generated using one or two other definitions would be included.
2. The authors use a combination of ERA5 and meteorological station data. Can they show or cite a study that shows how consistent ERA5 is with meteorological station data in China?
- AC2: 'Reply on RC2', Linqi Zhang, 01 Jan 2022
Status: closed
-
RC1: 'Comment on hess-2021-541', Anonymous Referee #1, 04 Dec 2021
This paper studies the predictability of flash drought over China using machine learning methods. The starting point is ERA5 soil moisture over China for the period 1979-2021. They use a definition of flash drought based on changes in soil moisture percentiles (SMP) which they term the rate of intensification (RI) during periods when SMP is decreasing. They define flash droughts as occurring when SMP crosses the 40th percentile and is decreasing at a rate of at least 6.5 percent per week (time step is weekly). There is some confusion in Figure 1 and text surrounding it as to whether crossing of the 20th percentile of SMP is also required (the figure implies this, but text does not). There also is a criterion for a termination time Tn “when the rapid decline of soil moisture ceases”, but this is not shown in Figure 1 nor are specifics in the text.
My main problem with this paper is philosophical. Why are you using machine learning at all? It reflects no physical process understanding water – you just throw a bunch of variables that you think could possibly have something to do with RI and turn the crank. Rather obviously, flash droughts are going to occur during dry periods (during precipitating periods, presumably soil moisture increases rather than decreases). So given that it’s dry, it must have to do with evaporative demand, and the soil moisture you start with. We do understand those processes (albeit imperfectly), so surely you could use a physically based model to predict the RI. Now, if you did that first, and then applied ML and could somehow (not clear at all to me how) use the ML predictions to diagnose the physically based ones so as to improve them, I would be interested. But I don’t really see where the hydrologic content is in this paper.
My other complaint is that key information needed to understand the results is either buried in text or missing altogether. For instance, were flash drought periods extracted from the entire period of record, without regard for season? Ordinarily, one would expect such events to occur primarily in summer, when evaporative demand is the highest. But RI is determined in terms of soil moisture percentage changes, which complicates the picture considerably. In winter, for instance, evaporative demand will be reduced, but the range of soil moisture percentages likely is also reduced, so it could be that the statistics of RIs are being dominated by events that in a practical sense aren’t really droughts at all. I don’t know if this is true but constraining the analysis to a window in the summer (if this hasn’t already been done – I searched the document and didn’t find any indication that it was) would make the most sense.
My suggestion is, the paper needs to go around the track again, and the authors need to include a physically based alternative. If ML provides better predictions, they need a very good explanation for why, and some diagnosis of why the physically based predictions are failing. Ate this point, I don’t see that this paper is really about hydrology.
- AC1: 'Reply on RC1', Linqi Zhang, 01 Jan 2022
-
RC2: 'Comment on hess-2021-541', Anonymous Referee #2, 11 Dec 2021
Overall, I consider this to be a worthwhile contribution to the rapidly expanding flash drought literature. The authors provide a new definition that can be compared to other proposed definitions and they examine association with a range of potential drought predictors. My two major comments are on the framing and the comparison between flash droughts and "slow droughts."
Major comments:
1. The methods applied in the study are, formally, supervised statistical learning algorithms. While one can debate what "AI" means, I think it's fair to assume that very few people think of linear regression, or even nonparametric statistical approaches like Random Forest, as AI. LTSM does sometimes get put in the AI basket, but it's no longer really a leading edge, advanced AI application. All that to say, I was surprised by the content of the manuscript after reading the title, and I suspect others may be as well. The paper simply does not provide an AI-oriented methodological advance, nor does it present results that are interesting because of novel application of relatively new methods. For this reason I recommend retitling and reframing the paper to focus on the flash drought findings, and removing the prominent use of the term AI in title, abstract, and throughout the paper. There are many published studies in many fields that compare performance of parametric and nonparametric methods for various applications, sometimes including NN as well, and at this point I really think that the difference in performance between those methods is best presented as a comparison of statistical methods that is useful but not particularly innovative. Instead, I recommend that the authors focus on their actual flash drought results in the framing of the paper, as those results are quite interesting for the flash drought community.
2. I appreciate the section of the manuscript that compares the predictability of flash drought to conventional drought. But in making this distinction the authors implicitly assume that flash and slow droughts, as distinguished using the RI threshold employed in this paper, are meaningful and relatively homogeneous types of drought with respect to the predictor variables. Are the flash droughts and slow droughts in the inventory relatively homogeneous and separable with respect to these predictors, when evaluated using standard clustering or homogeneity tests? And is there evidence of the greater spread in meteorological predictors for slow drought relative to flash drought, as the authors suggest when explaining poorer performance in predicting slow droughts as a function of meteorology?
Other comments:
1. I have no issue with the authors using their own, new definition to define flash drought events in their inventory, but it would be useful to, at a minimum, see a discussion of how the choice of definition is expected to influence results. Ideally, a comparison of inventories generated using one or two other definitions would be included.
2. The authors use a combination of ERA5 and meteorological station data. Can they show or cite a study that shows how consistent ERA5 is with meteorological station data in China?
- AC2: 'Reply on RC2', Linqi Zhang, 01 Jan 2022
Linqi Zhang et al.
Linqi Zhang et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
745 | 203 | 13 | 961 | 5 | 10 |
- HTML: 745
- PDF: 203
- XML: 13
- Total: 961
- BibTeX: 5
- EndNote: 10
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1