Analysis of Flash Drought in China using Artificial Intelligence models

The term “Flash drought” describes a type of drought with rapid onset and strong intensity, which is co-affected by both water-limited and energy-limited conditions. It has aroused widespread attention in related research communities due to its devastating impacts on agricultural production and natural system. Based on a global reanalysis dataset, we 15 identify flash droughts across China during 1979~2016 by focusing on the depletion rate of weekly soil moisture percentile. The relationship between the rate of intensification (RI) and nine related climate variables is constructed using three artificial intelligence (AI) technologies, namely, multiple linear regression (MLR), long short-term memory (LSTM), and random forest (RF) models. On this basis, the capabilities of these algorithms for estimating RI and droughts (flash droughts and traditional slowly-evolving droughts) detection were analyzed. Results showed that the RF model achieved the highest skill 20 in terms of RI estimation and flash droughts identification among the three approaches. Spatially, the RF-based RI performed best in the southeastern China, with an average CC of 0.90 and average RMSE of 2.6th percentile per week, while the poor performances were found in Xinjiang region. For drought detection, all three AI technologies presented a better performance in monitoring flash droughts than in conventional slowly-evolving droughts. Particularly, the probability of detection (POD), false alarm ratio (FAR), and critical success index (CSI) of flash drought derived from RF were 0.93, 0.15, 25 and 0.80, respectively, indicating that RF technology is preferable to estimate the RI and monitoring flash droughts by considering multiple meteorological variable anomalies in adjacent weeks of drought onset. In terms of the meteorological driving mechanism of flash drought, the negative precipitation (P) anomalies and positive potential evapotranspiration (PET) anomalies exhibited a stronger synergistic effect on flash droughts comparing to slowly-developing droughts, along with asymmetrical compound influences in different regions over China. For the Xinjiang region, P deficit played a dominant role 30 in triggering the onset of flash droughts, while in the southwestern China, the lack of precipitation and enhanced evaporative demand almost contributed equally to the occurrence of flash drought. This study is valuable to enhance the understanding of flash drought and highlight the potential of AI technologies in flash droughts monitoring. https://doi.org/10.5194/hess-2021-541 Preprint. Discussion started: 3 November 2021 c © Author(s) 2021. CC BY 4.0 License.


Introduction
Drought is generally regarded as a slowly-evolving climate phenomenon, which may persist for several months or even years 35 (Allen et al., 2010;Mishra and Singh, 2010). Several recent studies suggested that drought can also develop in a more intense and quicker manner under extreme atmospheric anomalies (Ford and Labosier, 2017;Hunt et al., 2014;Otkin et al., 2013). For instance, large precipitation deficits or increase in evaporative demand derive from unusual climate conditions (e.g., enhanced air temperatures, strong wind, or low humidity). This type of drought is usually termed as "flash drought", which has been used to describe an additional type of drought with the characteristic of rapid onset and high intensification 40 (Senay et al., 2008;Svoboda et al., 2002). Comparing to the conventional droughts, flash droughts may lead to severer heat in the process of land-atmosphere feedbacks (AghaKouchak et al., 2015;Ford et al., 2015;Hunt et al., 2009;Yuan et al., 2017).
The fifth assessment report (AR5) of the Intergovernmental Panel on Climate Change (IPCC) provided a comprehensive 70 assessment for recent and future changes in various types of droughts, and suggested that they should be considered separately (IPCC, 2013). Climate change has risen the temperature of land surface, which has led droughts to occur in a manner of higher frequency and greater intensity (Trenberth et al., 2014). Moreover, in the context of global warming, high temperature and heat wave occur more frequently due to land-atmosphere interaction, providing a favourable environment for the rapid intensification of drought (Teuling et al., 2018;Wang et al., 2016). From the perspective of physical 75 mechanisms, the evolution of flash drought involves complicated processes. Though a lack of precipitation for a certain period is a necessary requirement for droughts to develop, precipitation deficit alone is not likely to induce flash droughts (Otkin et al., 2018). Rather, the joint efforts of multiple meteorological variables, e.g., a lack of precipitation, enhanced evaporative demand caused by unusual high temperature, low humidity, strong wind, and sunshine duration, are possibly to induce a rapid intensification in soil moisture (Hobbins et al., 2016). In other words, the occurrence of flash droughts is 80 related to a variety of climate variables associated with water-limited and energy-limited conditions (Pendergrass et al., 2020).
In the context of global climate change, China has also experienced flash droughts frequently in recent years (Feng et al., 2014;Sun and Yang, 2012;Wang et al., 2011;Yuan et al., 2015). For example, the 2013 summer drought influenced 13 85 provinces in the southern China and caused a great loss for Guizhou and Hunan province with the damage of over 2 million hectares of crops. To improve the understanding of short-term droughts across China, Wang et al. (2016) applied temperature, evapotranspiration, and soil moisture anomalies to examine the variabilities of flash droughts and reveal their increasing trends mainly related to long-term warming. Liu et al. (2020b) investigated the temporal and spatial distribution of flash droughts over China from 1979 to 2018 and analyzed the coexisting relationship between flash droughts and 90 seasonal droughts. It is necessary to further enrich the knowledge of flash droughts and their mechanisms for the sake of better guiding the development of early warning systems on droughts. There has been limited studies to date in regards to monitor and simulate flash droughts from a climatic perspective, especially for China with its strong climate gradients and complicated spatial heterogeneity.

95
Artificial intelligence (AI) technologies, as the well-known data-driven methods, provide an opportunity to describe and predict complicated physical processes based on a combination of abundant data and advanced model architectures (Kadow et al., 2020;Pan et al., 2019;Pradhan et al., 2020). In recent years, AI models had achieved considerable progresses in the hydrological process (Bennett et al., 2021;Kim et al., 2021), climate change (Li et al., 2020;Mokhtar et al., 2021), earth system research (Cui et al., 2016;Zhang et al., 2021) and their sub-fields owing to their efficient computation and self-100  , 0-7, 7-28, 28-100, 100-289 cm). Meanwhile, ECMWF could provide SM at different spatial resolutions based on its platform for optional interpolation calculation. In this study, the daily SM 135 data of the top layer (0-7 cm) at a spatial resolution of 0.25°during 1979-2016 were collected and they were generated into weekly values for intercomparison.

Meteorological forcing
Daily point-scale meteorological observations, including precipitation (P), average air temperature (Tmean), maximum air temperature (Tmax), minimum air temperature (Tmin), air pressure (PRS), relative humidity (RHU), wind speed 140 (WIN), sunshine duration (SSD), from 756 national stations were employed. All these data have complete records from 1979 and 2016 and can be acquired from the China Meteorological Administration website (CMA, http://data.cma.cn/). The potential evapotranspiration (PET) was calculated using the physically-based Penman equation (Penman, 1948) with a variety of meteorological variables such as air temperature, RHU, and WIN involved.
These point-based data were interpolated into gridded data at a spatial resolution of 0.25° by the method of inverse 145 distance weighted (IDW).

Flash drought identification
There is no consistent definition of flash drought. In this study, we adopt a quantitative method to identify flash droughts by focusing on the rate of intensification (RI) during their onset-development phase (Liu et al., 2020a). Fig. 1 150 depicts the unusually rapid development process of a flash drought characterized by the significant depletion of soil moisture percentile and the anomalies of precipitation, temperature, and potential evapotranspiration in the adjacent weeks of drought onset. The upper limit (see the yellow line in Fig. 1a) represents the threshold of the 40th percentile that the soil is suffering abnormally dry conditions, while the lower limit (see the red line in Fig. 1a) denotes the 20th percentile when moisture deficits have the potential to cause severe impacts on the environment. As shown in Fig. 1, phase, this leads to a sharp reduction for the soil moisture percentile from above 40th to 5th percentile within 3 weeks.
Supposing T0 is the onset time when drought occurs, and Tn denotes the termination time for the onset-development stage when the rapid decline of soil moisture ceases but turn to smooth fluctuations or even an increased tendency instead. Tn can be determined through a polynomial function and located when the first derivative of the constructed 160 polynomial equals zero in calculus (Liu et al. 2020a). With the onset time and termination time, and the intensification rate of a drought event can be calculated as: Where 1 is the onset time, denotes the termination time, ( ) is the soil moisture percentile at time in the 165 rapid intensification process of drought.
In this method, a flash drought event is recognized when RI exceeded a predetermined threshold. We followed the suggestion of Liu et al. (2020a) by using a criterion of -6.5th percentile per week to identify flash drought events. This value is comparable to the criterion suggested by Ford and Labosier (2017), who defined a flash drought event as soil moisture percentile decreases from above the 40th percentile to blow the 20th percentile within 20 days. In this study, 170 we used the absolute value of RI to indicate the depletion rate of soil moisture percentile for expression convenience, i.e., a flash drought event was recognized when RI exceeded 6.5th percentile per week. Besides, the nonlinear relationship between RI and nine meteorological variables in the adjacent weeks (T0-7 ~ T0+7) was constructed based on the RF models.
where represents the anomaly value for meteorological variable in the drought event ; 0 and are intercept and corresponding regression coefficients, respectively; is the number of drought events at a given grid cell; is the number of input variables in the adjacent weeks of drought onset time; represents the estimated RI for a drought 195 event at a given grid cell based on the MLR method. The corresponding regression coefficients in each equation can reflect the importance of independent variable to dependent variable, which has the same function as regression weights. The importance of meteorological variables (i.e., P and PET) to RI would be presented in the Discussion section.

Long short term memory 200
Long-short term memory (LSTM) proposed by Hochreiter et al. (1997) is a special type of Recurrent Neural Network (RNN). Compared with traditional RNNs, it has memory structures that can combine previous information into the current time step for dealing with long-term dependencies between input and output features. The input of LSTM cells is composed of three parts: input vector at the current time ( ) , the output of LSTM cell at the previous time ℎ ( −1) , and cell state at the last time ( −1) . LSTM cell has two output values: the output of LSTM cell at current time ℎ ( ) and 205 current cell state ( ) . Each LSTM cell has three gates: input gate ( ) , forget gate ( ) , and output gate ( ) . The input gate decides what new information would be added to the current cell state ( ) , the forget gate determines how much of the previous cell state needs to be forgotten by a sigmoid function between the input for the current time ( ) and the previous output ℎ ( −1) , and the output gate controls the retention degree of the cell state to ℎ ( ) in the current time. ̃ is the candidate of new cell state values, which is calculated by a sigmoid function with a linear relationship on ( ) and Where is the sigmoid function = 1 to the hidden layer, respectively, (i.e., , , , ) are bias parameters associated with the input gate ( ) , forget gate ( ) , cell state ( ) , output gate ( ) . , and are adjusted using back propagation through time in the training period.

Evaluation metrics
The objectives of this study were to evaluate the performances of MLR, LSTM, and RF in estimating the RI of drought events and assess their capabilities in capturing flash droughts. Four evaluation metrics were employed: the correlation coefficient (CC) was used to assess the consistency between the simulated and observed RI, with a perfect value of 1; the root mean squared error (RMSE) and mean error (ME) can estimate their errors with an optimal value of 0; the 245 relative bias (BIAS) was employed to calculate the deviations of the simulated RI from observed RI, with an excellent value of 0. These evaluation metrics were specified by equations 10-13 as below: where ( ) is the observed RI at grid , ( ) is the simulated RI at grid , ̅̅̅̅̅̅̅ is the mean observed RI value, ̅̅̅̅̅̅̅ is the mean simulated RI value, and is the number of samples.
In addition, three skill scores, including the probability of detection (POD), false alarm ratio (FAR), and critical success index (CSI), were employed to measure the performances of three AI technologies in flash droughts detection. 255 All these three metrics indices range between 0 and 1. POD and CSI show the ratio of detected flash droughts by the AI technologies to observed flash droughts, and the higher values, the better performances of AI technologies in flash droughts detection. FAR reflects the ratio of detected flash droughts that not occur in observations, with an optimal value of 0. These evaluation metrics can be expressed as follows: where H (Hits) represents flash droughts both detected by the AI methods and observations; F (False alarms) represents the case when flash droughts captured by AI approaches but not recorded in observations. M (Misses) represents flash droughts recorded in observations but not captured by AI approaches.

General framework
The general flowchart for evaluating the performances of AI technologies (i.e., MLR, LSTM, and RF model) in flash drought detection is presented in Fig. 2. We used a global reanalysis soil moisture dataset (i.e., ERA-Interim SM) to identify drought events and calculate their RI. Also, nine climate variables (i.e., P, PET, Tmean, Tmin, Tmax, RHU, PRS, SSD, and WIN) collected from the in-situ observations were generated into spatially consistent climate element series 270 by the IDW method. The process for flash droughts identification includes the following steps. Firstly, the original time series of these data were aggregated into weekly series, and the SM data were further transferred into the SM percentile based on the optimal selection of theoretical probability distribution function (PDF

Figure 2:
The flow chart of evaluating the performances of AI models for flash droughts detection.

Evaluation of the intensification rete of soil moisture
The capabilities of AI technologies in simulating the RI of soil moisture were assessed through intercomparison with 285 the observed RI derived from ERA soil moisture. As shown in Fig. 3, higher RIs (up to 12.5th percentile per week for certain areas) were mostly concentrated in the southern part of China, e.g., the east of QTP, the east of SW, and the

Comparison of the RI of flash droughts and slowly-evolving droughts
RI is an important metric for distinguishing flash droughts from traditional slowly-evolving droughts. To evaluate the capabilities of three AI models in detecting drought events, we analyzed the correlation between model simulated RI and observed RI for flash droughts and slowly-evolving droughts, respectively (Fig. 6) Based on above analysis, we further evaluated the capabilities of the MLR, LSTM, and RF models for capturing flash drought events and slowly-evolving drought events in eight different sub-regions by using three skill scores (i.e., POD, 350 FAR, and CSI) (Fig. 8). For flash droughts, the average POD (FAR) of the MLR and LSTM models ranged from 0.58 to 0.88 (0.08 to 0.41) and 0.68 to 0.94 (0.10 to 0.44), respectively, which were much lower (higher) than those of the RF algorithm ( Figs. 8a and b). Likewise, the CSI of the MLR and LSTM models were much lower than that of the RF Spatially, with the highest POD and CSI scores and the lowest FAR scores, the SE region exhibited the best detection results, and poor performances were in the XJ region. In general, all three AI models provided more reliable information in detecting flash droughts than slowly-evolving droughts. Meanwhile, the RF is more recommended for

Spatiotemporal evolution of typical flash drought events
The ability of capturing the migration trajectories of droughts over time and space is also important for evaluating the capabilities of candidate AI models in drought detection. 10g and l). Similarly, the 2013 summer flash droughts were mostly concentrated in MLYR areas with the average RI 385 of 15.2th percentile per week (Fig. 10q). After 12 weeks, the flash droughts occurred on 17 October, and were mainly located in the SW area (Figs. 10v and aa). In terms of the accuracy of RI simulation, the MLR-estimated RI was generally higher than the observed RI in the SE and SW regions (Figs. 10h, m, w and ab). Comparing to the MLR algorithm, the simulated RI by the LSTM and RF approaches basically followed a nearly consistent pattern as the observed RI, suggesting that they were superior to MLR in monitoring flash droughts.

Performance of AI technologies for RI estimation
In this study, we evaluated three AI technologies, and found RF provided the best estimations of RI with higher CC and lower RMSE comparing to the observed RI (Figs. 4 and 5). It is not surprising that MLR did not perform well 400 given its simple linear regression scheme which is insufficient to describe the complicated nonlinear relationships of variables. With complicated model structures, the LSTM performed slightly better than MLR, but its efficiency is not  (Naghibi et al., 2016;Rodriguez-Galiano et al., 2012;Wang et al., 2015).
Regarding the spatial heterogeneity of RI, we found the RF performed best in the southern China, while the estimation 420 errors were high in the XJ region. This might be related to the local climate and soil conditions. Fig. 11 compares the variation of soil moisture, moisture-related (i.e., P and RHU) and energy-related (i.e., PET, Tmean, Tmax, Tmin, PRS, SSD, and WIN) meteorological factors in adjacent weeks (i.e., T0-7~T0+7) of the onset of drought events during 1979~2016 in XJ and SW regions of China. The XJ region is climatically drier with relatively thick soil layers and sparse vegetation, and this climate and underlying surface conditions may be not beneficial to induce a rapid response of soil moisture to 425 meteorological anomalies. From Figs. 11a, c, and e, we can see that for the XJ region, the variation of soil moisture was not consistent with the changes of meteorological anomalies for flash droughts. The sharp decline of soil moisture (with the value changing from 55.05th to 8.87th percentile within 2 weeks) in Fig. 11a is a typically rapid rate of intensification for flash droughts. However, the meteorological variables did not change synchronously, and even presented lagging variations (e.g., P, PET, and Tmean) after the onset of flash drought. By contrast, the consistency 430 between soil moisture and meteorological variables was considerably improved for slowly-evolving droughts (Figs. 11b, d, and f). As expected, the consistency degree was generally high in the SW region, with better behaviors for flash droughts. As shown in Fig. 11g, soil moisture decreased from 55.25th to 10.54th percentile within two weeks.
Regarding meteorological variables, both P and RHU showed relatively stable negative anomalies (e.g., the value of P anomaly and RHU anomaly at T0 was -0.43, and -0.69, respectively), and energy-related variables (e.g., PET, T, WIN) 435 presented continuously positive anomalies (e.g., the value of Tmean anomaly and PET anomaly at T0 was 0.28 and 0.59, respectively). All these contribute to the rapid decline of soil moisture. Different from the XJ region, the SW region belongs to humid climate zones with abundant soil moisture from the top to deep layers, accompanied with dense vegetation and well-developed root systems. In the joint effects of P deficit and high temperature or heat wave (Figs. 11g, i, and k), the capacity of evapotranspiration from vegetation could be enhanced in very short time period, leading 440 to rapid response of soil moisture to the unusual climate conditions. b, g, and h) denote the 25th~75th percentile range of soil moisture values. The dark yellow shadows in all 12 panels represent the onset-development phase of drought.

Comparison of AI technologies for flash droughts and slowly-evolving droughts
In this study, all three AI models produced better RI estimations of flash droughts than those of conventional droughts (Figs. 6 and 8), suggesting that they are more competent to monitor the rapid onset of droughts. From the perspective 450 of physical mechanisms, the formation of traditional slowly-evolving droughts commonly take a rather long time (e.g., several months or years) and they are driven by a variety of meteorological factors (Mishra and Singh, 2010).
Precipitation deficits, enhanced evaporative demand, their joint or alternant effects are all possible to impose cumulative effects on soil moisture. Given the different climate and underlying conditions, the response time of the hydrological system can be different, manifested as varied time scales of droughts (Zhu et al., 2021). Particularly, the 455 driving forces of slowly-evolving droughts could be more diverse when considering the abnormal atmospheric circulation, which is the origin of meteorological droughts and is also responsible for soil moisture drought. For example, several previous studies suggest droughts essentially are resulted from the sea-and land-atmosphere interactions, and large-scale circulation factors such as the surface sea temperature, 500hPa geopotential height, and 850hPa vertical velocity all influence the development of drought (Xiao et al., 2016;Zeng et al., 2019). In other words, 460 the complicated driving forces of slowly-evolving droughts at varying time scales make it difficult to simulate the variation of soil moisture from a climatic perspective.
In a different manner, flash drought particularly refers to the time period that rapid depletion of soil moisture occurs (Otkin et al., 2018), which usually requires the simultaneous anomalies in precipitation, relative humidity, potential evapotranspiration, temperature, sunshine duration, wind speed, and other meteorological variables to integrate into 465 strong climatic forces (Liu et al., 2020a;Hobbins et al., 2016;Hunt et al., 2014). This rigorous atmospheric driving condition theoretically would not sustain for a long time, and a pentad or weekly time scale is recommended for monitoring flash droughts. Comparison on the individual roles of precipitation (representing the water supply condition) and PET (representing the limits of evaporative demand) in formulating flash droughts and slowly-evolving droughts also showed this difference. Taking the case of the MLR method as an example, Fig. 12 exhibits the weights 470 of P and PET anomalies in the adjacent weeks (as T0-7~T0+7 in Fig. 11) of drought onset for the XJ and SW regions. It can be seen that the weights of P and PET anomalies for flash droughts were generally higher than those of traditional droughts, suggesting a closer relationship between meteorological variables (i.e., P and PET) and flash droughts. Meanwhile, regional differences associated with the individual roles of P and PET were also observed. For the XJ region, the weights of negative P anomaly were generally high at the beginning of two types drought, while the negative P anomalies and the positive PET anomalies presented high weights during the onset time of droughts. The results suggested that P deficit played an important role during drought onset in the XJ region, and for the SW region, the lack of precipitation and elevated evaporative demand both played important roles for the occurrence of droughts, and this synchronously combined effect on the depletion of soil moisture is particularly significant for flash droughts. 480 In general, the AI models are more competent to capture the variation of RI for flash droughts than the slowly-evolving drought due to the close causative relationship between meteorological forces and the former drought type.

Figure 12:
The weights of P (blue bar) and PET (yellow bar) for flash droughts and slowly-evolving droughts based on MLR method in adjacent weeks of drought onset in the XJ and SW regions. T0-1 denotes 1-week prior to the onset 485 time, while T0+1 represent 1-week after the onset time.

Conclusions
Based on the depletion rate of soil moisture derived from the ERA-Interim dataset, we identified flash droughts across China during 1979~2016. Furthermore, the linear and nonlinear relationships between ERA-Interim soil moisture and multiple climate variables were constructed using the MLR, LSTM, and RF technologies. On this basis, we evaluated 490 the performance of these models in estimating the rate of intensification (RI) of soil moisture and analyze their capabilities on flash drought detection. Overall, the RF model displayed the best performance for the whole of China, which was much better than that of MLR and LSTM models. The highest results estimated by RF were in the NE region, with an average CC of 0.90 and average RMSE of 2.6th percentile per week, while the lowest estimations were found in the XJ area, with average CC of 0.75 and average RMSE of 3.3th percentile per week. A specific 495 investigation on the summer and autumn droughts in 2006 and 2013 indicated that RF and LSTM can well reveal the spatial patterns of RI. They were able to provide a better simulation of flash drought relative to MLR with the lowest estimations. Furthermore, these AI methods displayed a relatively higher detection capacity of flash droughts than that of traditional slowly evolving droughts. RF model was recommended to simulate flash drought by considering the multiple meteorological variable anomalies in the adjacent time period of drought onset. The POD, FAR, and CSI of 500 flash drought captured by the RF were 0.93, 0.15, and 0.80, respectively. In terms of the meteorological driving mechanism of flash droughts, the negative precipitation (P) anomalies and positive potential evapotranspiration (PET) anomalies exhibited a stronger synergistic effect on flash droughts comparing to slowly-developing droughts. Such compound effects on flash drought also presented asymmetrical characteristics over two regions in China. For the XJ region, P deficit played a dominant role on driving the onset of droughts, while for the SW region, the lack of 505 precipitation and elevated evaporative demand contributed almost equally for the occurrence of droughts. This work would help enhance the understanding of flash droughts and provide a reference for the application of AI models on simulating flash droughts.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have 510 appeared to influence the work reported in this paper.