the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
 
                
            
            Understanding meteorological and physio-geographical controls of variability of flood event classes in headstream catchments of China
Yongyong Zhang
Yongqiang Zhang
Xiaoyan Zhai
Jun Xia
Qiuhong Tang
Wei Wang
Jian Wu
Xiaoyu Niu
Bing Han
Classification is beneficial for understanding flood variabilities and their formation mechanisms from massive flood event samples for both flood scientific research and management purposes. Our study investigates comprehensive manageable flood event classes from 1446 unregulated flood events in 68 headstream catchments of China using hierarchical and partitional clustering methods. Control mechanisms of meteorological and physio-geographical factors (e.g., meteorology or land cover and catchment attributes) on spatial and temporal variabilities of individual flood event classes are explored using constrained rank analysis and a Monte Carlo permutation test. We identify five robust flood event classes, i.e., moderately, highly, and slightly fast floods as well as moderately and highly slow floods, which account for 24.0 %, 21.2 %, 25.9 %, 13.5 %, and 15.4 %, respectively, of the total number of events. All of the classes are evenly distributed in the entire period, but the spatial distributions are quite distinct. The fast flood classes are mainly in southern China, and the slow flood classes are mainly in northern China and the transition region between southern and northern China. The meteorological category plays a dominant role in flood event variabilities, followed by catchment attributes and land covers. Precipitation factors, such as volume and intensity, and the aridity index during the events are the significant control factors. Our study provides insights into flood event variabilities and aids in flood prediction and control.
- Article
                                                    (7314 KB) 
- Full-text XML
- 
                                                Supplement (2001 KB) 
- BibTeX
- EndNote
Flood events usually show tremendous spatial and temporal variabilities in behavior due to heterogeneities in meteorological and underlying surface conditions over large basins or entire regions (e.g., county, continent, and world) (Berger and Entekhabi, 2001). Existing studies provide insights into the impacts of changes in meteorological or underlying surface conditions in specific flood metrics (e.g., magnitude, peak, timing, or seasonality) and their changes using the trend separation method, correlation testing, and mathematical modeling (Berghuijs et al., 2016; Tarasova et al., 2018; Liu et al., 2020; Wang et al., 2024). However, all of these studies are implemented at the event scale or in catchments with certain landscapes and climates, which are insufficient for comprehensive flood change investigation and generalized results (Tarasova et al., 2019; Zhang et al., 2020). Flood event similarity analysis is beneficial for investigating comprehensive dynamic characteristics of flood events in space and time by grouping massive heterogenous events into some manageable classes with significant statistical differences in flood responses (e.g., great or small floods, fast or slow floods, or rain or snowmelt floods) (Brunner et al., 2018). Flood event class determines hydrological response characteristics, longitudinal and lateral transfers of energy and material, and structures and functions of riverine ecosystems (Arthington et al., 2006; Poff and Zimmerman, 2010). The class also directly determines flood disaster losses for human society and affects the strategy formulations of flood control and management (Hirabayashi et al., 2013; Jongman et al., 2015). Hence, for both flood scientific research and management purposes, it is fundamentally important to identify the flood event classes and their formation mechanisms (Sikorska et al., 2015).
Inductive and deductive approaches are reported for the flood event similarity classification according to the clustering objectives (Olden et al., 2012). The inductive approach focuses directly on the shape similarity of flood events by clustering the response characteristics extracted from the flood event hydrographs. The response characteristics include magnitude, frequency, duration, timing, seasonality and variability metrics, which are considered the critical components for characterizing the entire range of flood events (Poff et al., 1997; Kuentz et al., 2017; Zhang et al., 2020). The reported flood event classes are the fast events with steep rising and falling limbs, the slow events with both elongated rising and falling limbs, the sharp or fast flood event, and the flash flood (Kuentz et al., 2017; Brunner et al., 2018; Zhai et al., 2021; Zhang et al., 2020). The deductive approach mainly focuses on the similarity of environmental factors which control flood events, such as meteorological variables (e.g., storm intensity, duration, and snowmelt) and physio-geographical conditions (e.g., soil moisture, land cover, and topography) (Merz and Blöschl, 2003; Ali et al., 2012; Brunner et al., 2018; Zhang et al., 2022). The reported flood event classes are the long-rain floods, short-rain floods, flash floods, rain-on-snow floods, and snowmelt floods (Merz and Blöschl, 2003; Sikorska et al., 2015; Brunner et al., 2018; Zhang et al., 2022). However, the control relationships of environmental factors with flood event shapes are not defined well, so that the identified classes are not exactly helpful for investigating the flood change patterns at the event scale. Therefore, it is a challenge to better understand the formation mechanisms of individual flood event classes.
The main procedure of existing flood event classification is to cluster the similarity of flood event attributes (e.g., flood response characteristics or control factors) across the spatial and temporal scales. According to the classification procedure, there are two widely adopted approaches, i.e., tree clustering methods (e.g., decision tree, regression tree, fuzzy tree, and random forest) (Sikorska et al., 2015; Brunner et al., 2017) and non-tree clustering methods (e.g., single linkage, complete linkage, average linkage, centroid linkage, ward linkage, k-means, and k-medoids) (Zhang et al., 2020; Zhai et al., 2021). The tree clustering methods are implemented successively for binary splitting of all of the flood events into smaller classes of similar flood events according to the thresholds of flood response metrics until the final classes are obtained (Sikorska et al., 2015; Brunner et al., 2017). The classification results could be applicable to other basins, and the flood response characteristics of different studies would be directly comparable if the same thresholds were to be adopted. However, these methods assume that the boundaries of flood response metrics in different classes are clear and that the thresholds of flood response metrics should be predefined and not overlap between different classes (Olden et al., 2012; Sikorska et al., 2015; Zhai et al., 2021). Additionally, the classification is very sensitive to the thresholds, whose small changes would create different flood event classes (Olden et al., 2012; Sikorska et al., 2015). Therefore, it will be difficult to define the thresholds clearly to get a robust classification performance. The non-tree clustering methods are implemented to directly split all of the flood events according to the different division rules of the comprehensive similarity measures of flood event shapes or metrics (Olden et al., 2012; Zhang et al., 2020). The class boundaries of the flood response metrics are vague, and the flood event classes are mainly based on the class membership degree deduced from sufficient heterogeneous flood events (Sikorska et al., 2015). The flood response characteristics of the individual classes were usually described qualitatively in order to distinguish between the differences in the classes (Olden et al., 2012; Tarasova et al., 2019; Zhang et al., 2020). Therefore, the classification results obtained from the different flood event samples are still difficult to compare quantitatively, even though the flood response characteristics or hydrographs in a certain class are similar (e.g., high or low floods or fast or slow floods) (Zhang et al., 2024). The determinations of the clustering method and the final cluster number are subjective in most existing studies, and the assessment of the clustering performance is usually unavailable (Olden et al., 2012; Sikorska et al., 2015; Brunner et al., 2017). Therefore, the robustness of flood event classification should be explored further.
The main aim of this study is to investigate the flood event similarity and control mechanisms of meteorological and physio-geographical factors in space and time at the class scale across China. Over 1000 unregulated flood events in 68 heterogeneous catchments with wider meteorological and physio-geographical conditions were selected for our study. The specific objectives are as follows:
- 
      Determine the optimal flood event classes by comparing multiple classification performance criteria of both the hierarchical and partitional clustering methods. 
- 
      Identify the main flood response characteristics of individual classes and their spatial and temporal variabilities. 
- 
      Quantify the effects of meteorological and physio-geographical factors on the variabilities of individual flood event classes. 
This study provides more comprehensive insights into meteorological and physio-geographical controls of variabilities of flood event classes and provides the mechanism supports for predicting flood event classes.
According to the Köppen–Geiger climate classification (Peel et al., 2007), China has diverse climate types, including alpine tundra (Köppen–Geiger code ET); tropical (A); arid, steppe, and cold (BSk); arid, desert, and cold (BWk); cold without a dry season (Df); cold with a dry winter (Dw); temperate without a dry season (Cf); and temperate with a dry winter (Cw). Most Köppen–Geiger climate types in China (i.e., A, Dw, Cf, and Cw) are controlled by the southeastern and southwestern monsoons in the summer, with temperate and humid climates and the northwestern and northeastern monsoons in the winter with cold and dry climates. In these monsoon-controlled climate types, the mean annual precipitation was 365–2654 mm with a mean of 1184 mm, of which over 65 % fell between May and September according to the gauged daily precipitation observations from 2001 to 2020 in these regions. This led to frequent flooding, and thus the region in the monsoon-controlled climate types is usually considered the flood-prone area of China (China Institute of Water Resources and Hydropower Research and Research Center on Flood and Drought Disaster Prevention and Reduction, the Ministry of Water Resources, 2021). In the last decade, flooding occurred in 455 rivers annually, which affected 822 million people and averaged over USD 10 billion in losses (Ministry of Water Resources of the People's Republic of China, 2020a).
Sixty-eight headstream stations spread across flood-prone areas of major upper-river basins in China were selected with catchment areas ranging from 21 to 4830 km2, covering all of the monsoon-controlled climate types of China, except for the tropical climate of the islands (i.e., A) (Fig. 1). Most of the catchments had large forest coverage, with a mean area percentage of 67.0 %, particularly in the Yangtze (69.9 %) and Pearl (68.7 %) river basins. A total of 1446 unregulated flood events with hourly time steps were collected from the hydrological yearbooks of the Songliao, Yellow, Huaihe, Yangtze, Southeast, and Pearl river basins over the period 1993–2015. The event was extracted following the standard of the Ministry of Water Resources of the People's Republic of China, i.e., the code for hydrological data processing (SL/T 247–2020) (Ministry of Water Resources of the People's Republic of China, 2020b). The extracted flood events at the individual stations usually had a maximum flood peak or flood volume, an isolated flood peak, continuous flood peaks, or a flood peak after prolonged drought during high- and normal-flow years (Ministry of Water Resources of the People's Republic of China, 2020b). In summary, per basin, there were 53 events at 4 stations, 104 events at 4 stations, 215 events at 13 stations, 844 events at 38 stations, 90 events at 5 stations, and 140 events at 4 stations in the upper tributaries of the Songliao River Basin (i.e., the Songhua and Wusuli rivers), Yellow River Basin (i.e., the Huangshui, Jinghe, and Yiluo rivers), Huaihe River Basin (i.e., the northern and southern tributaries), Yangtze River Basin (i.e., the Hanjiang, Wujiang, Lake Dongting, Lake Poyang, and lower Yangtze rivers), Southeast River Basin (i.e., the Qiantang and Jinjiang rivers), and Pearl River Basin (i.e., the Beijiang, Xijiang, and Dongjiang rivers), respectively. No fewer than 10 flood events were collected for every station to ensure representativeness. The densities of flood events and gauges in southern China (i.e., the Huaihe, Yangtze, Southeast, and Pearl river basins) were 1.25–11.01 times and 2.94–9.15 times greater than those in northern China (i.e., the Songliao and Yellow river basins) because of the higher occurrences of flood events (Table S1 in the Supplement).
Meteorological, catchment, and land cover data sources were collected together to calculate the potential meteorological and physio-geographical control factors and quantify their contributions to the spatial and temporal variabilities of flood event classes. The meteorological data sources were the synchronous hourly precipitation events extracted from the hydrological yearbooks and the daily precipitation, maximum temperature, and minimum temperature observations from 1993 to 2015 at the meteorological stations within or around the catchments downloaded from the China Meteorological Data Sharing Service System. All of the meteorological stations in the buffer zone with a radius of 100 km from every catchment center were selected. The station number was 466 in total, with no less than 8 stations for each catchment. The daily meteorological variables were interpolated to the catchment using the inverse-distance-weighting method, which is one commonly used meteorological interpolation method (Ahrens, 2006; Tan et al., 2021). The geographic information system (GIS) data contained the digital elevation model and the land cover data series in six periods (i.e., 1990, 1995, 2000, 2005, 2010, and 2015) whose spatial resolution is 30 m × 30 m. The GIS data were downloaded from the Data Center of Resources and Environmental Science, Chinese Academy of Sciences, and were adapted to extract catchment attributes and area percentages of individual land cover types. All of these data sources for control factor calculations had been widely used to represent the meteorological and underlying surface conditions in China for hydrometeorological change detection, causal analysis, and hydrological modeling (Zhang et al., 2020; Du et al., 2022; Zhang et al., 2024).
3.1 Flood response metrics
The flood classification in our study mainly focuses on the detailed response characteristics of flood hydrographs using the inductive approach. The magnitude, variability, timing, duration, and rate of change are widely accepted as the five main components for characterizing all of the flood events (Arthington et al., 2006; Kennard et al., 2010; Poff et al., 2007; Zhang et al., 2012) and thus are also adopted to characterize the detailed flood responses in our study. Additionally, flood peak number is one of the most important metrics for flood control (Aristeidis et al., 2010; Rustomji et al., 2009). Therefore, nine metrics are used to fully characterize the responses of flood events (Table 1). In particular, Tbgn is characterized using the circular statistical approach, which translates the calendar date into the polar coordinates on the circumference of a circle and is beneficial for distinguishing the seasonal pattern (Dhakal et al., 2015).
Table 1Metrics used to characterize the flood responses in our study.

Note: Qt is the flood magnitude on day t (m3 s−1), Qav is the mean flood magnitude (m3 s−1), Qbgn and Qend are the flood magnitudes at the beginning and end of an event (m3 s−1), σ is the standard deviation of the flood magnitude (m3 s−1), TD is the total number of days of a calendar year (d, i.e., 365 for a common year or 366 for a leap year), TFbgn and TFend are the beginning and end dates of flood events, TFpk is the occurrence date of the maximum flood peak, A is the catchment area (km2), and 86.4 is the unit conversion factor (m3 s−1 km−2 to mm).
3.2 Flood event classification
High dimensionality and multicollinearity exist among flood response metrics and affect the flood event classification when a large number of metrics are considered (Olden et al., 2012; Zhang et al., 2012). Here, principal component analysis (PCA) is used to transform the high-dimensional metrics into a few principal components (PCs) based on the orthogonal transform. If the cumulative variance is over 85 % of the total explained variances of all of the flood response metrics, the first m PCs are selected for classification. The main flood response metrics in the individual PCs were determined according to the load coefficient matrix. If the load coefficient is over 0.45, the corresponding flood response metrics are considered to be highly correlated with the PCs.
Subsequently, both the hierarchical (Ward) and partitional (k-medoids) clustering methods are used to cluster flood events based on the similarity of the selected PCs. Euclidean distance is the distance measure. Twenty-two criteria are used to assess the classification performance and determine the best number of clusters, i.e., KL, CH, Hartigan, CCC, Scott, Marriot, TrCovW, TraceW, Friedman, Silhouette, Ratkowsky, Ball, Ptbiserial, Dunn, Rubin, Cindex, DB, Duda, Pseudot2, McClain, SDindex, and SDbw (Table S2 in the Supplement) (Charrad et al., 2014). The greater values of the first 14 indexes (i.e., KL to Dunn) or the smaller values of the 8 remaining indexes (i.e., Rubin to SDbw) indicate the better classification. If the best criterion number is the largest of a certain cluster number, the cluster number is optimal and the corresponding clustering method is also selected. The implementations of all of the multivariable statistical analyses are given in Appendix A.
3.3 Control mechanisms of meteorological and physio-geographical factors of the variabilities of flood event classes
3.3.1 Meteorological and physio-geographical factors
The meteorological (e.g., precipitation intensity, timing and duration, or evapotranspiration volume) and physio-geographical factors (e.g., land covers and catchment attributes) directly affect the flood generation and routing processes, which thus cause the diversity of flood event shapes (Ali et al., 2012; Brunner et al., 2018; Merz and Blöschl, 2003; Zhang et al., 2022). As many potential control factors as possible are selected to investigate the control mechanisms on the variability of flood event classes according to existing studies. There are 34 meteorological, catchment, and land cover factors selected in all of the catchments (Table 2). In the meteorological factor category, 17 factors related to precipitation, potential evapotranspiration, and aridity index are selected, including the amounts, intensities, and timing factors during flood events, in the antecedent period and at the annual scale. All of the precipitation factors during the flood events are extracted using the hourly precipitation observations. The precipitation factors at the daily or annual scale are extracted using the daily precipitation observations. The potential evapotranspiration at a daily or annual scale is estimated using the Hargreaves method (Hargreaves and Samani, 1982), and the aridity index is the ratio of potential evapotranspiration to precipitation. All of these factors mainly affect the flood yield processes (Merz and Blöschl, 2003; Aristeidis et al., 2010; Zhang et al., 2022).
In the physio-geographical factor category, 10 catchment attributes are selected, including catchment location, area, elevation and slope, and river density and slope. Seven land cover factors for the six land cover periods are selected, including the area fractions of paddy, dry land, forest, grassland, water, and urban and rural areas in the entire catchment. All of these physio-geographical factors mainly affect the flood yield and routing processes (Ali et al., 2012; Kuentz et al., 2017; Zhai et al., 2021).
3.3.2 Effect quantifications of meteorological and physio-geographical factors
The constrained rank analysis is adopted to quantify the direct and interactive effects of multiple control factor categories on spatial and temporal variabilities of individual flood event classes for both distributed and lumped analyses. The widely adopted methods of constrained rank analysis are redundancy analysis (RDA) and canonical correlation analysis (CCA). RDA is a linear model and CCA is a unimodal model, both of which are extended methods of principal component analysis interacting with regression analysis. These methods have great advantages for solving multiple linear regressions and interactions between dependent and independent variable matrixes that are transformed into a few independent composite factors (ter Braak, 1986; Legendre and Anderson, 1999), and they are beneficial for quantifying the effects of an independent variable matrix on a dependent variable matrix and finding the most important factors. Both methods have commonly been used to test the multispecies response to environmental variables in the biological and ecological sciences (Legendre and Anderson, 1999) or the effects of physio-geographical factors and human activities on diffuse nutrient losses or water quality (Zhang et al., 2016; Shi et al., 2017).
The selection of the CCA and RDA is based on the first axis length of the detrended correspondence analysis. The CCA is proposed when the first axis length is greater than 4.0, while the RDA is proposed when the first axis length is less than 3.0. Otherwise, both CCA and RDA are proposed (ter Braak, 1986; Zhang et al., 2020). Additionally, because of the multiple control factor categories considered, two constrained rank analyses are implemented, i.e., entire and partial analyses. The entire analysis is implemented by involving all of the control factors as the independent variable matrix, and the variance percentage explained by the independent variable matrix of the total variance of the dependent variable matrix is considered to be the entire contribution of all of the control factors or categories to the total variabilities of the flood event classes. The partial analyses of the individual control factor categories are also implemented by involving a certain control factor category as the independent matrix, and the effects of the other control factor categories are held constant. The percentage of the constrained variance is considered to be the individual contribution of the involved control factor category. The meteorological, land cover, and catchment categories are adopted individually for the analysis, and their individual contributions are determined. If the sum of all of the individual contributions is less than the entire contribution of all of the factors, the interactive effects exist among the control factors and the difference between the summed and entire contributions is the interactive contribution (Legendre and Anderson, 1999; Zhang et al., 2016).
Furthermore, the Monte Carlo permutation test is adopted to test the statistical significance of the control factors and obtain the correlation coefficients (r) between the flood response matrix and the control factor matrix in the individual catchments (i.e., a distributed analysis) and the entire region (i.e., a lumped analysis), respectively. All of the meteorological and physio-geographical factors are included for the lumped analysis, while the catchment attributes are excluded for the distributed analysis because they are not dynamic in the individual catchments. The significant statistical interval is set to 95 %, i.e., p=0.05.
4.1 Flood event classification
Using the tests of independence and the linear correlation for all of the flood response metrics, Tbgn is independent of R, RQr, RQd, and Npk. Qpk is independent of Tpk. Npk is independent of RQr and RQd. Except for these independent metrics, all of the others have linear correlations with each other (Table S3 in the Supplement). Using the principal component analysis, five independent PCs are found with a total cumulative variance of 85.7 %, all of which are selected in our study (Table 3). The first PC is related to magnitude, variability, and rates of change with an explained variance of 33.3 %. The second PC is related to magnitude, variability, and peak number with an explained variance of 17.0 %. The third to fifth PCs are mainly related to the flood event duration, the start time of the flood event, and the flood peak timing with explained variances of 16.0 %, 10.8 %, and 8.6 %, respectively. Furthermore, optimal classification of all 1446 flood events is determined by comparing the classification performance between the hierarchical and k-medoids clustering methods. The five clusters using the k-medoids clustering method are optimal for further analysis in our study (Fig. B1 in Appendix B).
4.2 Flood response characteristics in different classes
The value ranges of flood response characteristics in different classes are presented in Fig. 2 and Table S4 in the Supplement. For the magnitude metrics, the distributions of both total flood volume (R) and maximum flood peak (Qpk) are the same in different classes. That is to say, the metric values are largest in Class 3, followed by Classes 5, 2, 1, and 4. For the variability metric (coefficient of variation – CV), the events are most variable in Class 5 and slightly variable in the other classes, with the mean CV being less than 1.0. For the timing and duration metrics (i.e., Tbgn, Tdrn, and Tpk), 73.2 % of the flood events in Class 1 occur before the wet season (i.e., January–May), 58.5 %, 67.7 %, and 57.0 % of the flood events in Classes 2, 3, and 5 occur in the earlier wet season (i.e., June–July), and 52.8 % of the flood events in Class 4 occur in the later wet season (i.e., August–September). The mean duration (Tdrn) is longest in Class 5, followed by Classes 3 and 1. The mean Tdrn values in Classes 4 and 2 are the shortest ones. The timings of the maximum flood peaks (Tpk) are usually largest in Class 2 with a mean of 50.6±10.3 %, which means that the flood peaks mainly occur in the middle or late stages of flood events. The flood peaks usually occur in the early stages of flood events in the other classes (i.e., Classes 1, 2, 4, and 5). In particular, in Class 3, the mean Tpk value is only 23.7±13.6 %.
For the rates of change, RQr in most classes is much greater than RQd because the flood peaks usually occur in the early stages of flood events, except for Class 2. The largest values of both RQr and RQd are in Class 3 because of the greatest flood peak. The smallest RQr values are mainly in Class 2 because of the late occurrences of the flood peaks, while the smallest RQd values are mainly in Class 5 because of the long durations of flood recession. For the flood peak number (Npk), 71.2 %, 69.9 %, 76.5 %, and 77.1 % of the flood events have one flood peak in Classes 1, 2, 4, and 5, respectively, and multiple flood peaks (i.e., two to four) exist in 94.4 % of the total flood events in Class 3, accounting for 33.8 % (two peaks), 48.7 % (three peaks), and 11.8 % (four peaks), respectively.
According to the metric distributions (Fig. 2) and the hydrographs and duration frequencies (Fig. 3) of the individual flood event classes, we can conclude that Class 1 is for moderately fast flood events occurring before the wet season and is characterized by a single peak and moderate duration. It is referred to as the “moderately fast flood event class”. Class 2 represents highly fast flood events with a single peak in the late stage and short duration and is designated the “highly fast flood event class”. Class 3 represents highly slow flood events during the latter part of the wet season, featuring multiple peaks and long duration and known as the “highly slow and multi-peak flood event class”. Class 4 represents slightly fast flood events occurring in the latter part of the wet season and having a single peak and short duration; it is named the “slightly fast flood event class”. Lastly, Class 5 represents moderately slow flood events with a single peak and long duration and is designated the “moderately slow flood event class”.

Figure 2Variations of the flood response metrics among Classes 1–5. The solid dark-red dot and the gray dot represent the mean and 50th percentile values, respectively. Each black box means the 25th and 75th percentile values, and the vertical line defines the minimum and maximum values without outliers. The violin shape means the frequency distribution of the flood response metric.
4.3 Spatial and temporal distributions of flood event classes
The spatial distributions of the individual classes are shown in Figs. 4 and S1 and Table S5 in the Supplement. The moderately fast flood event class (i.e., Class 1) is mainly in the upper Dongjiang River of the Pearl River Basin and the Lake Poyang and Lake Dongting tributaries of the Yangtze River Basin, accounting for 37.1 % () and 29.7 % () of the total number of events in the main river basins. Specifically, Class 1 is dominant at the Yanling (54.5 %, ) and Tongtang (50.0 %, ) stations in the Lake Dongting tributaries, the Shanggao (52.6 %, ) station in the Lake Poyang tributaries, and the Hezikou (47.2 %, ) station in the Dongjiang River. The very fast flood event class (i.e., Class 2) is mainly in the upper Beijiang River of the Pearl River Basin and the Lake Dongting tributaries of the Yangtze River Basin, accounting for 31.4 % () and 22.5 % () of the total number of events in the main river basins. Class 2 is particularly dominant at the Xiaogulu (80.0 %, ) station in the Beijiang River and the Tangdukou (57.6 %, ) station in the Lake Dongting tributaries. The very slow and multi-peak flood event class (i.e., Class 3) is mainly in the upper Jinjiang, Qiantang, and Minjiang rivers in the Southeast River Basin, accounting for 42.2 % () of the total number of events, particularly at the Longshan (69.6 %, ) station in the Jinjiang River. The slightly fast flood event class (i.e., Class 4) is mainly in the upper Huangshui, Jinghe, and Yiluo rivers of the Yellow River Basin and the upper Songhua and Wusuli rivers of the Songliao River Basin, accounting for 64.4 % () and 60.4 % () of the total number of events in the main river basins. This class is dominant at the Qiaotou (77.3 %, ) station in the Huangshui River, the Huating (63.6 %, ) station in the Jinghe River, the Luanchuan (69.2 %, ) station in the Yiluo River, the Jingyu (69.2 %, ) and Dongfeng (64.3 %, ) stations in the Songhua River, and the Muling (58.3 %, ) station in the Wusuli River. The moderately slow flood event class (i.e., Class 5) is mainly in the southern tributaries of the Huaihe River Basin, accounting for 47.4 % () of the total number of events, particularly at the Beimiaoji (100 %, ) and Qilin (70.0 %, ) stations. Therefore, Classes 1 to 3 are mainly in the temperate without dry season climate region in southern China (Fig. 1), Class 4 is mainly in the cold with dry winter climate region in northern China, and Class 5 is mainly in the transition region between the temperate without dry season climate and the cold with dry winter climate.

Figure 4Spatial variabilities of the individual flood event classes at the headstream stations of the major river basins.
According to the interannual distributions of the individual classes (Fig. 5a), all of the classes are evenly distributed. Their annual mean percentages are 24.0±5.9 %, 21.2±6.4 %, 13.5±7.7 %, 25.9±6.2 %, and 15.4±12.5 %, respectively. However, the interannual distributions of the individual classes are quite distinct at different stations, particularly in the upper Songhua and Wusuli rivers of the Songliao River Basin. At the headstream stations of the Songliao River Basin (Fig. 5b), Class 4 is dominant with an annual mean percentage of 26.1±38.3 % (n=32), though flood events are missed in several years due to the dry period. The dominance of Class 4 is most considerable in 1996, 1998, 2002, and 2009 at the Muling station in the upper Wusuli River. At the headstream stations of the Yellow River Basin (Fig. 5c), Class 4 is also dominant across the whole period with an annual mean percentage of 58.1±33.9 % (n=67), particularly in 1994–1996, 1999, and 2007. The dominance of Class 4 is most considerable in 1993–1995 and 2001–2004 at the Huating station in the upper Jinghe River. At the headstream stations of the Huaihe River Basin (Fig. 5d), Class 5 gradually prevails with an annual mean percentage of 41.5±23.7 % (n=102), particularly after 2007, when the percentage reaches 63.2±15.8 % (n=79). The dominance of Class 5 is most considerable in 2007–2014 at the Beimiaoji station in the southern tributaries. The event numbers of Classes 1 and 2 decrease gradually, accounting for 33.1±24.4 % (n=11) and 8.7±7.1 % (n=5) of the annual flood events in the periods 1993–1999 and 2011–2015 for Class 1 and 20.3±20.9 % (n=9) and 2.7±1.3 % (n=1) in the periods 1993–1999 and 2011–2015 for Class 2. The decreases in Classes 1 and 2 are remarkable at the Peihe station in the southern tributaries and the Ziluoshan station in the northern tributaries. The explanations are that the total precipitation amount and duration probably increase due to climate change (Dong et al., 2011; Jin et al., 2024). At the headstream stations of the Yangtze River Basin (Fig. 5e), Classes 1, 2, and 4 are dominant, accounting for 29.3±9.6 % (n=251), 23.0±11.5 % (n=197), and 21.1±7.0 % (n=181) of the annual mean flood events. Although the interannual changes in the event numbers of Classes 1 (n=1–21), 2 (n=1–14), and 4 (n=1–16) are considerable, those of the class percentages are relatively uniform, except for 2015. The class dominance is most considerable in 1993, 1995–1997, and 1998 at the Yanling station in the Lake Dongting tributaries for Class 1, in 1993, 1994, and 1997 at the Dutou station in the Lake Poyang tributaries for Class 2, and in 1998, 2000, 2001, 2004, 2005, 2007, and 2010–2013 at the Biyang station in the tributaries of the Hanjiang River for Class 4. At the headstream stations of the Southeast River Basin (Fig. 5f), Class 3 gradually prevails after 2000 with an annual mean percentage of 46.2±32.5 % (n=39), which is remarkable at the Longshan station in the upper Jinjiang River. At the headstream stations of the Pearl River Basin (Fig. 5g), Class 1 is dominant with an annual mean percentage of 36.0±24.0 % (n=52) but gradually shifts to Class 2, which accounts for 30.0±25.2 % of the annual mean flood events (n=40), particularly after 2008. The class dominance is most considerable from 1993 to 2007 at the Hezikou station in the upper Dongjiang River for Class 1 and in 1993, 1994, 1996, 2005, 2006, and 2009–2011 at the Xiaogulu station in the upper Beijaing River for Class 2.
4.4 Control mechanisms of the meteorological and physio-geographical factors
4.4.1 Control factors and their contributions to the distributed analysis
According to the Monte Carlo permutation test between the flood response matrix and the control factor matrix in the individual catchments of Class 1, the total and mean precipitation and the aridity index during the event (rpcp_dur=0.65–0.99, n=14; rpcp_av=0.70–0.97, n=7; rADI_dur=0.52–0.97, n=7) are the major control factors in 44.7 % (), 20 % (), and 25 % () of the total number of catchments of the Yangtze, Southeast, and Pearl river basins, respectively (Fig. 6 and Table 4). The contributions of the control factors are statistically significant only in the Liangshuikou catchment of the Yangtze River Basin and the Hezikou catchment of the Pearl River Basin. In the Liangshuikou catchment, 96.3 % of the temporal differences are explained, of which the meteorological and land cover categories explain 92.5 % and 3.8 %, respectively. In the Hezikou catchment, 66.7 % of the temporal differences are explained, of which the meteorological category and the interactive impact explain 49.4 % and 17.3 %, respectively. The major control factors and their contributions for Classes 2–5 are also presented in Sect. S1 and Figs. S2–5 of the Supplement. For all of the classes, only the factors in the meteorological category are statistically significant, particularly the precipitation amount and intensity and the aridity index during the events. Most of the control factors with statistical significance are in Class 1, followed by Classes 4, 5, 3, and 2. These control factors for the individual classes are mainly detected in the catchments of the Yangtze (Class 1), Yellow and Pearl (Class 4), Huaihe (Class 5), Southeast (Class 3), and Pearl (Class 2) river basins, respectively. The explanations for this are that the precipitation amount and potential evapotranspiration during the event usually show remarkable differences between the different events that directly determine the spatial and temporal heterogeneities of the flood generation process and consequently the flood event hydrograph, but the land covers usually show slow changes in the headstream catchments due to slight disturbances by human activities and climate change.

Figure 6Significant control factors and their correlation coefficients for the temporal variabilities of flood event Class 1 in the individual catchments. The gray color means the control factor without statistical significance. Note: the Anhe, Anren, Chengcun, Jiahe, Liangshuikou, Loudi, Pingshi, Shanggao, Shimenkan, Shuangjiangkou, Tangdukou, Tongtang, Xiawan, Yanling, Yanta, Yucun, and Yuexi catchments are from the Yangtze River Basin. The Tunxi catchment is from the Southeast River Basin. The Hezikou catchment is from the Pearl River Basin.
4.4.2 Control factors and their contributions to the lumped analysis
The Monte Carlo permutation tests across the entire study area suggest that the meteorological category is also the most important one (Fig. 7), particularly the precipitation amount and intensity (i.e., pcp_ant, pcp_dur, pcp_max, pcp_av, pcp_Tbeg, and pcp_Tdur) and the aridity index during the events (ADI_dur) with correlation coefficients of 0.33–0.74, 0.20–0.38, and 0.29–0.41, respectively. The significant factor number in the catchment attribute category is low despite the fact that the most relevant ones are the mean catchment length (Length), river density (Rivden), and ratio of river width to depth (RivSlope) with correlation coefficients of 0.18–0.32, 0.15–0.24, and 0.21–0.30, respectively. In the land cover category, only the grassland area ratio (Rgrass) is significant in Class 1, with a correlation coefficient of 0.21.

Figure 7Significant control factors and their correlation coefficients for the variabilities of the individual flood event classes (i.e., Classes 1–5). The gray color means the control factor without statistical significance.
In Class 1, the significant control factors are the precipitation, potential evapotranspiration, and aridity index in the antecedent 7 d (i.e., pcp_ant, pet_ant, and ADI_ant) and during the events (i.e., pcp_dur, pcp_av, pcp_max, pcp_Tbeg, pet_dur, pet_max, and ADI_dur); the potential evapotranspiration at the annual scale (i.e., pet_ann and pet_year) in the meteorological category; the area (Area), Length, maximum elevation (MaxiElev), Rivden, RivSlope, and ratio of river width to depth (Rwd) in the catchment attribute category; and Rgrass in the land cover category. Additionally, 72.7 % of the total spatial and temporal variabilities of the flood events are explained by all of the control factor categories, of which 43.9 % of the total variabilities are explained by the meteorological category (particularly the factors during the events), followed by the interactive impact (22.7 %), catchment attribute category (4.2 %), and land cover category (1.5 %), respectively (Fig. 8a).
The significant control factors of Class 2 are mainly in the meteorological factor category, including precipitation and potential evapotranspiration in the antecedent 7 d (i.e., pcp_ant and pet_ant) and the precipitation and aridity index during the flood events (i.e., pcp_dur, pcp_av, pcp_max, pcp_Tbeg, pcp_Tdur, and ADI_dur). In Class 3, the significant control factors are mainly the precipitation and aridity index during the flood events (i.e., pcp_dur, pcp_av, pcp_max, and ADI_dur) and the catchment elevation (i.e., Elevation and MaxiElev). In Classes 4 and 5, most of the meteorological and catchment factors are significant. The specific factors are the precipitation and potential evapotranspiration in the antecedent 7 d and during the events (i.e., pcp_ant, pcp_dur, pcp_av, pcp_max, pcp_Tbeg, pcp_Tdur, pet_ant, pet_dur, and pet_max), the aridity index during the events (i.e., ADI_dur), the precipitation at the annual scale (i.e., pcp_year) for the meteorological factor category, Area, Length, Rivden, and Rwd in the catchment attribute category for Class 4, together with precipitation factors (i.e., pcp_ant, pcp_dur, pcp_av, pcp_max, pcp_Tbeg, and pcp_year), the aridity index during the events and at the annual scale (i.e., ADI_dur and ADI_year) for the meteorological factor category, Length, Rivden, and Rwd in the catchment attribute category for Class 5. For all of the contributions of all of the control factors or categories, 73.3 %, 85.4 %, 65.9 %, and 65.7 % of the total spatial and temporal variabilities of the flood events are significantly explained in Classes 2–5 (Fig. 8b–e). For the individual contributions, the meteorological factor category explains the largest variabilities (i.e., 36.5 %–50.5 %), followed by the catchment attribute category (i.e., 5.1 %–6.1 %), and the land cover category explains the lowest variabilities, i.e., 0.0 %–2.4 %. The interactive impacts of all of the control factor categories also explain 17.5 %–33.0 % of the total variabilities, particularly in Class 3.
Therefore, the total variabilities of flood events in Class 1 are mainly controlled by the total precipitation amount and its intensity during the events, determining the magnitudes of the total flood yield and the flood peak, the catchment slope length, and the river slope, which affect the flood routing processes, e.g., the total duration of a flood event and the occurrence time of a flood peak. The total variabilities in Class 2 are also mainly controlled by the total precipitation amount and its intensity during the events. The total variabilities in Class 3 are mainly controlled by the total precipitation amount, its intensity, and the aridity index during the events, which determine the total magnitudes, the occurrence time of the flood yield, the catchment elevation, and the flood routing time. The total variabilities in Class 4 are mainly controlled by the total precipitation amount, the potential evapotranspiration, and the aridity index during the events, which determine the total magnitude and occurrence time of the flood yield and evapotranspiration, the catchment area, the slope, the river morphology, the flood routing time, and the river storage capacity. The total variabilities in Class 5 are mainly controlled by the total precipitation amount and the aridity index during the events, which determine the total magnitude and occurrence time of the flood yield, the river density, and the flood routing time in the river system.
4.4.3 Control mechanisms in the individual flood event classes
In both the individual catchments and the entire region, the dominant control factors of all of the flood event classes are the total and mean precipitation volumes, the maximum precipitation intensity, the aridity index and precipitation timing during the events, and the precipitation in the antecedent days in the meteorological category (Figs. 9 and S6 in the Supplement). Therefore, the flood events in Class 1 are mainly caused by the rainfall with low volume and intensity before the wet season in the wet, steep, and low-latitude catchments. The events in Class 2 are mainly caused by the short rainfall with high mean intensity in the wet low-latitude catchments. The events in Class 3 are mainly caused by the long rainfall with high volume and intensity in the small high-altitude and low-latitude catchments. The events in Class 4 are mainly caused by the short rainfall with low volume and intensity in the latter part of the wet season in the small, dry, steep, high-altitude, and high-latitude catchments. The events in Class 5 are mainly caused by the long rainfall with high volume and low mean intensity in the dry, gentle, and large mid-latitude catchments.

Figure 9Variations of the four critical control factors in Classes 1–5. The solid dark-red dot and the gray dot define the mean and 50th percentile values, respectively. Each black box means the 25th and 75th percentile values, and the vertical line defines the minimum and maximum values without outliers. The violin shape means the frequency distribution of the control factor, and the unfilled shape means the control factor without statistical significance.
Flood classification has great advantages in systematically identifying manageable classes from a large number of historical flood events based on the similarity of flood response characteristics (Arthington et al., 2006; Kuentz et al., 2017; Poff et al., 2007; Sikorska et al., 2015; Sivakumar et al., 2015). Flood events in the same class are widely accepted as having similar hydrological responses caused by similar meteorological or underlying surface conditions (Sikorska et al., 2015). Therefore, it is more efficient to investigate flood event changes and their causal mechanisms in a comprehensive manner than through individual event analyses (Zhang et al., 2012). This is expected to provide more useful flood response characteristics for flood disaster management purposes (e.g., early warning and quick design of flood control plans) and deep insights for investigating riverine ecological and environmental response mechanisms.
In our study, the flood event classes are identified based on all of the flood response characteristics, which cover not only the flood magnitude metrics (e.g., large, moderate, and small floods), but also the event shape metrics (e.g., fast or slow floods). Therefore, our study captures more detailed response dynamics of flood events than the predefined classes reported by several existing studies, such as flash floods, short-rain floods, rain-on-snow floods, or snowmelt floods (Brunner et al., 2018; Merz and Blöschl, 2003; Sikorska et al., 2015). The specific values and boundaries of the flood response metrics of the individual classes were difficult to compare quantitatively with most existing studies because the adopted classification methods were usually different. However, flood event classes with similar hydrographs or response mechanisms were also found in the existing studies. Classes 1 and 2 are mainly in southern China, particularly in the Pearl and Yangtze river basins, which are controlled by the temperate climate without a dry season. Storms with high intensities and short durations before the wet season in southern China are likely to cause flood events with great magnitudes and variabilities (Class 1) or fast flood events with a high single peak and short durations (Class 2) (Gao et al., 2018). The flood response characteristics in these two classes are similar to flash floods and short-rain floods in Austria (Merz and Blöschl 2003) and fast events in Switzerland (Brunner et al., 2018) and China (Zhai et al., 2021). Class 3 is mainly in the Southeast River Basin controlled by the tropical cyclone climate. Severe storms with high intensities and durations are likely to cause high slow flood events with multiple peaks (Class 3) (Yin et al., 2010; Zhang et al., 2020). The flood response characteristics are similar to the high unit peak flood on the western coast of the USA (Saharia et al., 2017) because both of the response characteristics were mainly controlled by subtropical or tropical storms near the ocean in the Cf climate type. They are also similar to the slow events in China (Zhai et al., 2021) because the rates of positive changes are 0.01–0.94 h−1 in our study and 0.04–1.78 h−1 in China (Zhai et al., 2021) and the rates of negative changes are 0.01–0.33 h−1 in our study and 0.02–0.25 h−1 in China (Zhai et al., 2021). Class 4 is mainly in northern China, controlled by the cold climate with dry winters. The heavy storms ahead of the westerly trough mainly occur in the latter wet season in this region and usually have low intensities and short durations (Gao et al., 2018). Thus, they are likely to cause the small fast flood events (Class 4), whose mean flood peak magnitude and coefficient of variation are 0.47 m3 s−1 km−2 and 0.86, respectively. Similar flood events are also reported, e.g., the low-flashiness floods with mean flood peak magnitudes of 0.20–0.25 m3 s−1 km−2 and a mean coefficient of variation of approximately 0.90 in the northern part of central–eastern Europe (Kuentz et al., 2017), which is also controlled by the similar climate type (i.e., Df). Class 5 is mainly in the south–north climate zone of China (i.e., the Huaihe River Basin), which has the dual climate characteristics of both the southern and northern monsoons. Storms characterized by a long period of continuous rainy meteorology with high frequency and low intensities (e.g., Meiyu rainfalls) in the earlier wet season are likely to cause moderate slow flood events with long durations (Gao et al., 2018; Sampe and Xie, 2010). The flood response characteristics are similar to the intermediate flood events in China (Zhai et al., 2021). For example, the coefficients of variation are 0.65–3.15 in our study and 0.78–3.07 in China (Zhai et al., 2021). The rates of positive and negative changes are 0.02–8.00 h−1 and 0.01–0.64 h−1 in our study, while those reported in Zhai et al. (2021) are 0.36–4.90 and 0.09–0.46 h−1 in China. Therefore, the classification is helpful for deep investigation of the control mechanisms of flood events, which is easy to transfer to prediction of flood events with similar control factors (Sikorska et al., 2015).
The meteorological, land cover, and catchment attribute categories are mainly reported to affect the flood generation and routing processes and could be widely accepted as the critical control factors of spatial and temporal differences of flood event classes (Ali et al., 2012; Brunner et al., 2018; Merz and Blöschl, 2003; Zhang et al., 2022). We also find that the meteorological factor category is dominant, which explains 49.4 %–95.9 % and 36.5 %–50.5 % of the flood event differences in the individual classes at the catchment scale and in the entire region, respectively. Similar results were reported in Kuentz et al. (2017), i.e., that the climatic variables (e.g., precipitation, temperature, and aridity index) play the most important role in 75 % of the total flow signatures and catchment attributes (e.g., area, elevation, slope, and river density) and are more important for flood flashiness. The main significant meteorological factors are the precipitation volume, intensity, and aridity index during the events. The main explanation is that the precipitation and aridity index during the flood events directly affect the hydrograph through flood generation, e.g., total volume and peak, variability, duration, rates of change, and peak number (Merz and Blöschl, 2003; Aristeidis et al., 2010). Additionally, these control factors in the antecedent days directly affect the antecedent soil moisture, which determines the initial losses of precipitation and the runoff generation timing during the flood events (Hall and Blöschl, 2018; Xu et al., 2023). The contribution of the meteorological factor category is highest in Class 2, particularly in the Tangdukou catchment of the Yangtze River Basin, because the flood events in this class usually show quick responses to the precipitation, while the contribution is lowest in Class 5, because the river density and river morphology play important roles in the flood storage capacity and routing time in the river system.
Secondly, the catchment attributes (e.g., geographical location and topography) mainly affect the hydrograph patterns through flood routing (Berger and Entekhabi, 2001; Ali et al., 2012), and the factors identified in our study are the catchment area and length, the river density, and the ratio of river width to depth. For example, a catchment with a longer routing length, a larger routing area, a higher river density, and a larger ratio of river width to depth usually has more flood regulation and storage capacity and thus generates slow flood events, while a catchment with a shorter routing length, a smaller routing area, a lower river density, and a smaller ratio of river width to depth usually has a weaker flood regulation and storage capacity and thus generates fast flood events (Zhang et al., 2020). However, the comprehensive contributions of catchment attributes are not considerable, i.e., only 0.0 %–6.1 % in the entire region, because the catchment attributes do not always match the flood event responses well (Kuentz et al., 2017; Ali et al., 2012). The contributions of the catchment attribute category to the slow flood event classes (e.g., Classes 3 and 5) are usually larger than those in the fast flood event classes (e.g., Classes 1, 2, and 4) because the catchment attribute factors are significantly correlated with the flood response metrics in Classes 3 and 5, particularly the catchment maximum elevation and river density. Furthermore, the location, annual precipitation, potential evapotranspiration, and aridity index mainly affect the overall catchment hydrological conditions (Berger and Entekhabi, 2001; Kennard et al., 2010). Finally, the land covers mainly determine the precipitation intercept and retention processes, which directly affect the flood variability and rates of change (Kuentz et al., 2017; Merz et al., 2020). For example, catchments with more vegetation cover (e.g., forest and grassland) usually generate slow flood events, while catchments with less vegetation cover (e.g., rural and urban lands) usually generate fast flood events (Kuentz et al., 2017; Zhai et al., 2021). However, all of the catchments selected in our study are mainly in the river source regions with good vegetation cover and mean area percentages of 67.0 % for forest and 6.6 % for grassland. The spatial and temporal differences in the land covers are not remarkable because they only explain 3.8 % and 1.5 % of the flood event differences in Class 1 at the Liangshuikou catchment of the Yangtze River Basin and in the entire region.
Our study provides an approach to investigate some manageable flood event classes from massive large-scale events and quantify the meteorological and physio-geographical controls on the spatial and temporal variabilities of flood event classes. The approach can easily be applied to other regions or countries if a great number of flood events are collected. All of the selected flood events were sufficient to represent the flood response characteristics of headstream catchments in the main river basins of China. Thus, our classification results and the control mechanisms of the variability of the flood event classes could be applied in other regions with similar climate types. However, several aspects should be taken into account for further improvements to our study. Firstly, the total flood event number is the main restrictive factor in the classification performance, the flood event class representativeness, and the control mechanisms at the catchment scale (Merz and Blöschl, 2003; Olden et al., 2012; Sikorska et al., 2015; Tarasova et al., 2020). This could be overcome effectively by adopting large flood event numbers of individual classes (i.e., approximately 10 % of the total number of events, at least in our study) for the classification (Zhang et al., 2020). However, not all of the control mechanisms of the flood event classes were explained well because of the insufficient flood events, which were mainly in the Songliao and Yellow river basins or most of the catchments, except for the Shimenkan, Liangshuikou, and Tangdukou catchments of the Yangtze River Basin and the Xiaogulu and Hezikou catchments of the Pearl River Basin. The representatives of the individual classes should be investigated further, particularly in basins with low densities of flood events. Secondly, the class boundaries of most of the flood response metrics were not clear when using inductive classification approaches (Parajka et al., 2005; Sikorska et al., 2015), e.g., the flood magnitude and the rates of positive and negative changes in our study. Although the predefined sharp thresholds of all of the flood response metrics are beneficial for clearly separating the flood events using the classification tree methods (e.g., decision tree and crisp tree), the predefinition is still challenging (Sikorska et al., 2015; Brunner et al., 2017; Tarasova et al., 2020). Finally, the control mechanism deduction was mainly based on the statistical detection of the control factors and their contributions. The interactive impacts of the different control factor categories were still difficult to explain clearly using the adopted statistical analysis method (i.e., the constrained rank analysis in our study).
In our study, the main flood event classes characterized by multiple flood response metrics are identified in 68 headstream catchments using the hierarchical and partitional clustering methods. The control mechanisms of the different flood event classes are investigated using the constrained rank analysis and Monte Carlo permutation test. The results are summarized as follows: the partitional clustering method (i.e., k-medoids) performs better than the hierarchical method, and the five optimal flood event classes are identified, which are the moderately fast flood event class (Class 1), the highly fast flood event class (Class 2), the highly slow and multi-peak flood event class (Class 3), the slightly fast flood event class (Class 4), and the moderately slow flood event class (Class 5). Most of the flood event differences between the individual classes are explained by the meteorological, land cover, and catchment attribute factors. The flood event differences in Class 3 (85.4 %) are explained well, followed by Classes 2 (73.3 %), 1 (72.7 %), 4 (65.9 %), and 5 (65.7 %). The meteorological category is the most significant of all the control factors, particularly the precipitation factors (e.g., volume and intensity) and the aridity index during the flood events.
This study preliminarily investigates the flood event classes in space and time at some headstream stations of China, which is beneficial for exploring the comprehensive formation mechanisms of flood events and critical control factors. It provides the scientific foundation for flood event prediction and control. In future, more unimpaired flood events could be collected to strengthen the representativeness of flood event classes and to further support the control mechanism analysis of flood classes at individual catchments. The interactive impacts of control factor categories could also be decomposed further into the impacts of individual factors using the hydrological model with strong physical mechanisms.
All of the multivariable statistical analyses are implemented using the R software (version 3.1.1) (R Development Core Team, 2010), involving the aov, cor, and princomp functions in the stats package (version 4.1.3) for the independence test, linear correlation test, and principal component analysis (Mardia et al., 1979), the hcluster function in the amap package (version 0.8-18) for the hierarchical cluster analysis (Antoine and Sylvain, 2006), the clara function in the cluster package (version 2.1.3) for the k-medoids cluster analysis (Kaufman and Rousseeuw, 1990), and the NbClust function in the NbClust package (version 3.0.1) for the optimal class number determination and classification performance assessment (Charrad et al., 2014). The Monte Carlo permutation test is implemented using the envfit, decorana, rda, cca, and permutest functions in the vegan package (version 2.5-7) of the R software (version 3.1.1) (ter Braak, 1986; R Development Core Team, 2010).
The optimal classification method and cluster number are determined by comparing the classification performance between the hierarchical and k-medoids clustering methods among the individual cluster numbers. Figure B1 shows that the optimal criterion number is largest when the cluster number is five (i.e., 22.7 % of the total) for the k-medoids clustering method. The optimal criteria are CCC, TrCovW, Silhouette, Ratkowsky, and PtBiserial with values of −2.98, 1.39×1015, 4.12×106, 0.20, 0.29, and 0.39, respectively. Therefore, the five clusters using the k-medoids clustering method are optimal for further analysis in our study. The flood event numbers in the individual classes are 347, 306, 195, 375, and 223, accounting for 24.0 %, 21.2 %, 13.5 %, 25.9 %, and 15.4 % of the total number of events.
The DEM data is sourced from SRTM (Shuttle Radar Topography Mission) (Farr et al., 2007). The land use data are sourced from CNLUCC (https://doi.org/10.12078/2018070201, Xu et al., 2018), available from the Data Center of Resources and Environmental Science, Chinese Academy of Sciences. The historical flood events and synchronous precipitation were collected from the hydrological yearbooks of the Songliao, Yellow, Huaihe, Yangtze, Southeast, and Pearl river basins which were available in the Annual Hydrological Reports of the main river basins (http://www.mwr.gov.cn/, Ministry of Water Resources of China, 1993–2015), and the readers may contact them to request access. The daily precipitation and temperature observations were collected from the basic surface meteorological observation data of China, which were available in the China Meteorological Data Sharing Service System (http://data.cma.cn/data/cdcdetail/dataCode/A.0012.0001.html, China Meteorological Data Service Centre, 1993–2015).
The supplement related to this article is available online at https://doi.org/10.5194/hess-29-3257-2025-supplement.
YongyZ: conceptualization, methodology, formal analysis, writing – original draft preparation, writing – review and editing, funding acquisition. YongqZ: conceptualization, writing – review and editing. XZ: data curation, formal analysis, writing – review and editing, funding acquisition. JX: conceptualization, writing – review and editing. QT: conceptualization, writing – review and editing. WW: data processing, formal analysis. JW: formal analysis. XN: formal analysis. BH: formal analysis.
The contact author has declared that none of the authors has any competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.
This study was supported by the National Natural Science Foundation of China (grant no. 42171047) and the CAS-CSIRO Partnership Joint Project of 2024 (grant no. 177GJHZ2023097MI). We extend our gratitude to the editors and anonymous reviewers for their insightful feedback and constructive comments.
This study was supported by the National Natural Science Foundation of China (grant no. 42171047) and the CAS-CSIRO Partnership Joint Project of 2024 (grant no. 177GJHZ2023097MI).
This paper was edited by Yue-Ping Xu and reviewed by two anonymous referees.
Ahrens, B.: Distance in spatial interpolation of daily rain gauge data, Hydrol. Earth Syst. Sci., 10, 197–208, https://doi.org/10.5194/hess-10-197-2006, 2006.
Ali, G., Tetzlaff, D., Soulsby, C., McDonnell, J. J., and Capell, R.: A comparison of similarity indices for catchment classification using a cross-regional dataset, Adv. Water Resour., 40, 11–22, https://doi.org/10.1016/j.advwatres.2012.01.008, 2012.
Antoine, L. and Sylvain J.: Using amap and ctc Packages for Huge Clustering. R News, 6, 58–60, 2006.
Aristeidis, G. K., Tsanis, I. K., and Daliakopoulos, I. N.: Seasonality of floods and their hydrometeorologic characteristics in the island of Crete, J. Hydrol., 394, 90–100, https://doi.org/10.1016/j.jhydrol.2010.04.025, 2010.
Arthington, A. H., Bunn, S. E., Poff, N. L., and Naiman, R. J.: The challenge of providing environmental flow rules to sustain river ecosystems, Ecol. Appl., 16, 1311–1318, https://doi.org/10.1890/1051-0761(2006)016[1311:TCOPEF]2.0.CO;2, 2006.
Berger, K. P. and Entekhabi, D.: Basin hydrologic response relations to distributed physiographic descriptors and climate, J. Hydrol., 247, 169–182, https://doi.org/10.1016/S0022-1694(01)00383-3, 2001.
Berghuijs, W. R., Woods R. A., Hutton C. J., and Sivapalan M.: Dominant flood generating mechanisms across the United States, Geophys. Res. Lett., 43, 4382–4390, https://doi.org/10.1002/2016GL068070, 2016.
Black, A. R. and Werritty, A.: Seasonality of flooding: a case study of North Britain, J. Hydrol., 195, 1–25, https://doi.org/10.1016/S0022-1694(96)03264-7, 1997.
Brunner, M. I., Viviroli, D., Sikorska, A. E., Vannier, O., Favre, A. C., and Seibert, J.: Flood type specific construction of synthetic design hydrographs, Water Resour. Res., 53, 1390–1406, https://doi.org/10.1002/2016WR019535, 2017.
Brunner, M. I., Viviroli, D., Furrer, R., Seibert, J., and Favre, A. C.: Identification of flood reactivity regions via the functional clustering of hydrographs, Water Resour. Res., 54, 1852–1867, https://doi.org/10.1002/2017WR021650, 2018.
Charrad, M., Ghazzali, N., Boiteau, V., and Niknafs, A.: NbClust: An R package for determining the relevant number of clusters in a data set, J. Stat. Softw., 61, 1–36, https://doi.org/10.18637/jss.v061.i06, 2014.
China Institute of Water Resources and Hydropower Research and Research Center on Flood and Drought Disaster Prevention and Reduction, the Ministry of Water Resources: Atlas of Flash Flood Disasters in China, Sinomap Press, 2021 (in Chinese).
China Meteorological Data Service Centre: Basic surface meteorological observation data of China from 1993 to 2015, China Meteorological Data Service Centre [data set], http://data.cma.cn/data/cdcdetail/dataCode/A.0012.0001.html (last access: 27 April 2024), 1993–2015 (in Chinese).
Dhakal, N., Jain, S., Gray, A. , Dandy, M., and Stancioff, E.: Nonstationarity in seasonality of extreme precipitation: a nonparametric circular statistical approach and its application, Water Resour. Res., 51, 4499–4515, https://doi.org/10.1002/2014WR016399, 2015.
Dong, Q., Chen, X., and Chen, T.: Characteristics and changes of extreme precipitation in the Yellow-Huaihe and Yangtze-Huaihe Rivers Basins, China, J. Climate, 24, 3781–3795, https://doi.org/10.1175/2010JCLI3653.1, 2011.
Du, Y., Wang, D., Zhu, J., Lin, Z., and Zhong, Y.: Intercomparison of multiple high-resolution precipitation products over China: Climatology and extremes, Atmos. Res., 278, 106342, https://doi.org/10.1016/j.atmosres.2022.106342, 2022.
Farr, T. G., Rosen, P. A., Caro, E., Crippen, R., Duren, R., Hensley, S., Kobrick, M., Paller, M., Rodriguez, E., Roth, L., Seal, D., Shaffer, S., Shimada, J., Umland, J., Werner, M., Oskin, M., Burbank, D., and Alsdorf, D.: The shuttle radar topography mission, Rev. Geophys., 45, RG2004, https://doi.org/10.1029/2005RG000183, 2007.
Gao, S. T., Zhou, Y. S., and Ran, L. K.: A review on the formation mechanisms and forecast methods for torrential rain in China, Chinese Journal of Atmospheric Sciences, 42, 833–846, https://doi.org/10.3878/j.issn.1006-9895.1802.17277, 2018 (in Chinese).
Hall, J. and Blöschl, G.: Spatial patterns and characteristics of flood seasonality in Europe, Hydrol. Earth Syst. Sci., 22, 3883–3901, https://doi.org/10.5194/hess-22-3883-2018, 2018.
Hargreaves, G. H. and Samani, Z. A.: Estimating potential evapotranspiration, J. Irrig. Drain. Div., 108, 225–230, https://doi.org/10.1016/0022-1694(82)90165-2,1982.
Hirabayashi, Y., Mahendran, R., Koirala, S., Konoshima, L., Yamazaki, D., Watanabe, S., Kim, H., and Kanae, S.: Global flood risk under climate change, Nat. Clim. Change, 3, 816–821, https://doi.org/10.1038/NCLIMATE1911, 2013.
Jin, H., Chen, X., and Adamowski, J. H. S.: Determination of duration, threshold and spatiotemporal distribution of extreme continuous precipitation in nine major river basins in China, Atmos. Res., 300, 107217, https://doi.org/10.1016/j.atmosres.2023.107217, 2024.
Jongman, B. Winsemius, H. C., Aerts, J. C. J. H., Coughlan de Perez, E., van Aalst, M. K., Kron, W., and Ward, P. J.: Declining vulnerability to river floods and the global benefits of adaptation, P. Natl. Acad. Sci. USA, 112, E2271–E2280, https://doi.org/10.1073/pnas.1414439112, 2015.
Kaufman, L. and Rousseeuw, P. J.: Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York, ISBN 0471878766, 1990.
Kennard, M. J., Pusey, B. J., Olden, J. D., Mackay, S. J., Stein, J. L., and Marsh, N.: Classification of natural flow regimes in Australia to support environmental flow management, Freshwater Biol., 55, 171–193, https://doi.org/10.1111/j.1365-2427.2009.02307.x, 2010
Kuentz, A., Arheimer, B., Hundecha, Y., and Wagener, T.: Understanding hydrologic variability across Europe through catchment classification, Hydrol. Earth Syst. Sci., 21, 2863–2879, https://doi.org/10.5194/hess-21-2863-2017, 2017.
Legendre, P. and Anderson, M. J.: Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments, Ecol. Monogr., 69, 1–24, https://doi.org/10.1890/0012-9615(1999)069[0001:DBRATM]2.0.CO;2, 1999.
Liu, J. Y., Feng, S. Y., Gu, X. H., Zhang, Y. Q., Beck, H. E., Zhang, J. W., and Yan, S.: Global changes in floods and their drivers, J. Hydrol., 614, 128553, https://doi.org/10.1016/j.jhydrol.2022.128553, 2020.
Mardia, K. V., Kent, J. T., and Bibby, J. M.: Multivariate Analysis, Academic Press, London, ISBN 9780080570471, 1979.
Merz, R. and Blöschl, G.: A process typology of regional floods, Water Resour. Res., 39, 1340, https://doi.org/10.1029/2002WR001952, 2003.
Merz, R., Tarasova, L., and Basso, S.: The Flood Cooking Book: ingredients and regional flavors of floods across Germany, Environ. Res. Lett., 15, 114024, https://doi.org/10.1088/1748-9326/abb9dd, 2020.
Ministry of Water Resources of the People's Republic of China: Annual hydrological report of China, Ministry of Water Resources of the People’s Republic of China [data set], http://www.mwr.gov.cn/ (last access: 27 April 2024), 1993–2015 (in Chinese).
Ministry of Water Resources of the People's Republic of China: Bulletin of flood and drought disasters in China from 2009 to 2018, China Water & Power Press, http://www.mwr.gov.cn/sj/tjgb/zgshzhgb/ (last access: 27 April 2024), 2020a (in Chinese).
Ministry of Water Resources of the People's Republic of China: Code for hydrologic data processing (SL/T 247—2020), ISBN 155170681, 2020b (in Chinese).
Olden, J. D., Kennard, M. J., and Pusey, B. J.: A framework for hydrologic classification with a review of methodologies and applications in ecohydrology, Ecohydrology, 5, 503–518, https://doi.org/10.1002/eco.251, 2012.
Parajka, J., Merz, R., and Blöschl, G.: A comparison of regionalisation methods for catchment model parameters, Hydrol. Earth Syst. Sci., 9, 157–171, https://doi.org/10.5194/hess-9-157-2005, 2005.
Peel, M. C., Finlayson, B. L., and McMahon, T. A.: Updated world map of the Köppen-Geiger climate classification, Hydrol. Earth Syst. Sci., 11, 1633–1644, https://doi.org/10.5194/hess-11-1633-2007, 2007.
Poff, N. L. and Zimmerman, J. K. H.: Ecological responses to altered flow regimes: a literature review to inform environmental flows science and management, Freshwater Biol., 55, 194–205, https://doi.org/10.1080/09647778409514905, 2010.
Poff, N. L., Allan, J. D., Bain, M. B., Karr, J. R., Prestegaard, K. L., Richter, B. D., Sparks, R. E., and Stromberg, J. C.: The natural flow regime: a paradigm for river conservation and restoration, Bioscience, 47, 769–784, https://doi.org/10.1080/09647778409514905, 1997.
Poff, N. L., Olden, J. D., Merritt, D., and Pepin, D.: Homogenization of regional river dynamics by dams and global biodiversity implications, P. Natl. Acad. Sci. USA, 104, 5732–5737, https://doi.org/10.1073/pnas.0609812104, 2007.
R Development Core Team: R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, http://www.R-project.org (last access: 27 April 2024), 2010.
Rustomji, P., Bennett, N., and Chiew, F.: Flood variability east of Australia's great dividing range, J. Hydrol., 374, 196–208, https://doi.org/10.1016/j.jhydrol.2009.06.017, 2009.
Saharia, M., Kirstetter, P. E., Vergara, H., Gourley, J. J., and Hong, Y.: Characterization of floods in the United States, J. Hydrol., 548, 524–535, https://doi.org/10.1016/j.jhydrol.2017.03.010, 2017.
Sampe, T. and Xie, P.: Large-scale dynamics of the Meiyu baiu rainband: Environmental forcing by the westerly jet, J. Climate, 23, 113–133, https://doi.org/10.1175/2009JCLI3128.1, 2010.
Shi, P., Zhang, Y., Li, Z., Li, P., and Xu, G.; Influence of land use and land cover patterns on seasonal water quality at multi-spatial scales, Catena, 151, 182–190, https://doi.org/10.1016/j.catena.2016.12.017, 2017.
Sikorska, A. E., Viviroli, D., and Seibert, J.: Flood-type classification in mountainous catchments using crisp and fuzzy decision trees, Water Resour. Res., 51, 7959–7976, https://doi.org/10.1002/2015WR017326, 2015.
Sivakumar, B., Singh, V., Berndtsson, R., and Khan, S.: Catchment classification framework in hydrology: challenges and directions, J. Hydrol. Eng., 20, A4014002, https://doi.org/10.1061/(ASCE)HE.1943-5584.0000837, 2015.
Tan, J., Xie, X., Zuo, J., Xing, X., Liu, B., Xia, Q., and Zhang, Y.: Coupling random forest and inverse distance weighting to generate climate surfaces of precipitation and temperature with multiple-covariates, J. Hydrol., 598, 126270, https://doi.org/10.1016/j.jhydrol.2021.126270, 2021.
Tarasova, L., Basso, S., Zink, M., and Merz, R.: Exploring controls on rainfall-runoff events: 1. Time series-based event separation and temporal dynamics of event runoff response in Germany, Water Resour. Res., 54, 7711–7732, https://doi.org/10.1029/2018WR022587, 2018.
Tarasova, L., Merz, R., Kiss, A., Basso, S., Günter, B., Merz, B., Viglione, A., Plötner, S. Guse, B., Schumann, A., Fischer, S., Ahrens, B., Anwar, F., Bárdossy, A., Bühler, P., Haberlandt, U., Kreibich, H., Krug, A., Lun, D., Müller-Thomy, H., Pidoto, R., Primo, C., Seidel, J., Vorogushyn, S., and Wietzke, L.: Causative classification of river flood events, WIRES Water, 6, e1353, https://doi.org/10.1002/wat2.1353, 2019.
Tarasova, L., Basso, S., Wendi, D., Viglione, A., Kumar, R., and Merz, R.: A process-based framework to characterize and classify runoff events: The event typology of Germany, Water Resour. Res., 56, e2019WR026951, https://doi.org/10.1029/2019WR026951, 2020.
ter Braak, C. J. F.: Canonical Correspondence Analysis: a new eigenvector technique for multivariate direct gradient analysis, Ecology, 67, 1167–1179, https://doi.org/10.2307/1938672, 1986.
Villarini, G.: On the seasonality of flooding across the continental United States, Adv. Water Resour., 87, 80–91, https://doi.org/10.1016/j.advwatres.2015.11.009, 2016.
Wang, H., Liu, J. G., Klaar, M., Chen, A. F., Gudmundsson, L., and Holden, J.: Anthropogenic climate change has influenced global river flow seasonality, Science, 383, 1009–1014, https://doi.org/10.1126/science.adi9501, 2024.
Xu, X. L., Liu, J. Y., Zhang, S. W., Li, R. D., Yan, C. Z., and Wu, S. X.: CNLUCC: Multi‐period land use remote sensing monitoring dataset in China, Data Center of Resources and Environmental Science, Chinese Academy of Sciences [data set], https://doi.org/10.12078/2018070201 (last access: 27 April 2024), 2018.
Xu, Z., Zhang, Y., Blöschl, G., and Piao, S.: Mega forest fires intensify flood magnitudes in southeast Australia, Geophys. Res. Lett., 50, e2023GL103812, https://doi.org/10.1029/2023GL103812, 2023.
Yin, Y. Z., Gemmer, M., Luo, Y., and Wang, Y.: Tropical cyclones and heavy rainfall in Fujian Province, China, Quatern. Int., 226, 122–128, https://doi.org/10.1016/j.quaint.2010.03.015, 2010.
Zhai, X. Y., Guo, L., and Zhang, Y. Y.: Flash flood type identification and simulation based on flash flood behavior indices in China, Sci. China Earth Sci., 64, 1140–1154, https://doi.org/10.1007/s11430-020-9727-1,2021.
Zhang, S. L., Zhou, L. M., Zhang, L., Yang, Y. T., Wei, Z. W., Zhou, S., Yang, D. W., Yang, X. F., Wu, X. C., Zhang, Y. Q., and Dai, Y. J.: Reconciling disagreement on global river flood changes in a warming climate, Nat. Clim. Change, 12, 1160–1167, https://doi.org/10.1038/s41558-022-01539-7, 2022.
Zhang, Y., Ren, Y., Ren, G., and Wang, G.: Precipitation trends over mainland China from 1961–2016 after removal of measurement biases, J. Geophys. Res.-Atmos., 125, e2019JD031728, https://doi.org/10.1029/2019JD031728, 2020.
Zhang, Y. Y., Arthington, A. H., Bunn, S. E., Mackay, S., Xia, J., and Kennard, M.: Classification of flow regimes for environmental flow assessment in regulated rivers: the Huai River Basin, China, River Res. Appl., 28, 989–1005, https://doi.org/10.1002/rra.1483, 2012.
Zhang, Y. Y., Zhou, Y. Y., Shao, Q. Y., Liu, H. B., Lei, Q. L., Zhai, X. Y., and Wang, X. L.: Diffuse nutrient losses and the impact factors determining their regional differences in four catchments from North to South China, J. Hydrol., 543, 577–594, https://doi.org/10.1016/j.jhydrol.2016.10.031, 2016.
Zhang, Y. Y., Chen, Q. T., and Xia, J.: Investigation on flood event variations at space and time scales in the Huai River Basin of China using flood behavior classification, J. Geogr. Sci., 30, 2053–2075, https://doi.org/10.1007/s11442-020-1827-3, 2020.
Zhang, Y. Y., Zhang, Y. Q., Zhai, X., Xia, J., Tang, Q., Zhao, T., and Wang, W.: Predicting flood event class using a novel class membership function and hydrological modelling, Earth's Future, 12, e2023EF004081, https://doi.org/10.1029/2023EF004081, 2024.
 
 
                                     
                                     
                                     
                                    






