A novel method to identify sub-seasonal clustering episodes of extreme precipitation events and their contributions to large accumulation periods

. Temporal (serial) clustering of extreme precipitation events on sub-seasonal time scales is a type of compound event. It can cause large precipitation accumulations and lead to ﬂoods. We present a novel, count-based procedure to identify episodes of sub-seasonal clustering of extreme precipitation. We introduce two metrics to characterise the frequency of sub-seasonal clustering episodes and their relevance for large precipitation accumulations. The procedure does not require the investigated variable (here precipitation) to satisfy any speciﬁc statistical properties. Applying this procedure to daily precipitation from the 5 ERA5 reanalysis data set, we identify regions where sub-seasonal clustering occurs frequently and contributes substantially to large precipitation accumulations. The regions are the east and northeast of the Asian continent (north of Yellow Sea, in the Chinese provinces of Hebei, Jilin and Liaoning; North and South Korea; Siberia and east of Mongolia), central Canada and south of California, Afghanistan, Pakistan, the southeast of the Iberian Peninsula, and the north of Argentina and south of Bolivia. Our method is robust with respect to the parameters used to deﬁne the extreme events (the percentile threshold and the 10 run length) and the length of the sub-seasonal time window (here 2 – 4 weeks). This procedure could also be used to identify temporal clustering of other variables (e.g. heat waves) and can be applied on different time scales (sub-seasonal to decadal).

For catchment boundaries we use the HydroBASINS data set format 2 (with inserted lakes) (Lehner and Grill, 2013).
HydroBASINS contains a series of polygon layers that delineate catchment area boundaries at a global scale. This data set has a grid resolution of 15 arc-seconds, corresponding to approximately 500 m at the equator. The HydroBASINS product provides 60 12 levels of catchment area delineations. The first 3 levels are assigned manually, with level 1 distinguishing 9 continental regions. From Level 4 onward, the breakdown follows a Pfafstetter coding, where a larger basin is sequentially subdivided into 9 smaller units: the 4 largest tributaries and the 5 inter-basins. A basin is divided into two sub-basins at every location where two river branches meet and where they have an individual upstream area of at least 100 km 2 . We use level 6 of HydroBASINS for our study.
Daily precipitation aggregated by catchment area was computed by taking the average of all ERA5 grid points values located within the catchment area (see Fig. 1 for an illustration). Computations were performed using the GeoPandas (version 0.6.0 and onward) Python library (Jordahl et al., 2019). Some small or elongated catchments had few or no grid points inside their boundaries. This is a consequence of the Pfafstatter coding used to construct the HydroBASINS division, where large differences can exist in the catchment areas for a given level. We retained only catchments containing at least five ERA5 grid 70 points for our analyses.
Further, we kept only catchments located in two latitudinal bands between 20°and 70°with a catchment 99 th annual percentile (99p) of daily precipitation above 10 mm. Those criteria remove catchments from the tropics and the poles, as well as dry areas. The timing of precipitation extremes is important for our analyses and Rivoire et al. (18 January 2021) showed that the timing of extreme precipitation is well captured by ERA5 in the extratropics but less so in the tropics.

Identification of extreme precipitation events
We used a Peak-Over-Threshold approach to identify extreme precipitation events from the time series of daily precipitation per catchment (Coles, 2001). We consider only the precipitation values exceeding the local annual 99 th percentile. We use annual percentiles rather than seasonal percentiles because they are more impact relevant. To analyse sub-seasonal serial 80 clustering, high frequency clustering had to be removed from the daily precipitation time series. High frequency clustering, i.e. successive days of extreme precipitation, can be caused by a stationary synoptic system (e.g., an extratropical cut-off cyclone). We employed the "runs declustering" method to account for the high frequency clustering (Ferro and Segers, 2003). Thereby, given a run length r and a threshold t, days with precipitation exceeding t that are separated by less than r days with precipitation below t were grouped into one high-frequency cluster (see Fig. 3a for an illustration). After applying the 85 declustering approach, a series of independent extreme daily precipitation events was defined. In the case of a high frequency cluster, the first day of the cluster was retained as the representative day for the event.
The choice of the two parameters (t and r) affects the distribution of independent extreme events (Coles, 2001). We followed the empirical approach of Barton et al. (2016) to determine reasonable values for the parameters. First, we selected two different thresholds: the 98 th and 99 th annual percentiles (further denoted as 98p and 99p) of the catchment area daily precipitation 90 distribution. These thresholds have been used in previous studies (e.g. Fukutome et al., 2015).
The run length can either be determined with an objective method (Barton et al., 2016;Fukutome et al., 2015) or chosen based on meteorological process arguments (Lenggenhager and Martius, 2019). Following the approach of Lenggenhager and Martius (2019), we tested run lengths of both one and two days, corresponding to the influence time of a cyclone at one location (Lackmann, 2011). The R package evd (Stephenson, 2002) was used for the computation of the yearly percentiles and the identification of independent peaks over the threshold, i.e. for the removal of the high-frequency clusters with the runs declustering described above.

Identification of sub-seasonal clustering episodes
The following procedure is used to automatically identify sub-seasonal clustering episodes. We start by counting the number 100 of extreme precipitation events (n w ) contained in a moving time window of w days after applying a runs declustering (as illustrated in Fig. 3b). In parallel, we calculate the precipitation accumulation (acc w ) for the moving time window. n w and acc w are computed for each day of the time series over the next w − 1 days (not w, as the starting day is included in the time window length). Our results are robust to the choice of a centred or lagged time window, except at the boundaries of the series. We then run our automated clustering episode identification algorithm that consists of the following steps: (i) isolate time windows with the highest count of extreme events n w ; (ii) from these, select the time window with the largest precipitation accumulation acc w , this is the first clustering episode; (iii) remove all days within w − 1 days before and after the starting day of the first episode from the initial time series, to avoid any overlap between the selected episodes; (iv) repeat steps (ii) and 110 (iii) on the reduced time series, until a pre-determined number N ep of clustering episodes is identified (the choice of N ep is discussed further below). This iterative selection results in the identification of non-overlapping clustering episodes sorted in a decreasing order by extreme event counts, and then by precipitation accumulations. We denote this classification as Cl n . In addition, we identify and classify episodes with high to extreme precipitation accumulation, denoted as Cl acc . This is done by applying steps (ii) to (iv) of our automated identification algorithm to the original precipitation time series. The episodes 115 picked out by the clustering episode identification and the extreme precipitation accumulation identification can be partly or completely identical. Examples of Cl n and Cl acc for the time series of Fig. 4 are shown in Table 1. Sub-seasonal clustering frequency and contribution to large accumulations can now be assessed based on the two classifications.

120
As a preliminary remark, we note that if the Cl n classification of a given catchment has many clustering episodes that contain several extreme events, then sub-seasonal clustering is occurring frequently in that catchment. Similarly, if the two classifica- Table 1. Sub-seasonal clustering episodes corresponding to Fig. 4 and their respective rank in the Cln and Clacc classifications. The columns Cln and Clacc are empty for episodes excluded from the classifications. For this example, S f = i∈Cln nw(i)·qi = 3·1+1·0.38+0·0.16 = 3.38, S = i∈Clacc nw(i) · qi = 3 · 1 + 0 · 0.38 + 1 · 0.16 = 3.16 and Sr = tions Cl n and Cl acc have episodes with the same number of extreme events at identical ranks, this implies that the episodes with the largest number of extreme events correspond to the episodes with the largest precipitation accumulations. In this case, the contribution of sub-seasonal clustering to large precipitation accumulations is maximised. We would like to build metrics 125 that synthesize the properties of the two classifications and allow us to directly compare catchments. This problem is equivalent to defining a scoring system, where each episode is given a weight q i depending on its position in the classification, and by taking into account the number of extreme events in each episode. We will use the method of the incenter of a convex cone following (Sitarz, 2013) to construct our weighting scheme. Sitarz (2013) assume two intuitive conditions for a scoring system.
First, they assign more points for the first place than for the second place, and more for the second than for the third, and so 130 on. Second, the difference between the i th place and the (i+1) th place should be larger than the difference between the (i+1) th place and the (i+2) th place. This is equivalent to considering the following set of points: where x 1 denotes the points for the first place, x 2 the points for the second place,. . . , and x N the points for the N th place.
Any choice of points in K would satisfy the two conditions for a scoring system, however we would like to have a unique 135 and representative value. The option chosen by Sitarz (2013) is to look for the equivalent of a mean value: the incenter of K.
Formally, the incenter is defined as an optimal solution of the following optimization problem by Henrion and Seeger (2010): where S x denotes the unit sphere, ∂K denotes the boundary of set K and dist denotes the distance in the Euclidean space. By using the calculation presented in the Appendix of Sitarz (2013), and dividing by the parameter λ and the points of the first place (x 1 ) to get the weights (q i ), we obtain: The weight q 1 is always 1 but the values of weights q 2 to q N depend on N and in our case N is the number of clustering episodes N ep . The first metric S f that describes the propensity of a catchment for sub-seasonal clustering is defined as the product of 150 the weight q i by the corresponding number of extreme events in the i th episode n w (i) summed over all N ep episodes in the Cl n classification (Eq. (3)): We refer to S f as the frequency metric, since it measures how often sub-seasonal clustering episodes happen and how many extreme events these episodes contain. High values of S f imply that the first N ep sub-seasonal clustering episodes contain a 155 large number of extreme events.
The second metric S r describes how sub-seasonal clustering episodes contribute to large sub-seasonal precipitation accumulations. It is defined as the ratio of S f to S f , where S f is computed the same way as S f , but this time using the Cl acc classification (Eq. (4)): We refer to S r as the relevance metric. S r is unit-less. It ranges between 0 and 1 and measures the degree of similarity between the two classifications. S r is equal to 1 when S f = S f , i.e. when the two classifications have episodes with the same number of extreme events at identical ranks. The episodes may not be classified in the exact same order, however, they are ranked by their respective n w in a strict descending order in both classifications. S r equal to 0 implies that the N ep episodes in the S f classification contain no extreme events (n w (i) = 0 ∀i ∈ [1, N ep ]). Thus, the episodes with the largest precipitation 165 accumulations contain no extreme events. In the example of Tab. 1, S r is close to 1, hence sub-seasonal clustering episodes contribute substantially to the three top 14-day precipitation accumulations.
The exact interpretation of intermediary values of S r requires to look at both classifications (Cl n and Cl acc ) in detail to see where they differ from each other. For example, S r = 0.8 means that both classifications have a high degree of similarity and that sub-seasonal clustering episodes contribute to large precipitation accumulations. However, it does not imply that 80% of the episodes have ranked equally in both classifications. The fact that S r is normalised allows to compare different catchments and assess their sensitivity to the choice of the parameters. Note that performing a regression between Cl n and Cl acc would require to give a unique identifier to each episode according to its starting day. In that case, the strength of the regression would be lowered when two episodes containing the same number of extreme events just swap their ranks in the two classifications.
Such a change does not affect S r . Hence a regression would be a more conservative approach in assessing the contribution of the contribution of the i th term to the sums in S f and S r becomes smaller as N ep increases. We have tested several values of N ep ranging from 10 to 50 and found that the results with N ep ranging from 30 to 50 are comparable. Hence, we selected N ep = 50 for our analysis.

Correlations with index of dispersion and significance test
We computed the index of dispersion φ for each catchment (Cox and Isham, 1980;Mailier et al., 2006) to compare our results

185
to a more traditional method. For an homogeneous Poisson process, φ = 1. When φ > 1, the process is more clustered than random. When φ < 1, the process is more regular than random (Mailier et al., 2006). To estimate φ for a given catchment, we separated the precipitation time series in successive intervals of w days and counted the number of extreme events in each interval. An estimator of φ is then given by (Mailier et al., 2006): We computed S f andφ, and calculated their Spearman rank correlation coefficient (Wilks, 2011) for all catchments and for each parameter combination (Table 2). All correlation coefficients are positive with values between 0.738 and 0.885, and significant with p-values < 10 −5 . Figure 5 displays a scatter plot of S f versusφ for all catchments for the initial parameter 195 combination (r = 2 days, t = 99p, w = 21 days) and illustrates this correlation. This significant positive correlation means that the use of S f andφ lead to similar conclusions about the clustering of extreme precipitation events. This is illustrated in Fig. 6, which shows S f andφ for the initial parameter combination and where it can be seen that regions of high (low) S f correspond to regions of high (low)φ. Figure 6a is further discussed in the results section.

200
First, world maps of the frequency and relevance metrics for all selected catchments are shown using the initial combination of parameters (r = 2 days, t = 99p, w = 21 days). These maps indicate regions where sub-seasonal clustering is prevalent. Then, the sensitivity of the sub-seasonal clustering to the parameter choice is assessed by testing 12 different parameter combinations: w = 14, 21, 28 days; u = 98p, 99p; r = 1, 2 days. Finally, we identify regions with values of S f above the 75 th percentile and values of S r below the 25 th percentile (Fig. 8c).

Frequency and precipitation accumulation contributions of sub-seasonal clustering episodes
The high values of S f mean that the clustering episodes identified by our algorithm contain a relatively large number of extreme 235 events, whereas the low values of S r mean that episodes leading to the largest accumulations contain a low number or even no extreme events. Such regions that exhibit frequent clustering, but where this frequent clustering has a limited contribution to large accumulations are the following: the south of Tibet, the south of the Qinghai and west of the Sichuan Chinese provinces and central Bolivia. Again, small groups of two to three or isolated catchments can be found on every continent. Only a few catchments exhibit this combination of high S f and low S r values, highlighting the importance of the clustering of extreme 240 events for generating the largest accumulations for the majority of the catchments.

Sensitivity analysis of S r
The choice of the parameters will affect S f for a given catchment. A decrease in the threshold t and a decrease in the run length r both increase the number of extreme events per episode. An increase in the time window w increases the likelihood of capturing more extreme events in a single episode. Together, those changes are expected to increase S f for most catchments.

245
However, the variations of S r with the parameters depends on the variations of both S f and S f . If the variations of S f and S f are of the same order of magnitude, then S r will change only slightly. It is therefore of interest to perform a sensitivity analysis on S r by modifying the parameters used to define the clustering episodes to see if the distribution of S r remains similar. Figure 9a shows the distributions of S r for all parameters combinations, while Figure 9b displays the distributions of the difference between the initial parameter combination (r = 2 days, t = 99p, w = 21 days) and the other combinations. The data 250 used to draw the boxplots can be found in tables A1 and A2 in the appendix. The median value of S r , indicated by the green lines in the boxplots, exhibits very low sensitivity to changes in the parameters with a minimum value of 0.79 (for r = 2 days, t = 98p, w = 14 days, see Fig. 9a) and a maximum value of 0.84 (r = 1 days, t = 98p, w = 28 days). The same constant (r = 1 days, t = 99p, w = 21 days), results in an absolute difference in S r smaller than 0.05 for almost all catchments.
However, the variation can be more substantial for other parameter combinations. For example, a change in t from 99p to 98p and in w from 21 to 14 days, while keeping r constant (e.g. r = 2 days, t = 98p, w = 14 days), leads to much larger absolute 260 differences in S r that can reach up to 0.35. Moreover, S r at a given catchment can exhibit a wide range of variations when looking at all parameters combinations (not shown).
Taking into account the potential for high sensitivity to the parameters, we counted the number of parameter combinations where catchments are above the 75 th percentile of both the S f and S r distributions to reach more robust conclusions. Areas with high counts, i.e. where catchments have been selected in several parameter combinations, are almost identical to the 265 ones identified with the initial parameter combination (Fig. 10a). This means that the parameters selection does not have a substantial impact on the identified regions where sub-seasonal clustering occurs frequently and contributes substantially to large accumulations. This robustness with respect to variations in the parameters is also found for the catchments with S f < 25p and S r > 75p (rare clustering with substantial contribution), and S f < 75p and S r > 25p (frequent clustering with limited contribution), We present a novel count-based procedure to analyse sub-seasonal clustering of extreme precipitation events. The procedure identifies individual clustering episodes and introduces two metrics to characterise the frequency of sub-seasonal clustering episodes (S f ) and their relevance for large precipitation accumulations (S r ). The procedure is an avowedly simple countbased approach that has its advantages and drawbacks. Conceptually, our approach differs from previously proposed methods

275
to quantify sub-seasonal clustering that are based on parametric distributions with associated assumptions on the underlying distributions of the data. A major advantage of our method is that it does not require the investigated variable (here precipitation) to satisfy any specific statistical properties. This allowed us to study annual percentiles, which in most catchments exhibit a strong seasonal cycle. The seasonal cycle violates the independence assumptions underlying the parametric approaches. The seasonality issue is countered in the parametric approaches by either focusing on a single season (e.g., Mailier et al., 2006) 280 or by including a seasonally varying occurrence rate in the models (Villarini et al., 2013). Working with annual percentiles allows us to focus on high-impact events. This comes at the cost of not being able to distinguish seasonal drivers from other drivers of sub-seasonal clustering. If precipitation in some regions occurs more often or with more intensity during a specific period of the year, then the use of an annual thresholds will result in a more frequent detection of extremes during this specific period. Consequently, extremes will also be more likely to happen successively in a sub-seasonal time window. Hence, a 285 catchment exhibiting a strong seasonality of extreme precipitation would likely show higher values of S f than a catchment where precipitation shows no or weak seasonality.
One shortcoming of our method is the lack of a simple assessment of the significance of the clustering. In mitigation, we note that this can be done using the established methods and that our procedure introduces valuable practical refinements to the established methods. First, the identification of individual clustering episodes allows researchers to study the atmospheric 290 conditions that prevailed before and during an episode and hence the processes leading to clustering. An illustration is given in Figure 11a, which shows a 21-days clustering episode identified with our procedure for a catchment of the Iberian Peninsula (HydroBASINS ID n°2060654920), with the corresponding Potential Vorticity and Integrated Vapor Transport composites ( Fig. 11b and Fig. 11c, respectively). Second, knowing when clustering episodes happen enables researchers to study their medium-range to seasonal predictability (see Webster et al. (2011) for an example). Third, the episode identification makes 295 possible to link the precipitation clustering to hydrological impacts (e.g., using disasters data bases or hydrological models).
And finally, the S r metric allows to globally assess the contribution of sub-seasonal clustering to high precipitation accumulations, which to our knowledge cannot be done with any existing method.
Applying this methodology to the recent ERA5 data set, we identify regions where sub-seasonal clustering of annual high precipitation percentiles occurs frequently and contributes substantially to large precipitation accumulations. Those regions   Table A2. Descriptive statistics of the distributions of the difference between the initial parameter combination (r = 2 days, t = 99p, w = 21 days) and the other combinations. The measures are the same as in Table A1.