the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Constructing a geography of heavy-tailed flood distributions: insights from common streamflow dynamics
Abstract. Heavy-tailed flood distributions depict the higher occurrence probability of extreme floods. Understanding the spatial distribution of heavy tail floods is essential for effective risk assessment. Conventional methods often encounter data limitations, leading to uncertainty across regions. To address this challenge, we utilize hydrograph recession exponents derived from common streamflow dynamics, which have shown to be a robust indicator of flood tail propensity across analyses with varying data lengths. Analyzing extensive datasets covering the Atlantic Europe, Northern Europe, and the continental United States, we uncover distinct patterns: prevalent heavy tails in the Atlantic Europe, diverse behavior in the continental United States, and predominantly nonheavy tails in Northern Europe. The regional tail behavior has been observed in relation to the interplay between terrain and meteorological characteristics, and we further conducted quantitative analyses to assess the influence of hydroclimatic conditions using Köppen classifications. Notably, temporal variations in catchment storage are a crucial mechanism driving highly nonlinear catchment responses that favor heavy-tailed floods, often intensified by concurrent dry periods and high temperatures. Furthermore, this mechanism is influenced by various flood generation processes, which can be shaped by both hydroclimatic seasonality and catchment scale. These insights deepen our understanding of the interplay between climate, physiographical settings, and flood behavior, while highlighting the utility of hydrograph recession exponents in flood hazard assessment.
- Preprint
(2228 KB) - Metadata XML
-
Supplement
(876 KB) - BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on hess-2024-159', Anonymous Referee #1, 02 Jul 2024
Comments on “Constructing a geography of heavy-tailed flood distributions: insights from common streamflow dynamics” by Wang et al., submitted to HESS for possible publication.
The authors provide empirical analyses on the pattern and drivers of the tail heaviness. They adopt hydrograph recession exponent as the indicator of watersheds with or without propensity for heavy tails of flood peak distributions. The contrasts in the tail heaviness across watersheds, climate regions, seasons, shed light on the importance of characterizing catchment storage in dictating flood regimes. The analyses are interesting and robust. A major of mine is that the manuscript is lengthy that obscure the new wisdom obtained. I would suggest the authors to further refine and remove unnecessary details.
Specific comments:
- The part that emphasize the utility of hydrograph recession exponents in characterizing tail heaviness is lengthy, and needs to be shortened. One question might be, as far as can be seen from the dataset, the record lengths are quite adequate despite of variance, some of other tail heaviness indicators would be able to perform as well.
- What is the rationale of using five days as the minimum duration, considering the vast variance in drainage areas?
- “The upper tail is defined by an optimized lower boundary of the discharge, determined by selecting the best fit based on the KS statistic”. This is not quite clear. How the upper tail is defined and statistically modelled is important. Section 3.2 also needs to be concise and informative. Please reconstruct.
- Line 269, by “majority” you use “50 %” as the threshold?
- From Figure 1 and Figure 2, we can see there are many overlaps between heavy tails and nonheavy tails. This is especially evident in Figure 2 where we see the scatters are well mixed. These results make me wonder the utility of recession exponent (a=2) as the criteria. I would suggest the authors to explain and discuss the limitation.
- Figure 4 and the text, please explain the rationale of using percentage. The absolute count of watersheds would matter, as can be seen that there is only one watershed in Bwk. This can be due to sampling uncertainties.
- What do the authors mean by “catchment storage”? Please clarify.
- Line 571, I would suggest the role of ET alone might not be that important. The ratio of ET to P worthwhile to be explored.
- Line 605-606, the three references use indicators that quantify the heaviness of upper tails, while in this study, the authors are in fact addressing “propensity”.
- Line 649-650, this is obvious.
- Section 5, I enjoy reading this section overall, but it can be further improved by explicitly highlighting what are found in this study, and what are proposed by previous studies, especially the review paper by Merz et al.
Citation: https://doi.org/10.5194/hess-2024-159-RC1 -
AC1: 'Reply on RC1', Hsing-Jui Wang, 24 Jul 2024
Reply on Reviewer 1
We thank the Reviewer for providing valuable comments and suggestions. We have addressed each point below and will incorporate the comments into the revised manuscript after considering feedback from other reviewers. The Reviewer's comments are marked in italic font, while our replies are indicated in normal font.
===
The authors provide empirical analyses on the pattern and drivers of the tail heaviness. They adopt hydrograph recession exponent as the indicator of watersheds with or without propensity for heavy tails of flood peak distributions. The contrasts in the tail heaviness across watersheds, climate regions, seasons, shed light on the importance of characterizing catchment storage in dictating flood regimes. The analyses are interesting and robust. A major of mine is that the manuscript is lengthy that obscure the new wisdom obtained. I would suggest the authors to further refine and remove unnecessary details.
Thank you for the summarized review and positive feedback. We will streamline the details, particularly addressing the specific comments below, to better highlight the key findings of this work as suggested by the reviewer.
Specific comments:
1) The part that emphasize the utility of hydrograph recession exponents in characterizing tail heaviness is lengthy, and needs to be shortened. One question might be, as far as can be seen from the dataset, the record lengths are quite adequate despite of variance, some of other tail heaviness indicators would be able to perform as well.
We will shorten the sections that describe the utility of hydrograph recession exponents in characterizing tail heaviness, as suggested by the reviewer. In particular, we will shorten section 3.1 by making larger use of references to a previous publication where this approach was first introduced (Wang et al., 2023). The dataset employed in this study spans 24 to 148 years. We acknowledge that other indicators could also be used; however, we are specifically interested in the recession exponent because it is a novel index that allows us to infer the propensity of rivers to experience extreme floods. This index enables us to identify potential risks even in the absence of recorded extreme floods, which is not possible with other indicators. Additionally, the recession exponent is suggested to mitigate the bias often introduced by the variance in dataset lengths across cases (Smith et al., 2018; Wietzke et al., 2020; Wang et al., 2023).
We have discussed the literature review on this topic in lines 64-84. We plan to supplement this with the following statement after line 84: “Nonetheless, we acknowledge that other indicators could also be used; however, we are specifically interested in the recession exponent because it is a novel index that allows us to infer the propensity of rivers to experience extreme floods. Such an index enables us to identify potential risks even in the absence of recorded extreme floods, which is not possible with other indicators. Its stability provides additional value to mitigate the bias often introduced by the variance in dataset lengths across cases.”
2) What is the rationale of using five days as the minimum duration, considering the vast variance in drainage areas?
Event-scale recession analyses typically choose a minimum of 3 to 5 days of recession for daily data (e.g., Shaw and Riha, 2012; Biswal and Marani, 2010) to minimize noise from short events (Ye et al., 2014) and ensure sufficient sample sizes for proper data quality (Shaw, 2016). We acknowledge that the recession period may vary depending on the drainage area. The recessions we identify and analyze have indeed different durations in different catchments. In this study, we select a fixed minimum number of 5 days (Dralle et al., 2017) to ensure sufficient sample size for suitably characterizing recession attributes.
3) “The upper tail is defined by an optimized lower boundary of the discharge, determined by selecting the best fit based on the KS statistic”. This is not quite clear. How the upper tail is defined and statistically modelled is important. Section 3.2 also needs to be concise and informative. Please reconstruct.
Thank you for your comment. We will modify lines 232-233 (“The upper tail is defined by an optimized lower boundary of the discharge, determined by selecting the best fit based on the KS statistic”) as below to improve the clarity:
“Empirical data following a power-law distribution (if applicable) typically do so above a certain lower bound, defining the analyzed tail (Clauset et al., 2009). Therefore, we employ the approach proposed by Clauset et al. (2007) to determine the optimized lower boundary. This method selects the boundary where the probability distributions of the data and the best-fit power-law model are most similar. If the optimized lower boundary is higher than the true lower boundary, the reduced data set size leads to a poorer match due to statistical fluctuations. If it is lower, the distributions differ fundamentally. The KS statistic is employed to quantify the distance between these distributions.”
We will also revise the remaining of section 3.2 with the goal of shortening it.
4) Line 269, by “majority” you use “50 %” as the threshold?
Yes we do. We will specify it in the revised paper.
5) From Figure 1 and Figure 2, we can see there are many overlaps between heavy tails and nonheavy tails. This is especially evident in Figure 2 where we see the scatters are well mixed. These results make me wonder the utility of recession exponent (a=2) as the criteria. I would suggest the authors to explain and discuss the limitation.
Ideally, the validation of the new index should include benchmarks for both heavy-tailed and non-heavy-tailed case studies. However, we can only statistically establish the benchmark for the former (power-law-tailed case studies, black dots in Figure 2), but not for the latter (uncertain case studies, gray dots in Figure 2). This is due to the latter category encompasses case studies that either do not follow a power-law distribution or whose underlying distributions cannot be determined due to high uncertainty from small sample sizes, and thus contributes to the ‘mixed pattern’ as indicated by the reviewer. Notice that several years of data are still a small sample to reliably characterize the tail of empirical and purely statistical distributions fitted on the data.
Due to this approach limitation, the effectiveness of the recession exponent shall be estimated only based on the former group (black dots), as highlighted in Figure 2. To rigorously validate the effectiveness of the recession exponent criteria, we need also Figure 1, where we statistically confirm the hypothesized heavy- and non-heavy-tailed groups using a=2 as the criterion.
This confusion seems unavoidable due to the inherent limitations of data analysis. Meanwhile, we acknowledge that misattribution can occur due to the recession exponent not always being able to properly distinguish between heavy and light tails, particularly when a is around the threshold value of 2. This issue is shown in the case studies in Norway, which we discuss in lines 356-363. We have addressed this in lines 242-246:
"We term such a case study a 'power-law-tailed case study,' while cases that don't meet these criteria are labeled as 'uncertain case studies' in subsequent analyses. The latter label acknowledges the awareness that we cannot definitively conclude whether these case studies are indeed not power-law-tailed or if their underlying distributions cannot be determined due to the high uncertainty caused by the small sample sizes of available observations."
Moreover, we will insert additional discussion after line 331 to improve clarity:
"We cannot conclude whether uncertain case studies (gray dots) represent cases that are indeed not power-law-tailed or if their underlying distributions cannot be determined due to the high uncertainty caused by small sample sizes. Therefore, we benchmark the recession exponent against the empirical power law exponent by focusing on the 'certain group,' i.e., power-law-tailed case studies (black dots)."
The following statement will also be inserted at the end of line 365:
“However, we acknowledge that misattributions can still occur, particularly when a is around the threshold value.”
6) Figure 4 and the text, please explain the rationale of using percentage. The absolute count of watersheds would matter, as can be seen that there is only one watershed in Bwk. This can be due to sampling uncertainties.
We recognize that the varying number of cases across climate types might introduce bias due to sample sensitivity (as we have mentioned in lines 490-493). Nonetheless, in order to estimate the propensity of tail behavior the ratio (i.e., percentage) of heavy- to non-heavy-tailed case studies in each climate region is considered to be one of the most direct approaches. We will revise lines 490-493 to emphasize this concern of the employed dataset and approach:
Original lines 490-493: “We acknowledge that these results are based on overarching conditions and do not encompass all climate types, and achieving an equal number of study sites across various climate regions might not always be feasible. Expanding the number of study sites could further enhance our understanding, especially for extreme cases.”
Modified version: “We acknowledge that these results are based on overarching conditions and do not encompass all climate types, and achieving an equal number of study sites across various climate regions might not always be feasible. We should be mindful of potential bias caused by sample sensitivity, particularly in regions with a limited number of cases (e.g., Csa, BSh, BWk in this study). Expanding the number of study sites could further enhance our understanding, especially for extreme cases.”
7) What do the authors mean by “catchment storage”? Please clarify.
Thank you for pointing this out. We will improve the clarity by adding the following description after lines 487-489 (“we have identified the conjunction of dry periods and higher temperatures as crucial meteorological factors significantly contributing to the dynamics of catchment storage, thereby influencing the nonlinearity of hydrological responses.”):
“We refer to catchment storage hereafter as the water contained in a catchment at a certain moment, which concurs to define its wetness status (this is chiefly the degree of saturation of the critical zone). This capacity is dynamic and depends on various factors, such as soil moisture states, precipitation, and evapotranspiration (Merz and Blöschl, 2009; Zhou et al., 2022).”
8) Line 571, I would suggest the role of ET alone might not be that important. The ratio of ET to P worthwhile to be explored.
We agree that, based on our analyses and findings, the temporal characteristics of rainfall and evapotranspiration collaboratively influence this seasonality, as discussed in detail in lines 539-548. We will therefore revise lines 569-571 as follows:
Original lines 569-571: “Regions with pronounced temperature variations across seasons, particularly with higher temperature in summer, tend to display such dynamics and highlight the role of evapotranspiration in catchments in driving this seasonality.”
Modified version: “Regions with pronounced temperature variations across seasons, particularly with higher temperatures in summer, and characterized by relatively evenly distributed rainfall throughout the year tend to display such dynamics. This highlights the importance of both evapotranspiration and the temporal characteristics of rainfall in shaping flood tail behavior across seasons, aligning with previous studies (Guo et al., 2014; Basso et al., 2023).”
9) Line 605-606, the three references use indicators that quantify the heaviness of upper tails, while in this study, the authors are in fact addressing “propensity”.
Thank you for pointing this out. We will improve the clarity as below:
Original lines 604-606: “These findings align with previous discussions on this matter (e.g., Merz and Blöschl, 2009; Villarini and Smith, 2010; Smith et al., 2018), which have suggested a relatively weak inverse correlation between catchment area and the occurrence of heavy-tailed flood behavior.”
Modified version: “In a similar context, previous studies, using different heavy-tailed flood indices, have suggested a relatively weak inverse correlation between catchment area and the occurrence of heavy-tailed flood behavior (e.g., Merz and Blöschl, 2009; Villarini and Smith, 2010; Smith et al., 2018).”
It is worth noting that the heavy-tail propensity identified in this study encompasses: 1) case studies confirmed to exhibit power-law tail behavior, 2) case studies where power-law tail behavior could not be confirmed due to insufficient samples, and 3) case studies that do not show power-law tail behavior based on historical data but are suggested to exhibit such behavior due to high catchment nonlinearity. The first type is likely to be identified in studies employing different heavy-tailed flood indices as well.
10) Line 649-650, this is obvious.
This sentence (lines 649-650) serves as a contrasting pattern with the following one (lines 650-652). We will revise the terms to tone it down and present a more neutral statement as follows:
Original lines 649-650: “Our findings first indicate that regions with relatively uniform hydroclimatic conditions (the Atlantic Europe and Northern Europe) tend to exhibit a single/dominant propensity of flood tail behavior.”
Modified version: “As expected, the results show regions with relatively uniform hydroclimatic conditions (the Atlantic Europe and Northern Europe) tend to exhibit a single/dominant propensity of flood tail behavior.”
11) Section 5, I enjoy reading this section overall, but it can be further improved by explicitly highlighting what are found in this study, and what are proposed by previous studies, especially the review paper by Merz et al.
Thank you for the suggestion. We will enhance the clarity of this section by highlighting the comparison between current understanding and the new findings contributed by this study in the revised version.
References
- Basso, S., Merz, R., Tarasova, L., & Miniussi, A. (2023). Extreme flooding controlled by stream network organization and flow regime. Nature Geoscience, 16(April), 339–343. https://doi.org/10.1038/s41561-023-01155-w
- Biswal, B., & Marani, M. (2010). Geomorphological origin of recession curves. Geophysical Research Letters, 37(24), 1–5. https://doi.org/10.1029/2010GL045415
- Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661–703. https://doi.org/10.1137/070710111
- Clauset, A., Young, M., & Gleditsch, K. S. (2007). On the Frequency of Severe Terrorist Events. Journal of Conflict Resolution, 51(1), 58–87. https://doi.org/10.1177/0022002706296157
- Dralle, D. N., Karst, N. J., Charalampous, K., Veenstra, A., & Thompson, S. E. (2017). Event-scale power law recession analysis: Quantifying methodological uncertainty. Hydrology and Earth System Sciences, 21(1), 65–81. https://doi.org/10.5194/hess-21-65-2017
- Guo, J., Li, H.-Y., Leung, L. R., Guo, S., Liu, P., & Sivapalan, M. (2014). Links between flood frequency and annual water balance behaviors: A basis for similarity and regionalization. Water Resources Research, 50, 937–953. https://doi.org/http://dx.doi.org/10.1002/2013WR014374
- Merz, R., & Blöschl, G. (2009). Process controls on the statistical flood moments - a data based analysis. Hydrological Processes, 23(5), 675–696. https://doi.org/10.1002/hyp
- Shaw, S. B. (2016). Investigating the linkage between streamflow recession rates and channel network contraction in a mesoscale catchment in New York state. Hydrological Processes, 30(3), 479–492. https://doi.org/10.1002/hyp.10626
- Shaw, S. B., & Riha, S. J. (2012). Examining individual recession events instead of a data cloud: Using a modified interpretation of dQ/dt-Q streamflow recession in glaciated watersheds to better inform models of low flow. Journal of Hydrology, 434–435, 46–54. https://doi.org/10.1016/j.jhydrol.2012.02.034
- Smith, J. A., Cox, A. A., Baeck, M. L., Yang, L., & Bates, P. (2018). Strange Floods: The Upper Tail of Flood Peaks in the United States. Water Resources Research, 54(9), 6510–6542. https://doi.org/10.1029/2018WR022539
- Villarini, G., & Smith, J. A. (2010). Flood peak distributions for the eastern United States. Water Resources Research, 46(6), 1–17. https://doi.org/10.1029/2009WR008395
- Wang, H., Merz, R., Yang, S., & Basso, S. (2023). Inferring heavy tails of flood distributions through hydrograph recession. Hydrol. Earth Syst. Sci, 27(24), 4369–4384. https://doi.org/10.5194/hess-27-4369-2023
- Wietzke, L. M., Merz, B., Gerlitz, L., Kreibich, H., Guse, B., Castellarin, A., & Vorogushyn, S. (2020). Comparative analysis of scalar upper tail indicators. Hydrological Sciences Journal, 65(10), 1625–1639. https://doi.org/10.1080/02626667.2020.1769104
- Ye, S., Li, H. Y., Huang, M., Alebachew, M. A., Leng, G., Leung, L. R., et al. (2014). Regionalization of subsurface stormflow parameters of hydrologic models: Derivation from regional analysis of streamflow recession curves. Journal of Hydrology, 519(PA), 670–682. https://doi.org/10.1016/j.jhydrol.2014.07.017
- Zhou, X., Sheng, Z., Yang, Y., Han, S., Zhang, Q., Li, H., & Yang, Y. (2022). Catchment water storage dynamics and its role in modulating streamflow generation in spectral perspective: a case study in the headwater of Baiyang Lake, China. Hydrology and Earth System Sciences, (November). Retrieved from https://doi.org/10.5194/hess-2022-357
Citation: https://doi.org/10.5194/hess-2024-159-AC1
-
RC2: 'Comment on hess-2024-159', Anonymous Referee #2, 07 Aug 2024
This paper investigates an interesting hypothesis, that the tail heaviness of flood distributions can be inferred from a recession analysis, through a large sample analysis with data from Europe and the United States. In hydrology, a heavy tail distribution of floods means that extreme floods are more likely to occur than would be predicted by distributions that have exponential asymptotic behaviour. Very large extremes may therefore happen with a probability that is not as small as one would expect if using, for example, a Gumbel distribution to model flood probabilities, and may result in huge damages due to their surprising nature. Identifying the properties of the distribution tails is a hard task which requires, usually, very long series of data and/or regional analyses when using statistical techniques. Linking tail heaviness to catchment behaviour and process understanding is therefore a very interesting avenue to follow in order to increase our confidence in the behaviour of the extremes. This is the aim of this paper and it is therefore of interest to the hydrology community. The paper is well-written with high-quality figures.
Having said that, I am not convinced by the research conducted here. My concerns are essentially two:
1) The first aim of the paper is to validate the effectiveness of the method in identifying heavy tail flood behavior (line 95). However, the validation is based on analysing whether samples of (on average) 10 years of length (line 253), at the daily to monthly maxima timescales, can be represented in the tails by an empirical power law. My question is whether this "tail analysis" is representative of the extremes of practical interest? In hydrological practice, we typically focus on return periods of 100 or 200 years and sometimes more. It is unclear to me what are the return periods of interest in this paper. It is important to reflect on the range of return periods for which the analyses in this paper are meant because processes may emerge with increasing return periods which are not at work for less extreme floods. I am not convinced that the paper demonstrates that the method is relevant for extremes that are of interest in hydrology. Benchmarking the recession analysis on a statistical analysis on short samples does not constitute a proper validation of the method. In order to better evaluate the effectiveness of the method in identifying heavy tail flood behavior, an additional, and to me more convincing, benchmark should be the analysis of (many) long timeseries with methods usually adopted in flood frequency analysis (e.g., fit of the GEV shape parameter, better if using regional analysis).
2) The second aim of the paper is the evaluation of the causes for differences in the recession coefficient and therefore, based on the hypothesis made here, of the tail heaviness of floods. The results are not that easy to interpret, since the method proposed uses a sharp threshold on the recession coefficient (a=2) to distinguish between heavy tail behaviour and (possibly) non-heavy tail behaviour. So only a binary distinction is made and differences between Germany and UK, for example, cannot be clearly identified. Since different degrees of tail heaviness exist, wouldn't it be more useful to link the recession coefficient to, for example, the exponent of the empirical power law b? The Authors show something like this in Figure 2 even though the relationship doesn't seem to be so strong. But wouldn't that be more useful in hydrological practice where, for instance, the estimation of the GEV shape parameter is of interest? Besides, I think the spatial results obtained here should be compared to regional studies on flood frequency? One example is Macdonald et al. (2022) who identify the GEV shape parameter as a quantification of tail heaviness in Germany. Figure 3a here seems consistent with Figure 4 of Macdonald et al. (2022). What about the other regions? In the US there are maps of the regional skewness in the Bulletin 17b. These comparisons could strengthen the confidence in the effectiveness of the method, since they are based on longer timeseries and on regional analyses.
Given these concerns, I am sorry I cannot recommend publication of this work in HESS.
Citation: https://doi.org/10.5194/hess-2024-159-RC2 -
AC2: 'Reply on RC2', Hsing-Jui Wang, 20 Aug 2024
Reply on Reviewer 2
We thank the Reviewer for providing comments. We have addressed the concerns below and will incorporate them into the revised manuscript. For ease of reference, the Reviewer's comments are marked in italic font, while our replies are indicated in normal font.
===
This paper investigates an interesting hypothesis, that the tail heaviness of flood distributions can be inferred from a recession analysis, through a large sample analysis with data from Europe and the United States. In hydrology, a heavy tail distribution of floods means that extreme floods are more likely to occur than would be predicted by distributions that have exponential asymptotic behaviour. Very large extremes may therefore happen with a probability that is not as small as one would expect if using, for example, a Gumbel distribution to model flood probabilities, and may result in huge damages due to their surprising nature. Identifying the properties of the distribution tails is a hard task which requires, usually, very long series of data and/or regional analyses when using statistical techniques. Linking tail heaviness to catchment behaviour and process understanding is therefore a very interesting avenue to follow in order to increase our confidence in the behaviour of the extremes. This is the aim of this paper and it is therefore of interest to the hydrology community. The paper is well-written with high-quality figures.
Thank you for summarizing the aims of this study.
1) The first aim of the paper is to validate the effectiveness of the method in identifying heavy tail flood behavior (line 95). However, the validation is based on analysing whether samples of (on average) 10 years of length (line 253), at the daily to monthly maxima timescales, can be represented in the tails by an empirical power law. My question is whether this "tail analysis" is representative of the extremes of practical interest? In hydrological practice, we typically focus on return periods of 100 or 200 years and sometimes more. It is unclear to me what are the return periods of interest in this paper. It is important to reflect on the range of return periods for which the analyses in this paper are meant because processes may emerge with increasing return periods which are not at work for less extreme floods. I am not convinced that the paper demonstrates that the method is relevant for extremes that are of interest in hydrology. Benchmarking the recession analysis on a statistical analysis on short samples does not constitute a proper validation of the method. In order to better evaluate the effectiveness of the method in identifying heavy tail flood behavior, an additional, and to me more convincing, benchmark should be the analysis of (many) long timeseries with methods usually adopted in flood frequency analysis (e.g., fit of the GEV shape parameter, better if using regional analysis).
The concern raised pertains to line 253, where we mention that the average sample size used for fitting empirical power laws on monthly maximum streamflow is 132. We believe the reviewer interpreted this as representing roughly 10 years of observations, which is considered too short to represent extreme behavior. We would like to clarify that the average sample size of 132 refers only to the length of the identified 'tail' of the frequency distribution, not the length of the entire observation period. The full observation period averages, however, 62 years (ranging from 24 to 148 years) across the dataset used in this study (see line 125). It's important to note that empirical data, if they follow a power-law distribution, typically do so only for values above a certain threshold (i.e., the tail). Consequently, it is standard practice to first identify this threshold (i.e., where is the tail) before fitting a power law. The sample size of 132 hence refers to the most extreme monthly maxima above this threshold observed within an average 62-year-long data series.
Although the length of the data series used in this study (62-year length in average) do not allow to derive directly from data the magnitude of events with 100 or 200 years return period, such an observation period aligns with what is normally available (Bertola et al., 2023) and used in flood frequency analysis for estimating 100- or 200-year floods (Zhao et al., 2021).
Several studies suggested that the shape parameter of the GEV may not be a reliable indicator of tail heaviness because it is highly sensitive to the length of the observation series and the occurrence of outliers (Hu et al., 2020; Cai and Hames, 2010). However, to address the concerns of the reviewer and support the benchmark employed in this study, we also calculated the L-moment ratio diagrams, which have been demonstrated to be a more robust approach, particularly for evaluating highly skewed samples (Vogel and Fennessey, 1993). Consequently, 97.8%, 100%, and 94.1% of the identified heavy-tailed case studies—based on empirical power law fitting in the analyses of daily, ordinary peaks, and monthly maxima, respectively—exhibit greater L-skewness and L-kurtosis than the exponential distribution, indicating heavy-tailed behavior. (See details in lines 77-80, 326-330, and Figure S5.)
We regret any confusion caused by the previous wording and intend to clarify this point in the text from line 253 as follows: “It's important to clarify that these sample sizes refer specifically to the tail of the empirical distributions. In other words, only the most extreme observations are analyzed to determine whether the empirical distributions exhibit power-law behavior in their tails. For the overview of the entire data series analyzed in this study, please refer to Section 2.”
2) The second aim of the paper is the evaluation of the causes for differences in the recession coefficient and therefore, based on the hypothesis made here, of the tail heaviness of floods. The results are not that easy to interpret, since the method proposed uses a sharp threshold on the recession coefficient (a=2) to distinguish between heavy tail behaviour and (possibly) non-heavy tail behaviour. So only a binary distinction is made and differences between Germany and UK, for example, cannot be clearly identified. Since different degrees of tail heaviness exist, wouldn't it be more useful to link the recession coefficient to, for example, the exponent of the empirical power law b? The Authors show something like this in Figure 2 even though the relationship doesn't seem to be so strong. But wouldn't that be more useful in hydrological practice where, for instance, the estimation of the GEV shape parameter is of interest?
The identification of heavy-tailed floods through hydrograph recession analysis (employed in this study) uses a threshold of two on the recession exponent to distinguish heavy-tailed cases from non-heavy-tailed ones. The method further allows for evaluating the tail heaviness based on the specific exponent values, as noted by the reviewer. Such an approach resembles what done for other indices, such as the GEV shape parameter, where a threshold value of zero is used to differentiate between heavy-tailed and non-heavy-tailed distributions. Differently from the latter case, where the threshold of zero has a statistical meaning only, the threshold of two in the method adopted in this study has also a physical meaning, as it represents a degree of non-linearity of the catchment hydrologic response which causes a shift in the resulting streamflow and flood distributions (see Botter et al., 2009; Kirchner et al., 2009; Basso et al., 2023).
This study emphasizes a binary distinction between heavy and non-heavy-tailed distributions - rather than discussing the degree of heaviness - for two reasons. First, a reliable identification of heavy-tailed distributions (i.e., even without any claim about their degree of heaviness) is per se a difficult task, as noted by the reviewer in the previous comment. Second, the identification itself holds significant hydrological importance, regardless of the degree of heaviness. In fact, the presence of a heavy tail alone can serve as a critical warning of a relatively high probability of extreme events. For the latter reason also other studies, even those using indices like the GEV shape parameter, often focus on distinguishing cases with heavy-tailed flood distributions from those without (e.g., Macdonald et al., 2022). In addition, there is a notable knowledge gap in conducting such investigations on an extensive international scale. This gap is largely due to the fact that quantifying such behavior remains highly sensitive to the sample size (Wietzke et al., 2020; Hu et al., 2020 ), making reliable identification across different datasets challenging (Merz et al., 2022).
We acknowledge uncertainties in the estimation of the degree of heaviness, particularly for cases with values close to the employed threshold. However, the recession exponent used in this study has been tested and found to be more reliable to distinguish between heavy and non-heavy-tailed distributions than, e.g., the GEV shape parameter, especially in analyses with short data lengths, as shown in a previous work (see Hu et al., 2020 and Wang et al., 2023). This justifies the selection of such an index for investigations across a broader range of study areas. We will improve the relevant statement as follows: Line 556-560 “We acknowledge that the hydroclimatic factors analyzed in this study may not account for all cases... This discrepancy could be attributed to either the uncertainty in inferring heavy-tailed floods through recession exponents (particularly for cases with values close to the threshold) or the presence of additional factors or mechanisms influencing flood tail behavior in these regions.”
Besides, I think the spatial results obtained here should be compared to regional studies on flood frequency? One example is Macdonald et al. (2022) who identify the GEV shape parameter as a quantification of tail heaviness in Germany. Figure 3a here seems consistent with Figure 4 of Macdonald et al. (2022). What about the other regions? In the US there are maps of the regional skewness in the Bulletin 17b. These comparisons could strengthen the confidence in the effectiveness of the method, since they are based on longer timeseries and on regional analyses.
We agree that comparing our findings with previous regional studies would strengthen the conclusions of this work, as we described in lines 390-424. We thank the reviewer for suggesting further comparisons. Accordingly, we will enhance this section with the following modifications:
For Germany: Line 395: “This finding aligns with Macdonald et al. (2022), who used GEV shape parameters as an indicator of heavy-tailed behavior for gauges with more than 50 years of observations.”
For the UK: Line 400-401: “According to our findings, heavy-tailed flood behavior is prevalent in the UK, with a prevalence of 77%, particularly in the eastern and southern coastal regions. This aligns with previous regional findings (European Environmental Agency, 2010; Robson, 2002).”
For the US: Line 419: “In particular, catchments on the eastern side of the Appalachian Mountains exhibit pronounced heavy-tailed flood behavior, while those on the western side mostly exhibit non-heavy-tailed behavior. This is consistent with several previous findings based on the skewness of annual maximum streamflow (Bulletin 17b, 1982), the GEV shape parameters (Villarini and Smith, 2010), and the upper tail ratio (Smith et al., 2018).”We would like to highlight that the analyses of this study, which are based on shorter and more variable lengths of data (24-148 years) and on analyzing hydrograph recessions from ordinary flows rather than flood records, provide findings in agreement with studies that rely on longer data records (e.g., only gauges with more than 75 years in the work of Villarini and Smith (2010), and more than 50 years in Macdonald et al. (2022)). Such an agreement not only confirms the effectiveness of this new approach but also highlights its advantages. While most previous studies have been confined to specific regional areas, the newly proposed approach allows for the analysis across a broader geographical area, facilitating the investigation of diverse conditions.
Given these concerns, I am sorry I cannot recommend publication of this work in HESS.We hope the response provided above satisfactorily addresses the reviewer’s concerns.
We would like to emphasize that this work not only aims at validating the effectiveness of the newly proposed approach (as demonstrated in greater detail in our previous work, Wang et al., 2023) but also at using this index to shed light on the relationships between heavy-tailed flood behavior and critical environmental factors (e.g., climate, catchment areas) that remain poorly understood.
References
Basso, S., Merz, R., Tarasova, L., & Miniussi, A. (2023). Extreme flooding controlled by stream network organization and flow regime. Nature Geoscience, 16(April), 339–343. https://doi.org/10.1038/s41561-023-01155-w
Bertola, M., Blöschl, G., Bohac, M., Borga, M., Castellarin, A., Chirico, G. B., et al. (2023). Megafloods in Europe can be anticipated from observations in hydrologically similar catchments. Nature Geoscience, 16(11), 982–988. https://doi.org/10.1038/s41561-023-01300-5
Botter, G., Porporato, A., Rodriguez-Iturbe, I., & Rinaldo, A. (2009). Nonlinear storage-discharge relations and catchment streamflow regimes. Water Resources Research, 45(10), 1–16. https://doi.org/10.1029/2008WR007658
Bulletin17B: Interagency Committee on Water Data. (1982). Guidelines for determining flood flow frequency. Hydrology Subcommittee, Technical Report.
Cai, Y., & Hames, D. (2010). Minimum sample size determination for generalized extreme value distribution. Communications in Statistics: Simulation and Computation, 40(1), 87–98. https://doi.org/10.1080/03610918.2010.530368
Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661–703. https://doi.org/10.1137/070710111
European Environmental Agency. (2010). Mapping the impacts of natural hazards and technological accidents in Europe An overview of the last decade. Publications Office of the European Union. https://doi.org/10.2800/62638
Hu, L., Nikolopoulos, E. I., Marra, F., & Anagnostou, E. N. (2020). Sensitivity of flood frequency analysis to data record, statistical model, and parameter estimation methods: An evaluation over the contiguous United States. Journal of Flood Risk Management, 13(1), 1–13. https://doi.org/10.1111/jfr3.12580
Kirchner, J. W. (2009). Catchments as simple dynamical systems: Catchment characterization, rainfall-runoff modeling, and doing hydrology backward. Water Resources Research, 45(2), 1–34. https://doi.org/10.1029/2008WR006912
Merz, B., Basso, S., Fischer, S., Lun, D., Blöschl, G., Merz, R., et al. (2022). Understanding heavy tails of flood peak distributions. Water Resources Research, 1–37. https://doi.org/10.1029/2021wr030506
Robson, A. J. (2002). Evidence for trends in UK flooding. Philosophical Transactions of the Royal Society A, 360, 1327–1343. https://doi.org/10.1098/rsta.2002.1003
Smith, J. A., Cox, A. A., Baeck, M. L., Yang, L., & Bates, P. (2018). Strange Floods: The Upper Tail of Flood Peaks in the United States. Water Resources Research, 54(9), 6510–6542. https://doi.org/10.1029/2018WR022539
Villarini, G., & Smith, J. A. (2010). Flood peak distributions for the eastern United States. Water Resources Research, 46(6), 1–17. https://doi.org/10.1029/2009WR008395
Vogel, R. M., & Fennesse, N. M. (1993). L moment diagrams should replace product moment diagrams. Water Resources Research, 29(6), 1745–1752. https://doi.org/10.1029/93WR00341
Wang, H., Merz, R., Yang, S., & Basso, S. (2023). Inferring heavy tails of flood distributions through hydrograph recession. Hydrol. Earth Syst. Sci, 27(24), 4369–4384. https://doi.org/10.5194/hess-27-4369-2023
Wietzke, L. M., Merz, B., Gerlitz, L., Kreibich, H., Guse, B., Castellarin, A., & Vorogushyn, S. (2020). Comparative analysis of scalar upper tail indicators. Hydrological Sciences Journal, 65(10), 1625–1639. https://doi.org/10.1080/02626667.2020.1769104
Zhao, G., Bates, P., Neal, J., & Pang, B. (2021). Design flood estimation for global river networks based on machine learning models. Hydrology and Earth System Sciences, 25(11), 5981–5999. https://doi.org/10.5194/hess-25-5981-2021
Citation: https://doi.org/10.5194/hess-2024-159-AC2
-
AC2: 'Reply on RC2', Hsing-Jui Wang, 20 Aug 2024
-
EC1: 'Comment on hess-2024-159', Serena Ceola, 25 Nov 2024
I was asked to review this paper, which received contrasting comments. I appreciate the overall scientific objective of the manuscript, even though I have some doubts on the effectiveness of estimating heavy tail distribution starting from a few years of observations. I encourage the authors to provide detailed comments about this. In particular, authors should clearly and concisely highlight the approximations and limitations of the methodolgy they outlined. They shoudl e.g., clearly state that they assume that severe floods are caused by the same drivers and smaller floods. Overall, I found the manuscript quite lenghty and difficult to read. Even though, from the first review round, the manuscript has been significantly improved substantially, I invite the authors to shorten it a improve its readability.
Citation: https://doi.org/10.5194/hess-2024-159-EC1
Status: closed
-
RC1: 'Comment on hess-2024-159', Anonymous Referee #1, 02 Jul 2024
Comments on “Constructing a geography of heavy-tailed flood distributions: insights from common streamflow dynamics” by Wang et al., submitted to HESS for possible publication.
The authors provide empirical analyses on the pattern and drivers of the tail heaviness. They adopt hydrograph recession exponent as the indicator of watersheds with or without propensity for heavy tails of flood peak distributions. The contrasts in the tail heaviness across watersheds, climate regions, seasons, shed light on the importance of characterizing catchment storage in dictating flood regimes. The analyses are interesting and robust. A major of mine is that the manuscript is lengthy that obscure the new wisdom obtained. I would suggest the authors to further refine and remove unnecessary details.
Specific comments:
- The part that emphasize the utility of hydrograph recession exponents in characterizing tail heaviness is lengthy, and needs to be shortened. One question might be, as far as can be seen from the dataset, the record lengths are quite adequate despite of variance, some of other tail heaviness indicators would be able to perform as well.
- What is the rationale of using five days as the minimum duration, considering the vast variance in drainage areas?
- “The upper tail is defined by an optimized lower boundary of the discharge, determined by selecting the best fit based on the KS statistic”. This is not quite clear. How the upper tail is defined and statistically modelled is important. Section 3.2 also needs to be concise and informative. Please reconstruct.
- Line 269, by “majority” you use “50 %” as the threshold?
- From Figure 1 and Figure 2, we can see there are many overlaps between heavy tails and nonheavy tails. This is especially evident in Figure 2 where we see the scatters are well mixed. These results make me wonder the utility of recession exponent (a=2) as the criteria. I would suggest the authors to explain and discuss the limitation.
- Figure 4 and the text, please explain the rationale of using percentage. The absolute count of watersheds would matter, as can be seen that there is only one watershed in Bwk. This can be due to sampling uncertainties.
- What do the authors mean by “catchment storage”? Please clarify.
- Line 571, I would suggest the role of ET alone might not be that important. The ratio of ET to P worthwhile to be explored.
- Line 605-606, the three references use indicators that quantify the heaviness of upper tails, while in this study, the authors are in fact addressing “propensity”.
- Line 649-650, this is obvious.
- Section 5, I enjoy reading this section overall, but it can be further improved by explicitly highlighting what are found in this study, and what are proposed by previous studies, especially the review paper by Merz et al.
Citation: https://doi.org/10.5194/hess-2024-159-RC1 -
AC1: 'Reply on RC1', Hsing-Jui Wang, 24 Jul 2024
Reply on Reviewer 1
We thank the Reviewer for providing valuable comments and suggestions. We have addressed each point below and will incorporate the comments into the revised manuscript after considering feedback from other reviewers. The Reviewer's comments are marked in italic font, while our replies are indicated in normal font.
===
The authors provide empirical analyses on the pattern and drivers of the tail heaviness. They adopt hydrograph recession exponent as the indicator of watersheds with or without propensity for heavy tails of flood peak distributions. The contrasts in the tail heaviness across watersheds, climate regions, seasons, shed light on the importance of characterizing catchment storage in dictating flood regimes. The analyses are interesting and robust. A major of mine is that the manuscript is lengthy that obscure the new wisdom obtained. I would suggest the authors to further refine and remove unnecessary details.
Thank you for the summarized review and positive feedback. We will streamline the details, particularly addressing the specific comments below, to better highlight the key findings of this work as suggested by the reviewer.
Specific comments:
1) The part that emphasize the utility of hydrograph recession exponents in characterizing tail heaviness is lengthy, and needs to be shortened. One question might be, as far as can be seen from the dataset, the record lengths are quite adequate despite of variance, some of other tail heaviness indicators would be able to perform as well.
We will shorten the sections that describe the utility of hydrograph recession exponents in characterizing tail heaviness, as suggested by the reviewer. In particular, we will shorten section 3.1 by making larger use of references to a previous publication where this approach was first introduced (Wang et al., 2023). The dataset employed in this study spans 24 to 148 years. We acknowledge that other indicators could also be used; however, we are specifically interested in the recession exponent because it is a novel index that allows us to infer the propensity of rivers to experience extreme floods. This index enables us to identify potential risks even in the absence of recorded extreme floods, which is not possible with other indicators. Additionally, the recession exponent is suggested to mitigate the bias often introduced by the variance in dataset lengths across cases (Smith et al., 2018; Wietzke et al., 2020; Wang et al., 2023).
We have discussed the literature review on this topic in lines 64-84. We plan to supplement this with the following statement after line 84: “Nonetheless, we acknowledge that other indicators could also be used; however, we are specifically interested in the recession exponent because it is a novel index that allows us to infer the propensity of rivers to experience extreme floods. Such an index enables us to identify potential risks even in the absence of recorded extreme floods, which is not possible with other indicators. Its stability provides additional value to mitigate the bias often introduced by the variance in dataset lengths across cases.”
2) What is the rationale of using five days as the minimum duration, considering the vast variance in drainage areas?
Event-scale recession analyses typically choose a minimum of 3 to 5 days of recession for daily data (e.g., Shaw and Riha, 2012; Biswal and Marani, 2010) to minimize noise from short events (Ye et al., 2014) and ensure sufficient sample sizes for proper data quality (Shaw, 2016). We acknowledge that the recession period may vary depending on the drainage area. The recessions we identify and analyze have indeed different durations in different catchments. In this study, we select a fixed minimum number of 5 days (Dralle et al., 2017) to ensure sufficient sample size for suitably characterizing recession attributes.
3) “The upper tail is defined by an optimized lower boundary of the discharge, determined by selecting the best fit based on the KS statistic”. This is not quite clear. How the upper tail is defined and statistically modelled is important. Section 3.2 also needs to be concise and informative. Please reconstruct.
Thank you for your comment. We will modify lines 232-233 (“The upper tail is defined by an optimized lower boundary of the discharge, determined by selecting the best fit based on the KS statistic”) as below to improve the clarity:
“Empirical data following a power-law distribution (if applicable) typically do so above a certain lower bound, defining the analyzed tail (Clauset et al., 2009). Therefore, we employ the approach proposed by Clauset et al. (2007) to determine the optimized lower boundary. This method selects the boundary where the probability distributions of the data and the best-fit power-law model are most similar. If the optimized lower boundary is higher than the true lower boundary, the reduced data set size leads to a poorer match due to statistical fluctuations. If it is lower, the distributions differ fundamentally. The KS statistic is employed to quantify the distance between these distributions.”
We will also revise the remaining of section 3.2 with the goal of shortening it.
4) Line 269, by “majority” you use “50 %” as the threshold?
Yes we do. We will specify it in the revised paper.
5) From Figure 1 and Figure 2, we can see there are many overlaps between heavy tails and nonheavy tails. This is especially evident in Figure 2 where we see the scatters are well mixed. These results make me wonder the utility of recession exponent (a=2) as the criteria. I would suggest the authors to explain and discuss the limitation.
Ideally, the validation of the new index should include benchmarks for both heavy-tailed and non-heavy-tailed case studies. However, we can only statistically establish the benchmark for the former (power-law-tailed case studies, black dots in Figure 2), but not for the latter (uncertain case studies, gray dots in Figure 2). This is due to the latter category encompasses case studies that either do not follow a power-law distribution or whose underlying distributions cannot be determined due to high uncertainty from small sample sizes, and thus contributes to the ‘mixed pattern’ as indicated by the reviewer. Notice that several years of data are still a small sample to reliably characterize the tail of empirical and purely statistical distributions fitted on the data.
Due to this approach limitation, the effectiveness of the recession exponent shall be estimated only based on the former group (black dots), as highlighted in Figure 2. To rigorously validate the effectiveness of the recession exponent criteria, we need also Figure 1, where we statistically confirm the hypothesized heavy- and non-heavy-tailed groups using a=2 as the criterion.
This confusion seems unavoidable due to the inherent limitations of data analysis. Meanwhile, we acknowledge that misattribution can occur due to the recession exponent not always being able to properly distinguish between heavy and light tails, particularly when a is around the threshold value of 2. This issue is shown in the case studies in Norway, which we discuss in lines 356-363. We have addressed this in lines 242-246:
"We term such a case study a 'power-law-tailed case study,' while cases that don't meet these criteria are labeled as 'uncertain case studies' in subsequent analyses. The latter label acknowledges the awareness that we cannot definitively conclude whether these case studies are indeed not power-law-tailed or if their underlying distributions cannot be determined due to the high uncertainty caused by the small sample sizes of available observations."
Moreover, we will insert additional discussion after line 331 to improve clarity:
"We cannot conclude whether uncertain case studies (gray dots) represent cases that are indeed not power-law-tailed or if their underlying distributions cannot be determined due to the high uncertainty caused by small sample sizes. Therefore, we benchmark the recession exponent against the empirical power law exponent by focusing on the 'certain group,' i.e., power-law-tailed case studies (black dots)."
The following statement will also be inserted at the end of line 365:
“However, we acknowledge that misattributions can still occur, particularly when a is around the threshold value.”
6) Figure 4 and the text, please explain the rationale of using percentage. The absolute count of watersheds would matter, as can be seen that there is only one watershed in Bwk. This can be due to sampling uncertainties.
We recognize that the varying number of cases across climate types might introduce bias due to sample sensitivity (as we have mentioned in lines 490-493). Nonetheless, in order to estimate the propensity of tail behavior the ratio (i.e., percentage) of heavy- to non-heavy-tailed case studies in each climate region is considered to be one of the most direct approaches. We will revise lines 490-493 to emphasize this concern of the employed dataset and approach:
Original lines 490-493: “We acknowledge that these results are based on overarching conditions and do not encompass all climate types, and achieving an equal number of study sites across various climate regions might not always be feasible. Expanding the number of study sites could further enhance our understanding, especially for extreme cases.”
Modified version: “We acknowledge that these results are based on overarching conditions and do not encompass all climate types, and achieving an equal number of study sites across various climate regions might not always be feasible. We should be mindful of potential bias caused by sample sensitivity, particularly in regions with a limited number of cases (e.g., Csa, BSh, BWk in this study). Expanding the number of study sites could further enhance our understanding, especially for extreme cases.”
7) What do the authors mean by “catchment storage”? Please clarify.
Thank you for pointing this out. We will improve the clarity by adding the following description after lines 487-489 (“we have identified the conjunction of dry periods and higher temperatures as crucial meteorological factors significantly contributing to the dynamics of catchment storage, thereby influencing the nonlinearity of hydrological responses.”):
“We refer to catchment storage hereafter as the water contained in a catchment at a certain moment, which concurs to define its wetness status (this is chiefly the degree of saturation of the critical zone). This capacity is dynamic and depends on various factors, such as soil moisture states, precipitation, and evapotranspiration (Merz and Blöschl, 2009; Zhou et al., 2022).”
8) Line 571, I would suggest the role of ET alone might not be that important. The ratio of ET to P worthwhile to be explored.
We agree that, based on our analyses and findings, the temporal characteristics of rainfall and evapotranspiration collaboratively influence this seasonality, as discussed in detail in lines 539-548. We will therefore revise lines 569-571 as follows:
Original lines 569-571: “Regions with pronounced temperature variations across seasons, particularly with higher temperature in summer, tend to display such dynamics and highlight the role of evapotranspiration in catchments in driving this seasonality.”
Modified version: “Regions with pronounced temperature variations across seasons, particularly with higher temperatures in summer, and characterized by relatively evenly distributed rainfall throughout the year tend to display such dynamics. This highlights the importance of both evapotranspiration and the temporal characteristics of rainfall in shaping flood tail behavior across seasons, aligning with previous studies (Guo et al., 2014; Basso et al., 2023).”
9) Line 605-606, the three references use indicators that quantify the heaviness of upper tails, while in this study, the authors are in fact addressing “propensity”.
Thank you for pointing this out. We will improve the clarity as below:
Original lines 604-606: “These findings align with previous discussions on this matter (e.g., Merz and Blöschl, 2009; Villarini and Smith, 2010; Smith et al., 2018), which have suggested a relatively weak inverse correlation between catchment area and the occurrence of heavy-tailed flood behavior.”
Modified version: “In a similar context, previous studies, using different heavy-tailed flood indices, have suggested a relatively weak inverse correlation between catchment area and the occurrence of heavy-tailed flood behavior (e.g., Merz and Blöschl, 2009; Villarini and Smith, 2010; Smith et al., 2018).”
It is worth noting that the heavy-tail propensity identified in this study encompasses: 1) case studies confirmed to exhibit power-law tail behavior, 2) case studies where power-law tail behavior could not be confirmed due to insufficient samples, and 3) case studies that do not show power-law tail behavior based on historical data but are suggested to exhibit such behavior due to high catchment nonlinearity. The first type is likely to be identified in studies employing different heavy-tailed flood indices as well.
10) Line 649-650, this is obvious.
This sentence (lines 649-650) serves as a contrasting pattern with the following one (lines 650-652). We will revise the terms to tone it down and present a more neutral statement as follows:
Original lines 649-650: “Our findings first indicate that regions with relatively uniform hydroclimatic conditions (the Atlantic Europe and Northern Europe) tend to exhibit a single/dominant propensity of flood tail behavior.”
Modified version: “As expected, the results show regions with relatively uniform hydroclimatic conditions (the Atlantic Europe and Northern Europe) tend to exhibit a single/dominant propensity of flood tail behavior.”
11) Section 5, I enjoy reading this section overall, but it can be further improved by explicitly highlighting what are found in this study, and what are proposed by previous studies, especially the review paper by Merz et al.
Thank you for the suggestion. We will enhance the clarity of this section by highlighting the comparison between current understanding and the new findings contributed by this study in the revised version.
References
- Basso, S., Merz, R., Tarasova, L., & Miniussi, A. (2023). Extreme flooding controlled by stream network organization and flow regime. Nature Geoscience, 16(April), 339–343. https://doi.org/10.1038/s41561-023-01155-w
- Biswal, B., & Marani, M. (2010). Geomorphological origin of recession curves. Geophysical Research Letters, 37(24), 1–5. https://doi.org/10.1029/2010GL045415
- Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661–703. https://doi.org/10.1137/070710111
- Clauset, A., Young, M., & Gleditsch, K. S. (2007). On the Frequency of Severe Terrorist Events. Journal of Conflict Resolution, 51(1), 58–87. https://doi.org/10.1177/0022002706296157
- Dralle, D. N., Karst, N. J., Charalampous, K., Veenstra, A., & Thompson, S. E. (2017). Event-scale power law recession analysis: Quantifying methodological uncertainty. Hydrology and Earth System Sciences, 21(1), 65–81. https://doi.org/10.5194/hess-21-65-2017
- Guo, J., Li, H.-Y., Leung, L. R., Guo, S., Liu, P., & Sivapalan, M. (2014). Links between flood frequency and annual water balance behaviors: A basis for similarity and regionalization. Water Resources Research, 50, 937–953. https://doi.org/http://dx.doi.org/10.1002/2013WR014374
- Merz, R., & Blöschl, G. (2009). Process controls on the statistical flood moments - a data based analysis. Hydrological Processes, 23(5), 675–696. https://doi.org/10.1002/hyp
- Shaw, S. B. (2016). Investigating the linkage between streamflow recession rates and channel network contraction in a mesoscale catchment in New York state. Hydrological Processes, 30(3), 479–492. https://doi.org/10.1002/hyp.10626
- Shaw, S. B., & Riha, S. J. (2012). Examining individual recession events instead of a data cloud: Using a modified interpretation of dQ/dt-Q streamflow recession in glaciated watersheds to better inform models of low flow. Journal of Hydrology, 434–435, 46–54. https://doi.org/10.1016/j.jhydrol.2012.02.034
- Smith, J. A., Cox, A. A., Baeck, M. L., Yang, L., & Bates, P. (2018). Strange Floods: The Upper Tail of Flood Peaks in the United States. Water Resources Research, 54(9), 6510–6542. https://doi.org/10.1029/2018WR022539
- Villarini, G., & Smith, J. A. (2010). Flood peak distributions for the eastern United States. Water Resources Research, 46(6), 1–17. https://doi.org/10.1029/2009WR008395
- Wang, H., Merz, R., Yang, S., & Basso, S. (2023). Inferring heavy tails of flood distributions through hydrograph recession. Hydrol. Earth Syst. Sci, 27(24), 4369–4384. https://doi.org/10.5194/hess-27-4369-2023
- Wietzke, L. M., Merz, B., Gerlitz, L., Kreibich, H., Guse, B., Castellarin, A., & Vorogushyn, S. (2020). Comparative analysis of scalar upper tail indicators. Hydrological Sciences Journal, 65(10), 1625–1639. https://doi.org/10.1080/02626667.2020.1769104
- Ye, S., Li, H. Y., Huang, M., Alebachew, M. A., Leng, G., Leung, L. R., et al. (2014). Regionalization of subsurface stormflow parameters of hydrologic models: Derivation from regional analysis of streamflow recession curves. Journal of Hydrology, 519(PA), 670–682. https://doi.org/10.1016/j.jhydrol.2014.07.017
- Zhou, X., Sheng, Z., Yang, Y., Han, S., Zhang, Q., Li, H., & Yang, Y. (2022). Catchment water storage dynamics and its role in modulating streamflow generation in spectral perspective: a case study in the headwater of Baiyang Lake, China. Hydrology and Earth System Sciences, (November). Retrieved from https://doi.org/10.5194/hess-2022-357
Citation: https://doi.org/10.5194/hess-2024-159-AC1
-
RC2: 'Comment on hess-2024-159', Anonymous Referee #2, 07 Aug 2024
This paper investigates an interesting hypothesis, that the tail heaviness of flood distributions can be inferred from a recession analysis, through a large sample analysis with data from Europe and the United States. In hydrology, a heavy tail distribution of floods means that extreme floods are more likely to occur than would be predicted by distributions that have exponential asymptotic behaviour. Very large extremes may therefore happen with a probability that is not as small as one would expect if using, for example, a Gumbel distribution to model flood probabilities, and may result in huge damages due to their surprising nature. Identifying the properties of the distribution tails is a hard task which requires, usually, very long series of data and/or regional analyses when using statistical techniques. Linking tail heaviness to catchment behaviour and process understanding is therefore a very interesting avenue to follow in order to increase our confidence in the behaviour of the extremes. This is the aim of this paper and it is therefore of interest to the hydrology community. The paper is well-written with high-quality figures.
Having said that, I am not convinced by the research conducted here. My concerns are essentially two:
1) The first aim of the paper is to validate the effectiveness of the method in identifying heavy tail flood behavior (line 95). However, the validation is based on analysing whether samples of (on average) 10 years of length (line 253), at the daily to monthly maxima timescales, can be represented in the tails by an empirical power law. My question is whether this "tail analysis" is representative of the extremes of practical interest? In hydrological practice, we typically focus on return periods of 100 or 200 years and sometimes more. It is unclear to me what are the return periods of interest in this paper. It is important to reflect on the range of return periods for which the analyses in this paper are meant because processes may emerge with increasing return periods which are not at work for less extreme floods. I am not convinced that the paper demonstrates that the method is relevant for extremes that are of interest in hydrology. Benchmarking the recession analysis on a statistical analysis on short samples does not constitute a proper validation of the method. In order to better evaluate the effectiveness of the method in identifying heavy tail flood behavior, an additional, and to me more convincing, benchmark should be the analysis of (many) long timeseries with methods usually adopted in flood frequency analysis (e.g., fit of the GEV shape parameter, better if using regional analysis).
2) The second aim of the paper is the evaluation of the causes for differences in the recession coefficient and therefore, based on the hypothesis made here, of the tail heaviness of floods. The results are not that easy to interpret, since the method proposed uses a sharp threshold on the recession coefficient (a=2) to distinguish between heavy tail behaviour and (possibly) non-heavy tail behaviour. So only a binary distinction is made and differences between Germany and UK, for example, cannot be clearly identified. Since different degrees of tail heaviness exist, wouldn't it be more useful to link the recession coefficient to, for example, the exponent of the empirical power law b? The Authors show something like this in Figure 2 even though the relationship doesn't seem to be so strong. But wouldn't that be more useful in hydrological practice where, for instance, the estimation of the GEV shape parameter is of interest? Besides, I think the spatial results obtained here should be compared to regional studies on flood frequency? One example is Macdonald et al. (2022) who identify the GEV shape parameter as a quantification of tail heaviness in Germany. Figure 3a here seems consistent with Figure 4 of Macdonald et al. (2022). What about the other regions? In the US there are maps of the regional skewness in the Bulletin 17b. These comparisons could strengthen the confidence in the effectiveness of the method, since they are based on longer timeseries and on regional analyses.
Given these concerns, I am sorry I cannot recommend publication of this work in HESS.
Citation: https://doi.org/10.5194/hess-2024-159-RC2 -
AC2: 'Reply on RC2', Hsing-Jui Wang, 20 Aug 2024
Reply on Reviewer 2
We thank the Reviewer for providing comments. We have addressed the concerns below and will incorporate them into the revised manuscript. For ease of reference, the Reviewer's comments are marked in italic font, while our replies are indicated in normal font.
===
This paper investigates an interesting hypothesis, that the tail heaviness of flood distributions can be inferred from a recession analysis, through a large sample analysis with data from Europe and the United States. In hydrology, a heavy tail distribution of floods means that extreme floods are more likely to occur than would be predicted by distributions that have exponential asymptotic behaviour. Very large extremes may therefore happen with a probability that is not as small as one would expect if using, for example, a Gumbel distribution to model flood probabilities, and may result in huge damages due to their surprising nature. Identifying the properties of the distribution tails is a hard task which requires, usually, very long series of data and/or regional analyses when using statistical techniques. Linking tail heaviness to catchment behaviour and process understanding is therefore a very interesting avenue to follow in order to increase our confidence in the behaviour of the extremes. This is the aim of this paper and it is therefore of interest to the hydrology community. The paper is well-written with high-quality figures.
Thank you for summarizing the aims of this study.
1) The first aim of the paper is to validate the effectiveness of the method in identifying heavy tail flood behavior (line 95). However, the validation is based on analysing whether samples of (on average) 10 years of length (line 253), at the daily to monthly maxima timescales, can be represented in the tails by an empirical power law. My question is whether this "tail analysis" is representative of the extremes of practical interest? In hydrological practice, we typically focus on return periods of 100 or 200 years and sometimes more. It is unclear to me what are the return periods of interest in this paper. It is important to reflect on the range of return periods for which the analyses in this paper are meant because processes may emerge with increasing return periods which are not at work for less extreme floods. I am not convinced that the paper demonstrates that the method is relevant for extremes that are of interest in hydrology. Benchmarking the recession analysis on a statistical analysis on short samples does not constitute a proper validation of the method. In order to better evaluate the effectiveness of the method in identifying heavy tail flood behavior, an additional, and to me more convincing, benchmark should be the analysis of (many) long timeseries with methods usually adopted in flood frequency analysis (e.g., fit of the GEV shape parameter, better if using regional analysis).
The concern raised pertains to line 253, where we mention that the average sample size used for fitting empirical power laws on monthly maximum streamflow is 132. We believe the reviewer interpreted this as representing roughly 10 years of observations, which is considered too short to represent extreme behavior. We would like to clarify that the average sample size of 132 refers only to the length of the identified 'tail' of the frequency distribution, not the length of the entire observation period. The full observation period averages, however, 62 years (ranging from 24 to 148 years) across the dataset used in this study (see line 125). It's important to note that empirical data, if they follow a power-law distribution, typically do so only for values above a certain threshold (i.e., the tail). Consequently, it is standard practice to first identify this threshold (i.e., where is the tail) before fitting a power law. The sample size of 132 hence refers to the most extreme monthly maxima above this threshold observed within an average 62-year-long data series.
Although the length of the data series used in this study (62-year length in average) do not allow to derive directly from data the magnitude of events with 100 or 200 years return period, such an observation period aligns with what is normally available (Bertola et al., 2023) and used in flood frequency analysis for estimating 100- or 200-year floods (Zhao et al., 2021).
Several studies suggested that the shape parameter of the GEV may not be a reliable indicator of tail heaviness because it is highly sensitive to the length of the observation series and the occurrence of outliers (Hu et al., 2020; Cai and Hames, 2010). However, to address the concerns of the reviewer and support the benchmark employed in this study, we also calculated the L-moment ratio diagrams, which have been demonstrated to be a more robust approach, particularly for evaluating highly skewed samples (Vogel and Fennessey, 1993). Consequently, 97.8%, 100%, and 94.1% of the identified heavy-tailed case studies—based on empirical power law fitting in the analyses of daily, ordinary peaks, and monthly maxima, respectively—exhibit greater L-skewness and L-kurtosis than the exponential distribution, indicating heavy-tailed behavior. (See details in lines 77-80, 326-330, and Figure S5.)
We regret any confusion caused by the previous wording and intend to clarify this point in the text from line 253 as follows: “It's important to clarify that these sample sizes refer specifically to the tail of the empirical distributions. In other words, only the most extreme observations are analyzed to determine whether the empirical distributions exhibit power-law behavior in their tails. For the overview of the entire data series analyzed in this study, please refer to Section 2.”
2) The second aim of the paper is the evaluation of the causes for differences in the recession coefficient and therefore, based on the hypothesis made here, of the tail heaviness of floods. The results are not that easy to interpret, since the method proposed uses a sharp threshold on the recession coefficient (a=2) to distinguish between heavy tail behaviour and (possibly) non-heavy tail behaviour. So only a binary distinction is made and differences between Germany and UK, for example, cannot be clearly identified. Since different degrees of tail heaviness exist, wouldn't it be more useful to link the recession coefficient to, for example, the exponent of the empirical power law b? The Authors show something like this in Figure 2 even though the relationship doesn't seem to be so strong. But wouldn't that be more useful in hydrological practice where, for instance, the estimation of the GEV shape parameter is of interest?
The identification of heavy-tailed floods through hydrograph recession analysis (employed in this study) uses a threshold of two on the recession exponent to distinguish heavy-tailed cases from non-heavy-tailed ones. The method further allows for evaluating the tail heaviness based on the specific exponent values, as noted by the reviewer. Such an approach resembles what done for other indices, such as the GEV shape parameter, where a threshold value of zero is used to differentiate between heavy-tailed and non-heavy-tailed distributions. Differently from the latter case, where the threshold of zero has a statistical meaning only, the threshold of two in the method adopted in this study has also a physical meaning, as it represents a degree of non-linearity of the catchment hydrologic response which causes a shift in the resulting streamflow and flood distributions (see Botter et al., 2009; Kirchner et al., 2009; Basso et al., 2023).
This study emphasizes a binary distinction between heavy and non-heavy-tailed distributions - rather than discussing the degree of heaviness - for two reasons. First, a reliable identification of heavy-tailed distributions (i.e., even without any claim about their degree of heaviness) is per se a difficult task, as noted by the reviewer in the previous comment. Second, the identification itself holds significant hydrological importance, regardless of the degree of heaviness. In fact, the presence of a heavy tail alone can serve as a critical warning of a relatively high probability of extreme events. For the latter reason also other studies, even those using indices like the GEV shape parameter, often focus on distinguishing cases with heavy-tailed flood distributions from those without (e.g., Macdonald et al., 2022). In addition, there is a notable knowledge gap in conducting such investigations on an extensive international scale. This gap is largely due to the fact that quantifying such behavior remains highly sensitive to the sample size (Wietzke et al., 2020; Hu et al., 2020 ), making reliable identification across different datasets challenging (Merz et al., 2022).
We acknowledge uncertainties in the estimation of the degree of heaviness, particularly for cases with values close to the employed threshold. However, the recession exponent used in this study has been tested and found to be more reliable to distinguish between heavy and non-heavy-tailed distributions than, e.g., the GEV shape parameter, especially in analyses with short data lengths, as shown in a previous work (see Hu et al., 2020 and Wang et al., 2023). This justifies the selection of such an index for investigations across a broader range of study areas. We will improve the relevant statement as follows: Line 556-560 “We acknowledge that the hydroclimatic factors analyzed in this study may not account for all cases... This discrepancy could be attributed to either the uncertainty in inferring heavy-tailed floods through recession exponents (particularly for cases with values close to the threshold) or the presence of additional factors or mechanisms influencing flood tail behavior in these regions.”
Besides, I think the spatial results obtained here should be compared to regional studies on flood frequency? One example is Macdonald et al. (2022) who identify the GEV shape parameter as a quantification of tail heaviness in Germany. Figure 3a here seems consistent with Figure 4 of Macdonald et al. (2022). What about the other regions? In the US there are maps of the regional skewness in the Bulletin 17b. These comparisons could strengthen the confidence in the effectiveness of the method, since they are based on longer timeseries and on regional analyses.
We agree that comparing our findings with previous regional studies would strengthen the conclusions of this work, as we described in lines 390-424. We thank the reviewer for suggesting further comparisons. Accordingly, we will enhance this section with the following modifications:
For Germany: Line 395: “This finding aligns with Macdonald et al. (2022), who used GEV shape parameters as an indicator of heavy-tailed behavior for gauges with more than 50 years of observations.”
For the UK: Line 400-401: “According to our findings, heavy-tailed flood behavior is prevalent in the UK, with a prevalence of 77%, particularly in the eastern and southern coastal regions. This aligns with previous regional findings (European Environmental Agency, 2010; Robson, 2002).”
For the US: Line 419: “In particular, catchments on the eastern side of the Appalachian Mountains exhibit pronounced heavy-tailed flood behavior, while those on the western side mostly exhibit non-heavy-tailed behavior. This is consistent with several previous findings based on the skewness of annual maximum streamflow (Bulletin 17b, 1982), the GEV shape parameters (Villarini and Smith, 2010), and the upper tail ratio (Smith et al., 2018).”We would like to highlight that the analyses of this study, which are based on shorter and more variable lengths of data (24-148 years) and on analyzing hydrograph recessions from ordinary flows rather than flood records, provide findings in agreement with studies that rely on longer data records (e.g., only gauges with more than 75 years in the work of Villarini and Smith (2010), and more than 50 years in Macdonald et al. (2022)). Such an agreement not only confirms the effectiveness of this new approach but also highlights its advantages. While most previous studies have been confined to specific regional areas, the newly proposed approach allows for the analysis across a broader geographical area, facilitating the investigation of diverse conditions.
Given these concerns, I am sorry I cannot recommend publication of this work in HESS.We hope the response provided above satisfactorily addresses the reviewer’s concerns.
We would like to emphasize that this work not only aims at validating the effectiveness of the newly proposed approach (as demonstrated in greater detail in our previous work, Wang et al., 2023) but also at using this index to shed light on the relationships between heavy-tailed flood behavior and critical environmental factors (e.g., climate, catchment areas) that remain poorly understood.
References
Basso, S., Merz, R., Tarasova, L., & Miniussi, A. (2023). Extreme flooding controlled by stream network organization and flow regime. Nature Geoscience, 16(April), 339–343. https://doi.org/10.1038/s41561-023-01155-w
Bertola, M., Blöschl, G., Bohac, M., Borga, M., Castellarin, A., Chirico, G. B., et al. (2023). Megafloods in Europe can be anticipated from observations in hydrologically similar catchments. Nature Geoscience, 16(11), 982–988. https://doi.org/10.1038/s41561-023-01300-5
Botter, G., Porporato, A., Rodriguez-Iturbe, I., & Rinaldo, A. (2009). Nonlinear storage-discharge relations and catchment streamflow regimes. Water Resources Research, 45(10), 1–16. https://doi.org/10.1029/2008WR007658
Bulletin17B: Interagency Committee on Water Data. (1982). Guidelines for determining flood flow frequency. Hydrology Subcommittee, Technical Report.
Cai, Y., & Hames, D. (2010). Minimum sample size determination for generalized extreme value distribution. Communications in Statistics: Simulation and Computation, 40(1), 87–98. https://doi.org/10.1080/03610918.2010.530368
Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661–703. https://doi.org/10.1137/070710111
European Environmental Agency. (2010). Mapping the impacts of natural hazards and technological accidents in Europe An overview of the last decade. Publications Office of the European Union. https://doi.org/10.2800/62638
Hu, L., Nikolopoulos, E. I., Marra, F., & Anagnostou, E. N. (2020). Sensitivity of flood frequency analysis to data record, statistical model, and parameter estimation methods: An evaluation over the contiguous United States. Journal of Flood Risk Management, 13(1), 1–13. https://doi.org/10.1111/jfr3.12580
Kirchner, J. W. (2009). Catchments as simple dynamical systems: Catchment characterization, rainfall-runoff modeling, and doing hydrology backward. Water Resources Research, 45(2), 1–34. https://doi.org/10.1029/2008WR006912
Merz, B., Basso, S., Fischer, S., Lun, D., Blöschl, G., Merz, R., et al. (2022). Understanding heavy tails of flood peak distributions. Water Resources Research, 1–37. https://doi.org/10.1029/2021wr030506
Robson, A. J. (2002). Evidence for trends in UK flooding. Philosophical Transactions of the Royal Society A, 360, 1327–1343. https://doi.org/10.1098/rsta.2002.1003
Smith, J. A., Cox, A. A., Baeck, M. L., Yang, L., & Bates, P. (2018). Strange Floods: The Upper Tail of Flood Peaks in the United States. Water Resources Research, 54(9), 6510–6542. https://doi.org/10.1029/2018WR022539
Villarini, G., & Smith, J. A. (2010). Flood peak distributions for the eastern United States. Water Resources Research, 46(6), 1–17. https://doi.org/10.1029/2009WR008395
Vogel, R. M., & Fennesse, N. M. (1993). L moment diagrams should replace product moment diagrams. Water Resources Research, 29(6), 1745–1752. https://doi.org/10.1029/93WR00341
Wang, H., Merz, R., Yang, S., & Basso, S. (2023). Inferring heavy tails of flood distributions through hydrograph recession. Hydrol. Earth Syst. Sci, 27(24), 4369–4384. https://doi.org/10.5194/hess-27-4369-2023
Wietzke, L. M., Merz, B., Gerlitz, L., Kreibich, H., Guse, B., Castellarin, A., & Vorogushyn, S. (2020). Comparative analysis of scalar upper tail indicators. Hydrological Sciences Journal, 65(10), 1625–1639. https://doi.org/10.1080/02626667.2020.1769104
Zhao, G., Bates, P., Neal, J., & Pang, B. (2021). Design flood estimation for global river networks based on machine learning models. Hydrology and Earth System Sciences, 25(11), 5981–5999. https://doi.org/10.5194/hess-25-5981-2021
Citation: https://doi.org/10.5194/hess-2024-159-AC2
-
AC2: 'Reply on RC2', Hsing-Jui Wang, 20 Aug 2024
-
EC1: 'Comment on hess-2024-159', Serena Ceola, 25 Nov 2024
I was asked to review this paper, which received contrasting comments. I appreciate the overall scientific objective of the manuscript, even though I have some doubts on the effectiveness of estimating heavy tail distribution starting from a few years of observations. I encourage the authors to provide detailed comments about this. In particular, authors should clearly and concisely highlight the approximations and limitations of the methodolgy they outlined. They shoudl e.g., clearly state that they assume that severe floods are caused by the same drivers and smaller floods. Overall, I found the manuscript quite lenghty and difficult to read. Even though, from the first review round, the manuscript has been significantly improved substantially, I invite the authors to shorten it a improve its readability.
Citation: https://doi.org/10.5194/hess-2024-159-EC1
Data sets
Global Runoff Data Centre (GRDC) Federal Institute for Hydrology (BfG) http://www.bafg.de/GRDC/EN
Shuttle Radar Topography Mission (SRTM) A. Jarvis et al. https://cgiarcsi.community/data/srtm-90m-digital-elevation-database-v4-1/
High-Resolution Present-Day Köppen Climate Map H. E. Beck et al. https://doi.org/10.1038/sdata.2018.214
High-Resolution Map of Derived Potential Evapotranspiration R. J. Zomer et al. https://doi.org/10.1038/s41597-022-01493-1
Global Dams and Reservoirs Dataset: GeoDAR v.1.0 J. Wang et al. https://doi.org/10.5281/zenodo.6163413
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
348 | 103 | 74 | 525 | 39 | 14 | 15 |
- HTML: 348
- PDF: 103
- XML: 74
- Total: 525
- Supplement: 39
- BibTeX: 14
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1