the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Modelling flood frequency and magnitude in glacially conditioned settings: land use matters
Pamela Elizabeth Tetford
Joseph Robert Desloges
Abstract. A reliable flood frequency analysis (FFA) requires selection of an appropriate statistical distribution to model historic streamflow data and, where streamflow data are not available (ungauged sites), a regression-based regional flood frequency analysis (RFFA) often correlates well with downstream channel discharge to drainage area relations. However, the predictive strength of the accepted RFFA relies on an assumption of homogeneous watershed conditions. For glacially conditioned fluvial systems, inherited glacial landforms, sediments, and variable land use can alter flow paths and modify flow regimes. This study compares a multi-variate RFFA that considers 28 explanatory variables to characterize variable watershed conditions (i.e., surficial geology, climate, topography, and land use) to an accepted power-law relationship between discharge and drainage area. Archived gauge data from southern Ontario, Canada are used to test these ideas. Mathematical goodness-of-fit criteria best estimate flood discharge for a broad range of flood recurrence intervals, i.e., 1.25, 2, 5, 10, 25, 50, and 100 years. The LN, EV1, LP3, and GEV distributions are found most appropriate in 42.5 %, 31.9 %, 21.7 %, and 3.9 % of cases, respectively, suggesting that systematic model selection criterion is required for FFA in heterogeneous landscapes. Multi-variate regression of estimated flood quantiles with backward elimination of explanatory variables using principal component and discriminant analyses reveal that precipitation provides a greater predictive relationship for more frequent flood events, whereas surficial geology demonstrates more predictive ability for high magnitude, less frequent flood events. In this study, all seven flood quantiles identify a statistically significant two-predictor model that incorporates upstream drainage area and the percentage of naturalized landscape with 5 % improvement in predictive power over the commonly used single-variable drainage area model (p < 2.2e−16). An analysis of variance (ANOVA) further supports the two-predictor model indicating a decrease in the sum of squares of residuals and an F statistic (p < 0.001) that demonstrates very strong evidence in favour of the two-predictor model (i.e., drainage area and land use) when estimating flood discharge in this low-relief landscape with pronounced glacial legacy effects and heterogenous land use.
- Preprint
(2125 KB) - Metadata XML
- BibTeX
- EndNote
Pamela Elizabeth Tetford and Joseph Robert Desloges
Status: final response (author comments only)
-
RC1: 'Comment on hess-2022-411', Anonymous Referee #1, 27 Mar 2023
The authors present a study that implements an additional environmental variable into a commonly used approach for RFFA through a series of statistical methods. The objective is to incorporate the spatial heterogeneity of basins into the approach, which is interesting in its nature. The results demonstrate improvement in the model prediction, and various variables are utilized in statistical methods to produce more rational results. I have two major comments regarding the description that I believe the authors could address. Furthermore, I have listed several minor comments in order to maximize the impact of this study.
Major comments:
- The methods section of the current manuscript is somewhat scattered, with several critical details not being explicitly stated and only being supplemented in the results section. As a result, the reader may find it challenging to understand the entire procedures used in this study and would require a considerable amount of effort to piece together all the information, while some crucial details may still be missing. I have listed some examples below, but there may be more. Addressing these concerns would aid in maximizing the reproducibility of this method.
- L210: It is not clear how the authors define "the theoretical association." I believe it refers to the Pearson correlation, which is mentioned much later in the paper.
- Section 3.4 consists of only one sentence, which indicates that several details are missing.
- L218-220: It is unclear how the spatial patterns of 45 sub-basins were incorporated into the analysis of 207 gauges. Additionally, it is not clear whether "sub-watersheds" and "sub-basin units" refer to the same concept or different concepts.
- L221-222: This information should have been included in Section 3.1 because it pertains to the fundamental information regarding the hydrological data used in the study.
- L224-226: This information should also have been included in Section 3.1 because it pertains to a strategy used in the study for applying the model selection criteria. The first part of this information is already included in Section 3.1, while the second part is in the results section, leading to confusion. Additionally, the authors should provide relevant references.
- L235: This sentence is unclear and difficult to comprehend.
- L258: It is unclear how the authors define "strongly significant."
- Section 4.3.1 is a method, rather than results.
- The authors aimed to improve the RFFA method by incorporating variables that consider the spatial heterogeneity of sub-basins and investigated the influence of 28 variables on the RFFA. While the final results indicated an overall 5% improvement in adjusted R2, with the value increasing from approximately 0.85 to 0.9, the authors employed several statistical methods to eliminate 27 out of the 28 variables. The backward elimination process was used to identify the most critical variables and remove those that did not significantly contribute to the model. However, the authors' justification for discarding the 27 variables is not entirely convincing, especially considering that 27 is a significant portion of the total 28 variables investigated (e.g., the relatively confusing statement in lines 347-370). To reinforce the validity of this analysis, a clear and systematic procedure should be presented (as discarding such a large portion of variables without sufficient justification may raise questions about the validity and robustness of the proposed method), which currently appears to be in a relatively random order. The inclusion of tables or hierarchical diagrams could potentially aid in clarifying the methodology. Additionally, the authors should explain how they selected the different but parallel statistical methods, such as correlation analysis and PCA, and how the elimination process was objectively determined. By reorganizing the description in a more structured and coherent way, the authors could strengthen the justification for the proposed model.
Minor comments:
- It is not clear by which method the authors used for comparing the 3(or 4 in some cases) model selection criteria (i.e., AIC, BIC, ADC, AICc) for determining the best fit distributions.
- Concerning the Pearson correlation: (1) why use a standard of 0.6 (instead of a common one, e.g., 0.5)? any references? (2) Does the normality been checked? If not, would using the Spearman correlation be a better option
- L327-331 stated that WS_Area and Stream_Length are removed whereas WS_Perimeter and Basin_Compactness are retain due to the intension to focus on basin shape rather than basin size for concentrating on the efficiency of the fluvial system transport. However, it should be aware that the length-area relationship has been found as a function of basin shape (Sassolas-Serrayet et al., 2018). Additionally, several studies have shown that basin size plays a critical role in catchment hydrological response and determines the curve of the flood frequency analysis (Merz & Blöschl, 2009). While it is recognized that the drainage size appears in the final model based on its origin as a one-variable model, the description provided by the authors may be misleading.
- It is unclear why drainage size still appears in the 28 variables if it has already been recognized as a prominent variable in previous RFFA.
- The link between the interpretation of the PCA results and Figure 6 needs improvement, and it is unclear what procedure/standard the authors used to objectively eliminate variables based on the results of the Pearson correlations and PCA
- “the observed correlation” in L326 should be stated in a quantitative way.
- It is stated in L227-233 and L428-436 that the results of the best-fit distributions show an extreme low percentage for GEV distributions and the authors refer (or hint?) this to the number of parameters. Although it is not incorrect, it should be noted that the two two-parameter distributions used in this study (EV1 and LN) have lighter-tailed behavior than the two three-parameter distributions (GEV and LP3) (El Adlouni et al., 2008). Based on their results, it implies that the estimates of the extreme flood quantiles are relatively small in this region. Such a result may be due to the shorter dataset used in this study, which should be roughly between 37-48 years long based on the informative mean and standard deviation of the record length provided in this paper. As a result, there may be considerable uncertainty in estimating extreme quantiles for GEV (e.g., Papalexiou and Koutsoyiannis, 2013), which should be properly referenced in the discussion.
- The title suggests a potential linkage or interaction between these conditions and flood frequency analysis, so it is expected that more discussion of this topic would be included (which is still weak, e.g., 450-452, and supporting references are expected). Otherwise, it may be advisable to reconsider the title to avoid any potential for confusion.
- In L463-464, the authors imply that they intend to implement another modified model to properly capture spatial variability in a glacial region, but the resulting model obtained in this study eventually excluded all these variables. Please comment on this, and it is not clear whether the scope of this study is "glacial conditions" or "land use" or both from the current title.
- Please provide more information on the dataset used in Sec. 3.3, such as whether the same basin boundary was used for the analyses of the one-variable model and the two-variable model, the duration of streamflow data, and whether the analysis is representative if the duration of each variable does not overlap (as shown, the precipitation data is from 1981-2000, whereas the land use data is based on the dataset from 2014-2017).
- L160-161: Since the authors explain the reason for using annual maxima instead of partial duration series (PDS), it would be appropriate to acknowledge other commonly used metrics, e.g., the peak-over-threshold (POT) analysis.
- There is no clear discussion in the previous section concerning the statement in L537-538, which appears as the closing sentence of the conclusion.
- It could be beneficial for the authors to consider providing some additional discussion on how the findings of this study may be transferable to other regions. Do the authors suggest directly including naturalized variables in RFFA for other regions, or do they recommend conducting an entire eliminating process based on different environmental variables obtained in the study area?
- Throughout:
- There appear to be a relatively large number of minor mistakes, such as unclear meanings of italics and special font sizes (e.g., L83, L86, L93, L310-312, L365, L366), using abbreviations before their definitions (e.g., L16, L87), mistakes in figures (e.g., the legend of Fig. 2: HYDAT is not a GIS dataset; the y-ticks of Fig. 7 are missing; Dim1 explains 27.7% in L318 but 28% in Fig. 6), and typos and grammatical errors. It may be beneficial to have someone do a thorough proofreading to address these issues.
- It is suggested that several statements should provide supporting references (e.g., L155, L225 (n/p<40), L469-471).
Reference
El Adlouni, S., Bobée, B., and Ouarda, T. B. M. J.: On the tails of extreme event distributions in hydrology, J. Hydrol., 355, 16–33, https://doi.org/10.1016/j.jhydrol.2008.02.011, 2008.
Merz, R. and Blöschl, G.: Process controls on the statistical flood moments - a data based analysis, Hydrol. Process., 23, 675–696, https://doi.org/10.1002/hyp, 2009.
Papalexiou, S. M., Koutsoyiannis, D., and Makropoulos, C.: How extreme is extreme? An assessment of daily rainfall distribution tails, Hydrol. Earth Syst. Sci., 17, 851–862, https://doi.org/10.5194/hess-17-851-2013, 2013.
Sassolas-Serrayet, T., Cattin, R., and Ferry, M.: The shape of watersheds, Nat. Commun., 9, 1–8, https://doi.org/10.1038/s41467-018-06210-4, 2018.
-
AC1: 'Reply on RC1', Pamela Tetford, 06 Jun 2023
We thank Referee 1 for their valuable insights, and we appreciate the feedback regarding our research paper. We would welcome the opportunity to provide a revised version that takes into account many of their helpful suggestions. Based on Referee 1, we feel this research paper is revisable with elements elevated to integrate the changing importance of other variables as flood frequency changes. We agree with Referee 1 that some sections of the current manuscript require some reorganization to enhance reader understanding and maximize the reproducibility of our approach.
With regard to the first major comment from Referee 1, a revised version of our manuscript would move all procedural details from the results section to the methods section (e.g., L218-226). Additionally, all details related to input variables (Section 4.3.1) would be moved from the results section to a section dedicated to Data Input.
Regarding the second major comment of Referee 1, systematic diagnostics were applied to justify the elimination of variables during backward elimination, however, we agree with Referee 1 that a clearer, systematic procedure should be presented. A revised version of the manuscript would include a diagram clarifying the methodology (i.e., t-tests, examination of standardized residuals, added variable plots, variance inflation factors, and marginal model plots, F-tests). A revised version of our manuscript would also include a leave-one-out cross validation that tests model performance. Our leave-one-out cross validation supports the findings of our research, reporting a lower RMSE and MAE for the 2 variable model compared to the single variable model for all flood quantiles tested.
Regarding the minor comments of Referee 1, a revised manuscript would include minor editorial modifications to provide more clarity and address their concerns. We thank Referee 1 for the additional references regarding basin shape, basin size, and statistical distributions. Addition references would be included in a revised manuscript to address L327-331, PCA interpretation, L227-233, L428-436, and L155 (concerns 3, 5, 7, and 14). We agree there is considerable uncertainty in estimating extreme quantiles in the study region due to limited record length of many gauges in the study region.
Regarding minor comment 4, it was the goal of our research to improve upon the predictive strength of the drainage area to discharge relationship in a heterogeneous landscape. The predictive drainage area approach to estimating channel discharge is straightforward and widely used by the geomorphic and earth science community. Our research investigated variables that may improve upon the predictive strength of this relationship. While catchment area is expectedly the leading attribute for estimating channel discharge, our research determined that including a second variable, the percentage of vegetated (naturalized) upstream area, can improve the prediction of channel discharge by approximately 5% across a broad range of flood frequencies. We feel this information would be of interest to members of the geomorphic earth science community, particularly those interested in estimating discharge in a heterogeneous landscape when greater precision is required.
To address minor concerns regarding the title and the glacially conditioned landscape (concerns 8, 9, and 13) a revised manuscript would also summarize 3-variable and 4-variable regression results to emphasize the importance of different variables as flood frequency shifts from high frequency, low magnitude flood events to more extreme, low frequency overbank events.
Citation: https://doi.org/10.5194/hess-2022-411-AC1
- The methods section of the current manuscript is somewhat scattered, with several critical details not being explicitly stated and only being supplemented in the results section. As a result, the reader may find it challenging to understand the entire procedures used in this study and would require a considerable amount of effort to piece together all the information, while some crucial details may still be missing. I have listed some examples below, but there may be more. Addressing these concerns would aid in maximizing the reproducibility of this method.
-
RC2: 'Comment on hess-2022-411', Anonymous Referee #2, 18 May 2023
This paper presents a Regional Flood Frequency Analysis (RFFA) applied to 207 gauging stations in the Great Lake region relative to Ontario, Canada, and proposes a 2-variable relationship to be applied for estimating flood quantiles (with return periods ranging from 1.25 to 100) at ungauged catchments. The paper is well written and the topics of ungauged catchments and glacial environments are both very interesting. However, I don't think this paper could be published in HESS for the following main reasons:
1) In my opinion, the novelty aspect is not strong enough for a publication in HESS. The methods used are not new, and the results are expected : the catchment area is the main explanatory factor, and all other catchment attributes are far behind. However, with some improvements, this paper could be resubmitted to other journals in Geography (for the GIS treatment aspects) or in Hydrology (but as a regional study focusing on glacially influenced catchments).
2) Some important references in RFFA are missing. Typically, RFFA (which consists of transferring information available at different gauges to a target station) is used to achieve two different objectives. The first is to improve the estimation of flood quantiles at the target station for high return periods. The second is to estimate quantiles when no discharge data are available (ungauged target station). Neither of these issues are addressed in this paper. High return period quantiles (up to 100 years) are estimated locally (sometimes with only a few years of observation), and the results of the models obtained are not tested for "ungauged" cases. To do this, a leave-one-out (LOO) cross-validation methodology would have been required.
Minor comments :
L160 : This statement is not true. I don't think Cunnane said it. There is a lot of literature explaining the interest of partial duration series (or peaks over threshold).
L171 : a large error is associated with a 100-year quantile.
L186 : your 28 attributes should be presented in a data section at the beginning of the paper (not in table 3, in the results section).
L188 : I did not understand these "sub-basin" characteristics.
L218-227 : This is not a result. It should be moved to the methods part.
L278-288 : this is not a result. This should be moved to the data part (even : figure 5)
L300-305 : this is not a result. This should be moved to the data part (also table 3).
L350-360 : please show the results that lead to this conclusion.
L469-471 : please show the results that lead to this conclusion
Citation: https://doi.org/10.5194/hess-2022-411-RC2 -
AC2: 'Reply on RC2', Pamela Tetford, 06 Jun 2023
We thank Referee 2 for their valuable and important feedback regarding our research paper and we would appreciate the opportunity to provide a revised version that takes into account many of their constructive suggestions.
Regarding the first main comment from Referee 2, we disagree that the findings of our paper are not strong enough for a publication in HESS. The drainage area to discharge relationship is a widely used approach for estimating channel discharge that is frequently applied by the geomorphic and earth science community as well as watershed planners. Our findings highlight the central importance of considering the appropriate flood frequency distribution and channel forming discharges (not considered by hydrologists). It was the goal of our research to investigate variables in a heterogeneous landscape that may improve upon the predictive strength of this relationship. While catchment area is expectedly the leading attribute for estimating channel discharge, our research determined that including a second variable, the percentage of vegetated (naturalized) upstream area, can improve the prediction of channel discharge across a range of flood frequencies. While the addition may seem small at approximately 5%, we strongly believe that land use is not given appropriate consideration by the geomorphic earth science community, particularly those interested in estimating discharge in a heterogeneous landscape when greater precision is required.
Regarding the second main comment from Referee 2, a revised version of our manuscript would include a leave-one-out cross validation that tests for “ungauged” cases. Our leave-one-out cross validation supports the findings of our research, reporting a lower RMSE and MAE for the 2 variable model compared to the single variable model for all flood quantiles tested.
To address the minor comments of Referee 2, a revised manuscript would include minor editorial modifications to provide more clarity and address their concerns. A summary of the 19-variable regression results would be included to support L350-360. Additional summaries of the 3-variable/4-variable regression would also be included to support L469-471 and emphasize the importance of different variables as flood frequency shifts from high frequency, low magnitude flood events to more extreme, low frequency overbank events. A revised manuscript would also move L278-288 and L300-305 (Figure 5 and Table 3) from the Results section to a dedicated Data Input section to enhance reader understanding and maximize the reproducibility of our approach.
Citation: https://doi.org/10.5194/hess-2022-411-AC2
-
AC2: 'Reply on RC2', Pamela Tetford, 06 Jun 2023
Pamela Elizabeth Tetford and Joseph Robert Desloges
Pamela Elizabeth Tetford and Joseph Robert Desloges
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
296 | 82 | 16 | 394 | 3 | 3 |
- HTML: 296
- PDF: 82
- XML: 16
- Total: 394
- BibTeX: 3
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1