Articles | Volume 27, issue 21
Research article
09 Nov 2023
Research article |  | 09 Nov 2023

Modelling flood frequency and magnitude in a glacially conditioned, heterogeneous landscape: testing the importance of land cover and land use

Pamela E. Tetford and Joseph R. Desloges

A reliable flood frequency analysis (FFA) requires selection of an appropriate statistical distribution to model historical streamflow data and, where streamflow data are not available (ungauged sites), a regression-based regional flood frequency analysis (RFFA) often correlates well with downstream channel discharge to drainage area relations. However, the predictive strength of the accepted RFFA relies on an assumption of homogeneous watershed conditions. For glacially conditioned fluvial systems, inherited glacial landforms, sediments, and variable land use can alter flow paths and modify flow regimes. This study compares a multivariate RFFA that considers 28 explanatory variables to characterize variable watershed conditions (i.e., surficial geology, climate, topography, and land use) to an accepted power-law relationship between discharge and drainage area. Archived gauge data from southern Ontario, Canada, are used to test these ideas. Mathematical goodness-of-fit criteria best estimate flood discharge for a broad range of flood recurrence intervals, i.e., 1.25, 2, 5, 10, 25, 50, and 100 years. The log-normal, Gumbel, log-Pearson type III, and generalized extreme value distributions are found most appropriate in 42.5 %, 31.9 %, 21.7 %, and 3.9 % of cases, respectively, suggesting that systematic model selection criteria are required for FFA in heterogeneous landscapes. Multivariate regression of estimated flood quantiles with backward elimination of explanatory variables using principal component and discriminant analyses reveal that precipitation provides a greater predictive relationship for more frequent flood events, whereas surficial geology demonstrates more predictive ability for high-magnitude, less-frequent flood events. In this study, all seven flood quantiles identify a statistically significant two-predictor model that incorporates upstream drainage area and the percentage of naturalized landscape with 5 % improvement in predictive power over the commonly used single-variable drainage area model (p<2.2×10-16). Leave-one-out model testing and an analysis of variance (ANOVA) further support the parsimonious two-predictor model when estimating flood discharge in this low-relief landscape with pronounced glacial legacy effects and heterogeneous land use.

1 Introduction

A reliable assessment of flood frequency and flood magnitude over space and time is critical for urban planning and infrastructure engineering that depends on flood probability (Basso et al., 2016). Flood magnitude, frequency, and duration are primary drivers of channel erosion and stream morphology (Taniguchi and Biggs, 2015) as a self-shaping alluvial channel entrains and transports sediment to adjust its dimensions, planform pattern, bed characteristics, and gradient in response to varying flow levels (Church and Ferguson, 2015). So reliable estimates of flood frequency are important for understanding geomorphic channel change.

A regional flood frequency analysis (RFFA) can be very important in determining the probability of extreme flood events where streamflow data are not readily available (Ahn and Palmer, 2016) by transferring observed hydrological information from a group of gauged sites to comparative ungauged sites as a representation of flow statistics using hydrological variables (Odry and Arnaud, 2017). A common approach to RFFA consolidates data samples from many measuring sites and uses ordinary least-squares (OLS) regression to identify a relationship between mean annual floods of multiple basins and some basin characteristic (e.g., drainage area). As the source area for channel discharge, drainage area is a widely used proxy for channel discharge (Galster et al., 2006; Knighton, 1999). It has become an accepted practice to model discharge using a single-variable power-law relationship between discharge (Q) and drainage area (Ad) of the form

(1) Q = α A d β ,

where Ad is the upstream drainage area and the coefficient α and exponent β are empirically derived by statistical regression (Dunne and Leopold, 1978; Knighton, 1999; Phillips and Desloges, 2014). This scaling relationship can be rewritten as

(2) log Q = log α + β log A d .

The reliability of this single-variable predictive relationship, however, relies on the relative regional homogeneity of the landscape, with similar basin conditions and climate (Ahn and Palmer, 2016; Hosking and Wallis, 1993; Phillips and Desloges, 2014).

To estimate how often a specified flood event (or channel discharge) will occur, flood frequency analysis (FFA) is widely used (Farooq et al., 2018). Most often, an FFA uses the occurrence of extreme flood events to estimate the return period, T, of flood quantiles, Q(T), using a fixed probability model based on long-term, historical flow data from a gauge station (Di Baldassarre et al., 2009). This probabilistic approach “fits” the site-specific data to a statistical distribution to predict the likelihood of future flood events. To provide flexibility of fit, statistical probability distributions require two to four parameters (Zhang et al., 2020). The choice of the probabilistic model that best represents the observed data and the estimation of a distribution's parameters affects the reliability of flood prediction (Cunnane, 1973; Farooq et al., 2018; Laio et al., 2009). Poor model application and fit can lead to unreliable estimates (Basso et al., 2016). The generalized extreme value (GEV) distribution, Gumbel maximum or extreme value type I (EV1) distribution, log-normal (LN) distribution, and log-Pearson type III (LP3) distribution have traditionally been recommended to characterize flood probability based on goodness of fit (Onen and Bagatur, 2017; Laio et al., 2009). The LP3 and GEV distributions use three parameters, i.e., location, scale, and shape, and the EV1 and LN distributions use two parameters, i.e., location and scale, to fit data distributions (see Appendix A). In Canada, it is recommended that FFA studies draw from the Normal, GEV, and Pearson distribution families. Distribution fitting with more than three parameters is not recommended due to the limited record lengths of Canadian gauge stations (Natural Resources Canada, 2019). For regions with diverse flood characteristics, multiple distributions may apply for different catchments requiring site-specific selections (Zhang et al., 2020). Since 1967, Guidelines for Determining Flood Flow Frequency, Bulletin 17C of the US Geological Survey (USGS), recommends the use of the LP3 distribution as an appropriate statistical distribution to characterize the probabilities of annual flood series (USGS, 2019), but the recent HEC-SSP statistical software package (version 2.2) includes the ability to perform two goodness-of-fit tests for up to 19 statistical distributions (Hydrologic Engineering Center, 2019). Recent research indicates that estimation of flood frequency and magnitude improves with the application of a systematic and objective model selection criteria when fitting observed flow data to a statistical probabilistic curve (Di Baldassarre et al., 2009).

Research suggests that the spatial variability of basin attributes (i.e., topographic relief, climate, vegetation, and land use) and sub-surface characteristics which influence hydrological and fluvial function are controlling factors of a fluvial system's drainage efficiency and are relevant to the flow response in a catchment (Di Lazzaro et al., 2015; Fryirs and Brierley, 2012; Galster et al., 2006; Oudin et al., 2008). Additionally, landscape modifications that decrease infiltration will impose changes to river hydrology (Ashmore, 2015; Ghunowa et al., 2021; Taniguchi and Biggs, 2015; Winter, 2001) with a downstream cascading effect on flow regime (Royall, 2013). Human occupation, landscape manipulation, and the generation of impervious surfaces associated with urbanization have the most profound impact on hydrogeomorphic responses, particularly in smaller watersheds (Pasternack, 2013; Royall, 2013). Moreover, a fluvial system's response to human-induced land use change (or its sensitivity to change) will vary, depending on basin attributes (i.e., configuration, geomorphology, and sediment retention) (Royall, 2013). For this reason, the spatial heterogeneity across a landscape will likely produce a variation in flood response that may best be captured using a multivariate RFFA approach that considers parameterization of relevant basin characteristics (i.e., topographic relief, land use, vegetation, and sub-surface geology) as a set of explanatory variables to estimate flood discharge (Ahn and Palmer, 2016).

Recent works have highlighted the impact of geomorphic spatial heterogeneity on the basin hydrological response (Ahn and Palmer, 2016; Di Lazzaro et al., 2015; Taniguchi and Biggs, 2015). However, many rapid geomorphic studies have relied on just catchment area as the leading attribute for estimating channel forming discharge (Ashmore et al., 2023). This study seeks to explore additional explanatory hydrological and land use controls that improve the predictive strength of this relationship in a heterogeneous landscape. This multivariate approach uses exploratory statistical analysis to better understand the link between intra-catchment variability and hydrological function. This study explores the following:

  1. An FFA is completed to model reliable estimations of discharge for a broad range of flood recurrence intervals (i.e., Q1.25, Q2, Q5, Q10, Q25, Q50, and Q100). Model selection is determined by applying systematic and objective model selection criteria to optimize model fit to long term site-specific flow data (>10 years). A test sample of 207 individual gauge sites within a glacially conditioned regional setting is used.

  2. The widely used single-variable RFFA (Eq. 1) is derived to characterize the relationship between discharge (Q) and site-specific drainage area (Ad) for the test region using optimized estimates of a broad range of flood quantiles to test the predictive power of a single hydrological variable in a spatially heterogeneous landscape.

  3. A multivariate, regression-based RFFA is presented that considers the spatially variability of hydrological controls in the context of inherited glacial landforms, sediments, and land use. To achieve this goal, 28 predictor variables are explored representing basin characteristics (i.e., topographic relief, climate, land use, vegetation, and sub-surface geology). A backward elimination approach is employed (i.e., discriminant and principal component analyses, and regression diagnostics) to identify the most parsimonious discharge models for recurrence intervals of 1.25, 2, 5, 10, 25, 50, and 100 years.

  4. The predictive power of a multivariate derived RFFA that considers multiple basin hydrological controls is compared with a generally accepted single-variable RFFA in a spatially heterogeneous setting.

2 Regional setting

This flood frequency study focuses on a test region of peninsular southern Ontario, Canada (Fig. 1), that is bounded by the Canadian Shield to the north, the three lower Great Lakes – Huron, Erie, and Ontario – to the southwest, and the Ottawa and St. Lawrence rivers to the east. Located within the North American Great Lakes watershed, it is a region of modest relief, with elevation ranging from 544 m a.s.l. near Lake Huron draining by way of the St. Lawrence River lowlands at less than 70 m to the Atlantic Ocean (Larson and Schaetzl, 2001). Convective, synoptic, and tropical systems that influence the humid, continental climate of the region are enhanced by local, regional, and topographic conditions (Paixao et al., 2011). Moisture and temperature associated with the Great Lakes influence inland precipitation for up to 50 km. Consequently, the mean annual precipitation varies regionally from 800 to 1200 mm (Paixao et al., 2011). During winter months, precipitation typically accumulates in the form of snow, generating spring snowmelt floods that dominate river flow regimes (Javelle et al., 2003). The surficial geology of the region, and the hydrological controls exerted by the parent materials, are the product of the region's glacial history (Chapman and Putman, 1984). Recurring continental glaciations over the past ∼2 million years have topographically influenced the fluvial drainage networks of southern Ontario (Desloges et al., 2020; Fulton et al., 1986).

Deglaciation, approximately 12–13 000 years ago, has left pronounced glacial legacy effects with complex sequences of sub-glacial, ice-contact, and proglacial sediments deposited during the final retreat of the Laurentide Ice Sheet (Larson and Schaetzl, 2001; Phillips and Desloges, 2014, 2015). The most common physiographic features include sheets of till, finer glaciolacustrine plains of sand or clay, glaciofluvial outwash deposits of sand, gravel, silts and clays, and a configuration of moraines (Thayer et al., 2016). Two significant post-glacial geomorphic features are the Niagara Escarpment and the Oak Ridges Moraine (Fig. 1). The Niagara Escarpment is a Paleozoic limestone bedrock ridge resulting from differential glacial erosion and weathering of harder and softer rock that arches from the region between lakes Ontario and Erie, bypassing Lake Ontario and extending northward to Georgian Bay (Chapman and Putman, 1984; Phillips and Desloges, 2014).

Figure 1Map identifying the study area (indicated in orange) and two significant post-glacial geomorphic features that influence drainage networks of southern Ontario (i.e., the Niagara Escarpment and the Oak Ridges Moraine). The inset map (upper right) indicates the study region within the Ontario portion of the Laurentian Great Lakes catchment relative to Canada.

Several pre-glacial rivers have carved deep valleys into the Niagara Escarpment; however, Late Pleistocene glaciations have infilled these valleys with varying thicknesses of till (Chapman and Putman, 1984) directing catchment flow mostly away from the escarpment crest. The Oak Ridges Moraine is a stratified kame moraine of glacial drift that extends from the Niagara Escarpment 160 km eastward across south-central Ontario (Phillips and Desloges, 2014). This massive ridge forms a drainage divide, separating catchments flowing north to Georgian Bay/Lake Huron and south to Lake Ontario. Glacial sediments typically blanket the study area at a thickness of 50 m, and up to 350 m in some places (Larson and Schaetzl, 2001). In many areas, where stratified limestones and shales of the Palaeozoic age lie beneath the thick glacial overburden, fertile soils rich in calcium carbonate and clay are produced (Desloges et al., 2020; Phillips and Desloges, 2014, 2015). These fertile soils support southern Ontario's widespread agricultural development (Donnan, 2008).

More recent European settlement and regional expansion have resulted in differentiated land use with extensive agricultural land, natural and reforested areas, and clustered urban settlement (Chapman and Putman, 1984). The southern Ontario region continues to accommodate an increasing population. Drawn by employment, most people settle in built-up cities and surrounding areas, driving clustered regional urbanization that consumes surrounding rural lands. However, a comparable demand to expand the total area of cropland has also occurred to support larger farming operations (Donnan, 2008).

3 Methods and data collection

An overview of the methodology for this study is provided in Fig. 2.

Figure 2Flowchart of FFA and comparison of a multivariate RFFA to a single-variable RFFA that uses a discharge to drainage area relationship. The regression-based multivariate RFFA employs sub-basin characterization and backward elimination of explanatory variables to determine the most parsimonious model to predict discharge over seven flood quantiles.


3.1 Estimation of flood frequency

A Station Meta Data Index for 1188 Ontario georeferenced stream gauges from the HYDAT database of the Water Survey of Canada (WSC) monitoring program is accessible online at (last access: 11 August 2019) (ECCC, 2019). The quality of gauge data depends on the selected measurement techniques, computation methods, and physical conditions at the monitoring sites (i.e., ice and other influences). However, the WSC performs regular audits of field activities and adheres to standard operating procedures to improve data quality (ECCC, 2019). Gauge locations are sorted by catchment and synthesized to identify gauges specific to the southern Ontario region.

Retention of station data is based on three criteria: (1) the gauge station lies within the peninsular region of southern Ontario, (2) the gauge station exists for a fluvial system with known field survey data (i.e., Annable 1995, 1996; Phillips, 2014), and (3) streamflow data represent a minimum of 10 years of continuous (non-seasonal) year-round operation. These criteria yield 207 gauge stations from the HYDAT database with a minimum operation period of 10 years, an average of 42.5 years (±1.7 years, median = 42 years), and a maximum operation period of 106 years. Two approaches are commonly used for FFA: the annual maxima series (AMS) and the partial duration series (PDS). The AMS approach uses the highest annual discharge from the recorded mean daily discharge values at a gauge, ensuring statistical independence of observations between years (USGS, 2019). Although some research suggests that the instantaneous maximum discharge may command greater geomorphic significance (Phillips and Desloges, 2014), the mean daily discharges provide a larger dataset with fewer gaps in the discharge records. Alternatively, the PDS approach (or peak-over-threshold) uses floods that exceed some base threshold discharge (q0), regardless of the time distribution (USGS, 2019). The AMS approach has been shown to be more efficient than the PDS approach for floods Q(T) when T>10 (Cunnane, 1973). This study uses the AMS of mean daily discharge (m3 s−1) for flood recurrence computations at each gauge station. The MSClaio2008 R function, part of the package nsRFA in R, is used to compare the LP3, EV1, GEV, and LN distributions to the annual maximum discharge data. No prior processing is implemented to fit the distributions. Each flood dataset is fit to each of the four candidate models (i.e., GEV, EV1, LN, and LP3) in the form of probability distributions with parameters estimated using the maximum likelihood method. Model selection criteria, including the Akaike information criterion (AIC), the Bayesian information criterion (BIC), the Anderson–Darling criterion (ADC), and the corrected Akaike information criterion (AICc) where the sample size, n, is small with respect to the number of estimated parameters, p, such that n/p<40, are separately applied to each candidate model to evaluate the model which most closely fits the flood data, similar to the methods of others (Di Baldassarre et al., 2009; Farooq et al., 2018; Laio et al., 2009). These model selection criteria are shown to provide good operational strategy when applied to frequency analysis of hydrological extremes by enabling a systematic and objective mathematical test of model fit (Laio et al., 2009). For each flood dataset, the MSClaio2008 function returns the distribution that is most often selected by the selection criteria. The selected optimal distribution for each gauge dataset is used to model flood recurrence using the cumulative probability. Flood quantiles for seven recurrence intervals (RIs) of 1.25, 2, 5, 10, 25, 50, and 100 years (i.e., Q1.25, Q2, Q5, Q10, Q25, Q50, and Q100) are derived directly from individual gauge data and, therefore, reflect the upstream conditions of the corresponding drainage basin.

3.2 Selection of explanatory variables

3.2.1 Gauge station drainage area

Catchment basins of the study area are delineated based on the hierarchical framework of the Ontario Watershed Boundaries, published by the Ontario Ministry of Natural Resources and Forestry (OMNRF, 2020). The digital geospatial datasets are accurate to within 100 m and accessed online from (last access: 26 April 2021). Basins are first identified according to tertiary level watersheds.

The site-specific drainage area for each gauge station is evaluated based on Ontario's hydrologically enforced provincial digital elevation model (DEM; version 2.0.0) of the Ontario Ministry of Natural Resources (OMNR, 2005) following Phillips and Desloges (2014). Hydrological enforcement ensures that drainage occurs in a down-slope direction, facilitating the construction of a flow accumulation raster necessary to establish the upstream drainage area of each gauge station.

3.2.2 Sub-basin attributes

To characterize the upstream hydrological, geomorphic, and land use conditions affecting channel discharge, the 16 tertiary level catchments of southern Ontario are subdivided into 45 sub-basins demarcated by quaternary level boundaries. Georeferenced gauge stations are clustered within sub-basin units to best represent the immediate upstream hydrological conditions. Sub-basin attributes are selected to characterize the drainage area conditions for each gauge station. Using digital cell counts and zonal statistics from multiple sources, sub-basin characteristic variables are extracted from four geospatial raster datasets to represent topography, land use, precipitation, and hydrological properties from a geomorphic perspective:

  • a.

    Ontario's provincial DEM (version 2.0.0), a hydrologically enforced tiled raster dataset with a cell resolution of 10 m and vertical accuracy of 5 m

  • b.

    the southern Ontario Land Resource Information System (SOLRIS; version 3.0), accessed online at (last access: 27 January 2020), a comprehensive, digital landscape level inventory published by the OMNRF(2019) that identifies urban, rural, and natural features at a 15 m resolution derived from Landsat-8 OLI imagery acquired from 2014 to 2017

  • c.

    the Canadian Climate Normals 1981–2010, accessed at (last access: 10 August 2020), which are commonly used to assess regional climate and adhere to the accepted standards of the World Meteorological Organization which recommends 30-year records to eliminate year to year variation (ECCC, 2020)

  • d.

    the revised Surficial Geology of Southern Ontario (MRD 128 – Revised), accessed online at (last access: 3 February 2020) which provides a seamless, standardized map of the geology, primary material, genesis, and formation coverages for southern Ontario (OGS, 2010).

For each predictor variable (i.e., attribute), a single output value is produced and applied to the gauge(s) within a sub-basin. Since many sub-basins contain more than one gauge station, some gauges share the same topographic, land use, climate, and geomorphic values but possess their own unique drainage area value. All mapping and spatial analyses use a combination of standard GIS software and spreadsheet algorithms. Maps are projected to the Universal Transverse Mercator (UTM, Zone 17N), referenced to the North American Datum of 1983.

3.3 Comparison of single-variable RFFA to multivariate RFFA

For each of the seven flood quantiles, a single-variable relationship (Eq. 1) between discharge and drainage area is obtained by statistical regression. Multivariate relationships between the explanatory variables and each of the quantile discharge datasets are assessed by applying OLS regression. OLS assumes that the set of explanatory variables (i.e., basin characteristics) and errors must be independent to avoid bias. When characterizing natural systems, the potential exists for some variables to correlate with other variables due to their representation of related natural phenomena, often indicated by high correlations between variables suggesting a duplication of information captured (Ahn and Palmer, 2016; Phillips and Desloges, 2015). To identify the most parsimonious discharge model for RIs of 1.25, 2, 5, 10, 25, 50, and 100 years, regression models are developed using a backward elimination strategy (Fig. 3):

  1. Discriminant analysis, similar to that of Ahn and Palmer (2016), tests for variable independence and identifies highly correlated variables. A principal components analysis (PCA) explores the most important influences on channel discharge. PCA has been shown to be an effective tool for variable reduction that provides a statistical basis to discard redundant variables (King and Jackson, 1999). A simple Pearson correlation and a Spearman correlation are applied to all predictor variables (criteria |r|0.6> and |ρ|>0.6). Preference is granted to predictors with a stronger theoretical association to channel discharge. Similarly, Ahn and Palmer (2016) apply a criterion of |r|>0.7 to eliminate strongly correlated variables.

  2. An iterative process of multivariate regression diagnostics is applied, following others (Roman et al., 2012; Sheather and Oostrom, 2009), to remove variables that demonstrate little or no predictive power.

  3. Models are evaluated for performance using an analysis of variance (ANOVA) that compares the residual sum of squares of the multivariate models to the single-variate models, and leave-one-out cross validation (LOOCV) assesses the predictive capabilities of the models in practice.

Figure 3Flowchart of discriminant analysis and backward elimination strategy employing tests for variable independence, multivariate regression diagnostics, and model evaluation.


4 Regression model inputs

4.1 Flood quantiles as dependent variables

The model selection criteria determine that 42.5 % of the 207 hydrometric gauge records are most suited to an LN distribution, 31.9 % to an EV1 distribution, 21.7 % to an LP3 distribution, and 3.9 % to a GEV distribution (Fig. 4). Goodness-of-fit tests suggest that all four distributions are potentially suitable for modelling flood extremes from gauges in southern Ontario. For 74.4 % of the gauge records tested, the selection criteria chose a two-parameter model (i.e., LN or EV1) over a three-parameter model (i.e., LP3 or GEV). The two-parameter EV1 model is found to be five times more likely to be selected as the optimal distribution over its three-parameter parent model, GEV. The GEV distribution is only selected in a limited number of cases. In general, there is no single “best fit” distribution type indicated based on geographic location within a sub-basin unit (Fig. 4).

Figure 4Map identifying the geographic locations of 207 gauge stations and the optimal statistical distribution selected to model the AMS data. Sub-divisions of the tertiary level watershed boundaries are indicated. No sub-basin indicates a single “best fit” distribution type.

The optimal probability distribution curve is used to estimate the flood quantiles for RIs of 1.25, 2, 5, 10, 25, 50, and 100 years for each of the 207 gauge stations. These flood quantiles are consistent with return periods explored in other flood frequency analyses (Ahn and Palmer, 2016; Basso et al., 2016; Hollis, 1975; Onen and Bagatur, 2017). A Shapiro–Wilk analysis tests the null hypothesis that the flood quantile datasets are normally distributed (Table 1). The dataset for each flood quantile does not meet the assumption of normality (p<0.05) and the null hypothesis is rejected. A logarithmic transformation is applied to all flood quantile values. A subsequent Shapiro–Wilk test of the log-transformed flood quantile datasets fails to reject the null hypothesis (p>0.05), suggesting that the log-transformed flood quantiles are normally distributed.

Table 1Results of Shapiro–Wilks normality tests for each flood quantile, with and without logarithmic transformation of data.

Download Print Version | Download XLSX

4.2 Attributes as predictor variables

Twenty-eight attributes are selected (Table 2) to characterize the drainage area conditions representing the topography, precipitation, land use, and hydrological properties from a geomorphological perspective. The drainage area conditions influence channel discharge (across all seven flood frequency quantiles) in terms of the regional geomorphic, hydrological, topographic, and land use properties within each sub-basin.

Table 2Twenty-eight variables representing geomorphic, hydrological, land use, and topographic variability between sub-basins.

Download Print Version | Download XLSX

The upstream drainage area for each georeferenced gauge station is extracted from the hydrologically enforced DEM. A logarithmic transformation is applied to the drainage area variable values to ensure normality (W=0.994; p= 0.522). The geomorphic sub-basin attributes are represented by the percentage of the dominant surficial material within the geographic area of each sub-basin: glaciolacustrine clay, glacial till, glaciofluvial/glaciolacustrine gravel, wetland, Paleozoic or Precambrian bedrock, glaciofluvial/fluvial/glaciolacustrine sand, and glaciolacustrine/fluvial silt (Fig. 5a). The hydrological conditions are characterized by an interpolation of Canadian climate normals (Fig. 5b).

Figure 5Raster datasets illustrating (a) the dominant surficial material, (b) the mean annual precipitation, (c) the hydrologically enforced DEM, and (d) the categorical land use for the peninsular region of southern Ontario.

Point information for mean annual precipitation, annual number of precipitation days, mean annual rainfall, and annual number of rainfall days from 65 observation stations is converted to raster coverage using several interpolation techniques. Inverse distance weighting (IDW) and ordinary kriging (OK) using a stable model and an exponential model are compared. OK has been shown to produce accurate results when used to describe spatially heterogeneous natural phenomena (Bevan and Conolly, 2009) such as precipitation. Cross validation suggests fitting an OK exponential model for annual mean precipitation, annual mean rainfall, and the annual number of rainfall days, and an OK stable model for the annual number of precipitation days. The topographic conditions of the sub-basins are extracted and quantified from the hydrologically enforced DEM (Fig. 5c). Percent land use is quantified using the SOLRIS categories for each sub-basin (Fig. 5d). For this study, three land use categories are established: %Urban, %Cropland, and %Naturalized area. %Urban regions combine all transportation and built-up areas. %Cropland is defined by tilled agricultural areas. %Naturalized regions combine all tall-grass land cover, mixed forests, cultivated tree plantations, swamps, wetlands, and open water areas as indicated by the SOLRIS version 3.0 dataset.

5 Results

5.1 Single-variate regression RFFA

Regression of the logDrainage variable against each of the seven flood quantile datasets (i.e., Q1.25, Q2, Q5, Q10, Q25, Q50, and Q100) establishes seven single-variable power relationships (Table 3). Statistically significant relations (p<0.001) for logDrainage area are indicated across all RIs with a minor, but consistent, decrease in adjusted R2 values as RI increases indicating greater uncertainty in prediction as flood magnitude increases. Research indicates that the Q2 flood quantile (bolded in Table 3) represents a flow magnitude and frequency that is important to the maintenance of channel morphology and, therefore, has been used in a discharge–drainage area relation in numerous other studies of the southern Ontario region (Annable et al., 2011; Phillips and Desloges, 2014; Thayer et al., 2016; Vocal Ferencevic and Ashmore, 2012).

Table 3Single-variate RFFA models for each flood quantile. The Q2 flood quantile (bolded) is compared to discharge–drainage area relations of other studies in the southern Ontario region in Fig. 6.

Download Print Version | Download XLSX

Figure 6The single-variable discharge–drainage area relationship for the Q2 flood quantile of this study (solid black line) compared with the findings of others. The Q2 relationship of Annable (1995) is indicated by the dashed black line. The Q2 relationship of Phillips and Desloges (2014) is given by the dotted black line.


Expressing the Q2 results (bolded in Table 3) in a power-law format (Eq. 1), the Q2 model is found to be similar to the findings of other southern Ontario models similarly derived from annual maximum series datasets of the southern Ontario region (Annable, 1995; Phillips and Desloges, 2014). The Q2 power relationship identified in this study indicates a slightly lower estimate of Q2 discharge for smaller drainage areas (<100 km2) compared with the research of others (Fig. 6). For larger drainage areas (>100 km2), this study predicts similar discharge estimates compared with the relationship described by Phillips and Desloges (2014) but greater discharge estimates than those in Annable (1995). Neither of those studies specified best-fit RFFA distributions, so the relationship presented here is considered more robust.

5.2 Multivariate regression RFFA with parameterized basin characteristics

5.2.1 Basin characterization parsimony

A PCA of the 28 explanatory variables produces seven components with eigenvalues greater than 1.00 (eigenvalues for Dim.1 through Dim.7 are 7.85, 5.48, 2.87, 2.25, 1.84, 1.09, and 1.04, respectively) that explain 80.1 % of the total variability of the dataset. This suggests an absence of strong inter-correlations among many of the 28 variables. The latent root criterion (also known as the Kaiser or eigenvalue-one criterion) suggests retaining and interpreting principal components if the eigenvalue is greater than 1.00 (Kaiser, 1960). However, using the point where the first few eigenvalues depart from the more similar lesser eigenvalues (i.e., the broken-stick model) (Jackson, 1993), suggests retaining the first three dimensions which account for almost 58 % of the total variability of the dataset and are the most interpretable. The correlation circles illustrate the projections of the first three principal components (Dim1, Dim2, and Dim3) (Fig. 7). Highly correlated variables project in the same direction. The first principal component (Dim1) tends towards a land use composition grouping with some loading from gradient variables and precipitation variables (Dim1 explains 28.0 % of the variance). The second principal component (Dim2) tends towards an elevation cluster with additional loading from precipitation variables (Dim2 explains 19.6 % of the variance). The third principal component (Dim3) is a weakly defined land surface grouping with loading from surficial geology classifications and basin geometry (Dim3 explains 10.2 % of the variance). While elevation is a clear contributor to the variance of the dataset based on the PCA tests, the directional indicators suggest the presence of multicollinearity among the elevation predictors which is reinforced by the results of correlation detection.

Figure 7Principal component analysis (PCA) highlighting the most contributing variables of the 28-variable dataset for each dimension and illustrating the correlation circles for principal components one, two, and three (Dim1, Dim2, and Dim3).


Simple Pearson correlation tests (|r|>0.6) suggest that 29 correlated relationships exist among 11 basin characteristics including %Diamicton and %Sand, %Cropland and %Naturalized, Gradient_Mean, Gradient_StDev, Stream_Length, WS_Area, Mean_Rainfall, Min_Elevation, Max_Elevation, Elev_Mean, and Elev_StDev. Similarly, Spearman correlation tests (|ρ|>0.6) suggest 31 correlated relationships among the same basin characteristics. PCA and correlation tests support elimination of elevation variables (i.e., Min_Elevation, Max_Elevation, Elev_Mean, Elev_StDev, and Elev_Range) except for Gradient_Mean which is retained as the sole predictor to represent the variability of topography among the sub-basins. Correlations between Gradient_Mean and Gradient_StDev (r=0.795; ρ=0.628) and between Mean_Rainfall and Mean_Precip (r=0.697; ρ=0.755) also result in the elimination of Gradient_StDev and Mean_Rainfall from the potential predictor variables. The Mean_Precip variable is retained as it explains snowmelt floods which dominate the flow regime of southern Ontario rivers (Javelle et al., 2003). Other correlated variables are removed from further analysis due to high correlation: WS_Area versus WS_Compactness (r=0.717; ρ=0.739) and Stream_Length versus WS_Perimeter (r=0.884; ρ=0.943). Subyani et al. (2012) similarly find significant correlation between stream length and basin perimeter. The variable WS_Area is removed in favour of WS_Compactness which is a descriptor of basin shape (Apaydin et al., 2006). Although total catchment size has been shown to have a role in the catchment hydrological response (Merz and Blöschl, 2009), the contributing upstream drainage area is more proportionally relevant to channel discharge at individual gauge stations (Dunne and Leopold, 1978; Prancevic and Kirchner, 2019). Stream_Length is also removed in favour of retaining WS_Perimeter which is a descriptor of basin shape. The retention of WS_Compactness and WS_Perimeter enables greater focus on basin shape which impacts hydrological relationships and the efficiency with which a fluvial system can evacuate precipitation from the region (Apaydin et al., 2006; Fryirs and Brierley, 2012). For example, elongated catchments (WS_Compactness  0.4) have been shown to have slower runoff (Fryirs and Brierley, 2012). Mean_Rainfall is also eliminated due to correlation with multiple variables (i.e., Mean_Precip, %Cropland, Gradient_StDev). By eliminating nine sub-basin characteristics (Min_Elevation, Max_Elevation, Elev_Mean, Elev_StDev, Elev_Range, Gradient_StDev, Mean_Rainfall, WS_Area, and Stream_Length), 27 correlated relationships identified by the correlation tests are removed. Contrasting geomorphic conditions between catchments (i.e., surficial geology) are represented by a negative correlation (r=-0.707; ρ=-0.739 between %Sand or %Diamicton; however, both are initially retained for multivariate modelling to represent the primary substrate of each sub-basin.

Table 4Results from model regression of all 19 predictors indicating predictor variables found to be statistically significant in the 19-variable model. The remaining variables are not found to be statistically significant in 19-variable regressions.

Download Print Version | Download XLSX

5.2.2 Multivariate regression analysis with backward elimination to identify the most parsimonious models

It can be a practice to test and transform independent variables to ensure a normal distribution of a multivariate dataset; however, tests for multivariate normality are rarely performed (Tacq, 2010). Alternatively, the plots of standardized residuals from combinations of predictor variables are examined for a desired elliptically symmetric distribution (Sheather and Oostrom, 2009). To enable model comparison, the gauge drainage area predictor variable included in the multivariate regression is logarithmically transformed, consistent with the single-variable power model.

Multiple linear regression is applied to the remaining 19-variable dataset for each of the seven flood RIs, i.e., 1.25, 2, 5, 10, 25, 50, and 100 years, using an OLS approach. The fitted values of the model are compared to the dependent variables (i.e., Q1.25, Q2, Q5, Q10, Q25, Q50, and Q100) to detect heteroscedasticity. Regression of all 19 predictor variables for each flood quantile reveals statistically significant relationships (p<2.2×10-16) suggesting at least one of the predictor variables is significantly related to the quantile discharge (Table 4). The multiple coefficients of determination are greater than 0.8 (R2>0.8) for all quantiles.

Examination of the associated p values for the t statistic of each predictor variable indicates that, for all seven flood quantiles, logDrainage overwhelmingly contributes to the 19-variable prediction models (>70 %), although its importance decreases as flood frequency decreases. The 19-variable regression (logDrainage included) indicates that either %Naturalized or Gradient_Mean is statistically significant for predicting flood quantiles Q1.25, Q2, Q5, Q10, and Q25. This is consistent with the results of the PCA, which identifies the first principal component as a land use grouping and the second principal component as an elevation grouping. Other potential predictor variables do not indicate statistical significance in the 19-variable models. During backward elimination, added-variable plots (or partial regression plots) indicating low statistical significance are consistent with the t statistic results confirming a lack of significance. Variables indicating no relationship are eliminated. A high variance inflation factor (VIF > 5) confirms %Diamicton is unsuitable as a predictor variable due to multicollinearity. Marginal model plots indicate somewhat linear but weak relationships for %Organic, %Sand, and %Gravel. For both %Clay and %Bedrock, high incidences of zero values produce non-linear relationships suggesting a lack of significance as predictor variables. A high VIF is indicated for %Cropland. High negative correlation (r=-0.775; ρ=-0.780) is also observed between %Cropland and Gradient_Mean resulting in the elimination of %Cropland as a predictor. All five-predictor models demonstrate statistical significance; however, the identified variables are not consistent over all seven flood quantiles (Table 5).

Table 5Regression results of five- and three-predictor models over all flood quantiles tested. Variables with some explanatory contribution in five- and three-predictor models are indicated by an “X”.

Download Print Version | Download XLSX

Table 6The most parsimonious two-variate RFFA models for each flood quantile.

Download Print Version | Download XLSX

The five- and three-predictor models retain surficial material variables (%Organic, %Sand, and %Gravel) and climate variables (Mean_Precip, Precip_Days, and Rainfall_Days). Reducing from five-variable models to three-variable models decreases the adjusted R2 value and increases the F statistic over all flood quantiles. Three-predictor models demonstrate statistical significance; however, the third variable is not consistent over all seven flood frequency quantiles. Results show that including Mean_Precip as a variable increases model fit for flood quantiles Q1.25, Q2, Q5, and Q25, whereas including Rainfall_Days improves the goodness of fit for the Q10 model. Alternatively, the inclusion of %Organic is shown to have some statistical significance (p<0.05) as a third predictor variable for flood quantiles Q50 and Q100. For all seven flood quantiles, i.e., Q1.25, Q2, Q5, Q10, Q25, Q50, and Q100, backward elimination reduces to the same single independent variable, logDrainage. As the most parsimonious model, logDrainage is shown to significantly predict discharge (p<2.2×10-16) confirming the often used single-variable relationship between drainage area and discharge. However, all seven flood quantiles also identify a two-predictor model using the variables logDrainage and %Naturalized where t tests demonstrate that both logDrainage and %Naturalized are statistically significant (p<2.2×10-16) (Table 6).

Figure 8The single-variable and two-variable fitted discharge models are plotted against the best-fit estimated discharge derived from the flood frequency curves for (a) a 1.25-year RI, (b) a 2-year RI, (c) a 5-year RI, (d) a 10-year RI, (e) a 25-year RI, and (f) a 100-year RI. The two-predictor logDrainage and %Naturalized model shows less scatter and a lower standard error than the single logDrainage predictor for all six flood RIs shown. The explanation of variability in flood discharge is improved by nearly 5 % using the two-predictor model for all seven flood quantiles.


Table 7Leave-one-out cross validation (LOOCV) and analysis of variance (ANOVA) comparing the single-variable models to the two-variable models.

Download Print Version | Download XLSX

5.3 Model evaluation

For all seven flood quantile models, the addition of the %Naturalized predictor variable reduces the residual standard error (Res SE) and increases the adjusted coefficient of determination (adj R2). Plots of the fitted single-variable models and the two-variable models, versus gauge derived estimates of discharge, demonstrate less scatter (Fig. 8 illustrates six quantiles). The two-variable model (i.e., logDrainage and %Naturalized) results in an increased adjusted R2 value of approximately 0.05 and a lower standard error suggesting that the addition of the second variable (i.e., %Naturalized) improves the goodness of fit for all seven flood quantile models (six illustrated). The two-predictor combination of variables provides an improved explanation for the variations in discharge by nearly 5 %. Generally, an increase in model scatter is observed for both one-variable and two-variable prediction as the RI increases suggesting increasing uncertainty in prediction of discharge moving from Q1.25 to Q100. This is consistent with the higher variance in discharge observed for large, infrequent flood events within the original gauge datasets. The predictability of larger flood events is limited by both the low frequency with which they occur and the length of the gauge records analyzed (average 42.5 years).

An analysis of variance (ANOVA) (Table 7) comparing the single-variable models to the two-variable models further indicates an improved prediction of discharge using the two-predictor model compared with the one-predictor model. For all seven flood quantiles, a decrease in the sum of squares of residuals (RSS) is observed with the addition of the %Naturalized predictor, and an F statistic (p<0.001) demonstrates very strong evidence in favour of the two-predictor model. These results are supported by leave-one-out cross validation (LOOCV) (Table 7). For all seven flood quantiles, LOOCV demonstrates a reduction in the root mean square error (RMSE), an improvement of the R2 value, and a reduction in the mean absolute error (MAE) for the two-variable model compared with the single-variable model.

6 Discussion

Flood magnitude, frequency, and duration are primary drivers of channel erosion and stream morphology (Taniguchi and Biggs, 2015). High-magnitude, less-frequent floods will undoubtedly result in significant alterations to a channel's morphology and are more important when considering hazards, loss of life and infrastructure damage (Onen and Bagatur, 2017). However, the cumulative effects of more frequent, lower-magnitude floods can also be geomorphically more effective in altering channel form (Church and Ferguson, 2015; Wolman and Gerson, 1978; Wolman and Miller, 1960). Consequently, for effective risk management and hazard prevention, it is useful to model flows of different flood RIs when considering flood frequency as a predictive tool to better understand a river's morphological response to discharge (Basso et al., 2016). The best estimation of extreme flood events, however, is limited by the availability and accuracy of recorded gauge data, the length of the observed flood series, and the presence or absence of extreme flood occurrences within a flow record (Odry and Arnaud, 2017). This analysis uses a broad range of high- and low-frequency flood estimates from long-term historical flow data to develop a reliable RFFA for urban planning and infrastructure engineering. It is common practice to develop an RFFA relating the drainage area of a catchment to channel discharge using a single-variable power-law relationship. Research suggests that physiographic features, such as those inherited by southern Ontario's glacial legacy, and anthropogenic land use, for example, southern Ontario's clustered urbanization and widespread agricultural development, can influence a region's hydrogeomorphic response, particularly in smaller watersheds (Royall, 2013). Seeking to improve upon a widely accepted single-variate RFFA model in a heterogeneous landscape, the objective of this study was to explore a dependable RFFA using a multivariate approach for a region influenced by glacial conditioning and varying land use while also considering the hydrological influences of climate and topography.

In this study, rigorous goodness-of-fit testing of annual maximum mean daily discharge data series from 207 hydrometric gauge stations in a heterogeneous landscape shows that 42.5 % of gauge records are most suited to a two-parameter LN distribution, 31.9 % to a two-parameter EV1 distribution, 21.7 % to a three-parameter LP3 distribution, and 3.9 % to a three-parameter GEV distribution. This suggests that all four distributions are potentially suitable for modelling flood extremes in heterogeneous regions. The model selection criteria favoured a two-parameter model over a three-parameter model in 74.4 % of cases, consistent with other studies which found that selection criteria demonstrate a predisposition towards the most parsimonious model (i.e., fewest distribution parameters) (Farooq et al., 2018; Laio et al., 2009; Onen and Bagatur, 2017). Most notably, the two-parameter EV1 model is optimal five times more frequently than its three-parameter parent model, the GEV distribution, which is only found appropriate for use in 3.9 % of cases. This finding is similar to that of Laio et al. (2009) where the GEV distribution was only selected in a limited number of cases when modelling the annual maxima of peak discharge in 1000 United Kingdom basins. However, the GEV and LP3 distributions are heavier tailed than the LN or EV1 distributions (El Adlouni et al., 2008; Merz et al., 2022; Papalexiou et al., 2013) suggesting the upper tail behaviour of a flood time series may be underestimated when estimating flood frequency from small sample sizes (less than 50) while a single extreme flood event may lead to overestimation of the upper tail (Papalexiou et al., 2013). The average hydrological record in this study was 42.5 years which implies uncertainty in estimating extreme quantiles in the study region due to a limited record length of some gauges, but research has indicated that basins with snowmelt-dominated regimes tend towards lighter tailed distributions (Merz et al., 2022).

Flood estimation will often apply a universal, fixed probabilistic model to historical gauge data (Di Baldassarre et al., 2009). Other southern Ontario studies have employed a blanket LP3 probability distribution to model the Q2 flood frequency (Annable, 1995; Phillips and Desloges, 2014). However, the variation in statistical distributions identified as an optimal fit in this study suggests a need for careful, systematic model selection criteria when fitting observed flow data in regions with variable land use or other hydraulic influences (i.e., geomorphology, substrate materials, climate, or topography). To prevent an over-estimation or, more importantly, an under-estimation of discharge when predicting flood recurrence, model goodness of fit should be evaluated. The results of this study indicate that a two-parameter LN statistical distribution will provide an optimal fit for 43 % of the southern Ontario flood records when a broad range of flood quantiles are being examined.

Other studies have explored a variety of novel regionalization approaches. Di Lazzaro et al. (2015) presented an RFFA using a single-variable parameterization of drainage density. Ahn and Palmer (2016) estimated flood frequency using the GEV distribution and then proposed regionalization methods using a spatial proximity approach. However, regionalization based on spatial proximity assumes that nearby sites are more similar than distal sites (Odry and Arnaud, 2017). In a glacially conditioned landscape, such as the southern Ontario region, the configuration of glacial deposits (Fig. 5a) often forms drainage divides that segregate neighbouring catchments with diverse flood characteristics. This study, therefore, explores regionalization through a multivariate regression-based approach to capture the variability of upstream hydrological controls that are often dependent on the spatial arrangement of post-glacial physiographic features and, in the case of southern Ontario, the variable land use (i.e., regionally clustered urbanization and agricultural development). The mapping of surficial material, climate conditions, topography, and land use illustrates the variability of hydrological influences on the region (Fig. 5). Consistent with the agricultural land use of southern Ontario, analysis reveals a negative correlation between %Cropland and Gradient_Mean. Regions of steep gradient are not typically associated with areas of high agricultural activity, whereas lower gradient regions provide much of the agricultural/cropping activity. Crops are typically cultivated in areas with favourable conditions for growth (i.e., rainfall and gradient) producing collinear relationships with key elevation and precipitation variables relevant to channel discharge. Likewise, the high spatial variability in surficial geology of southern Ontario (due to its glacial conditioning) can be problematic. Contrasting geomorphic conditions between catchments are represented by, for example, high negative correlations among %Diamicton and %Sand, and an absence of surficial material types in many areas (e.g., %Bedrock and %Clay) produces high incidences of zero values and non-linear relationships. Conversely, a measure of natural land use is available across the study region, making a linear relationship between %Naturalized and discharge possible.

During the backward elimination process, different land use, geomorphic, climatic, and topographic variables assume different importance in predicting channel flow depending on the flood magnitude being modelled. The influence of the glacial legacy is captured by the inclusion of surficial materials in the five- and three-variable models (Table 5). The less parsimonious, but still statistically valid, five- and three-predictor models show the importance of land cover/glacial legacy (%Organics, %Sand, and %Gravel) and climate variables (rainfall days). In three-variable models, precipitation (i.e., Mean_Precip or Rainfall_Days) increases model fit for lower magnitude and more frequent flood events (i.e., Q1.25, Q2, Q5, Q10, and Q25), suggesting a greater predictive relationship of channel discharge. In contrast, surficial geology (i.e., %Organic) has more predictive value for high-magnitude, less-frequent flood events (Q50 and Q100). During low-magnitude flood events, it is unsurprising that a fluvial system's hydrological response is more directly related to the amount of rainfall or snowmelt infiltration, whereas during less-frequent, high-magnitude or flash flood events, surface saturation across an increasing area of the watershed is more closely tied to surficial material properties that limit or enhance infiltration, impacting surface runoff. The surficial material %Organic is retained for high-magnitude, low-frequency floods (i.e., Q50 and Q100) in the three-variable models suggesting that the percentage of a basin with highly organic surficial material (e.g., wetlands) can effectively increase infiltration, limiting overland flow and the magnitude of channel discharge during high-magnitude flood events. Surficial material with higher organic content has been shown to significantly increase infiltration capacity and porosity (Luna et al., 2018).

Although the most parsimonious model for estimating discharge is found to be the generally accepted and efficient single-variable relation between discharge and drainage area, when considering model variance, the two-predictor combination of upstream drainage area and the regional percentage of naturalized landscape (%Naturalized) shows a 5 % improvement when explaining variation in flood discharge for all RIs tested (i.e., 1.25, 2, 5, 10, 25, 50, and 100 years). An analysis of variance further indicates a statistically significant improvement in prediction of discharge using the two-predictor model (i.e., logDrainage and %Naturalized) compared with the single-predictor model (i.e., logDrainage). The percentage of naturalized landscape is important because it reflects areas within a catchment that have enhanced water storage compared with urban or agricultural areas. These findings are important for situations when it is necessary to reduce uncertainty in flood prediction. Plots comparing the single- and two-predictor models demonstrate less scatter for all seven flood quantiles. Generally, an increase in model scatter is observed for both one-variable and two-variable prediction as the RI increases suggesting the predictive capability lessens moving from Q1.25 to Q100. This finding is similar to that of Basso et al. (2016) where model performance is better for short and intermediate return intervals. Any flood frequency analysis is limited by the length of the flow records being analyzed. Since the average length of gauge records used in this study is 42.5 years, a decrease in model reliability is anticipated as the non-linear hydrological processes of the region are extrapolated. Despite careful selection of the candidate statistical distributions to “best fit” the observed flow records, the absence of large flood events captured within the sample data can skew the estimation of flood frequency for low-probability, low-frequency events (Odry and Arnaud, 2017).

The findings of this research demonstrate that land use has greater predictive power than surficial geology when coupled with drainage area to estimate channel discharge in a heterogeneous landscape over a broad range of flood quantiles. While the methodology used in this study is transferable to other regions, this finding may also be transferable. However, a new scenario would require recalibration of the drainage area relationship and possibly reclassification of land use types to suit the spatial variation of the new location. Human landscape alterations that impact drainage density will influence rates of overland flow and channel flow, exerting additional influence on hydrological processes and stream response and, subsequently, impacting the magnitude and frequency of peak channel flows (Taniguchi and Biggs, 2015). Changes to land cover, such as deforestation, conversion to cropping, and urbanization, typically decrease infiltration which increases discharge, and alters flood magnitude (Chin et al., 2013; Royall, 2013). It follows that the presence of reforested or natural areas will have a significant influence on modelled discharge. Since the early 1900s, select areas of southern Ontario have been reforested in recognition of wasteful clearing of marginal and submarginal agricultural lands by early settlers (Armson et al., 2001). The %Naturalized variable includes tall-grass land cover, mixed forests, cultivated tree plantations, swamps, wetlands, and open water areas, representing areas of high infiltration or the surface storage of water. The negative coefficient for the percentage of naturalized area reduces the weight of the drainage area input. This is consistent with the theoretical expectation that drainage areas of sub-basins with a high percentage of naturalized areas may be overemphasized without the appropriate correction for surface water storage. Although urbanization has been shown to have the most profound influence on fluvial system response, altering hydrological processes through (a) a decrease in infiltration, (b) an increase in overland flow, and (c) a potential decrease in groundwater recharge (Chin et al., 2013), the regional impact of clustered urban populations of southern Ontario is diluted by the expansive regions of cropland, grazing, and naturalized areas that separate them. Consequently, the %Urban variable shows minimal significance in the multivariate regression. Similarly, %Cropland was shown to be a poor regional predictor for discharge due to a collinear relationship with other predictors. The statistical significance of %Naturalized, however, suggests that the percentage of a sub-basin that is naturalized can be an effective variable to represent temporary surface water storage, limiting the impact to a channel during flood events.

7 Conclusion

To transfer flood discharge information from gauged sites to ungauged sites in a heterogeneous landscape (e.g., a low-relief, glacially conditioned landscape with variable land use), the primary objective of this research was to explore additional explanatory hydrological controls to improve the predictive strength of a well-known regional flood frequency approach that correlates drainage area to discharge. The main conclusions of this analysis are as follows:

  1. When modelling the annual maximum mean daily discharge records for southern Ontario, 42.5 % are most suited to a two-parameter LN distribution, 31.9 % to EV1, and 21.7 % to LP3, and 3.9 % to a GEV distribution suggesting all four distributions tested are potentially suitable for modelling flood extremes in a heterogeneous landscape. The variation of “best fit” probability distributions indicates that systematic model selection criteria is necessary when fitting observed flow data in regions with variable land use or other hydraulic influences (i.e., geomorphology, climate, or topography).

  2. For lower-magnitude, more-frequent flood events (i.e., Q1.25, Q2, Q5, Q10, and Q25), precipitation shows a greater predictive relationship with channel discharge in three- and five-variable models, whereas for high-magnitude, less-frequent flood events surficial geology has more predictive value. For high-magnitude, low-frequency floods (i.e., Q50 and Q100) highly organic surficial material (e.g., wetlands) can effectively increase infiltration, limiting overland flow and the magnitude of channel discharge during high-magnitude flood events.

  3. While land use, geomorphology, material type, climate, and topographic variables are variably important on the flood magnitude being modelled, the results here show the most parsimonious predictor for estimating discharge in ungauged streams is the accepted and efficient single-variable drainage area.

  4. When considering model variance, a two-predictor combination of upstream drainage area and the regional percentage of naturalized landscape shows a statistically significant 5 % improvement when explaining variation in flood discharge for a broad range of recurrence intervals tested (i.e., 1.25, 2, 5, 10, 25, 50, and 100 years). The negative coefficient associated with the percentage of naturalized area serves as a correction to the drainage area relationship to account for surface water storage. This finding is important for situations when it is necessary to reduce uncertainty in flood prediction.

  5. The findings suggest that applying a zonal two-variable model, which accounts for drainage area and the percentage of upstream naturalized land use, serves as a correction for surface water storage when modelling flood magnitude for high- and low-frequency flood events. This improvement is of value in a heterogeneous landscape when considering the geomorphic response of channels to predicted channel discharge for a broad range of flood recurrence intervals and greater precision is required.

Appendix A: Probability distribution functions

The GEV distribution uses a three-parameter probability distribution function such that

(A1) F ( x ) = exp - 1 - ε x - μ σ 1 / ε ε 0 exp - exp - x - μ σ ε = 0 ,

where μ, σ, and ε are the location, scale, and shape parameters of the flow data, respectively. The location parameter describes the shift of a distribution along the horizontal axis, while the scale and shape parameters describe the spread (Zhang et al., 2020). The GEV blends the Gumbel (EV1), Frechet, and Weibull distributions which are nested models within the GEV distribution (Laio et al., 2009). The simplified EV1 distribution uses the GEV function where the shape parameter, ε, is reduced to zero, giving the two-parameter probability distribution function

(A2) F ( x ) = exp - exp - x - μ σ ,

where μ is the location parameter and σ is the scale parameter. Consideration of the three-parameter GEV distribution balances model bias versus model variance. The more complicated three-parameter GEV distribution reduces model bias compared with the two-parameter EV1 distribution; however, as the number of parameters increases, variance typically increases (Laio et al., 2009). The LN distribution is the log-transformed two-parameter normal or Gaussian distribution represented by the probability distribution function

(A3) F ( x ) = 1 σ 2 π exp - 1 2 x - μ σ 2 ,

also applying μ and σ as location and scale parameters, respectively. Similarly, the LP3 distribution is the log-transformed three-parameter gamma or Pearson type III identified by the probability distribution function

(A4) F ( x ) = 1 | σ | Γ ε x - μ σ ε - 1 exp - x - μ σ ,

where μ, σ, and ε are the location, scale, and shape parameters, respectively. Pearson type III and normal distributions are converted to LP3 and LN distributions when the data are log-transformed at the outset (Di Baldassarre et al., 2009).

Data availability

The data that support the findings of this study are available from the corresponding author, Pamela E. Tetford, upon reasonable request.

Author contributions

PET: conceptualization (lead); methodology (lead); investigation (lead); formal analysis (lead); writing – original draft (lead); writing – review and editing (equal). JRD: conceptualization (supporting); methodology (supervision); investigation (supervision); formal analysis (supervision); writing – original draft (supporting); writing – review and editing (equal).

Competing interests

The contact author has declared that neither of the authors has any competing interests.


Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.


This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSER-CGS-D graduate scholarship to Pamela E. Tetford), and research funding from the University of Toronto to Joseph R. Desloges. Many thanks to Esther Bushuev who helped compile data of the WSC flow gauges for analysis. Academic advice and insights are greatly appreciated from Marney Isaac, Carl Mitchell, and Michael Widener. We also thank our anonymous peer reviewers for their insightful comments which improved the manuscript.

Financial support

This research has been supported by the Natural Sciences and Engineering Research Council of Canada (NSERC-CGS-D graduate scholarship to Pamela E. Tetford) and research funding from the University of Toronto to Joseph R. Desloges.

Review statement

This paper was edited by Efrat Morin and reviewed by two anonymous referees.


Ahn, K. H. and Palmer, R.: Regional flood frequency analysis using spatial proximity and basin characteristics: Quantile regression vs. parameter regression technique, J. Hydrol., 540, 515–526,, 2016. 

Annable, W. K.: Morphological relationships of rural water courses in southwestern Ontario for use in natural channel designs (Order No. MM00797), ProQuest Dissertations & Theses Global, Doctoral dissertation, University of Guelph, (last access: 27 June 2022), 1995.  

Annable, W. K.: Database of morphologic characteristics of watercourses in Southern Ontario [Data set], Ontario Ministry of Natural Resources, ISBN 0777851121, 1996. 

Annable, W. K., Lounder, V. G., and Watson, C. C.: Estimating channel-forming discharge in urban watercourses, River Res. Appl., 27, 1535–1459,, 2011. 

Apaydin, H., Ozturk, F., Merdun, H., and Aziz, N. M.: Determination of the drainage basin characteristics using vector GIS, Hydrol. Res., 37, 129–142,, 2006. 

Armson, K. A., Grinnell, W. R., and Robinson, F. C.: History of reforestation in Ontario, in: Regenerating the Canadian Forest: Principles and Practice for Ontario, Ontario Ministry of Natural Resources, edited by: Wagner, R. G. and Colombo, S. J., Fitzhenry & Whiteside Limited, 3–22, ISBN 1550413783, 2001. 

Ashmore, P.: Towards a sociogeomorphology of rivers, Geomorphology, 251, 149–156,, 2015. 

Ashmore, P., McDonald, J., and Barlow, V.: Multi-decadal changes in river morphology in an urbanizing watershed: Highland Creek, Toronto, Canada, Geomorphology, 433, 108710,, 2023. 

Basso, S., Schirmer, M., and Botter, G.: A physically based analytical model of flood frequency curves, Geophys. Res. Lett., 43, 9070–9076,, 2016. 

Bevan, A. and Conolly, J.: Modelling spatial heterogeneity and nonstationarity in artifact-rich landscapes, J. Archaeol. Sci., 36, 956–964,, 2009. 

Chapman, L. J., and Putman, D. F. (Eds.): The physiography of southern Ontario, in: 3rd Edn., Ontario Ministry of Natural Resources, ISBN 9780774394222, 1984. 

Chin, A., O'Dowd, A. P., and Gregory, K. J.: Urbanization and River Channels, in: Treatise on Geomorphology, Vol. 9, Elsevier Ltd,, 2013. 

Church, M. and Ferguson, R. I.: Morphodynamics: Rivers beyond Steady State, Water Resour. Res., 51, 1883–1897,, 2015. 

Cook, R. and Weisberg, S.: Graphics for assessing the adequacy of regression models, J. Archaeol. Sci., 92, 490–499,, 1997. 

Cunnane, C.: A particular comparison of annual maxima and partial duration series methods of flood frequency prediction, J. Hydrol., 18, 257–271,, 1973. 

Desloges, J. R., Phillips, R. T. J., Byrne, M. L., and Cockburn, J. M. H.: Geomorphology of the Great Lakes Lowlands of Eastern Canada, in: Landscapes and Landforms of Eastern Canada, Springer International Publishing AG, 259–275,, 2020. 

Di Baldassarre, G., Laio, F., and Montanari, A.: Design flood estimation using model selection criteria, Phys. Chem. Earth, 34, 606–611,, 2009. 

Di Lazzaro, M., Zarlenga, A., and Volpi, E.: Hydrological effects of within-catchment heterogeneity of drainage density, Adv. Water Resour., 76, 157–167,, 2015. 

Donnan, J. A.: Economic implications and consequences of population growth, land use trends, and urban sprawl in southern Ontario, Environmental Commissioner of Ontario, (last access: 3 December 2021), 2008. 

Dunne, T. and Leopold, L. B.: Water in environmental planning, Freeman and Company, p. 818, ISBN 9780716700791, 1978. 

ECCC – Environment and Climate Change Canada Historical Hydrometric Data,, (last access: 11 August 2019), 2019. 

ECCC – Environment and Climate Change Canada: Canadian Climate Normals, (last access: 10 August 2019), 2020. 

El Adlouni, S., Bobée, B., and Ouarda, T. B. M. J.: On the tails of extreme event distributions in hydrology, J. Hydrol., 355, 16–33,, 2008. 

Farooq, M., Shafique, M., and Khattak, M. S.: Flood frequency analysis of river swat using Log Pearson type 3, Generalized Extreme Value, Normal, and Gumbel Max distribution methods, Arab. J. Geosci., 11, 216,, 2018. 

Fryirs, K. A. and Brierley, G. J.: Geomorphic analysis of river systems: an approach to reading the landscape, John Wiley & Sons, ISBN 9781118305454, 2012. 

Fulton, R. J., Karrow, P. F., LaSalle, P., and Grant, D. R.: Summary of quaternary stratigraphy and history, eastern Canada, Quaternary Sci. Rev., 5, 211–228,, 1986. 

Galster, J. C., Pazzaglia, F. J., Hargreaves, B. R., Morris, D. P., Peters, S. C., and Weisman, R. N.: Effects of urbanization on watershed hydrology: The scaling of discharge with drainage area, Geology, 34, 713–716,, 2006. 

Ghunowa, K., MacVicar, B. J., and Ashmore, P.: Stream power index for networks (SPIN) toolbox for decision support in urbanizing watersheds, Environ. Model. Softw., 144, 105185,, 2021. 

Hollis, G.: The effect of urbanization on floods of different recurrence interval, Water Resour. Res., 11, 431–435,, 1975. 

Hosking, J. R. M. and Wallis, J. R.: Some statistics useful in regional frequency analysis, Water Resour. Res., 29, 271–281,, 1993. 

Hydrologic Engineering Center: HEC-SSP Statistical Software Package Version 2.2 User's Manual, (last access: 16 August 2021), 2019. 

Jackson, D. A.: Stopping Rules in Principal Components Analysis: A Comparison of Heuristical and Statistical Approaches, Ecology, 74, 2204–2214,, 1993. 

Javelle, P., Ouarda, T. B. M. J., and Bobée, B.: Spring flood analysis using the flood-duration-frequency approach: Application to the provinces of Quebec and Ontario, Canada, Hydrol. Process., 17, 3717–3736,, 2003. 

Kaiser, H. F.: The Application of Electronic Computers to Factor Analysis, Educ. Psychol. Meas., 20, 141–151,, 1960. 

King, J. R. and Jackson, D. A.: Variable selection in large environmental data sets using principal components analysis, Environmetrics, 10, 67–77,<67::aid-env336>;2-0, 1999. 

Knighton, A. D.: Downstream variation in stream power, Geomorphology, 29, 293–306,, 1999. 

Laio, F., Di Baldassarre, G., and Montanari, A.: Model selection techniques for the frequency analysis of hydrological extremes, Water Resour. Res., 45, 1–6,, 2009. 

Larson, G. and Schaetzl, R.: Origin and evolution of the Great Lakes, J. Great Lakes Res., 27, 518–546,, 2001. 

Luna, L., Vignozzi, N., Miralles, I., and Solé-Benet, A.: Organic amendments and mulches modify soil porosity and infiltration in semiarid mine soils, Land Degrad. Dev., 29, 1019–1030,, 2018. 

Merz, R. and Blöschl, G.: Process controls on the statistical flood moments – a data based analysis, Hydrol. Process., 23, 675–696,, 2009. 

Merz, B., Basso, S., Fischer, S., Lun, D., Bloeschl, G., Merz, R., Guse, B., Viglione, A., Vorogushyn, S., Macdonald, E., Wietzke, L., and Schumann, A.: Understanding heavy tails of flood peak distributions, Water Resour. Res., 58, e2021WR030506,, 2022. 

Natural Resources Canada: Federal hydrologic and hydraulic procedures for flood hazard delineation, Federal Flood Mapping Guidelines Series, 19–24, (last access: 22 April 2021), 2019. 

Odry, J. and Arnaud, P.: Comparison of flood frequency analysis methods for ungauged catchments in France, Geosci. J., 7, 88,, 2017. 

OGS – Ontario Geological Survey: Surficial Geology of Southern Ontario (MRD 128 – Revised), (last access: 3 Febuary 2020), 2010. 

OMNR – Ontario Ministry of Natural Resources: Provincial Digital Elevation Model— Tiled Dataset, version 2.0.0, Ontario Ministry of Natural Resources, Peterborough, Ontario, Canada, (last access: 13 November 2018), 2005. 

OMNRF – Ontario Ministry of Natural Resources and Forestry: Southern Ontario Land Resource Information System (SOLRIS) – version 3.0, (last access: 27 January 2020), 2019. 

OMNRF – Ontario Ministry of Natural Resources and Forestry: Ontario Watershed Boundaries,,-82.339420,2.93 (last access: 26 April 2021), 2020. 

Onen, F. and Bagatur, T.: Prediction of Flood Frequency Factor for Gumbel Distribution Using Regression and GEP Model, Arab. J. Sci. Eng., 42, 3895–3906,, 2017. 

Oudin, L., Andréassian, V., Perrin, C., Michel, C., and Le Moine, N.: Spatial proximity, physical similarity, regression and ungaged catchments: A comparison of regionalization approaches based on 913 French catchments, Water Resour. Res., 44, 1–15,, 2008. 

Paixao, E., Auld, H., Mirza, M., Klaassen, J., and Shephard, M.: Regionalization of heavy rainfall to improve climatic design values for infrastructure: case study in Southern Ontario, Canada, Hydrolog. Sci. J., 56, 1067–1089,, 2011. 

Papalexiou, S. M., Koutsoyiannis, D., and Makropoulos, C.: How extreme is extreme? An assessment of daily rainfall distribution tails, Hydrol. Earth Syst. Sci., 17, 851–862,, 2013. 

Pasternack, G. B.: Geomorphologist's Guide to Participating in River Rehabilitation, in: Treatise on Geomorphology, Vol. 9, Elsevier Ltd.,, 2013. 

Phillips, R. T. J.: Alluvial Floodplain Classification and Organization in Low-Relief Glacially Conditioned River Catchments (Order No. 3687518), ProQuest Dissertations & Theses Global, Doctoral dissertation, University of Toronto, (last access: 13 November 2018), 2014. 

Phillips, R. T. J. and Desloges, J. R.: Glacially conditioned specific stream powers in low-relief river catchments of the southern Laurentian Great Lakes, Geomorphology, 206, 271–287,, 2014. 

Phillips, R. T. J. and Desloges, J. R.: Alluvial floodplain classification by multivariate clustering and discriminant analysis for low-relief glacially conditioned river catchments, Earth Surf. Proc. Land., 40, 756–770,, 2015. 

Prancevic, J. P. and Kirchner, J. W.: Topographic controls on the extension and retraction of flowing streams, Geophys. Res. Lett., 46, 2084–2092,, 2019. 

Roman, D. C., Vogel, R. M., and Schwarz, G. E.: Regional regression models of watershed suspended-sediment discharge for the eastern United States. J. Hydrol., 472–473, 53–62,, 2012. 

Royall, D.: Land-Use Impacts on the Hydrogeomorphology of Small Watersheds, in: Treatise on Geomorphology, Vol. 13, Elsevier Ltd., 2013. 

Sheather, S. and Oostrom, C. G.: A Modern Approach to Regression with R, in: 1st Edn., Springer, New York,, 2009.  

Subyani, A. M., Qari, M. H., and Matsah, M. I.: Digital elevation model and multivariate statistical analysis of morphometric parameters of some wadis, western Saudi Arabia, Arab. J. Geosci., 5, 147–157,, 2012. 

Tacq, J.: Multivariate Normal Distribution, in: International Encyclopedia of Education, 3rd Edn., edited by: Peterson, P., Baker, E., and McGaw, B., Elsevier, 332–338,, 2010. 

Taniguchi, K. T. and Biggs, T. W.: Regional impacts of urbanization on stream channel geometry: A case study in semiarid southern California, Geomorphology, 248, 228–236,, 2015. 

Thayer, J. B., Phillips, R. T. J., and Desloges, J. R.: Downstream Channel Adjustment in a Low-Relief, Glacially Conditioned Watershed, Geomorphology, 262, 101–111,, 2016. 

USGS – US Geological Survey: Guidelines for Determining Flood Flow Frequency Bulletin 17C Book 4, Hydrologic Analysis and Interpretation, US Department of the Interior, US Geological Survey, 23–34, (last access: 25 August 2021), 2019. 

Vocal Ferencevic, M. and Ashmore, P.: Creating and evaluating digital elevation model-based stream-power map as a stream assessment tool, River Res. Appl., 28, 1394–1416,, 2012. 

Winter, T. C.: The Concept of Hydrologic Landscapes, J. Am. Water Resour. Assoc., 37, 335–49,, 2001. 

Wolman, M. and Miller, J.: Magnitude and Frequency of Forces in Geomorphic Processes, J. Geol., 68, 54–74, 1960. 

Wolman, M. G. and Gerson, R.: Relative scales of time and effectiveness of climate in watershed geomorphology, Earth Surf. Process., 3, 189–208,, 1978. 

Zhang, Z., Stadnyk, T. A., and Burn, D. H.: Identification of a preferred statistical distribution for at-site flood frequency analysis in Canada, Can. Water Resour. J., 45, 43–58,, 2020. 

Short summary
An efficient regional flood frequency model relates drainage area to discharge, with a major assumption of similar basin conditions. In a landscape with variable glacial deposits and land use, we characterize varying hydrological function using 28 explanatory variables. We demonstrate that (1) a heterogeneous landscape requires objective model selection criteria to optimize the fit of flow data, and (2) incorporating land use as a predictor variable improves the drainage area to discharge model.