A reliable flood frequency analysis (FFA) requires selection of an appropriate statistical distribution to model historical streamflow data and, where streamflow data are not available (ungauged sites), a regression-based regional flood frequency analysis (RFFA) often correlates well with downstream channel discharge to drainage area relations. However, the predictive strength of the accepted RFFA relies on an assumption of homogeneous watershed conditions. For glacially conditioned fluvial systems, inherited glacial landforms, sediments, and variable land use can alter flow paths and modify flow regimes. This study compares a multivariate RFFA that considers 28 explanatory variables to characterize variable watershed conditions (i.e., surficial geology, climate, topography, and land use) to an accepted power-law relationship between discharge and drainage area. Archived gauge data from southern Ontario, Canada, are used to test these ideas. Mathematical goodness-of-fit criteria best estimate flood discharge for a broad range of flood recurrence intervals, i.e., 1.25, 2, 5, 10, 25, 50, and 100 years. The log-normal, Gumbel, log-Pearson type III, and generalized extreme value distributions are found most appropriate in 42.5 %, 31.9 %, 21.7 %, and 3.9 % of cases, respectively, suggesting that systematic model selection criteria are required for FFA in heterogeneous landscapes. Multivariate regression of estimated flood quantiles with backward elimination of explanatory variables using principal component and discriminant analyses reveal that precipitation provides a greater predictive relationship for more frequent flood events, whereas surficial geology demonstrates more predictive ability for high-magnitude, less-frequent flood events. In this study, all seven flood quantiles identify a statistically significant two-predictor model that incorporates upstream drainage area and the percentage of naturalized landscape with 5 % improvement in predictive power over the commonly used single-variable drainage area model (

A reliable assessment of flood frequency and flood magnitude over space and time is critical for urban planning and infrastructure engineering that depends on flood probability (Basso et al., 2016). Flood magnitude, frequency, and duration are primary drivers of channel erosion and stream morphology (Taniguchi and Biggs, 2015) as a self-shaping alluvial channel entrains and transports sediment to adjust its dimensions, planform pattern, bed characteristics, and gradient in response to varying flow levels (Church and Ferguson, 2015). So reliable estimates of flood frequency are important for understanding geomorphic channel change.

A regional flood frequency analysis (RFFA) can be very important in determining the probability of extreme flood events where streamflow data are not readily available (Ahn and Palmer, 2016) by transferring observed hydrological information from a group of gauged sites to comparative ungauged sites as a representation of flow statistics using hydrological variables (Odry and Arnaud, 2017). A common approach to RFFA consolidates data samples from many measuring sites and uses ordinary least-squares (OLS) regression to identify a relationship between mean annual floods of multiple basins and some basin characteristic (e.g., drainage area). As the source area for channel discharge, drainage area is a widely used proxy for channel discharge (Galster et al., 2006; Knighton, 1999). It has become an accepted practice to model discharge using a single-variable power-law relationship between discharge (

To estimate how often a specified flood event (or channel discharge) will occur, flood frequency analysis (FFA) is widely used (Farooq et al., 2018). Most often, an FFA uses the occurrence of extreme flood events to estimate the return period,

Research suggests that the spatial variability of basin attributes (i.e., topographic relief, climate, vegetation, and land use) and sub-surface characteristics which influence hydrological and fluvial function are controlling factors of a fluvial system's drainage efficiency and are relevant to the flow response in a catchment (Di Lazzaro et al., 2015; Fryirs and Brierley, 2012; Galster et al., 2006; Oudin et al., 2008). Additionally, landscape modifications that decrease infiltration will impose changes to river hydrology (Ashmore, 2015; Ghunowa et al., 2021; Taniguchi and Biggs, 2015; Winter, 2001) with a downstream cascading effect on flow regime (Royall, 2013). Human occupation, landscape manipulation, and the generation of impervious surfaces associated with urbanization have the most profound impact on hydrogeomorphic responses, particularly in smaller watersheds (Pasternack, 2013; Royall, 2013). Moreover, a fluvial system's response to human-induced land use change (or its sensitivity to change) will vary, depending on basin attributes (i.e., configuration, geomorphology, and sediment retention) (Royall, 2013). For this reason, the spatial heterogeneity across a landscape will likely produce a variation in flood response that may best be captured using a multivariate RFFA approach that considers parameterization of relevant basin characteristics (i.e., topographic relief, land use, vegetation, and sub-surface geology) as a set of explanatory variables to estimate flood discharge (Ahn and Palmer, 2016).

Recent works have highlighted the impact of geomorphic spatial heterogeneity on the basin hydrological response (Ahn and Palmer, 2016; Di Lazzaro et al., 2015; Taniguchi and Biggs, 2015). However, many rapid geomorphic studies have relied on just catchment area as the leading attribute for estimating channel forming discharge (Ashmore et al., 2023). This study seeks to explore additional explanatory hydrological and land use controls that improve the predictive strength of this relationship in a heterogeneous landscape. This multivariate approach uses exploratory statistical analysis to better understand the link between intra-catchment variability and hydrological function. This study explores the following:

An FFA is completed to model reliable estimations of discharge for a broad range of flood recurrence intervals (i.e.,

The widely used single-variable RFFA (Eq. 1) is derived to characterize the relationship between discharge (

A multivariate, regression-based RFFA is presented that considers the spatially variability of hydrological controls in the context of inherited glacial landforms, sediments, and land use. To achieve this goal, 28 predictor variables are explored representing basin characteristics (i.e., topographic relief, climate, land use, vegetation, and sub-surface geology). A backward elimination approach is employed (i.e., discriminant and principal component analyses, and regression diagnostics) to identify the most parsimonious discharge models for recurrence intervals of 1.25, 2, 5, 10, 25, 50, and 100 years.

The predictive power of a multivariate derived RFFA that considers multiple basin hydrological controls is compared with a generally accepted single-variable RFFA in a spatially heterogeneous setting.

This flood frequency study focuses on a test region of peninsular southern Ontario, Canada (Fig. 1), that is bounded by the Canadian Shield to the north, the three lower Great Lakes – Huron, Erie, and Ontario – to the southwest, and the Ottawa and St. Lawrence rivers to the east. Located within the North American Great Lakes watershed, it is a region of modest relief, with elevation ranging from 544 m a.s.l. near Lake Huron draining by way of the St. Lawrence River lowlands at less than 70 m to the Atlantic Ocean (Larson and Schaetzl, 2001). Convective, synoptic, and tropical systems that influence the humid, continental climate of the region are enhanced by local, regional, and topographic conditions (Paixao et al., 2011). Moisture and temperature associated with the Great Lakes influence inland precipitation for up to 50 km. Consequently, the mean annual precipitation varies regionally from 800 to 1200 mm (Paixao et al., 2011). During winter months, precipitation typically accumulates in the form of snow, generating spring snowmelt floods that dominate river flow regimes (Javelle et al., 2003). The surficial geology of the region, and the hydrological controls exerted by the parent materials, are the product of the region's glacial history (Chapman and Putman, 1984). Recurring continental glaciations over the past

Deglaciation, approximately 12–13 000 years ago, has left pronounced glacial legacy effects with complex sequences of sub-glacial, ice-contact, and proglacial sediments deposited during the final retreat of the Laurentide Ice Sheet (Larson and Schaetzl, 2001; Phillips and Desloges, 2014, 2015). The most common physiographic features include sheets of till, finer glaciolacustrine plains of sand or clay, glaciofluvial outwash deposits of sand, gravel, silts and clays, and a configuration of moraines (Thayer et al., 2016). Two significant post-glacial geomorphic features are the Niagara Escarpment and the Oak Ridges Moraine (Fig. 1). The Niagara Escarpment is a Paleozoic limestone bedrock ridge resulting from differential glacial erosion and weathering of harder and softer rock that arches from the region between lakes Ontario and Erie, bypassing Lake Ontario and extending northward to Georgian Bay (Chapman and Putman, 1984; Phillips and Desloges, 2014).

Map identifying the study area (indicated in orange) and two significant post-glacial geomorphic features that influence drainage networks of southern Ontario (i.e., the Niagara Escarpment and the Oak Ridges Moraine). The inset map (upper right) indicates the study region within the Ontario portion of the Laurentian Great Lakes catchment relative to Canada.

Several pre-glacial rivers have carved deep valleys into the Niagara Escarpment; however, Late Pleistocene glaciations have infilled these valleys with varying thicknesses of till (Chapman and Putman, 1984) directing catchment flow mostly away from the escarpment crest. The Oak Ridges Moraine is a stratified kame moraine of glacial drift that extends from the Niagara Escarpment 160 km eastward across south-central Ontario (Phillips and Desloges, 2014). This massive ridge forms a drainage divide, separating catchments flowing north to Georgian Bay/Lake Huron and south to Lake Ontario. Glacial sediments typically blanket the study area at a thickness of 50 m, and up to 350 m in some places (Larson and Schaetzl, 2001). In many areas, where stratified limestones and shales of the Palaeozoic age lie beneath the thick glacial overburden, fertile soils rich in calcium carbonate and clay are produced (Desloges et al., 2020; Phillips and Desloges, 2014, 2015). These fertile soils support southern Ontario's widespread agricultural development (Donnan, 2008).

More recent European settlement and regional expansion have resulted in differentiated land use with extensive agricultural land, natural and reforested areas, and clustered urban settlement (Chapman and Putman, 1984). The southern Ontario region continues to accommodate an increasing population. Drawn by employment, most people settle in built-up cities and surrounding areas, driving clustered regional urbanization that consumes surrounding rural lands. However, a comparable demand to expand the total area of cropland has also occurred to support larger farming operations (Donnan, 2008).

An overview of the methodology for this study is provided in Fig. 2.

Flowchart of FFA and comparison of a multivariate RFFA to a single-variable RFFA that uses a discharge to drainage area relationship. The regression-based multivariate RFFA employs sub-basin characterization and backward elimination of explanatory variables to determine the most parsimonious model to predict discharge over seven flood quantiles.

A Station Meta Data Index for 1188 Ontario georeferenced stream gauges from the HYDAT database of the Water Survey of Canada (WSC) monitoring program is accessible online at

Retention of station data is based on three criteria: (1) the gauge station lies within the peninsular region of southern Ontario, (2) the gauge station exists for a fluvial system with known field survey data (i.e., Annable 1995, 1996; Phillips, 2014), and (3) streamflow data represent a minimum of 10 years of continuous (non-seasonal) year-round operation. These criteria yield 207 gauge stations from the HYDAT database with a minimum operation period of 10 years, an average of 42.5 years (

Catchment basins of the study area are delineated based on the hierarchical framework of the Ontario Watershed Boundaries, published by the Ontario Ministry of Natural Resources and Forestry (OMNRF, 2020). The digital geospatial datasets are accurate to within 100 m and accessed online from

The site-specific drainage area for each gauge station is evaluated based on Ontario's hydrologically enforced provincial digital elevation model (DEM; version 2.0.0) of the Ontario Ministry of Natural Resources (OMNR, 2005) following Phillips and Desloges (2014). Hydrological enforcement ensures that drainage occurs in a down-slope direction, facilitating the construction of a flow accumulation raster necessary to establish the upstream drainage area of each gauge station.

To characterize the upstream hydrological, geomorphic, and land use conditions affecting channel discharge, the 16 tertiary level catchments of southern Ontario are subdivided into 45 sub-basins demarcated by quaternary level boundaries. Georeferenced gauge stations are clustered within sub-basin units to best represent the immediate upstream hydrological conditions. Sub-basin attributes are selected to characterize the drainage area conditions for each gauge station. Using digital cell counts and zonal statistics from multiple sources, sub-basin characteristic variables are extracted from four geospatial raster datasets to represent topography, land use, precipitation, and hydrological properties from a geomorphic perspective:

Ontario's provincial DEM (version 2.0.0), a hydrologically enforced tiled raster dataset with a cell resolution of 10 m and vertical accuracy of 5 m

the southern Ontario Land Resource Information System (SOLRIS; version 3.0), accessed online at

the Canadian Climate Normals 1981–2010, accessed at

the revised Surficial Geology of Southern Ontario (MRD 128 – Revised), accessed online at

For each of the seven flood quantiles, a single-variable relationship (Eq. 1) between discharge and drainage area is obtained by statistical regression. Multivariate relationships between the explanatory variables and each of the quantile discharge datasets are assessed by applying OLS regression. OLS assumes that the set of explanatory variables (i.e., basin characteristics) and errors must be independent to avoid bias. When characterizing natural systems, the potential exists for some variables to correlate with other variables due to their representation of related natural phenomena, often indicated by high correlations between variables suggesting a duplication of information captured (Ahn and Palmer, 2016; Phillips and Desloges, 2015). To identify the most parsimonious discharge model for RIs of 1.25, 2, 5, 10, 25, 50, and 100 years, regression models are developed using a backward elimination strategy (Fig. 3):

Discriminant analysis, similar to that of Ahn and Palmer (2016), tests for variable independence and identifies highly correlated variables. A principal components analysis (PCA) explores the most important influences on channel discharge. PCA has been shown to be an effective tool for variable reduction that provides a statistical basis to discard redundant variables (King and Jackson, 1999). A simple Pearson correlation and a Spearman correlation are applied to all predictor variables (criteria

An iterative process of multivariate regression diagnostics is applied, following others (Roman et al., 2012; Sheather and Oostrom, 2009), to remove variables that demonstrate little or no predictive power.

Models are evaluated for performance using an analysis of variance (ANOVA) that compares the residual sum of squares of the multivariate models to the single-variate models, and leave-one-out cross validation (LOOCV) assesses the predictive capabilities of the models in practice.

Flowchart of discriminant analysis and backward elimination strategy employing tests for variable independence, multivariate regression diagnostics, and model evaluation.

The model selection criteria determine that 42.5 % of the 207 hydrometric gauge records are most suited to an LN distribution, 31.9 % to an EV1 distribution, 21.7 % to an LP3 distribution, and 3.9 % to a GEV distribution (Fig. 4). Goodness-of-fit tests suggest that all four distributions are potentially suitable for modelling flood extremes from gauges in southern Ontario. For 74.4 % of the gauge records tested, the selection criteria chose a two-parameter model (i.e., LN or EV1) over a three-parameter model (i.e., LP3 or GEV). The two-parameter EV1 model is found to be five times more likely to be selected as the optimal distribution over its three-parameter parent model, GEV. The GEV distribution is only selected in a limited number of cases. In general, there is no single “best fit” distribution type indicated based on geographic location within a sub-basin unit (Fig. 4).

Map identifying the geographic locations of 207 gauge stations and the optimal statistical distribution selected to model the AMS data. Sub-divisions of the tertiary level watershed boundaries are indicated. No sub-basin indicates a single “best fit” distribution type.

The optimal probability distribution curve is used to estimate the flood quantiles for RIs of 1.25, 2, 5, 10, 25, 50, and 100 years for each of the 207 gauge stations. These flood quantiles are consistent with return periods explored in other flood frequency analyses (Ahn and Palmer, 2016; Basso et al., 2016; Hollis, 1975; Onen and Bagatur, 2017). A Shapiro–Wilk analysis tests the null hypothesis that the flood quantile datasets are normally distributed (Table 1). The dataset for each flood quantile does not meet the assumption of normality (

Results of Shapiro–Wilks normality tests for each flood quantile, with and without logarithmic transformation of data.

Twenty-eight attributes are selected (Table 2) to characterize the drainage area conditions representing the topography, precipitation, land use, and hydrological properties from a geomorphological perspective. The drainage area conditions influence channel discharge (across all seven flood frequency quantiles) in terms of the regional geomorphic, hydrological, topographic, and land use properties within each sub-basin.

Twenty-eight variables representing geomorphic, hydrological, land use, and topographic variability between sub-basins.

The upstream drainage area for each georeferenced gauge station is extracted from the hydrologically enforced DEM. A logarithmic transformation is applied to the drainage area variable values to ensure normality (

Raster datasets illustrating

Point information for mean annual precipitation, annual number of precipitation days, mean annual rainfall, and annual number of rainfall days from 65 observation stations is converted to raster coverage using several interpolation techniques. Inverse distance weighting (IDW) and ordinary kriging (OK) using a stable model and an exponential model are compared. OK has been shown to produce accurate results when used to describe spatially heterogeneous natural phenomena (Bevan and Conolly, 2009) such as precipitation. Cross validation suggests fitting an OK exponential model for annual mean precipitation, annual mean rainfall, and the annual number of rainfall days, and an OK stable model for the annual number of precipitation days. The topographic conditions of the sub-basins are extracted and quantified from the hydrologically enforced DEM (Fig. 5c). Percent land use is quantified using the SOLRIS categories for each sub-basin (Fig. 5d). For this study, three land use categories are established: %Urban, %Cropland, and %Naturalized area. %Urban regions combine all transportation and built-up areas. %Cropland is defined by tilled agricultural areas. %Naturalized regions combine all tall-grass land cover, mixed forests, cultivated tree plantations, swamps, wetlands, and open water areas as indicated by the SOLRIS version 3.0 dataset.

Regression of the logDrainage variable against each of the seven flood quantile datasets (i.e.,

Single-variate RFFA models for each flood quantile. The

The single-variable discharge–drainage area relationship for the

Expressing the

A PCA of the 28 explanatory variables produces seven components with eigenvalues greater than 1.00 (eigenvalues for Dim.1 through Dim.7 are 7.85, 5.48, 2.87, 2.25, 1.84, 1.09, and 1.04, respectively) that explain 80.1 % of the total variability of the dataset. This suggests an absence of strong inter-correlations among many of the 28 variables. The latent root criterion (also known as the Kaiser or eigenvalue-one criterion) suggests retaining and interpreting principal components if the eigenvalue is greater than 1.00 (Kaiser, 1960). However, using the point where the first few eigenvalues depart from the more similar lesser eigenvalues (i.e., the broken-stick model) (Jackson, 1993), suggests retaining the first three dimensions which account for almost 58 % of the total variability of the dataset and are the most interpretable. The correlation circles illustrate the projections of the first three principal components (Dim1, Dim2, and Dim3) (Fig. 7). Highly correlated variables project in the same direction. The first principal component (Dim1) tends towards a land use composition grouping with some loading from gradient variables and precipitation variables (Dim1 explains 28.0 % of the variance). The second principal component (Dim2) tends towards an elevation cluster with additional loading from precipitation variables (Dim2 explains 19.6 % of the variance). The third principal component (Dim3) is a weakly defined land surface grouping with loading from surficial geology classifications and basin geometry (Dim3 explains 10.2 % of the variance). While elevation is a clear contributor to the variance of the dataset based on the PCA tests, the directional indicators suggest the presence of multicollinearity among the elevation predictors which is reinforced by the results of correlation detection.

Principal component analysis (PCA) highlighting the most contributing variables of the 28-variable dataset for each dimension and illustrating the correlation circles for principal components one, two, and three (Dim1, Dim2, and Dim3).

Simple Pearson correlation tests (

Results from model regression of all 19 predictors indicating predictor variables found to be statistically significant in the 19-variable model. The remaining variables are not found to be statistically significant in 19-variable regressions.

It can be a practice to test and transform independent variables to ensure a normal distribution of a multivariate dataset; however, tests for multivariate normality are rarely performed (Tacq, 2010). Alternatively, the plots of standardized residuals from combinations of predictor variables are examined for a desired elliptically symmetric distribution (Sheather and Oostrom, 2009). To enable model comparison, the gauge drainage area predictor variable included in the multivariate regression is logarithmically transformed, consistent with the single-variable power model.

Multiple linear regression is applied to the remaining 19-variable dataset for each of the seven flood RIs, i.e., 1.25, 2, 5, 10, 25, 50, and 100 years, using an OLS approach. The fitted values of the model are compared to the dependent variables (i.e.,

Examination of the associated

Regression results of five- and three-predictor models over all flood quantiles tested. Variables with some explanatory contribution in five- and three-predictor models are indicated by an “X”.

The most parsimonious two-variate RFFA models for each flood quantile.

The five- and three-predictor models retain surficial material variables (%Organic, %Sand, and %Gravel) and climate variables (Mean_Precip, Precip_Days, and Rainfall_Days). Reducing from five-variable models to three-variable models decreases the adjusted

The single-variable and two-variable fitted discharge models are plotted against the best-fit estimated discharge derived from the flood frequency curves for

Leave-one-out cross validation (LOOCV) and analysis of variance (ANOVA) comparing the single-variable models to the two-variable models.

For all seven flood quantile models, the addition of the %Naturalized predictor variable reduces the residual standard error (Res SE) and increases the adjusted coefficient of determination (adj

An analysis of variance (ANOVA) (Table 7) comparing the single-variable models to the two-variable models further indicates an improved prediction of discharge using the two-predictor model compared with the one-predictor model. For all seven flood quantiles, a decrease in the sum of squares of residuals (RSS) is observed with the addition of the %Naturalized predictor, and an

Flood magnitude, frequency, and duration are primary drivers of channel erosion and stream morphology (Taniguchi and Biggs, 2015). High-magnitude, less-frequent floods will undoubtedly result in significant alterations to a channel's morphology and are more important when considering hazards, loss of life and infrastructure damage (Onen and Bagatur, 2017). However, the cumulative effects of more frequent, lower-magnitude floods can also be geomorphically more effective in altering channel form (Church and Ferguson, 2015; Wolman and Gerson, 1978; Wolman and Miller, 1960). Consequently, for effective risk management and hazard prevention, it is useful to model flows of different flood RIs when considering flood frequency as a predictive tool to better understand a river's morphological response to discharge (Basso et al., 2016). The best estimation of extreme flood events, however, is limited by the availability and accuracy of recorded gauge data, the length of the observed flood series, and the presence or absence of extreme flood occurrences within a flow record (Odry and Arnaud, 2017). This analysis uses a broad range of high- and low-frequency flood estimates from long-term historical flow data to develop a reliable RFFA for urban planning and infrastructure engineering. It is common practice to develop an RFFA relating the drainage area of a catchment to channel discharge using a single-variable power-law relationship. Research suggests that physiographic features, such as those inherited by southern Ontario's glacial legacy, and anthropogenic land use, for example, southern Ontario's clustered urbanization and widespread agricultural development, can influence a region's hydrogeomorphic response, particularly in smaller watersheds (Royall, 2013). Seeking to improve upon a widely accepted single-variate RFFA model in a heterogeneous landscape, the objective of this study was to explore a dependable RFFA using a multivariate approach for a region influenced by glacial conditioning and varying land use while also considering the hydrological influences of climate and topography.

In this study, rigorous goodness-of-fit testing of annual maximum mean daily discharge data series from 207 hydrometric gauge stations in a heterogeneous landscape shows that 42.5 % of gauge records are most suited to a two-parameter LN distribution, 31.9 % to a two-parameter EV1 distribution, 21.7 % to a three-parameter LP3 distribution, and 3.9 % to a three-parameter GEV distribution. This suggests that all four distributions are potentially suitable for modelling flood extremes in heterogeneous regions. The model selection criteria favoured a two-parameter model over a three-parameter model in 74.4 % of cases, consistent with other studies which found that selection criteria demonstrate a predisposition towards the most parsimonious model (i.e., fewest distribution parameters) (Farooq et al., 2018; Laio et al., 2009; Onen and Bagatur, 2017). Most notably, the two-parameter EV1 model is optimal five times more frequently than its three-parameter parent model, the GEV distribution, which is only found appropriate for use in 3.9 % of cases. This finding is similar to that of Laio et al. (2009) where the GEV distribution was only selected in a limited number of cases when modelling the annual maxima of peak discharge in 1000 United Kingdom basins. However, the GEV and LP3 distributions are heavier tailed than the LN or EV1 distributions (El Adlouni et al., 2008; Merz et al., 2022; Papalexiou et al., 2013) suggesting the upper tail behaviour of a flood time series may be underestimated when estimating flood frequency from small sample sizes (less than 50) while a single extreme flood event may lead to overestimation of the upper tail (Papalexiou et al., 2013). The average hydrological record in this study was 42.5 years which implies uncertainty in estimating extreme quantiles in the study region due to a limited record length of some gauges, but research has indicated that basins with snowmelt-dominated regimes tend towards lighter tailed distributions (Merz et al., 2022).

Flood estimation will often apply a universal, fixed probabilistic model to historical gauge data (Di Baldassarre et al., 2009). Other southern Ontario studies have employed a blanket LP3 probability distribution to model the

Other studies have explored a variety of novel regionalization approaches. Di Lazzaro et al. (2015) presented an RFFA using a single-variable parameterization of drainage density. Ahn and Palmer (2016) estimated flood frequency using the GEV distribution and then proposed regionalization methods using a spatial proximity approach. However, regionalization based on spatial proximity assumes that nearby sites are more similar than distal sites (Odry and Arnaud, 2017). In a glacially conditioned landscape, such as the southern Ontario region, the configuration of glacial deposits (Fig. 5a) often forms drainage divides that segregate neighbouring catchments with diverse flood characteristics. This study, therefore, explores regionalization through a multivariate regression-based approach to capture the variability of upstream hydrological controls that are often dependent on the spatial arrangement of post-glacial physiographic features and, in the case of southern Ontario, the variable land use (i.e., regionally clustered urbanization and agricultural development). The mapping of surficial material, climate conditions, topography, and land use illustrates the variability of hydrological influences on the region (Fig. 5). Consistent with the agricultural land use of southern Ontario, analysis reveals a negative correlation between %Cropland and Gradient_Mean. Regions of steep gradient are not typically associated with areas of high agricultural activity, whereas lower gradient regions provide much of the agricultural/cropping activity. Crops are typically cultivated in areas with favourable conditions for growth (i.e., rainfall and gradient) producing collinear relationships with key elevation and precipitation variables relevant to channel discharge. Likewise, the high spatial variability in surficial geology of southern Ontario (due to its glacial conditioning) can be problematic. Contrasting geomorphic conditions between catchments are represented by, for example, high negative correlations among %Diamicton and %Sand, and an absence of surficial material types in many areas (e.g., %Bedrock and %Clay) produces high incidences of zero values and non-linear relationships. Conversely, a measure of natural land use is available across the study region, making a linear relationship between %Naturalized and discharge possible.

During the backward elimination process, different land use, geomorphic, climatic, and topographic variables assume different importance in predicting channel flow depending on the flood magnitude being modelled. The influence of the glacial legacy is captured by the inclusion of surficial materials in the five- and three-variable models (Table 5). The less parsimonious, but still statistically valid, five- and three-predictor models show the importance of land cover/glacial legacy (%Organics, %Sand, and %Gravel) and climate variables (rainfall days). In three-variable models, precipitation (i.e., Mean_Precip or Rainfall_Days) increases model fit for lower magnitude and more frequent flood events (i.e.,

Although the most parsimonious model for estimating discharge is found to be the generally accepted and efficient single-variable relation between discharge and drainage area, when considering model variance, the two-predictor combination of upstream drainage area and the regional percentage of naturalized landscape (%Naturalized) shows a 5 % improvement when explaining variation in flood discharge for all RIs tested (i.e., 1.25, 2, 5, 10, 25, 50, and 100 years). An analysis of variance further indicates a statistically significant improvement in prediction of discharge using the two-predictor model (i.e., logDrainage and %Naturalized) compared with the single-predictor model (i.e., logDrainage). The percentage of naturalized landscape is important because it reflects areas within a catchment that have enhanced water storage compared with urban or agricultural areas. These findings are important for situations when it is necessary to reduce uncertainty in flood prediction. Plots comparing the single- and two-predictor models demonstrate less scatter for all seven flood quantiles. Generally, an increase in model scatter is observed for both one-variable and two-variable prediction as the RI increases suggesting the predictive capability lessens moving from

The findings of this research demonstrate that land use has greater predictive power than surficial geology when coupled with drainage area to estimate channel discharge in a heterogeneous landscape over a broad range of flood quantiles. While the methodology used in this study is transferable to other regions, this finding may also be transferable. However, a new scenario would require recalibration of the drainage area relationship and possibly reclassification of land use types to suit the spatial variation of the new location. Human landscape alterations that impact drainage density will influence rates of overland flow and channel flow, exerting additional influence on hydrological processes and stream response and, subsequently, impacting the magnitude and frequency of peak channel flows (Taniguchi and Biggs, 2015). Changes to land cover, such as deforestation, conversion to cropping, and urbanization, typically decrease infiltration which increases discharge, and alters flood magnitude (Chin et al., 2013; Royall, 2013). It follows that the presence of reforested or natural areas will have a significant influence on modelled discharge. Since the early 1900s, select areas of southern Ontario have been reforested in recognition of wasteful clearing of marginal and submarginal agricultural lands by early settlers (Armson et al., 2001). The %Naturalized variable includes tall-grass land cover, mixed forests, cultivated tree plantations, swamps, wetlands, and open water areas, representing areas of high infiltration or the surface storage of water. The negative coefficient for the percentage of naturalized area reduces the weight of the drainage area input. This is consistent with the theoretical expectation that drainage areas of sub-basins with a high percentage of naturalized areas may be overemphasized without the appropriate correction for surface water storage. Although urbanization has been shown to have the most profound influence on fluvial system response, altering hydrological processes through (a) a decrease in infiltration, (b) an increase in overland flow, and (c) a potential decrease in groundwater recharge (Chin et al., 2013), the regional impact of clustered urban populations of southern Ontario is diluted by the expansive regions of cropland, grazing, and naturalized areas that separate them. Consequently, the %Urban variable shows minimal significance in the multivariate regression. Similarly, %Cropland was shown to be a poor regional predictor for discharge due to a collinear relationship with other predictors. The statistical significance of %Naturalized, however, suggests that the percentage of a sub-basin that is naturalized can be an effective variable to represent temporary surface water storage, limiting the impact to a channel during flood events.

To transfer flood discharge information from gauged sites to ungauged sites in a heterogeneous landscape (e.g., a low-relief, glacially conditioned landscape with variable land use), the primary objective of this research was to explore additional explanatory hydrological controls to improve the predictive strength of a well-known regional flood frequency approach that correlates drainage area to discharge. The main conclusions of this analysis are as follows:

When modelling the annual maximum mean daily discharge records for southern Ontario, 42.5 % are most suited to a two-parameter LN distribution, 31.9 % to EV1, and 21.7 % to LP3, and 3.9 % to a GEV distribution suggesting all four distributions tested are potentially suitable for modelling flood extremes in a heterogeneous landscape. The variation of “best fit” probability distributions indicates that systematic model selection criteria is necessary when fitting observed flow data in regions with variable land use or other hydraulic influences (i.e., geomorphology, climate, or topography).

For lower-magnitude, more-frequent flood events (i.e.,

While land use, geomorphology, material type, climate, and topographic variables are variably important on the flood magnitude being modelled, the results here show the most parsimonious predictor for estimating discharge in ungauged streams is the accepted and efficient single-variable drainage area.

When considering model variance, a two-predictor combination of upstream drainage area and the regional percentage of naturalized landscape shows a statistically significant 5 % improvement when explaining variation in flood discharge for a broad range of recurrence intervals tested (i.e., 1.25, 2, 5, 10, 25, 50, and 100 years). The negative coefficient associated with the percentage of naturalized area serves as a correction to the drainage area relationship to account for surface water storage. This finding is important for situations when it is necessary to reduce uncertainty in flood prediction.

The findings suggest that applying a zonal two-variable model, which accounts for drainage area and the percentage of upstream naturalized land use, serves as a correction for surface water storage when modelling flood magnitude for high- and low-frequency flood events. This improvement is of value in a heterogeneous landscape when considering the geomorphic response of channels to predicted channel discharge for a broad range of flood recurrence intervals and greater precision is required.

The GEV distribution uses a three-parameter probability distribution function such that

The data that support the findings of this study are available from the corresponding author, Pamela E. Tetford, upon reasonable request.

PET: conceptualization (lead); methodology (lead); investigation (lead); formal analysis (lead); writing – original draft (lead); writing – review and editing (equal). JRD: conceptualization (supporting); methodology (supervision); investigation (supervision); formal analysis (supervision); writing – original draft (supporting); writing – review and editing (equal).

The contact author has declared that neither of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSER-CGS-D graduate scholarship to Pamela E. Tetford), and research funding from the University of Toronto to Joseph R. Desloges. Many thanks to Esther Bushuev who helped compile data of the WSC flow gauges for analysis. Academic advice and insights are greatly appreciated from Marney Isaac, Carl Mitchell, and Michael Widener. We also thank our anonymous peer reviewers for their insightful comments which improved the manuscript.

This research has been supported by the Natural Sciences and Engineering Research Council of Canada (NSERC-CGS-D graduate scholarship to Pamela E. Tetford) and research funding from the University of Toronto to Joseph R. Desloges.

This paper was edited by Efrat Morin and reviewed by two anonymous referees.