Advancing stream classification and hydrologic modeling of ungaged basins for environmental flow management in coastal southern California

. Environmental streamflow management can improve the ecological health of streams by returning modified flows to more natural conditions. The Ecological Limits of Hydrologic Alteration (ELOHA) framework for developing regional 10 environmental flow criteria has been implemented to reverse hydromodification across the heterogenous region of coastal southern California (So. CA) by focusing on two elements of the flow regime: streamflow permanence and flashiness. Within ELOHA, classification groups streams by hydrologic and geomorphic similarity to stratify flow-ecology relationships. Analogous grouping techniques are used by hydrologic modelers to facilitate streamflow prediction in ungaged basins (PUB) through regionalization. Most watersheds, including those needed for stream classification and environmental flow 15 development, are ungaged. Furthermore, So. CA is a highly heterogeneous region spanning a gradient of urbanization, which presents a challenge for regionalizing ungaged basins. In this study, we develop a novel classification technique for PUB modeling that uses an inductive approach to group regional streams by modeled hydrologic similarity followed by deductively determining class membership with hydrologic model errors and watershed metrics. As a new type of classification, this “Hydrologic Model-based Classification” (HMC) prioritizes modeling accuracy, which in turn provides a means to improve 20 model predictions in ungaged basins, while complementing traditional classifications and improving environmental flow management. HMC is developed by calibrating a regional catalog of process-based rainfall-runoff models, quantifying the hydrologic reciprocity of calibrated parameters that would be unknown in ungaged basins, and grouping sites according to hydrologic and physical similarity. HMC was applied to 25 USGS streamflow gages in the south coast region of California and was compared to other hybrid PUB approaches combining inductive and deductive classification. Using an Average 25 Cluster Error metric, results show HMC provided the most hydrologically similar groups according to calibrated parameter reciprocity. Hydrologic Model-based Classification is relatively complex and time-consuming to implement, but it shows potential for advancing ungaged basin management. This study demonstrates the benefits of thorough stream classification using multiple approaches, and suggests that Hydrologic Model-based Classification has advantages for PUB and building the hydrologic foundation for environmental flow management. Initial baseflow 235 discharges were included in the jackknife analysis and are treated as calibrated parameters because they would be unknown in a PUB analysis. For each individual model’s calibrated parameters, jackknife resampling generated 24 time series characterizing streamflow across the region. The accuracy of each simulated hydrograph resulting from jackknifed parameters was assessed by comparing to the 24 observed USGS streamflow gages. The true gage streamflow data do not affect the jackknifing process because they are only used to determine the accuracy of the output flow data resulting from the jackknifed 240 parameters. The accuracy of each jackknifed parameterization was calculated for the entire 25x24 matrix of time series data using the EFCC (Eqn. 4) scaled by minimum and maximum errors, resulting in a normalized 25x24 matrix quantifying the accuracy of each calibrated model when its calibrated parameters were directly input into all other models. Each sites’ original calibration error was added to the matrix such that a normalized 25x25 matrix was produced with very small calibration errors spanning the diagonal. essentially hybridizes “hydrologic regionalization” and “streamflow classification”. Deductive classification produced relatively low uncertainty of model parameters, with all five classes containing ACE values between 0.2 and 0.6 The relatively tight spread coupled with a low overall ACE Fig. implicate deductive classification as a worthy alternative to HMC for regionalization of ungaged basins. These results are consistent with the most common implementation of regionalization wherein models are typically grouped by spatial proximity, physical 445 similarity, or parameter regression (Oudin et al., 2008; Razavi and Coulibaly, 2013; Samuel et al., 2011). This study has shown how a new type of “streamflow regionalization”, akin to Hydrologic Model-based Classification, might edge out traditional “hydrologic regionalization” from deductive classification, at estimating streamflow in ungaged basins. “Hydrologic regionalization” and “streamflow regionalization” both implement watershed characteristics to separate sites for high utility in modeling ungaged basins; however, “streamflow regionalization” improves modeling by directly incorporating a 450 quantifiable measure of ungaged model accuracy. This important addition to “streamflow regionalization” directly captures regional model uncertainty and strengthens the science supporting modeling ungaged basins.


Introduction
The natural variability of streamflow regimes, including flow magnitude, duration, frequency, timing, and rate of change (Poff et al., 1997), is crucial for maintaining the ecological integrity of streams (Bunn and Arthington, 2002).
Maintenance of aquatic and riparian ecosystem functions is a major priority for water managers; however, streamflow regimes have been altered globally as population growth and development lead to urbanization, dams, flow extraction, and other land 35 use changes (Naiman et al., 1995;Richter et al., 1997). Environmental flow criteria frameworks, such as the Ecological Limits of Flow Alteration (ELOHA) (Poff et al., 2010), are methods for protecting the ecological health of streams from hydrologic alteration by reestablishing essential elements of streamflow and sediment regimes. The ELOHA framework is robust because it synthesizes many flow-ecology relationships from a study area to provide a foundation for developing environmental flow recommendations within an entire municipality or management region (Poff et al., 2010). Such a regional approach has been 40 recommended for the widespread implementation of environmental flows because it allows for effective and comprehensive estimation of environmental streamflow regimes at a wide variety of streams in a large and diverse study area (Arthington et al., 2006). The coastal area of southern California (So. CA) is experiencing substantial hydrologic alteration (Hawley and Bledsoe, 2011) and associated ecological decline (Stein et al., 2012), which has prompted application of ELOHA Parker et al., 2019;Pyne et al., 2017;Sengupta et al., 2018;Stein et al., 2017). The region is highly heterogenous, 45 spanning an extensive range of geology, stream types, and land uses, which presents unique challenges for implementing ELOHA.
Stream classification is one of four major steps within the scientific process of ELOHA used to group hydrologically, or otherwise similar, streams (Poff et al., 2010). Its primary role towards developing environmental flows is to stratify flowecology relationships by regional stream type, and to help determine where new bioassessment sites should be placed to 50 strengthen the variety of sites within a region. Olden et al. (2012) outlined two overarching approaches to hydrologic classification-those utilizing inductive reasoning (observed or modeled flows) and those utilizing deductive reasoning (watershed data characterizing flow). While the inductive approach benefits from actual measures of discharge, it is often plagued by insufficient gauging networks (Olden et al., 2012) and uncertainty modeling ungaged basins (Blöschl et al., 2013).
Two mirroring state-wide stream classification studies utilizing both inductive and deductive approaches have recently been 55 performed across California. Pyne et al. (2017) first clustered all stream reaches based on similarity of watershed characteristics, then used hydrologic metrics to determine cluster membership and separate reference reaches. Conversely, Lane et al. (2017) grouped the natural streamflow regime of all reaches before using watershed characteristics to determine flow type. A third state-wide classification study was performed by Lane et al. (2018), which unified the classifications of Pyne et al. (2017) and Lane et al. (2017) by using daily-scale hydrologic baseline archetypes based on dimensionless reference 60 hydrographs. These three stream classification studies focused on characterizing natural flow regimes across California, which is a challenge in the heavily hydrologically modified and heterogeneous Southern Coast hydrologic region of CA (Waananen and Crippen, 1977). Sites from this region did not show strong separate from the rest of CA in previous classifications. While https://doi.org/10.5194/hess-2021-553 Preprint. Discussion started: 10 January 2022 c Author(s) 2022. CC BY 4.0 License.
most South Coast streams were classified as "rain and seasonal groundwater" (Lane et al., 2017) or "rain and seasonal groundwater" and "flashy, ephemeral rain" (Lane et al., 2018), not one of the 91 reference gages used to drive the Lane et al. 65 (2017) classification fell in the South Coast. Furthermore, streams in the Mohave Desert and Central Valley shared the same "rain and seasonal groundwater" classification and South Coast streams (Lane et al., 2017). Central Valley streams remained grouped with South Coast streams in the unified classification (Lane et al., 2018). Finally, none of the seven classes produced by Pyne et al. (2017) were dominated by South Coast streams. The results of these three state-wide classifications indicate developing environmental streamflow criteria for South Coast streams could benefit from a more targeted classification 70 focused on the diverse regional landscape.
Regionalization is a common framework for predicting streamflow in ungaged basins (PUB) that is performed by transferring hydrologic information from gaged systems to ungaged (Blöschl et al., 2013;Razavi and Coulibaly, 2013). While regionalization often employs regression equations to compute singular streamflow metrics, such as peak flow, continuous hydrologic models offer process-based analyses with full hydrograph outputs that can be used to analyze past and future 75 climate, land use, and management scenarios. The application of hydrologic models to these alternative scenarios makes them important for developing the hydrologic foundation within ELOHA (Poff et al., 2010). Additionally, a hydrologic foundation often necessitates modeling of ungaged basins because crucial bioassessment sites used to develop flow-ecology relationships often occur on small streams without available representative streamflow data (Poff and Ward, 1989). Despite the clear importance of PUB to ELOHA and other stream management efforts, no superior method for regionalizing hydrologic models 80 has emerged (Blöschl et al., 2013).
In a typical flow regionalization effort with hydrologic models, a network of models is created and calibrated at gaged sites across a study area. For ungaged sites within the network, model parameters that cannot be calculated directly are estimated and/or transferred from the catalog of calibrated models, typically using a measure of spatial proximity, physical similarity, or parameter regression (Oudin et al., 2008;Razavi and Coulibaly, 2013;Samuel et al., 2011). While spatial 85 proximity is generally the preferred regionalization approach (Razavi and Coulibaly, 2013), it is not always superior and is less applicable in highly heterogeneous regions, such as So. CA, where neighboring watersheds may have substantially different geology, land use, and/or climate. These challenges with applying a traditional regionalization approach in a highly heterogenous region provide opportunities for PUB innovations. Furthermore, the technique of grouping similar streams is shared by ELOHA and PUB, which provides an excellent opportunity to explore new approaches for classifying streams with 90 the intention of modeling ungaged basins while developing environmental flow criteria in a highly heterogeneous region. This study was motivated by a desire to improve the science supporting environmental streamflows in So. CA where flow criteria are under development Parker et al., 2019;Sengupta et al., 2018;Stein et al., 2017). In this study, we develop a new method of stream classification that quantifies hydrologic similarity for regionalizing ungaged basins in a heterogeneous region. We compare this new approach to traditional methods of stream classification using hydrologic and 95 watershed characteristics. Towards this end, this study has three specific objectives:

2)
Develop and implement a new approach for stream classification that prioritizes the accuracy of regional hydrologic models; and 3) Compare the accuracy of traditional classifications versus the new approach for estimating streamflow and 100 flow-ecology relationships in heterogeneous ungaged basins.
We hypothesize that directly incorporating regional model accuracy into a stream classification scheme will provide information complementary to existing deductive and inductive schemes and demonstrate greater ability to accurately model ungaged basins through regionalization, compared to the traditional classifications.

Study Area
This study was focused within the large coastal region of southern California, which is roughly bounded by the transverse mountain ranges to the north, Mexico to the south, the peninsular mountain ranges to the east, and the Pacific Ocean to the west. Study watersheds lie within the coastal regions of San Diego, Riverside, Orange, San Bernardino, Los Angeles, Ventura, and Santa Barbara Counties, and are considered within the "South Coast" hydrologic region of CA according to the 110 U.S. Geological Survey (USGS) (Waananen and Crippen, 1977). The climate is characterized as semi-arid and Mediterranean with hot, dry summers and mild, wet winters. Diverse regional topography, geology, and precipitation patterns allow for the natural existence of many stream types, spanning perennial, intermittent, and ephemeral. Land use varies dramatically across the region ranging from heavily urban and suburban sprawl, to significantly agricultural, to rural coastal and mountainous.
These diverse land uses profoundly influence streamflows, with particular deviation from natural flow regimes occurring due 115 to the urban centers of Los Angeles and San Diego concurrently with the California State Water Project.
As a first step towards developing environmental flow criteria, only USGS stream gage sites were considered with neighboring bioassessment sites from the California Water Boards' Perennial Streams Assessment (PSA) within the Surface Water Ambient Monitoring Program (SWAMP). This provided gaged flow estimates at bioassessment sites. Hydrologic surrogacy between gage and bioassessment sites was assumed by ensuring a difference in watershed area of less than 15% 120 with no intervening dams, diversions, reservoirs, or interbasin transfers. Gages from the region were selected to contain highresolution hourly streamflow data for water years (WY) 2005-2007, which typify relatively wet, average, and dry years consecutively in So. CA (WRCC, 2015). Finally, watersheds of selected gages required sufficient meteorological and landscape data to build minimally calibrated rainfall-runoff models (Sect. 2.3.1). An exhaustive search for suitable streamflow records yielded 25 USGS gage sites for classification ( Fig. 1; Table A1).

Traditional Classification
Three types of traditional classification were used in this study: an inductive approach with gaged flow data, a deductive approach utilizing watershed characteristics, and a combined inductive and deductive approach applying both types 130 of data.

Inductive Approach
Previous research in So. CA has shown streamflow flashiness and drying have important influence on shaping local benthic macroinvertebrate assemblages (Gasith and Resh, 1999;Mazor et al., 2018;Parker et al., 2019). This influence makes https://doi.org/10.5194/hess-2021-553 Preprint. Discussion started: 10 January 2022 c Author(s) 2022. CC BY 4.0 License. them strong metrics for developing flow-ecology relationships to guide environmental flow recommendations. To this end, 135 flashiness and drying have been extensively studied for developing regional flow criteria Parker et al., 2019;Pyne et al., 2017;Sengupta et al., 2018;Stein et al., 2017); however, additional elements of the natural flow regime are also important drivers of ecological health in CA (Yarnell et al., 2020). For this study, Richards-Baker Flashiness Index (RBI) (Baker et al., 2004) and a metric quantifying the frequency of extremely low flows indicative of drying were computed from the 25 hourly time series of discharge. RBI was calculated according to Eqn. 1, wherein Qt is the discharge at time t, Qt+1 is 140 the discharge at time step after t, and T is the final time step. (1) To quantify the frequency of extremely low flows indicative of drying, the fraction of flow record with flow less than 1 cfs was calculated according to Eqn. 2, wherein NQ<1cfs is the number of time steps containing streamflow less than 1 cfs and N is the total number of time steps containing flow data. 145 Although flows less than 1 cfs are recorded by USGS, this threshold was chosen to indicate stream drying given the inherent measurement error associated with stream gage data at extreme low flows. Due to So. CA's heterogeneous landscape, large variations in land use, topography, and precipitation shape flow permanence and flashiness across the region (Table A1).
To better discern the effects of these heterogeneities on streamflow, and to more accurately capture time-sensitive 150 environmental flow metrics on a scale relevant to benthic macroinvertebrates, hourly data were chosen over daily.
Additionally, high resolution hourly data across So. CA provide an opportunity to complement the previous state-wide classifications (Lane et al., 2017;Lane et al., 2018;Pyne et al., 2017), which used daily data, at finer temporal and spatial scales.
Inductive classification was performed to group sites based on similarity of streamflow flashiness (RBI) and 155 permanence (< 1 cfs). To achieve this, a variety of exploratory ordination analyses were conducted to develop an initial understanding of how gages might classify. Weighted classical (metric) multidimensional scaling within the "vegan" package of R (Oksanen et al., 2019) complemented principal component analysis (PCA) and a scree plot from the "stats" package (R Core Team, 2019). Classification was ultimately determined using K-means clustering from the NbClust package in R (Charrad et al., 2014) after assessing the following indices: C-Index, Dunn, McClain, and Silhouette. 160

Deductive Approach
For traditional deductive classification, watershed data describing USGS streamflow gages were retrieved from the USGS's GAGES-II database (Falcone, 2011) and the U.S. Environmental Protection Agency's (EPA) NHDPlusV2 database (McKay et al., 2012). Correlation was performed with the "stats" package in R (R Core Team, 2019) to remove highly https://doi.org/10.5194/hess-2021-553 Preprint. Discussion started: 10 January 2022 c Author(s) 2022. CC BY 4.0 License. correlated watershed metrics. Finally, the same exploratory ordination analyses and clustering process as the inductive 165 approach provided results for traditional deductive classification.

Combined Inductive and Deductive Approaches
Inductive and deductive methods of stream classification were combined in multiple ways. First, a single K-means clustering analysis was performed using the hydrologic metrics (RBI and < 1 cfs) and the best performing watershed variables from the deductive classification. Next, multinomial logistic regression within the "nnet" package of R (Venables and Ripley, 170 2002) was used to determine if flow metrics could predict deductively produced clusters, and likewise used to see if landscape metrics could predict inductively produced clusters. Finally, the USGS has categorized streamflow gages containing minimally disturbed watersheds without significant flow alteration as "reference" within the GAGES-II database (Falcone, 2011).
Multinomial logistic regression with flow and watershed metrics was again used to predict whether a gage was reference or non-reference. 175

Hydrologic Model-based Classification
Hydrologic Model-based Classification (HMC) first requires the accurate creation and calibration of rainfall-runoff models across a region, exactly like regionalization for estimating streamflow in ungaged basins. Parsimonious and minimallycalibrated models are important to HMC so that physical relationships between regional watershed variables and highly uncertain model parameters might be established. Rather than using tradition inductive measures of streamflow to assess 180 hydrologic similarity for classification, HMC quantifies the hydrologic similarity between two sites as the reciprocating model accuracy when calibrated parameters from one model are donated to the other and vice versa. Representing hydrologic similarity with model errors produced by a regional range of parameters is a new idea in regionalization that can be used to quantify and reduce parameter uncertainty. Calibrated parameters inherently have greater uncertainty than directly calculated parameters, and this uncertainty is substantially increased in ungaged basins where calibration cannot occur. HMC uses 185 jackknife resampling of complete calibrated parameter sets for all models across the region to generate a model-error matrix of hydrologic similarity spanning the region. The regional error matrix can be interpreted as quantitatively describing parameter uncertainty for the most uncertain parameters across a region. In HMC, the error matrix is used as an inductive basis of hydrologic similarity and combined with a deductive approach to produce a new combined classification that directly incorporates regionalization and reduces parameter uncertainty in models of ungaged basins. Ultimately, classifying models 190 with reciprocally low errors provides a subset of parameters from a calibrated regional catalog with reduced uncertainty.   (Fry et al., 2011) were verified by USGS StreamStats data (USGS, 2019). Inverse distance was used to weight precipitation gages from each watershed's centroid. Simple canopy (interception and transpiration) and surface (infiltration) parameters 205 were estimated from delineated data. HEC-HMS model parameters associated with the deficit and constant loss element (infiltration) were calculated directly using soil and imperviousness data available from USGS GAGES-II (Falcone, 2011).
Similarly, the time of concentration and Clark unit hydrograph storage coefficient used within the Clark unit hydrograph transform element were calculated directly using the Kirpich method (Kirpich, 1940) and standard approaches utilized by the Arizona Department of Transportation (ADOT, 2014). To produce minimally calibrated models, methods were selected to 210 balance simplicity and parameter parsimony with reliable and process-based hydrology. The Kirpich Method, for example, contains only two parameters, which facilitates straightforward calculations in data-scare areas. It is a long-trusted method for estimating time of concentration (USDA NRCS, 2007) that is highly effective across a wide range of conditions in a similar region (Roussel et al., 2005).
After directly estimating and calculating parameters associated with precipitation losses and hydrograph 215 transformation, only two linear reservoir baseflow parameters were calibrated for the 25 modeled watersheds. Initial flow values were known using streamflow gage data, and a single linear reservoir was used for each of the two groundwater layers.
These two layers were connected in parallel with the both groundwater layers combining to produce a total baseflow (USACE, 2000). As such, only the groundwater storage coefficient for each layer was altered during calibration.
Flashy floods and periods of little precipitation have strongly influenced the evolution of healthy freshwater aquatic 220 ecosystems in So. CA (Gasith and Resh, 1999). In continuing with this study's focus on streamflow flashiness and permanence as ecologically-relevant management metrics, models were calibrated to optimize RBI and < 1 cfs. While the accuracy of a singular measure of overall fit is typically used for hydrologic model calibration (Bardossy, 2007;Beven, 2012), environmental flow studies have shown it is not ideal for modeling ecological flow metrics (Cassin et al., 2005;Murphy et al., 2013;Parker et al., 2019;Vis et al., 2015). As a result, calibration accuracy of flashiness and flow permanence were equally considered and 225 combined into one "Ecologically-Focused Combined Calibration" (EFCC), which has been used to calibrate hydrologic models for ecological applications in So. CA (Parker et al., 2019). EFCC (Eqn. 4) equally weights the percent error (Eqn. 3) of RBI (Eqn. 1) and < 1 cfs (Eqn. 2).

Jackknife Resampling Error Matrix
To compute hydrologic similarity among the regional network of minimally calibrated hydrologic models, storage coefficients and initial discharges of both groundwater layers were donated from one model to all 24 remaining models. This was done for every model in the region in a process known as jackknife resampling (Efron, 1982;Friedl and Stampfer, 2014).
Model parameters directly calculated or estimated from available landscape data were not jackknifed. Initial baseflow 235 discharges were included in the jackknife analysis and are treated as calibrated parameters because they would be unknown in a PUB analysis. For each individual model's calibrated parameters, jackknife resampling generated 24 time series characterizing streamflow across the region. The accuracy of each simulated hydrograph resulting from jackknifed parameters was assessed by comparing to the 24 observed USGS streamflow gages. The true gage streamflow data do not affect the jackknifing process because they are only used to determine the accuracy of the output flow data resulting from the jackknifed 240 parameters. The accuracy of each jackknifed parameterization was calculated for the entire 25x24 matrix of time series data using the EFCC (Eqn. 4) scaled by minimum and maximum errors, resulting in a normalized 25x24 matrix quantifying the accuracy of each calibrated model when its calibrated parameters were directly input into all other models. Each sites' original calibration error was added to the matrix such that a normalized 25x25 matrix was produced with very small calibration errors spanning the diagonal. 245 https://doi.org/10.5194/hess-2021-553 Preprint. Discussion started: 10 January 2022 c Author(s) 2022. CC BY 4.0 License.

Combined Inductive and Deductive Approach
Combining inductive and deductive approaches for Hydrologic Model-based Classification was very similar to the combined approach under traditional classification that implemented multinomial logistic regression. Using the jackknife error matrix of hydrologic similarity, weighted classical (metric) multidimensional scaling, PCA, and a scree plot provided a sense of how sites might cluster. K-means clustering with C-Index, Dunn, McClain, and Silhouette indices was used to split sites 250 into reciprocating low model-error clusters. This inductive approach produced groups of hydrologically similar gages, as measured by a site's ability to accurately model all other sites within its group. A deductive approach was added to HMC by using multinomial logistic regression to determine if watershed variables could predict low-error cluster membership.

Classification Assessment
To better understand the utility of each classification towards estimating flow in ungaged basins, a performance metric 255 dubbed "average cluster error" (ACE) was developed for this study. ACE characterizes the errors produced by donated parameters within a classification method and its classes. Low-error classifications and classes indicate greater certainty in donated calibrated parameters, which inherently contain high uncertainty in models of ungaged basins. Classifications and classes with low ACE values may provide the foundation for accurately modeling ungaged basins with regionalization. ACE was modeled after the cross-validation standard error (CVSE) statistic presented by Wortman (2005) and is displayed in Eqn. 260 5, wherein C is the total number of clusters produced by a specific classification, c represents each cluster, S is the total number of sites within the given cluster, s is each site from the cluster, Normalized Errors is taken directly from the jackknife error matrix, and P is the total number of sites (25 in this study).
The following example helps explain how Eqn. 5 was used: Say a specific classification divided the 25 sites into 5 265 equal groups split chronologically (Sites 1-5, 6-10, 11-15, etc.). Total error for the first group would be computed by summing all within cluster errors (when site 1 parameters were applied to Sites 2, 3, 4, and 5; when site 2 parameters were applied to Sites 1, 3, 4, and 5; etc. for site 3, 4, and 5 parameters). This same process would be repeated for the four remaining groups and summed to produce a final total error. The total error would be divided by 25 sites to yield a single metric quantifying the average model error across all sites, exclusive to a specific classification. Following this procedure, ACE values can also be 270 computed for individual clusters unique to one classification, wherein the number of sites assigned to the specific group of interest would take the place of P (P = 5 when only considering one cluster from the example above), and the ∑ () =1 term would not be used because only one cluster from the classification is considered. Because all sites receiving each model's parameters were treated as ungaged basins during jackknife resampling, the ACE statistic provides insight regarding how well different classifications, or different groups within one classification, might be incorporated into regionalization. 275 https://doi.org/10.5194/hess-2021-553 Preprint. Discussion started: 10 January 2022 c Author(s) 2022. CC BY 4.0 License.

Additionally, the adjusted Rand index (ARI) was computed between each traditional classification technique and
Hydrologic Model-based Classification to compare the similarity of any two unique classification. ARI typically ranges from 0 to 1, wherein a value of 0 indicates no similarities between clusters and a value of 1 represents identical clusters; however, negative values can occur if class similarity is less than what would be expected during random clustering (Hubert and Arabie, 1985). Essentially, ARI values near 0 indicate a classification scheme provides unique groups that do not overlap. Specifically, 280 the "clues" package in R (Chang et al., 2010) was implemented to compute an ARI between all suitable classifications.
Between the two measures for assessing classifications in this study, ARI provides an understanding of each classification's ability separate its data, while ACE reflects the ability of a classification, or cluster within a classification, to estimate streamflow in ungaged basins. ARI is a more general metric for insight into data clustering, while ACE is a specific metric focused on cluster performance in ungaged basins. More generally, ARI quantifies between cluster variability while 285 ACE quantifies within cluster variability.

Inductive Approach
Classification of hourly flashiness and flow permanence metrics in coastal southern CA resulted in three classes (Fig.  290 3). Sites were essentially split according to flow permanence with intermittent streams containing below-average flashiness (Class 1 with 6 sites), perennial streams spanning the full range of flashiness (Class 2 with 10 sites), and ephemeral streams spanning the full range of flashiness (Class 3 with 9 sites). The intermittent class contained the smallest average cluster error with the least within cluster variability (0.2, Fig. 3), indicating calibrated parameters from models of these streams possessed the least uncertainty. Likewise, the perennial class had the least utility towards ungaged basins because it contained the most 295 within cluster variability (ACE = 0.9, Fig. 3). When considering all three clusters produced by traditional inductive classification, the ACE was 0.6 ( Fig. 3).

Deductive Approach
Classification of watershed characteristics yielded five classes with drainage area and soil content, specifically the percentage of Hydrologic Soil Group C (HGC), providing a parsimonious classification (Fig. 4). These two watershed variables were log-transformed within the K-means algorithm to address the right skewed nature of drainage area caused by a few large basins. Sites were primarily divided by drainage area, and secondarily by HGC, to generate classes of small 305 basins with low HGC (Class 3 with 3 sites), small basins with high HGC (Class 5 with 7 sites), medium-sized basins with low HGC (Class 1 with 5 sites), medium-sized basins with high HGC (Class 2 with 7 sites), and large basins with high HGC (Class 4 with 3 sites). The large basin with high HGC class contained the smallest ACE (0.2, Fig. 4), while the medium-sized basin with low HGC provided the largest (0.6, Fig. 4). An ACE of 0.4 was computed after considering all five clusters produced by traditional deductive classification (Fig. 4). 310

Combined Inductive and Deductive Approaches
Neither an expanded cluster analysis nor predicting inductively and deductively produced clusters with the selected 315 watershed characteristics and flow metrics, respectively, improved classification over the individual inductive and deductive approaches. New multinomial regression models were developed to accurately predict traditional inductive clusters with drainage area, % clay soil, minimum elevation, and annual minimum precipitation, and predict gage reference status with drainage area, % silt soil, baseflow index, and relative humidity.

Models
Calibration of the 25 HEC-HMS models at USGS gages was successful. Overall, the flashiness and flow permanence calibration criteria were modeled extremely accurately. Average percent errors of both RBI and < 1 cfs were well under 1%.

Combined Inductive and Deductive Approach
Hydrologic Model-based Classification combined inductive and deductive classification to produce a multinomial 325 logistic regression model (deductive classification) that uses landscape variables to predict membership of five hydrologicallysimilar groups of models (inductive classification) (Fig. 5). The inductive approach used in HMC does not group sites by the similarity of measured or modeled metrics, as is done traditionally, but instead groups sites to maximize model accuracy when calibrated models' parameters are donated to all other sites within a group. Despite this important distinction, streamflow flashiness and permanence were well distributed across the five hydrologic model-based clusters (Fig. 5). A multinomial 330 logistic regression model was able to predict low-error class membership with 4% error (24 sites matched correctly) using drainage area, sandy soil content, mean annual precipitation, and mean annual minimum precipitation. The number of sites was distributed less evenly across classes for Hydrologic Model-based Classification than traditional methods, with the first two clusters containing two sites each, the third cluster containing three, the fourth containing five sites, and the final cluster containing over half the sites with 13. As such, it is no surprise that class five contained the largest within cluster variability 335 (ACE = 0.5, Fig. 5), and is subsequently its worst performing group in ungaged basins. However, no other class within HMC produced an ACE greater than 0.1, which contributes to HMC owning the lowest within cluster variability across all classifications (ACE = 0.3, Fig. 5).
Stream classes produced by HMC include medium-sized basins with flashiness on both the high (Class 1) and low (Class 4) end. Flashy Class 1 streams receive the least precipitation and are located in southern San Diego County. Non-flashy 340 Class 4 streams comprise the two eastern-most sites. Medium-small basins (Class 3) receive relatively little precipitation and are located near the coast, while large-medium basins (Class 5) receive the most precipitation and are spread throughout the study area. The largest basins (Class 2) are slightly flashier and drier than the large-medium basins (Class 5). These Class 2 streams are concentrated in the northern area of the study area.

Adjusted Rand Index (ARI)
The geographical distribution of four unique classifications are displayed in the Appendix (Fig. A1), including traditional inductive (flow metrics), traditional deductive (watershed characteristics), a hybrid inductive/deductive (GAGES-II reference sites), and hydrologic model-based as a hybrid inductive/deductive (model accuracy and watershed 350 characteristics). Results of the ARI analysis show no major similarities and large variability between classifications, with the strongest relationship between GAGES-II reference sites and inductive classification (ARI = 0.12, Table 1). Inductive and Hydrologic Model-based Classifications were most different with an ARI of -0.04 (Table 1).

Discussion
Hydrologic Model-based Classification introduces a new way to think about stream similarity, which can improve the accuracy of hydrologic modeling and environmental flow management in ungaged basins. For hydrologic modeling, HMC can be incorporated into iterative development of a hydrologic foundation and it supplies the foundation for an improved 360 approach to regionalization of ungaged basins. As a management tool, HMC streamlines priority environmental flow metrics in ungaged basins.

Hydrologic Model-based Classification and environmental flow management
Using Hydrologic Model-based Classification to incorporate regionalization for modeling ungaged basins into stream classification provides an opportunity to improve environmental streamflow studies that require ungaged data. ELOHA is an 365 iterative process with significant feedback loops; however, stream classification is recommended to occur second, after developing a hydrologic foundation, and no guidance is provided on how classification might inform the hydrologic foundation or vice versa (Poff et al., 2010). Because the hydrologic foundation generates baseline and current hydrographs at sites with bioassessment data, many of which are ungaged, reciprocally low-error classes produced by HMC could be utilized in a modeling framework to increase the hydrologic foundation's accuracy. Switching the order of the first two steps in ELOHA, 370 and first classifying sites using HMC, could improve streamflow estimation in ungaged basins as a part of the hydrologic foundation. At the very least, developing the hydrologic foundation could be iterative with classification as key characteristics of the sites become better understood, especially if ungaged basins must be modeled.
The primary role of stream classification, as one of the four major steps of ELOHA, is to strengthen and standardize regional flow-ecology relationships so that they may be better implemented for water management (Poff et al., 2010);however, 375 it is the one step of ELOHA some studies have determined unnecessary and bypassed (Kendy et al., 2012). To this point, largescale classifications in the Chesapeake Bay watershed (Buchanan et al., 2011) and Western US, including a separate classification in California, (Hawkins and Vinson, 2000) did not improve benthic macroinvertebrate explanatory power. While this study has demonstrated how the primary application of stream classification is useful in coastal southern California, it has also introduced HMC to extend classification beyond its traditional role to modeling ungaged basins for developing a 380 hydrologic foundation in any region. It is likely that more accurate hydrologic foundation would create more accurate flowecology relationships and stronger environmental flow criteria, and it could also improve the utility of stream classification within ELOHA. This should be evaluated through additional analysis and application.
Modeled streamflow data does not always classify streams the same as gage data for the same sites. Peñas et al. (2016) showed daily and monthly gage data clustered better than monthly modeled data in Spain. Similarly, modeled data provided 385 different classes than gaged data in North Carolina (Eddy et al., 2017). While model accuracy is always a high priority in https://doi.org/10.5194/hess-2021-553 Preprint. Discussion started: 10 January 2022 c Author(s) 2022. CC BY 4.0 License. hydrologic applications, stream classification is very sensitive to this accuracy, which underscores the importance of accurate models within ELOHA. Poor model accuracy not only directly diminishes the utility of flow-ecology relationships, and subsequent environmental flow recommendations, but it can indirectly hamper management efforts by providing inconsistent stream classes. When ungaged basins are considered in ELOHA, model accuracy must be highly prioritized or else lingering 390 and compounding errors might spoil otherwise legitimate efforts.
From an operational perspective, Hydrologic Model-based Classification is more time-consuming than traditional classifications and might become unwieldy when applied across an expansive geographic region with many sites to classify. This is because not only must hydrologic models be created and calibrated for every classified site, but each model must be analyzed with every other models' calibrated parameters to produce the critical jackknife resampling error matrix. If ungaged 395 basins are to be included, however, some extra time spent building models is recouped as they would have been built anyway under traditional classifications. This study has demonstrated that HMC is feasible for 25 sites spanning a fairly large and highly heterogeneous region in the south coast of California. If a significantly larger region or denser network was the focus of this study, HMC would likely provide even more precise classes and accurate streamflow estimates, but with a substantially greater time investment. Realistically, HMC becomes less feasible at a state-wide scale or for a large network (~50 sites). 400 These issues make HMC most effective when used in concert with large-scale classification methods to enhance classification for relatively small-scale environmental flow development, which might range from basin-level to spanning multiple counties, or with expeditious hydrologic models.

Stream classification for regionalizing ungaged basins
Hydrologic Model-based Classification not only provides new information characterizing regional streams 405 complementary to traditional classifications, but it can also be used to accurately model ungaged basins across heterogenous area through regionalization, as evidenced through the average cluster error metric describing within cluster variability. ACE unpacks important information buried inside the jackknife resampling matrix describing how accurately a set of calibrated parameters can be donated from its original model to all other models in the region, as if the other models were ungaged. Error values from the matrix can be assessed for each model in the region or, when performing stream classification, can be 410 aggregated to quantify ACE for every class within a given classification. Further aggregation can provide an overall measure of ungaged modeling accuracy for an entire classification approach to compare to other classification schemes. A comparison of these overall ACE values shows Hydrologic Model-based Classification containing the least within cluster variability, which provides the most certainty regarding parameters in models of ungaged basins (ACE 0.3; Fig. 5). HMC was followed by deductive classification with drainage area and HGC (ACE 0.4; Fig. 4), inductive classification with < 1 cfs and RBI (ACE 415 0.6; Fig. 3), and lastly GAGES-II reference status (ACE 1.4).
By providing a method for reducing parameter uncertainty in models of ungaged basins, HMC has demonstrated utility beyond complementary classification. Modeling ungaged basins is fundamental to ELOHA (Poff et al., 2010) and many other hydrology applications, but different approaches vary significantly, contain uncertainty, and do not perform particularly https://doi.org/10.5194/hess-2021-553 Preprint. Discussion started: 10 January 2022 c Author(s) 2022. CC BY 4.0 License.
well across a geologically and hydroclimatically diverse area (Arsenault et al., 2019;Blöschl et al., 2013). This study provides 420 a foundation for directly incorporating the regional accuracy of a catalog of hydrologic models into a framework for improving ungaged modeling within a heterogeneous region.
. This study has shown flow permanence and flashiness were more consistently modeled in ungaged basins containing intermittent streams than ephemeral or perennial streams. Extreme sensitivity to precipitation explains why ephemeral streams did not produce a low ACE, and, while initially, it may be surprising to see baseflow parameters more accurately interchanged 425 between models of intermittent streams than perennial, the effluent nature of perennial streams, especially in a region as rapidly urbanizing as So. CA, inconsistently augments the natural flow regime (Ponce and Lindquist, 1990), and likely prevented accurate modeling in this study. Similarly, flows were modeled with more certainty at GAGES-II reference sites (ACE 0.4) than non-reference (ACE 1.9), wherein flow alteration restricts the ability to transform precipitation into streamflow. Based on the results of this study, intermittent reference streams are likely most accurately regionalized in the south coast. 430 While no combined classification in coastal southern CA was able to predict class membership of all 25 sites with 100% accuracy, HMC came the closest. This finding underscores the potential for using a measure of model accuracy across a region to define hydrologic similarity within stream classification. Olden et al. (2012) split deductive classification into three sub-approaches: "environmental regionalization" to provide a spatial representation of stream similarity, "hydrologic regionalization" using models to estimate flow in ungaged basins, and "environmental classification" for geographically 435 independent classification; however, only one inductive approach, ideal for geographic independence, is described: "streamflow classification". The new Hydrologic Model-based Classification developed in this study is based on inductive reasoning but is not "streamflow classification". Instead HMC is a type of "streamflow regionalization" wherein each region is a reciprocally low-error class. Instead of defining geographic areas of assumed flow similarity using watershed characteristics, "streamflow regionalization" directly groups sites based on modeled flow similarity. This new approach 440 essentially hybridizes "hydrologic regionalization" and "streamflow classification".
Deductive classification produced relatively low uncertainty of model parameters, with all five classes containing ACE values between 0.2 and 0.6 (Fig. 4). The relatively tight spread coupled with a low overall ACE (0.4; Fig. 4) implicate deductive classification as a worthy alternative to HMC for regionalization of ungaged basins. These results are consistent with the most common implementation of regionalization wherein models are typically grouped by spatial proximity, physical 445 similarity, or parameter regression (Oudin et al., 2008;Razavi and Coulibaly, 2013;Samuel et al., 2011). This study has shown how a new type of "streamflow regionalization", akin to Hydrologic Model-based Classification, might edge out traditional "hydrologic regionalization" from deductive classification, at estimating streamflow in ungaged basins. "Hydrologic regionalization" and "streamflow regionalization" both implement watershed characteristics to separate sites for high utility in modeling ungaged basins; however, "streamflow regionalization" improves modeling by directly incorporating a 450 quantifiable measure of ungaged model accuracy. This important addition to "streamflow regionalization" directly captures regional model uncertainty and strengthens the science supporting modeling ungaged basins.

Stream classification in coastal southern California
As measured by ARI, traditional inductive classification and reference status classification were the two most similar, but still contained high variability (0.12, Table 1). This finding is consistent with how GAGES-II primarily uses flow alteration 455 to classify reference streams (Falcone, 2011), and with how ELOHA recommends classifying by hydrologic similarity to develop flow-ecology relationships (Poff et al., 2010). Furthermore, the reference status classification established a relationship, predominately with drainage area, but also silt content, baseflow index, and relative humidity, which could help water managers identify streams facing potential flow alteration.
The two most different classifications in this study were traditional inductive and hydrologic model-based (ARI -460 0.04, Table 1). Hydrologic Model-based Classification is primarily based on an inductive approach; however, it quantifies hydrologic similarity completely differently than traditional inductive classification. The negative non-random relationship between these classifications is explained as the traditional approach considers gage data similarity and Hydrologic Modelbased considers model data similarity of the same metrics. The differences in these two inductively-based classifications underscore the complexity in modeling streamflow permanence and flashiness in So. CA and suggest great effort must be 465 taken when modeling ungaged basins in the south coast region.
Using ARI, this study has demonstrated how four unique stream classifications can each provide important, complementary information regarding how streams across a region may be grouped for management. While the two inductively-based classifications appear the most useful for separating gaged and ungaged sites, respectively, important relationships and management opportunities can be revealed through a robust regional stream classification using multiple 470 approaches.

Conclusions
Accurately modeling ungaged basins is often necessary for quantification and management of environmental streamflows (Poff et al., 2010), but it is a difficult undertaking with no consensus approach among the hydrology community, especially in heterogenous regions (Arsenault et al., 2019;Blöschl et al., 2013). Furthermore, stream classification is one of 475 the four major steps used to develop environmental flow criteria within ELOHA (Poff et al., 2010), but it is not always used in the framework (Kendy et al., 2012). This study sought to increase the utility of classification within ELOHA while simultaneously strengthening the science supporting modeling and management of ungaged basins in heterogeneous regions.
To this end, Hydrologic Model-based Classification was developed to provide: complementary classification information, improved ungaged model accuracy, and new opportunities for stream management. Iterating between the first two steps of 480 ELOHA (hydrologic foundation and classification) within HMC improves both steps and produces stronger environmental flow criteria.
While this study focused on streamflow permanence and flashiness due to their known ecological importance in the study region (Gasith and Resh, 1999;Mazor et al., 2018;Parker et al., 2019), additional flow metrics corresponding to other https://doi.org/10.5194/hess-2021-553 Preprint. Discussion started: 10 January 2022 c Author(s) 2022. CC BY 4.0 License. element of the flow regime are ecologically-relevant in So. CA (Yarnell et al., 2020) and could be incorporated. To develop a 485 better understanding of HMC in general, it could be extended to new regions and compared to the results of this study. This could produce general relationships between different classifications and provide insight into which classification approach might be most appropriate for specific applications and regions. Likewise, a type of nested classification similarly implemented across many regions would help different stakeholders understand how management actions at multiple geographic scales might affect streams and would foster coordinated management relationships. As HMC is expanded to additional regions, a 490 better understanding of the similarity of within-class management plans will be developed. These findings will be highly dependent on the management metrics and regions, but a general sense for management plan transferability within low-error classes will offer a clearer understanding of how Hydrologic Model-based Classification might assist in ungaged stream management without ever modeling the basin.
For coastal southern California, HMC results from this study should be further developed into a full framework for 495 modeling time-series of discharge in new ungaged basin(s) from the heterogenous region. This would foster a better understanding of the modeling complexities within Hydrologic Model-based Classification, and its associated new regionalization framework, and would provide the basis of a hydrologic foundation prioritizing ungaged basins, which is needed to develop robust regional environmental flow criteria in So. CA.

Acknowledgements 500
We would like to thank our technical advisory and stakeholder workgroups for their continued participation throughout this project. Their input improved the technical quality and management applicability of this study. Support for this project was provided by the California State Water Resources Control Board. The contents of this document do not necessarily reflect the views and policies of the State Water Resources Control Board, nor does mention of trade names or commercial products constitute endorsement or recommendation for use. 505