The influence of methodological procedures on hydrological classification performance

Hydrological classification has emerged as a suitable procedure to disentangle the inherent hydrological complexity of river networks. This practice has contributed to determining key biophysical relations in fluvial ecosystems and the effects of flow modification. Thus, a plethora of classification approaches, which agreed in general concepts and methods but differed largely in specific procedures, have emerged in the last decades. However, few studies have compared the implication of applying contrasting approaches and specifications over the same hydrological data. In this work, using cluster analysis and modelling approaches, we classify the entire river network covering the northern third of the Iberian Peninsula. Specifically, we developed classifications of increasing level of detail, ranging from 2 to 20 class levels, either based on raw and normalized daily flow series and using two contrasting approaches to determine class membership: classify-then-predict (ClasF) and predict-thenclassify (PredF). Classifications were compared in terms of their statistical strength, the hydrological interpretation, the ability to reduce the bias associated with underrepresented parts of the hydrological space and their spatial correspondnece. The results highlighted that both the data processing and the classification strategy largely influenced the classification outcomes and properties, although differences among procedures were not always statistically significant. The normalization of flow data removed the influence of flow magnitude and generated more complex classifications in which a wider range of hydrologic characteristics were considered. The application of the PredF strategy produced, in most of the cases, classifications with higher discrimination ability and presented greater ability to deal with the presence of distinctive gauges in the data set than using the ClasF strategy.


Introduction
Understanding the natural variability of hydrology at the regional scale has become crucial for river ecology and management for three main reasons: (i) it is a primary factor influencing river geomorphology (Peñas et al., 2012;Richter et al., 1998;Benda et al., 2004), water (Álvarez-Cabria et al., 2010;Chinnayakanahalli et al., 2011) and biological characteristics (Poff and Zimmerman, 2010); (ii) its variability reflects climate (Morán-Tejeda et al., 2011) and catchment attributes (second order driver; Monk et al., 2007); and (iii) freshwater resources are essential to maintain many human activities (Naiman and Dudgeon, 2011).
Much progress has been made over the last 20 years in understanding hydrologic variability and how it promotes self-sustaining ecosystems (Poff et al., 2006;Gurnell et al., 2000).However, the inherent complexity of flow regimes hinders both the quantification of direct responses of hydrology to catchment characteristics, and the identification of key hydrology and ecology relationships.The identification and characterization of relevant ecological aspects of the flow (Poff, 1996), through the definition of hydrological classifications, has emerged as a relevant procedure to structure analyses in hydroecological studies.Specifically, inductive hydrological classification approaches have been used to group river reaches into classes within similar attributes regarding the flow regime (Snelder et al., 2009) and ecological attributes (McManamay et al., 2012).
Many of the existing hydrological classifications following the inductive approach rely on the use of statistical procedures to minimize the redundancy of the hydrological information (Olden and Poff, 2003) and, also, to reduce the intra-group variability and increase the inter-groups variability (Snelder and Booker, 2013).Nevertheless, many specific steps within the classification process may be influenced by a series of subjective decisions depending on the rationale, objectives, and available data.For example, many hydrological classifications are based on normalized flow data (McManamay et al., 2012;Kennard et al., 2010;Reidy Liermann et al., 2012) while others used raw flow series (Zhang et al., 2012;Belmar et al., 2011;Alcázar and Palau, 2010).However, normalization can be viewed as a completely subjective choice that depends on the purpose of the classification (Olden et al., 2012).If the range of flow magnitude varies largely within a region, classification based on the raw flow series would be subjected uniquely to this attribute.In contrast, other flow attributes that present a lower degree of variability and that are not affect by the normalization of the series, would be masked in classifications.The main reason for normalization is to remove the scale dependence of flow magnitude indices to promote the classification of rivers according to a larger set of hydrological attributes.Therefore, the larger the number of hydrological aspects taken into account in the classification the larger its potential uses.For example, the normalization of flow series allows for the segregation of rivers attending the intra-annual variability of flows magnitude, i.e. the shape of the hydrographs.Undoubtedly the shape of the hydrograph influences river reach ecology (Bunn and Arthington, 2002;Richter et al., 1998) and are key elements for understanding the relationship between climatic and streamflow patterns (Gámiz-Fortis et al., 2011).Nonetheless, the size of a river reach and the absolute magnitude of flows also play a key role in ecological processes (Bunn and Arthington, 2002;Vannote et al., 1980) and it is a critical element to manage water resources.
In addition, the scientific and management utility of hydrologic classifications relies on the capacity to extrapolate the class membership to ungauged sites, providing a map of natural flow regimes at the regional scale (Snelder et al., 2009;Reidy Liermann et al., 2012).The classify-thenpredict (ClasF) strategy has been the most common approach to fulfil this objective (e.g.Kennard et al., 2010;Reidy Liermann et al., 2012).ClasF predicts class membership to ungauged sites based on environmental data (climate, topography, geology or land use).However, this method might pose some flaws when predicted onto an entire region, especially if the distribution of gauges is biased, i.e. specific kinds of rivers are under-or overrepresented (Snelder and Booker, 2013).If this is the case, the cluster step would fail in accounting for those hydrological features underrepresented in the data set.This is a critical issue since the low representation in the gauged network does not imply a low representation in the entire river network.The way in which these underrepresented data or distinctive gauges (i.e.those presenting a large hydrologic dissimilarity to the others present in the data set) are classified may lead to the loss of their "rare" hydrologic character when classes are predicted to the whole river network.Due to this reason, some researchers have attempted other approaches such as the predict-thenclassify (PredF) strategy (Ferrier and Guisan, 2006;Snelder and Booker, 2013).Using this approach, hydrological indices obtained from the flow series are predicted onto the entire river network based on climate and catchment characteristics.Then, classification of all river segments is performed as a final stage within the procedure.
The aim of this study was to investigate how the normalization of flow series data previous to the classification procedure and the use of ClasF and PredF influences (i) the classification performance, (ii) the hydrological interpretation of the classifications, (iii) their ability to reduce the bias associated with the underrepresented parts of the hydrological space, and (iv) the degree of spatial correspondence between classifications.To achieve this aim we will develop hydrological classifications of natural conditions over an entire river network in the northern third of the Iberian Peninsula, covering catchments of contrasting climate and spatial configuration.We hypothesized that normalization of river flow data will tend to classify rivers according to their annual regime and not only to the size of the river and also increase the contribution of other hydrological variables not related to flow magnitude.In addition, we hypothesized that the application of the PredF classification procedure will reduce within class heterogeneity.

Study area
The study area comprises the northern third of the Iberian Peninsula (Fig. 1) covering a total area greater than 124 000 km 2 .It represents heterogeneous environmental conditions and can be broadly segregated in three main zones.On the one hand, the area draining into the Cantabric sea encompass several small basins with drainage areas ranging from 30 to 4907 km 2 covering a total area of 22 000 km 2 .Rivers are confined by the Cantabrian Cordillera, which reaches up to 2600 m a.s.l. and runs parallel to the coast.Thus, they are characterized by high slopes and short main stream lengths.This region has a humid oceanic temperate climate (Rivas-Martínez et al., 2004).Precipitation is abundant throughout the year with a mean of 1300 mm year −1 , with maximum rainfall in December (150 mm month −1 ) and minimum in July (50 mm month −1 ).However, the precipitation magnitude and distribution varies significantly according to local topography.Snow precipitation is frequent in winter above 1000 m a.s.l.More than 50 % of the surface is occupied by deciduous forest, scrubs, and grasslands, while 10 % is occupied by agriculture.The population in this area amounts to almost 3 500 000 inhabitants with a population density of 175 inhabitants km −2 although it varies between regions.On the other hand, the Mediterranean area is mainly occupied by the Ebro Basin along with a set of medium size basins in the eastern zone.The Ebro Basin covers a total extension of 85 530 km 2 .It is enclosed by the Cantabrian Mountains and the Pyrenees (3400 m a.s.l.) in the north, by the Catalan Coastal Chain (1712 m a.s.l.) in the east and from the north-west to the south-east by the Iberian massif (2300 m a.s.l.), which creates a dense river network in the catchment boundaries and an extended flat surface in the interior.The Ebro Basin receives both temperate and Mediterranean climate influences.The Pyrenean area (northwest) and the northern part of the Iberian massif present oceanic temperate climates that change gradually to a typical Mediterranean climate in the central Ebro depression (annual precipitation is 656 mm); however, it varies from 300 mm in the centre to 1700 mm in the highest mountains (Bejarano et al., 2010) where snow is also common during the winter months.The precipitation regime in the Mediterranean region has its maxima in autumn and spring and minima in winter and summer.The temperature regime also oscillates through the year with temperatures over 30 • C in summer and below 5 • C during winter.Population density is below 35 inhabitants km −2 which could be considered low; however, more than 40 % of the surface is occupied by agricultural land and, thus, the catchment is subjected to an intensive water resource control by more than 216 large dams and other water engineering systems.The eastern zone of the study area comprises several medium catchments ranging from 72 to 5000 km 2 , occupying a total extension of 16 500 km 2 that drain directly from the Pyrenees or the Catalan costal chain to the sea.This area is dominated by the Mediterranean oceanic climate in the coast and by a temperate climate in the mountains.Precipitation declines from an annual mean of 1200 mm year −1 in the northern river heads to less than 500 mm year −1 in the southern catchments.Coniferous and broadleaf forest, scrubs, and grasslands occupy more than 60 % of the surface in the northern catchments which are progressively replaced by agricultural lands in the south.There are a total of 6 600 000 inhabitants in this area, mostly concentrated in the city of Barcelona and its metropolitan area.Therefore, most of the water resources are allocated to urban and industrial uses.

Hydrologic data
The initial data set consisted in a series of mean daily flow recorded at 428 gauging stations operated by different Spanish water agencies and regional governments.Only gauges unaffected by impoundments (defined as large engineering structures) or large upstream abstractions were selected for analyses.In addition, we selected those gauges with available data for the period 1976-2010 and analysed the quality of the series.First, an analysis of the flow series was carried out to eliminate those years without desirable data quality, which could be due to the presence of (i) periods of consecutive repeated values; (ii) non-natural extreme low flows for short time periods; (iii) periods of zero flow values in non-intermittent rivers; (iv) non-natural flow magnitude rises and falls; or (v) large differences between two periods, probably due to changes of flow recorder method.Years with more than 30 days of missing data were removed from the analysis.In the last step, we discarded the gauges that accounted for less than 8 years.After applying these restrictions, 156 gauges were selected with an average length of 17 years of data (Table 1).
In this study we developed four sorts of classifications (Fig. 2).Two of them were obtained from normalized flow series and the other two from non-normalized (raw) series.Normalization is used to eliminate the influence of flow magnitude (Snelder et al., 2009).Flow series were normalized by dividing all daily flow values by the mean annual flow (Poff et al., 2006).
A set of 103 and 101 hydrologic indices, which represent a wide range of ecologically meaningful aspects of the flow regime (Olden and Poff, 2003), were calculated for the raw and normalized flow series, respectively (Appendix A).These indices characterize the central tendency and dispersion of (i) magnitude of annual and monthly flow conditions, (ii) magnitude of severe-high and low-flow conditions, (iii) timing of flows, (iv) frequency and duration of high-flow pulses and (v) rate of change of flow (Richter et al., 1996;Olden and Poff, 2003).It must be pointed out that among the indices representing flow magnitude, l1 and lcv, were excluded from the set of indices extracted from the normalized flow series.After dividing each daily flow data by the mean annual flow, l1 became equal to 1 in all the gauges.In addition, lcv became equal to lca (as lcv = lca/l1).
Given the strong correlation between several indices, the initial set of indices was reduced to a set of non-correlated synthetic indices using the procedure outlined in Olden and Poff (2003) and followed by many others (Chinnayakanahalli et al., 2011;Zhang et al., 2012;Belmar et al., 2011).According to Olden and Poff (2003), a principal components analysis (PCA) was used to determine the patterns of correlation between the hydrological indices.It allowed one to identify the subsets of synthetic indices, that describe the major sources of variation while minimize redundancy.The broken stick method (Jackson, 1993) was applied to obtain and define the optimal set of PCs to be retained.Each of the selected PC was used as a hydrologic synthetic index in subsequent analysis.Two PCAs were carried out independently, one for the hydrologic indices calculated from the raw flow series and another for the hydrologic indices calculated from the normalized flow series.Each PC was standardized before conducting further analysis to give them equal weights.Snelder and Booker (2013) demonstrated that this additional step increased classification performance.

Environmental data
A synthetic river network (SRN) was delineated using a 25 m digital elevation model (DEM) using the NestStream software (Benda et al., 2007).The SRN comprises 667 406 segments with lengths ranging from 16 to 800 m and was used as a spatial network to integrate the hydrological and environmental information.Climate, topography, land cover and geology are hypothesized to be important discriminators of the hydrologic regime regardless of geographic location.Thus, environmental variables were used to explain the hydrological character of the recorded flow series and predict this character onto the whole river network.Predictor variables describing several environmental attributes including climate, topography, land cover, and geology were extracted from existing databases provided by several national and regional organizations.The variables for each segment represented the mean value of the variables in the upstream catchment.An initial set of 25 environmental variables with potential influence on the hydrological regimes were selected.Pearson's correlation coefficient between each pair of variables was calculated and variables with correlation higher than 0.7 were discarded.A final set of 16 variables were selected (Table 2): i. Climate (n = 3): precipitation, precipitation range, and evapotranspiration were derived from monthly climate series calculated in a 1 km × 1 km grid map.This map was obtained by means of an interpolation procedure based on data recorded in more than 5000 weather stations of the Spanish network.These data were originally developed to be implemented into the Integrated System for Rainfall-Runoff modelling (in Spanish SIMPA model) by the Centre for Hydrographic Studies (CEDEX, Ministry of Public works and Ministry of Agriculture and Environment, Spain).
ii. Topography (n = 5): catchment area, slope, elevation, confluence density, and drainage density were derived from the 25 m DEM.
iii.Land cover (n = 6): the percentage surface occupied by broadleaf forest, coniferous forest, pasture, agricultural land, denuded areas, and urban areas were derived from the Soil Occupancy Information System (in Spanish SIOSE) developed by the National Geographic Institute of the Spanish Government.SIOSE presents a scale of 1 : 25 000 and integrates satellite and aerial images from several sources of information.
iv. Geology (n = 2): the average rock hardness and the terrain permeability were derived from the lithostratigraphic and permeability map at scale 1:200,000 developed by the Spanish Geologic and Miner Institute (in Spanish IGM) of the Spanish Government.The base of the calculation of these variables was the percentage of area occupied by the original classes of rocks included in the data layer.These classes were then reclassified into broader ones and then, we assigned them a numerical value based on geological hardness and soil permeability (see Snelder et al., 2008 for details).

Classification procedures
In this study, we derived classifications with increasing numbers of levels using the synthetic hydrologic indices extracted from the raw or the normalized flow series and using two contrasting strategies (sensu Snelder and Booker, 2013): (i) the classify-then-predict (rawClasF and norClasF) and the (ii) predict-then-classify (rawPredF and norPredF).The prefix raw and nor indicated whether classification was based on the hydrological indices extracted from the raw or normalized flow series, respectively.Given the high number of gauges removed due to the presence of impoundments or abstraction upstream, it is probable that selected gauges do not represent the whole spectrum of natural hydrologic conditions in the study area.In addition, the SRN developed for this study presented many rivers of first and second order which are underrepresented in the gauge database.The prediction of the class membership (ClasF) or the hydrological synthetic indices (PredF) beyond the hydrological space represented in the selected gauges could lead to misleading results.Therefore, the prediction stage of the ClasF and PredF approaches was not based on the whole SRN (667 406 segments) but in a reduced SRN.All the segments of the SRN that presented values of the predictor variables out of the range (maximum/minimum) defined by these predictors in the selected gauges were discarded.The reduced SRN kept 178 297 segments.

Classify-then-predict classification (ClasF)
Partitioning around medoids (PAM; Kauffman and Rousseeuw, 1990) algorithm based on the synthetic indices was used to cluster gauges (Fig. 2).This technique allowed for the specification of the number of clusters.We produced classifications with numbers of classes ranging from 2 to 20.We then used random forest (RF; Breiman, 2001) to develop predictive models that relate class memberships and the environmental variables (Fig. 2).We fitted one specific RF for each classification level (2 to 20 class level) and then, these models were used to establish the most probable class of each segments of the SRN for each classification, i.e. 19 sets of predictions.

Predict-then-classify classification (PredF)
For the PredF strategy, empirical models were first fitted to each of the standardized synthetic indices as a function of environmental variables using RFs (Fig. 2).Then predictions of the synthetic indices are made for each segment of the SRN.Finally, classifications were produced by clustering all the modelled segments using the PAM algorithm varying again between 2 and 20 class levels.
As stated before, ClasF and PredF strategies are based in the use of RF (Breiman, 2001).RF fits many classification and regression trees (CART; Breiman et al., 1984), each of them grown with a randomized subset of sites and predictor variables from the initial data.Each CART is then used to predict the sites initially excluded from the data set, named the out-of-bag (OOB) samples.These predictions are used to calculate the predictive accuracy of the model and the importance of each predictor variable (Snelder et al., 2011).

Comparison of classification performance
The performance of the classifications was measured using the classification strength (CS; Van Sickle, 1997) and ANOVA (analysis of variance).
CS estimate the degree of dissimilarity between gauges explained by the classifications (Snelder and Booker, 2013).This analysis was performed on the hydrological indices with the highest loading on each of the retained PCs.Briefly, CS resulted from the difference between the mean dissimilarity of the gauges in the same class (D within ) and the mean dissimilarity of gauges in the other classes (D between ).Higher values of CS indicated a greater uniformity within classes and greater differences between classes (Van Sickle, 1997).We calculated CS for each classification (rawClasF, rawPredF, norClasF, and norPredF each with 2 to 20 class levels).We applied the restriction that classes comprising a minimum of five gauges to reduce the influence in the analysis of classes represented by a very low number of gauges.
In addition, we performed an ANOVA on all the hydrological indices (103 and 101 for raw and normalized series, respectively) with the class membership as the explanatory variable.ANOVA allowed for the analysis of the potential of classifications to discriminate each of the hydrological indices.The coefficient of determination (r 2 ) was calculated for each level (2 to 20 class level) of the four classifications.The restriction of the five gauges per class was also applied.
Following the procedure outlined in Snelder and Booker (2013) and Snelder et al. (2012), both the CS and ANOVA analysis were performed on gauges not used in the fitted models by means of a fivefold cross-validation procedure (Hastie et al., 2001).This allowed us to focus on the "predictive performance" of the classifications.Each cross-validation procedure was repeated 5 times in order to "smooth out" the variability inherent to each subset.Therefore, results of 25 estimates of predictive CS and r 2 statistics for each hierarchical level of classifications were obtained.Based on the "one standard error rule", two classifications were assumed significantly different if standard errors of the statistics did not intersect.

Hydrological interpretation of classifications
We selected the five hydrological indices included in the initial set (103 and 101 indices for the raw and normalized series, respectively) with the highest values in each retained PCs to interpret the hydrological meaning of the new synthetic indices.The retained PCs accounted for the greatest part of the hydrological variability so, they are the major determinants of the classification patterns.In addition, we used the ANOVA results to interpret each classification by looking at the different coefficients of determination for specific indices.We assumed that the higher the coefficient of determination the higher the importance of that index to discriminate among classes.

Analysis of distinctive gauges
We also analyzed how each classification strategy resolved the problem associated with the presence of distinctive gauges (DGs).DG can be defined as those that showed the most distinctive regimes (i.e.gauges presenting the largest hydrologic dissimilarity relative to the other ones present in the data set).The way the classification procedure deals with the DGs is very important.For instance, DGs can be grouped to other gauges that are completely dissimilar or in very exclusive classes with lower dissimilarity between gauges but a very restricted number.In both cases, the hydrologic character represented by the DGs may be underrepresented when classes are predicted to the whole river network.We quantified how the different strategies deal with the presence of DGs in the data set.
Independent analyses were made for classifications based on raw and normalized flow series.First, we calculated, based on the synthetic indices scores, the dissimilarity between each pair of gauges and then, the corresponding mean dissimilarity for each gauge.This value allowed for the selection of the gauges with the most distinctive hydrological regime, i.e. the DGs.We ordered the gauges from the most to the less dissimilar gauge and analyzed how the dissimilarity values decayed.We selected four DGs for each type of series (raw or normalized), corresponding to the first important inflexion point in the decay trend of the dissimilarity.It is important to stress than dissimilarity values decreased from DG1 to DG4.Finally, we recorded the classes where the DGs belong after classifying the SRN.
For each DG two analyses were performed.First, we calculated the distance between the DG and the medoid of the classes.This value was weighted by the mean distance between the medoid and all the other gauges belonging to the class.This distance indicated how much different the DG is relative to the other gauges included in the class.Second, we analyzed the proportion of the classification domain assigned to the classes where the distinctive gauges were included.Low frequency of a class in the observed space (i.e. in the gauge network) does not implied low frequency in the SRN.Therefore, we expected higher frequencies of the class in the SRN than those observed in the gauge network.Low frequency of these classes indicated the inability of the procedure to predict properly the hydrological characteristics represented by the DGs.

Correspondence between classifications
The spatial agreement between each pair of classifications was evaluated by means of the adjusted Rand index (ARI; Hubert and Arabie, 1985).ARI analyses the relationship of each pair of gauges and how they differ between two cluster solutions.It ranges between 0 (indicating that agreement between two clustering solutions is not better than chance) and 1 (indicating perfect agreement).Given the large number of segments in the SRN, we randomly selected a subset of 1000 segments and computed ARI for all pairs of the four classifications.This process was repeated 10 times to avoid the effect of the variability in the selected data set.
Bespoke functions written in R were used to analyze flow series and calculate hydrological indices (Snelder and Booker 2013).

PCA and Predictive mapping
The broken stick method selected the first five PCs of the PCA performed on the raw series.They explained 91 % of the variance, accounting for the PC1 alone for the 68 % (Table 3).The OOB misclassification rate of the RF models in the rawClassF ranged from 0.13 for the 2 classes level to 0.77 for the 20 classes level (Fig. 3).The most important predictor variables of the RF were catchment area, precipitation, agriculture, pasture, and elevation.For the rawPredF classification, the mean OOB r 2 for the RF models of the five synthetic indices was 0.4, decreasing from 0.65 for PC1 to 0.18 for the PC5.Predictors varied according to the modelled PC, but most of them included topography (catchment area, slope), climate (precipitation) and land cover (agriculture, coniferous and broadleaf forest) variables.
Parallel, the first six PCs of the PCA performed on the normalized flow series were retained.They explained 83.3 % of the variance (Table 3).The OOB misclassification rate of the RF models in the norClasF strategy ranged from 0.22 to 0.66 (Fig. 3).The most important variables differed between classifications comprising different class levels but in general precipitation, elevation, gradient, and broadleaf forest were present in most models.For the norPredF strategy the mean OOB r 2 was 0.31 for the six PCs, decreasing from 0.63 for PC2 to 0.08 for the PC6.The most important variables were not consistent between RF models although precipitation, elevation, pasture, and broadleaf forest were present in most of them.

Comparison of classification performance
CS statistics for the classifications based on the raw flow series (rawClasF and rawPredF) showed similar patterns.CS increased from 2 to 6 class level and in general, the analysis did not reveal significant differences (i.e.overlapped among standard error bars) beyond the 6 class level (Fig. 4a).Raw-PredF showed generally higher CS values than rawClasF, although differences were not always significant.
The discrimination power of classifications for each of hydrological index (ANOVA) got higher with increasing number of classes (Fig. 5 and Table S1 in the Supplement).However, in most cases there were not significant differences between classifications comprising a number of classes ranging from 6 to 20 classes.Moreover, rawPredF outperformed raw-ClasF, especially for those indices representing flow magnitude and duration (Fig. 5 and Table S1 in the Supplement).
NorPredF presented a progressive increment of CS from 2 to 10 class level where it reached the maximum value, suffering then only slight variations (Fig. 4b).NorClasF presented a more unstable CS pattern than norPredF.Except for specific class levels (2 and 4 class levels), norPredF reached higher CS than norClasF.
The discrimination ability of norClasF and norPredF on individual indices showed similar patterns to those found for classifications based on raw series.An increase in r 2 with increasing number of classes and the presence of an inflexion located between 6 and 10 class levels were observed (Fig. 6 and Table S2 in the Supplement).In addition, although nor-PredF performed better than norClasF, differences were not significant in several cases.In general, the classifications based on the raw flow series (rawClasF and rawPredF) provided slightly higher CS (Fig. 4) and r 2 values (Figs. 5 and 6) than those based on normalized series (norClasF and norPredF).

Hydrological Interpretation of classifications
According to the hydrological indices with the highest values on each axis in the PCA performed on the raw flow series, PC1 represented the magnitude of the mean annual and high flows, while PC2 represented the frequency of high-flow events and the magnitude of low flows.PC3 was also related to the frequency of high-flow events while PC4 and PC5 represented the interannual variability of different hydrological characteristics (Table 3).The hydrological interpretation of the PCs became more difficult as explained variance decreased.In addition, ANOVA analysis revealed higher r 2 values of indices related to flow magnitude, durantion and frequency than those representing other aspects of the flow regime (Fig. 5 and Table S1 in the Supplement).S1 in the Supplement).
The PCA performed on the normalized flow series showed that PC1 represented the variability of the annual mean flow and the magnitude and duration of extreme low flows and PC2 represented the variability of the magnitude and duration of high-flow events (Table 3).PC3 to PC6 are mainly related with indices representing flow magnitude in different months.Thus, they represented the shape and variability of the hydrograph across the year.In regard to the ANOVA, the highest r 2 values were obtained for the indices representing mean monthly flows.The maxima reached by the indices representing magnitude and duration of extreme flows was 0.3 (Fig. 6 and Table S2 in the Supplement).In addition, both norClasF and norPredF showed high discrimination ability on indices representing the frequency of high-flow events, despite these indices not identified as important in any PCs.

Analysis of distinctive gauges
Three of the four DGs selected from the raw flow series were situated in the Ebro catchment and one in the Cantabric region.The distance between each DG and its respective class medoid in the rawClasF classifications was lower than the distance in the rawPredF classification more than two thirds of the times.However, the relative differences were generally below 20 % (Table 4).In addition, for the rawClasF it was observed that the proportion of the classification domain assigned to the classes in which the distinctive gauges were included presented very low frequencies.This was especially visible beyond the 6 class level where this proportion was below 1 % for the four distinctive gauges (Fig. 7a).Regarding the rawPredF the proportions of the classes containing the distinctive gauges were higher than those for the rawClasF (Fig. 7b).
The classifications based on the normalized flow series presented two distinctive gauges situated in the Ebro catchment and the other two in two Catalan catchments.NorPredF showed smaller distances between the distinctive gauges and their respective class medoids than norClasF 95 % of the times.In addition, more than half of the times differences were over 40 % (Table 4).The comparison of the frequency of the classes containing the distinctive gauges did  S2 in the Supplement).
not reveal important differences between norClasF and nor-PredF (Fig. 7c and d).

Correspondence between classifications
The ARIs for each pair of classifications were in the range 0.12-0.4for the 6 class level and in the range 0.14-0.34for the 11, 16 class level and the mean of all classification levels (Table 5).The highest ARI was obtained between rawPredF and norPredF (≥ 0.3).Contrary, rawClasF and nor-ClasF showed the lowest correspondence (≤ 0.15).

Discussion
As expected the different data specification and classification procedures analyzed in this study exerted a significant influence in the classifications outcomes.The normalization of flow data generated hydrological classifications that were not completely subjected to the flow magnitude and the size of the river as if data were not normalised.Consequently, classifications based on normalized series were more difficult to interpret and predict.In addition, classifications based on PredF outperformed those obtained with ClasF and presented a greater ability than ClasF to deal with the underrepresented parts of the hydrological space.

Comparison of classification performance
Similar classification performance measured through CS and ANOVA was observed in relation to the results obtained by Snelder and Booker (2013) in New Zealand rivers.The specific classification characteristics depend upon the selected gauged network and the hydrological behaviour of the rivers in the target study zone.However, the similarity of the results with those obtained by Snelder and Booker (2013) highlights the possibility to discern more clearly the benefits and drawbacks of the different classification strategies and data specification.Our analysis demonstrated that the PredF performed better than ClasF and significant differences in the ability to discriminate hydrological characters were found for several class levels.The higher performance of PredF classifications is supported by the conceptual basis of this approach and can be explained by two main reasons: the equalization of data and the loss of information produced in ClasF strategy coupled with the effective data processing of the PredF strategy.First, ClasF imposes sharp barriers to the observed hydrological space, i.e. the gauged network, and not over the whole hydrologic space of the fluvial network.The creation of classes produced a equalization of hydrologic data within classes.Given that the subsequent prediction step enforces congruence of all the river segments of the SRN with those previously created classes, the equalization of data could be linked to a loss of information when classes are predicted.Moreover, the real extent to which such discrete groupings exist is uncertain (Kennard et al., 2010).In contrast, the aim of PredF is to account for the whole hydrological variability in the SRN before conducting the classification.This process generated a more complete distribution of the hydrologic variables which is in accordance with the actual hydrology of the SRN.This avoided the bias associated with gauge location.Moreover, PredF does not assume any interactions between the various dependent variables for each RF, which is true as the PCA created orthogonal and independent variables.
In addition, it must be pointed out that although prediction of classes and synthetic indices is not entirely comparable, results indicated that similar prediction performance can be assumed for both strategies.Hence, the prediction of classes or synthetic indices is not a major determinant in the better results obtained by PredF.
In general, the specification of the initial hydrological data also has significant consequences in the classification performance.Classifications based on raw flow series had higher discrimination ability for individual indices than those based on normalized flow series (Figs. 5 and 6).As discussed below, classifications based on raw series discriminated rivers based almost exclusively on flow magnitude, which greatly depends on river size.In contrast, classifications based on normalized flow series considered a greater range of hydrological aspects.Obviously, the variability of river size shows a clear pattern within river networks and thus, it is a Table 4. Euclidean distance between the distinctive gauges (DG) and the medoid of the classes in which they were included for the 4, 6, 8, 10, 12, 16, and 20 class levels classification.Distances were weighted by the mean difference of all the gauges included in the same class as the DG.Empty cells indicated that the gauge is the unique gauge in the class.Bold letters indicate the procedure that showed the lowest distance.straightforward approach to segregate river reaches.In contrast, the consideration of a higher spectrum of hydrologic aspects hampered the creation of evident classes and thus classifications achieved lower discrimination ability.

Hydrological interpretation of classifications
To our knowledge this is the first study that has compared the consequences of classifying river networks attending to the initial data specification: the use of raw or normalized flow series.The PCA performed on the raw series showed that the first PC explained more than two thirds of the hydrological variability in the study region.This PC was mainly related to the magnitude of mean annual and high flows.Thereby, the magnitude of flow was the major determinant to segregate rivers, as expected.In addition, indices accounting for the frequency of high-flow events were also represented in other PCs and therefore, this flow attribute also showed a relatively important contribution in the classifications (Table 6).Moreover, the ANOVA analysis also showed that all the indices related to flow magnitude, even those not included as the most important ones in any PC presented important differences between classes.This is not surprising given the high correlation between all the flow magnitude indices.However, although these classifications segregated river reaches according to flow magnitude, they were unable to incorporate the severity of droughts, i.e. the magnitude that these episodes represent in relation to the mean flow condition.The pattern of droughts in the study area is an essential element that should be considered in the classifications given the Mediterranean character of the study zone.The fact that the high differences in flow magnitude between large and small rivers  have accounted for the largest percentage of variability, have probably masked the effects of low-flow attributes.Contrary to our results, Belmar et al. (2011) and Chinnayakanahalli et al. (2011) working in areas influenced by the Mediterranean climate found that several hydrologic characteristics related to drought were considered in the synthetic hydrologic indices, even if the series were not normalized by the mean annual flow.We expected that the characteristic intermittency of many Mediterranean streams had been represented in the synthetic indices.However, the lack of this attribute in our classifications may be attributed to the scarcity of gauges situated in intermittent streams.
On the other hand, the interpretation of the classifications based on normalized flow series differed completely to those derived from raw flow series (Table 6).The main differences can be summarized in two essential aspects.First, the proportion of variance explained by the different PCs was more evenly distributed in the normalized than in the raw flow series; therefore, these classifications were not uniquely subjected to just one hydrologic attribute.Second, it was observed that the indices with the highest loading in each PC, and hence their interpretation, varied considerably depending on the data processing and specification (Table 6).Given the higher number of flow attributes influencing the classifications based on normalized flow series, their interpretation was more difficult than those based on raw flow series.In this regard, magnitude and duration of low-flow conditions were represented in PC1.Hence, the Mediterranean character of the rivers was one of the main attributes for classification.In addition, PC3 to PC6 were related to the magnitude of flows in different months and periods through the year; therefore, classification accounted for the shape of the hydrograph as it has been observed in other works (Bejarano et al., 2010;Solans and Poff, 2013;Snelder et al., 2009).Contrary to expectations, other indices not related to flow magnitude, such as the frequency of high-flow events were not included as important indices in any PC.Nonetheless, the ANOVA analysis showed the high ability of classifications based on normalized flow data to discriminate the indices representing frequency (Fig. 6).Therefore, it was assured that such an important hydrological aspect played an important role to define the classification patterns.In addition, classifications and hydrologic attributes based on normalized flow series were also more difficult to predict.For instance, while flow magnitude indices depended almost uniquely on the catchment area and climate patterns, other hydrologic attributes such as duration or frequency of different flow events were related with other environmental variables that were more difficult to characterize.
The interpretation and meaning of classifications are essential to determine their further use.As stated before, classifications based on raw flow series segregated rivers according almost exclusively to flow magnitude.This provides an important loss of hydrologic information which limits its use to water resource and flooding management issues.However, results demonstrated that these classification did not take into account the drought patterns in the study area.Hence, even within the water resources field, these classification would not be effective in dealing with low-flow issues, such as the environmental flows or reliability of water supply in drought situations.In contrast, segregating rivers according to a larger spectrum of hydrological attributes widens its potential applications.For instance, many other hydrological attributes different from magnitude may be potentially altered by human perturbations.Hence, following the principles established in the ELOHA framework (Poff et al., 2010), classifications based on normalized flow series may be more valuable in evaluating the hydrologic alteration caused by human perturbations or the response of freshwater ecosystems to these flow alteration.
Finally, it must be pointed out that any of the classifications, whether they were based on raw or normalized data, failed to represent some other important hydrologic aspects such as timing of extreme flow events and rate of change (Table 6).These attributes presented a modest spatial variability within the study area which ultimately resulted in a small contribution to the hydrologic classifications.

Analysis of distinctive gauges
The analyses demonstrated that the PredF approach presented greater capability than ClasF to deal with the underrepresented parts of the hydrological space in the data set.If data were not normalized, rawClasF approach generated classes that were comprised by the DG plus a very limited number of gauges, in most of the cases less than four.In these cases, the distance between the DG and the medoid of the class was similar to the mean distance calculated for the other gauges included in the class.Therefore, it can be assumed that these classes were relatively homogeneous in regard to its hydrologic characteristics.However, when classes where predicted by the SRN, their frequencies were normally lower than 1 %.This means that the hydrological characteristics accounted for in these classes where almost lost after the prediction step in of the rawClasF.Moreover, their frequencies were probably well below the actual frequencies of those river classes.
On the other hand, the normalization of the flow series smoothed the differences between gauges due to the reduction of the influence of low magnitude, which implied that DGs in the norClasF classifications were not isolated into such exclusive classes as those found in rawClasF.This greatly reduced the problem associated with the low frequency of these classes when they were predicted by the SRN.However, the distance between the DGs and the medoid was normally over two times the mean distance of the other gauges included in the class.This indicated that DGs were grouped to other gauges that are not hydrologically similar.Hence, it is assumed that the hydrologic characteristics accounted by the DG were not represented at all in any of the classes.
By contrast, when the PredF approach was applied, these rare hydrologic characteristics are predicted to a larger number of segments before classifying the SRN.Consequently, the proportion of segments accounting for these rare characteristics increased.In the subsequent step of classification, these segments accounting for the rare hydrological characteristics were grouped in specific classes and hence, the frequencies of these classes were more adjusted to the actual distribution of river types in the study area.

Correspondence between classifications
The ARI analysis showed that the correspondence between rawClasF and rawPredF and between norClasF and norPredF presented a similar pattern.The ARI values in these two cases were around 0.2 which implies important differences in the spatial distribution of classes.This indicated that the strategy used to predict class membership to the SRN (ClasF vs. PredF) is a critical specification in the classification procedure.In contrast to the expected outcome, ARI analyses also showed that classifications obtained through the PredF approach, regardless of the initial data processing (i.e.raw-PredF or norPredF), presented the highest spatial correspondence.This result highlights that the prediction of the hydrological characteristics to the SRN before classifying is probably generating classifications more adjusted to the actual spatial distribution of river types, even if classifications presented different interpretation.

Conclusions
In conclusion, this study showed that the methodological specifications used throughout the classification process greatly influences classification outcomes and performance.Although the comparison between ClasF and PredF did not reveal significant differences for several classification levels, the classifications based on PredF produced, in general, higher classification performance, greater ability to deal with the presence of distinctive gauges in the data set, and a spatial distribution of classes more adjusted to the actual river types.PredF produced classes that presented higher intra-class homogeneity and higher inter-class heterogeneity than ClasF.In general, the segregation of gauges before the prediction step in the ClasF produced a loss of information due to the presence of under-and overrepresented hydrologic characteristics.In contrast, the prediction of the hydrologic characteristics previous to the classification step avoided these bias associated with gauge location.These features are very valuable when applying these classifications with different objectives.For instance, classifications developed through PredF represents the best strategy to further detect not only the hydrological alteration caused by human perturbations but also the ecological impact associated with this alteration.Given all these strengths, we recommend the application of the PredF strategy to develop hydrological classifications at the regional scale.Finally, the specification of flow data influenced the interpretation of the hydrological classes.The normalization of flow data removed the effect of flow magnitude and generated classifications in which a larger spectrum of hydrologic characteristics was considered.This widens the potential range of management and ecological applications of the classification, as classifications would not be subjected to a unique hydrological attribute.In all the cases, the selection of the most suitable number of classes is difficult to accomplished from completely objective criteria, as, many times, classifications with different levels of detail presented similar statistical performances.
The Supplement related to this article is available online at doi:10.5194/hess-18-3393-2014-supplement.

Figure 1 .
Figure 1.Map of unregulated gauges (•; n = 156) in the study area.Black lines divide the Cantabric, the Ebro and the Catalan catchments (CS: Cantabric sea; MS: Mediterranean sea).

Figure 2 .
Figure 2. Schematic diagram summarising the four classifications strategies.

Figure 3 .
Figure 3. Out-of-bag misclassification rate of the random forest models developed for the 2 to 20 class level classifications using ClasF strategy based on the synthetic indices derived from the raw ( ; rawClasF) and the normalized flow series (♦: norClasF).

Figure 5 .
Figure 5. Performance of the classifications derived from the raw flow series based on the ANOVA on individual indices (•: rawPredF; ♦: rawClasF).We selected one index representing each aspect of the natural flow regime to illustrate the results (the values obtained for the 103 indices are included in TableS1in the Supplement).

Figure 6 .
Figure 6.Performance of the classifications derived from the normalized flow series based on the ANOVA analysis on indicidual indices ( norPredF; ♦ norClasF).We selected one index representing each aspect of the natural flow regime to illustrate results (the values obtained for the 101 indices are included in TableS2in the Supplement).

Table 1 .
Number of retained years for flow time series used in the analysis.

Table 2 .
Environmental variables used to predict classes or the synthetic hydrologic indices onto the ungauged segments of the river network (TG: topography; CL: climatic; LC: land cover; GL: geology).

Table 3 .
The 5 hydrologic indices with the highest loadings in each PC and variation explained by the retained PCs using the raw (above) and the normalized flow series (below).A minus sign indicates negative relation with the PC.

Table 6 .
Relative representativeness of each flow regime aspect according to the data processing previous to classification procedure.limited; * * moderate; * * * high. *