Hydrology and Earth System Sciences an Objective Approach for Feature Extraction: Distribution Analysis and Statistical Descriptors for Scale Choice and Channel Network Identification

A statistical approach to LiDAR derived topo-graphic attributes for the automatic extraction of channel network and for the choice of the scale to apply for parameter evaluation is presented in this paper. The basis of this approach is to use distribution analysis and statistical descrip-tors to identify channels where terrain geometry denotes significant convergences. Two case study areas with different morphology and degree of organization are used with their 1 m LiDAR Digital Terrain Models (DTMs). Topographic attribute maps (curvature and openness) for various window sizes are derived from the DTMs in order to detect surface convergences. A statistical analysis on value distributions considering each window size is carried out for the choice of the optimum kernel. We propose a three-step method to extract the network based (a) on the normalization and overlapping of openness and minimum curvature to highlight the more likely surface convergences, (b) a weighting of the up-slope area according to these normalized maps to identify drainage flow paths and flow accumulation consistent with terrain geometry, (c) the standard score normalization of the weighted upslope area and the use of standard score values as non subjective threshold for channel network identification. As a final step for optimal definition and representation of the whole network, a noise-filtering and connection procedure is applied. The advantage of the proposed methodology , and the efficiency and accurate localization of extracted features are demonstrated using LiDAR data of two different areas and comparing both extractions with field surveyed networks.


Introduction
Recent advances in data collection technology, such as airborne and terrestrial laser scanning, enabled rapid, accurate, and effective acquisition of topographic information (Ackermann, 1999;Kraus and Pfeifer, 2001;Briese, 2004;Slatton et al., 2007;Tarolli et al., 2009).A generation of high resolution (≤3 m) Digital Terrain Models (DTMs) is nowadays widely available, offering new opportunities for the scientific community to use detailed representations of surfaces.
Terrain geometry defines flow paths across a watershed, and raster-based DTMs have been widely applied to derive hydrogeomorphic features by using primary topographic attributes such as slope, aspect, and curvature (Florinsky, 1998).The accuracy of feature identification depends on that of the initial dataset, but remains a challenge, due to the multi-scale nature of geo-morphological processes and partly to the absence of objective thresholds for feature classification.
Extracting drainage networks from DTMs is one of the most important digital terrain analyses.Traditionally, extraction methodologies are based on the flow routing model.Various drainage algorithms offer possibilities of computing drainage networks all over the raster surface (O'Callaghan and Mark, 1984;Quinn et al., 1991;Tarboton, 1997;Orlandini et al., 2003).They generally follow the procedure of filling pits, computing flow direction, and computing the contributing area draining to each grid cell (Tarboton, 2003).
The conversion from a drainage flow path to a meaningful network requires a further step.The traditional approach is to use a unique contributing area or slope-area threshold beyond which the hydrographical network is chosen (O'Callaghan and Mark, 1984;Band, 1986;Mark, 1988;Tarboton et al., G. Sofia et al.: Distribution analysis and statistical descriptors for scale choice 1991; Montgomery and Dietrich, 1994;Dietrich et al., 1993;Dalla Fontana and Marchi, 2003).This approach, however, shows variable reliability and sensitivity over different drainage basins and grid cell sizes, with a general tendency to predict more channel heads than can be observed in the field (Orlandini et al., 2011).
These approaches share the idea that flow direction is strictly dependent from the topographic surface.Physical location of channel heads, however, is not related just to topographic slope, but in some cases depends also on several factors as geomorphic processes involved, soil properties, climatic environment, land use, etc. (Montgomery and Dietrich, 1988;Prosser and Abernethy, 1996;Wemple et al., 1996;Beven and Kirby, 1979;McGlynn and McDonnel, 2003).In these contexts, the identification of drainage network according to the area or slope-area thresholds does not necessarily correspond to the actual channel head location (Orlandini et al., 2011) because the use of a unique value for the area or slope-area thresholds is not enough to characterize all channels (Passalacqua et al., 2010b).Alternatively, some authors proposed morphological reasoning to establish this threshold (Rodriguez-Iturbe and Rinaldo, 1997;Heine et al., 2004).
Several studies pointed out that a robust delineation of stream networks should be based on direct detection of morphology, underlining how specific geometric properties of surfaces calculated directly from DTMs can effectively avoid the thresholding issue of classical methods on channel network extraction.The core idea of these approaches is to label convergent cells and connect them on a second step using classical flow routing procedures or cost functions based upon them.A large number of indexes directly derived from LiDAR DTMs exists to describe correctly surfaces geometric properties and to identify terrain convergences (Gallant and Wilson, 2000).Some of them have already been used for network extraction (Tarboton and Ames, 2001;Molloy and Stepinski, 2007;Lashermes et al., 2007;Tarolli and Dalla Fontana, 2009;Thommeret et al., 2010;Pirotti and Tarolli, 2010;Passalacqua et al., 2010a,b).Tarboton and Ames (2001) suggested the use of a proxy of curvature stemming from the Peuker and Douglas (1975) algorithm to account for spatially variable drainage density.Upwards curved grid cells have been used by other authors to derive channel networks from digital elevation data (Band, 1986;Gallant and Wilson, 2000).Tarboton (2003) proposed a procedure in order to provide a weight matrix to apply on drainage area computation.He suggested the use of a statistical threshold based on the constant drop property of channel networks (Broscoe, 1959) in order to choose the most suitable weighted support area threshold to map channels.However, some authors argued that this thresholding procedure is not applicable when the network topology needs to be related to morphology (Thommeret et al., 2010).
Wavelet analysis to locally filter LiDAR elevation data and to detect threshold of topographic curvature and slopedirection change has been used by Lashermes et al. (2007) to define valleys and portions of probable channelized areas within the valley.Curvature maps derived from LiDAR DTMs have been used by Tarolli and Dalla Fontana (2009) and Pirotti and Tarolli (2010) to assess the capability of high resolution topography for the recognition of convergent hollow morphology of channel heads and for channel network extraction respectively.Thommeret et al. (2010) used a datadriven and data-derived threshold based on DTM noise to extract badlands network, identifying convergent areas from a combination of terrain morphology indices and a single flow drainage algorithm.Passalacqua et al. (2010a,b) applied nonlinear diffusion filtering combined with a geomorphicallyinformed geodesic cost function to identify automatically channel initiation points and extract channel paths from Li-DAR data.
The referenced studies dealt with a prior assessment of the input data (Thommeret et al., 2010), calibration of the kernel size by interactively testing its effectiveness related to the investigated features (Pirotti and Tarolli, 2010), fixed arbitrarily chosen scales to evaluate topographic parameters (Tarboton and Ames, 2001;Molloy and Stepinski, 2007;Tarolli and Dalla Fontana, 2009;Passalacqua et al., 2010a,b).Some open questions still remain, as in how to identify thresholds that are not data-driven and how to objectively select scale without calibrating it on results and without considering a previous analysis of the data and of the study area.
For the present work, we proposed a methodology relatively independent of the input dataset or from the size of the analyzed features.The approach is based on normalized topographic attributes, such as openness (Yokoyama et al., 2002;Prima et al., 2006) and minimum curvature (Evans, 1979) as a weight for the upslope area.The identification of the optimum scale to use to evaluate topographic parameters is based upon distribution analysis and statistical thresholds provide the key for the choice of the parameters controlling the extracted network.

Study and test areas
We selected two study areas based on the availability of highresolution LiDAR surveys and detailed independent field based network location datasets.The main study site (Cordon), is a geometrically simple shaped area, defined intuitively in order to include a network and a pour point (maximum flow convergence), without any a priori reference to a single well defined hydrological unit.The test area instead, refers to a more complex structure: a headwater catchment (Miozza).This choice has been done in order to show the accuracy and the objectivity of the proposed methodology for alpine environment with a complex morphology.
The test area refers to the headwater catchment of Miozza basin (panel b, Fig. 1): the area covers 4.4 km 2 .Basin elevation ranges from 834 to 2075 m a.s.l. with an average value of 1530 m a.s.l.Geomorphological settings of the basin are typical of the north eastern alpine region: deep valleys with high value of slope and erosion areas.The basin is quite wild and the only human activity is related to occasional forest management activities.Available data consist of field surveys conducted during the past few years (Tarolli and Tarboton, 2006), including LiDAR survey (data acquired during  (Tarolli and Tarboton, 2006;Tarolli andDalla Fontana, 2008, 2009).
Channel heads were mapped in the field for both areas (Pirotti and Tarolli, 2010;Tarolli and Dalla Fontana, 2009): contributing area at channel head location varies significantly.For the Cordon area, it ranges between approximately 110 m 2 to 13 000 m 2 .Same analyses carried out for the Miozza basin (Tarolli and Dalla Fontana, 2009), show values of contributing area ranging from 128.6 m 2 to 96 680 m 2 .Considering this high variability, area and slope-area threshold procedures using a unique value have been proven to be not reliable for channel network extraction if compared with the real channel network, especially in morphologically complex areas (Passalacqua et al., 2010b;Orlandini et al. 2011).
Figure 3 shows a flow chart of the complete procedure.We propose a three-step method based (a) on the normalization of topographic parameters (openness and minimum curvature) to highlight local morphology, (b) a weighting of the upslope area according to such normalization to identify flow convergences, and (c) the choice of a statistical parameter as an objective threshold for channel network detection.As a final step, according to the presence/absence of noises, an indication is provided on how to perform a filtering approach and then how to connect the network.

Local morphology analysis
The objective of this study is to delineate the network where surface allows flows to converge.This analysis is done through a direct detection of morphology from the digital terrain maps.For the present work, DTMs were derived with two different interpolation procedures: the natural neighbor interpolator (Sibson, 1981) for the Cordon study site and an algorithm with a spline function in the ESRI TOPOGRID tool for the Miozza one (Tarolli and Tarboton, 2006;Tarolli andDalla Fontana, 2008, 2009).Depending on the spatial variation of the accuracy and density of the data, and on the suitability of the interpolation method for a certain relief, DTM quality might vary locally and regionally (Karel et al., 2006).For some parts of the Cordon study area we registered the presence of some small artifacts, probably due to changes of the bare ground point density derived by the filtering procedure (Cavalli and Tarolli, 2011).When dealing with surface derivatives, one of the limits of their use is that artifacts present on the input DTMs, even when controlled and limited by appropriate methods, might amplify when differentiating (Burrough and McDonnell, 1998;Gallant and Wilson, 2000).
On the idea of proposing an unsupervised extraction of network that did not require an assessment of the input data, we decided not to approach to any correction of these artifacts, but to enforce local concavity detection by joining two indexes: a direct surface derivative (Minimum Curvature, Evans, 1979), and an image of shaded relief and slope angle (Openness, Yokoyama et al., 2002;Prima et al., 2006).The choice of curvature has been done considering that numerous works proved its effectiveness for feature extraction (e.g.Molloy and Stepinski, 2007;Lashermes et al., 2007;Tarolli and Dalla Fontana, 2009;Thommeret et al., 2010;Pirotti and Tarolli, 2010;Passalacqua et al., 2010a,b).The choice of openness is based on the fact that its measure of convergences relies on an averaging procedure: openness values are calculated as averaging angles along azimuths (Yokoyama et al., 2002;Prima et al., 2006).We assumed that this averaging procedure might be less affected by artifacts in the input data due to interpolation techniques.As suggested by Yokoyama et al. (2002), values of both positive and negative openness have been compiled.

Minimum curvature
Curvature is defined as the second spatial derivative of the terrain (Evans, 1972).The local surface is approximated by a quadratic function with reference to a coordinate system (x, y, z) and six parameters (a to e) (Evans, 1979;Wood, 1996) (Evans, 1972;Horn, 1981;Zevenbergen and Thorne, 1987;Mitasova and Hofierka, 1993;Shary et al., 2002).Evans' (1979) method, however, is one of the most suitable at least for first-order derivatives (Shary et al., 2002), and it performs well in the presence of elevation errors (Albani et al., 2004;Florinsky, 1998).The two most frequently calculated forms are profile and plan curvature (Gallant and Wilson, 2000).These two measures involve the calculation of the slope vector.Therefore, they remain undefined for quadratic patches with zero gradients (i.e. the planar components d and e are both zero).
In such cases, alternative measures independent of slope need to be substituted.Evans (1979) suggests two measures of minimum and maximum curvature: where a, b and c derive from the quadratic function for surface approximation (Evans, 1979;Wood, 1996).
To perform terrain analysis across a variety of spatial scales different authors (Yokoya and Levine, 1989;Wood, 1996) solved the bi-quadratic equation using a n × n window with a local coordinate system (x, y, z) defined with the origin at the pixel of interest (central pixel).Equations ( 1) and ( 2) can be therefore, modified by generalizing the calculation for different window width (Wood, 1996): (3) where g is the grid resolution of the DTM, and n is the size of the moving window.These two formulae have been widely used in literature for multi-scale terrain analysis (Wilson et al., 2007) and for morphometric feature parameterization (Eshani and Quiel, 2008) since they are directly related to geomorphologic form, where surface concavities and convexities are detected.Maximum curvature has been applied successfully by Tarolli et al. (2010) for the extraction of geomorphic features, such as landslides crowns and debris flow banks.A mean curvature (C mean ) derived from these two formulae has been used by Pirotti and Tarolli (2010) for channel network extraction.Cavalli and Marchi (2008) applied the same generalization procedure to plan curvature formulation, for the characterization of surface morphology.
Channelized landform elements are formed around depressions in curvature and are thus referred to as concave elements.Therefore, we decided to use C min (Eq.4) as optimal for feature recognition (Figs.8c and 9c).A progressively increasing moving window size (from 3 × 3 to 33 × 33 cells) has been considered for the calculation of curvature, in order to reduce the effect of noise and small scale variation in the DTM (Fig. 4a).The choice of the kernel range is based on a computational constraint to set the minimum value and a literature review to set the maximum.The minimum width value (3 cells) relies on the fact that sampling window needs to be centered on the cell of interest, thus the smallest area must comprise this cell and its eight nearest neighbors.Literature review (Pirotti and Tarolli, 2010;Tarolli et al., 2010) demonstrated that considering a window width range of 3-33 cells, quality of extracted features tends to a progressive worsening when windows are greater than ∼25 cells.We decided to apply the same kernel size range (3-33) to account also for a margin of uncertainty considering the different interpolation techniques of the study DTMs and different morphological characterization of the study cases.

Openness
Openness is a morphometric parameter developed by Yokoyama et al. (2002), expressing the degree of dominance or enclosure of a location on an irregular surface.It is an angular measure of the relation between surface relief and horizontal distance (Prima et al., 2006).
Topographic openness is calculated as the average of either zenith (φ) or nadir (ψ) angles along eight azimuths D (0, 45, 90, 135, 180, 225, 270 and 315) within a radial distance L (Yokoyama et al., 2002).Openness always assumes a positive sign, and its values range from 0 to 180 • .The parameter is designated "positive" and "negative" in the same sense as has been used to express terrain-slope curvature (Pike et al., 1988).Positive openness φ L is convex-upward and refers to the calculation with zenith angles; negative openness D ψ L is concave-upward and refers to evaluation with nadir angles (Yokoyama et al., 2002).
Along the Azimuth D, the zenith angle D φ L at a grid within radial distance L is and the nadir angle Positive openness φ L of a location on the surface within a distance L on DTMs is and negative openness ψ L within the L distance is Representation of positive openness is designed to highlight topographic convexities, showing higher openness values for ridges and lower values for concavities.Maps of negative openness emphasize drainage (higher values) at the expense of convex-overall features.(Yokoyama et al., 2002)   To perform terrain analyses maintaining homogeneity with curvature, openness maps have been evaluated considering n × n moving window (Wood, 2009) (Fig. 4b).Instead of a radial distance L, we considered the distance between the centre of the inner pixel and the centre of surrounding ones considered within the window size (n).

Optimum kernel size evaluation
For this work, the channel pattern recognition and classification are based on the assumption that a deviation of values from their normal distribution can delineate a threshold between well organized valley axes and occurrence of localized convergent topography (Lashermes et al., 2007).
The degree of divergence of the distribution from normality is related to the shape of the distribution itself and to its skewness.Values of the height of a smoothed surface tend to have a symmetric distribution in a well selected window with slow-changing terrain (Bartels et al., 2005), but in a complex and hilly region, imbalanced terrain elevation affects the histogram distribution, increasing or decreasing its skewness (Yuan et al., 2008).Values of concavities connected to channels, furthermore, have low frequencies, and are usually located in the tails of the distribution.The shape of the distribution, therefore, is influenced both by the morphology but also by the kernel size used to evaluate the topographic parameters, since the aim of the kernel size approach is to provide a filter to mask noises and bring out the meaningful concavities related to channels.
The shape of a distribution can be measured through skewness, defined as: where µ and σ are respectively the mean and the standard deviation of the distribution and E() represents the expected value of the quantity.
Considering minimum curvature on the Cordon area (Fig. 5a) the increasing of widths of the moving window causes a progressive and slight decreasing of the skewness of the distributions.This is related to the presence of small artifacts in the original DTM that are magnified by the use of a direct surface derivative (minimum curvature).
Analyzing positive and negative openness in both study cases, instead, and observing minimum curvature for the Miozza basin, it is clear that the increasing the window size causes a skewing of the distributions of these parameters up to a certain point: skewness values become higher in the negative or positive domain until a maximum value is reached.After that, the increasing of the window size is related to a decreasing of skewness: the distributions slightly and slowly move toward less skewed shapes (Fig. 5a and b).This behavior is related to the fact that up to a certain scale (the optimal one), the kernel size approach succeeds on enhancing channelized features in spite of noises, but when the kernel size becomes too wide, the resulting maps are too smooth, masking all noises but at the same time, losing important feature details (Pirotti and Tarolli, 2010;Tarolli et al., 2010), therefore the distributions move toward less skewed shapes (skewness slightly decreases in the negative domain).
Based on this consideration, the optimum kernel width is the one identified at the first point where the change in the window size loses its meaningful effect: it refers to the minimum kernel size that determines a break in the feature enhancing and noises decreasing effects.By considering skewness as related to the window size in a function, in a mathematical context, this point can be defined as a "stationary point", a point where a function changes its pattern, stopping increasing or decreasing.On a short note, the term "stationary point" is used here in its mathematical definition, it does not imply a characterization of the value constancy.
To correctly represent the effect of kernel sizes on the skewness, for each of our datasets (minimum curvature, positive and negative openness'), we apply a procedure that can be defined as "polynomial fitting/enforcing" defining skewness as a function of window widths in a least square sense (Skew n ).The identification of stationary points is then mathematical and objective, and it is based on a differentiation approach: the stationary point is the point where the derivative of Skew n is equal to zero.
The description of this approach as polynomial "fitting/enforcing" refers to the fact that the polynomial orders is automatically and iteratively chosen in order to provide a curve having at least a derivative equal to zero at one point n included within the adopted kernel size range (3-33).It is a fitting procedure that is enforced, when it is necessary, to obtain a stationary point within the kernel size range.The polynomial "fitting/enforcing" is totally automatic and does not require the user to observe the data or to assess them: it is a recursive fitting/differentiating procedure automatically executed in a loop, that continues until the condition (stationary point is a real number within the window size range) is verified.The accomplishment of the condition forces an immediate exit of the loop code, providing the equation of the polynomial and the value of the optimum kernel size to apply.
The polynomial "fitting/enforcing" approach has two meanings for future practical application of the method.The first aim is to avoid some evaluation of topographic parameters: it is necessary to compute fewer maps, considering less windows, the only constraint is to have the minimum (n = 3), some average sizes (n = 15; n = 17) and the maximum width (n = 33), to evaluate the Skew n behavior.The second aim refers to the objectivity of the method.In our study cases, while for all parameters in the Miozza basin and for positive and negative openness in the Cordon one, each Skew n has an actual stationary point, the actual values of skewness for minimum curvature in the Cordon basin are represented by an increasing function, with no stationary points (Fig. 5a).In this case, the "fitting/enforcing" approach allows to define the stationary point even in this case, by forcing the representative Skew n .
The optimum kernel size referring to the minimum n value where the derivative of each Skew n is zero, are shown in Fig. 6, panel i and ii.For minimum curvature, this value refers to n = 11.56 for the Cordon area (Fig. 6, panel i) and n = 10.54 for the Miozza one (Fig. 6, panel ii).On the Cordon area, considering positive and negative openness, skewness derivative vanishes at n = 15.54 for both cases (Fig. 6, panel i).On the Miozza basin, positive and negative openness derivatives vanish for n = 15.16 for both cases (Fig. 6, panel ii).
For topographic parameters elaboration, the kernel size has to be an integer odd number.Therefore, stationary point values have been rounded to the closest odd integer.According to the proposed procedure, a kernel of n = 11 has been chosen for minimum curvature on both areas.To give homogeneity to the evaluation of the parameter among zenith and nadir angles, the mean value between the values of positive and negative openness' stationary points coordinates (15.54 and 15.16 for the Cordon and the Miozza site, respectively) have been rounded to n = 15.
One should note that for both the study areas, according to this methodology, the same windows sizes have been chosen, without any subjective decision.These values are statistically derived, but they do not imply a similarity on the areas morphology.The optimum values (15 and 11 cells, corresponding to 15 and 11 m) identified with the proposed procedure coincide with kernel widths that have been proven to be the best for feature extractions in the works by Pirotti and Tarolli (2010) and Tarolli et al. (2010), respectively.

Flow convergence recognition
Flow convergence has been evaluated through a multiple flow direction algorithm (Quinn et al., 1991), slightly modified in order to incorporate local morphological conditions depending on local concavity (evaluated according to the two proposed indexes).

Multiple flow upslope area
Literature reviews suggest to involve suitable algorithms for handling multiple-flow directions to compute properly contributing area when dealing with divergent topography (Tucker et al., 2001).Multiple-flow methods, as the one proposed e.g. by Quinn et al. (1991), Costa Cabral andBurges (1994), Tarboton (1997), appear to produce generally better results, avoiding the concentration of flow in distinct, often artificially straight lines, as in the single-flow direction algorithms (e.g., Erskine et al., 2006).Multiple flow algorithms allow to identify parts of channels likely to be active also under conditions of low or moderate flow, highlighting minor channel features, which are involved in hydrological processes during floods.Furthermore, previous studies demonstrated that multiple-flow algorithms should be preferred for applications of upslope contributing area derived from high-resolution DTMs (Erskine et al., 2006).Considering the available algorithms (Quinn et al., 1991;Costa Cabral and Burges, 1994;Tarboton, 1997) the ones such as digital elevation model networks (DEMON) (Costa-Cabral and Burges, 1994) are too complex and case specific to be implemented for most applications, even if they might have theoretical advantages (Tarboton, 1997).
The choice of the Quinn et al. (1991) multiple-flow algorithm was based on two further considerations: (a) previous studies (Endreny andWood, 2001, 2003) demonstrated that, compared to other flow algorithms, MDF (Quinn et al., 1991) was the least sensitive to terrain uncertainties; (b) the main disadvantage of Quinn's MDF (large degree of dispersion even for a convergent hillslope), was supplied by incorporating a weight depending on local topographic conditions.
By literature review, the Quinn's MDF algorithm is more robust than others to apply on a method that is not constrained by an a priori knowledge of the dataset.We considered this algorithm in order to observe all the possible downslope directions, to have a better highlight of topography, while the high dispersion of the algorithm itself is reduced by incorporating local morphological conditions (see Sect. 3.2.2).

Weighting procedure
For the present work, the Quinn' (1991) flow accumulation algorithm was modified using a weight factor W dependent on local morphology: where A w is the weighted upslope contribution area for a given pixel and r is the pixel location on the DTM.The main difference from a conventional MDF flow accumulation is to provide a map of W , directly related to geomorphologic form, where surface concavities and convexities are detected.Similar procedures can be found in Tarboton (2003) and Liu et al. (2007).
The weighted upslope area is an implicit description of how much water can be accumulated according to the degree of convergence of the surface.Given a defined upslope value, the weighted amount depends on upslope contributing area and local convergence of morphology, represented by the weight matrix W (Eq. 15, Figs.8d and 9d).This weight is identified through normalized values of openness and curvature.If a pixel relies on a convergent morphology, the value of the upslope area will be increased proportionally to its degree of convergence, while if it lies on divergent morphology, its value will be diminished.
For maps normalization, we evaluated for each attribute map a Quantile-Quantile plot (QQ-Plot) (Fig. 7a).This graphical operator compares ordered values of a variable with quantiles of a specific theoretical distribution (here Gaussian).This distribution represents the relative likelihood of the random variable to occur at a given point in the observation space.The divergence from a straight line indicates a deviation of the probability density function from the Gaussian and therefore, a deviation of the values from the overall pattern of points.In the work of Lashermes et al. (2007) and Passalacqua et al. (2010a,b) QQ-Plots of landform curvature were used to define objectively curvature thresholds for network extraction.In this study, we suggested that the deviation from the normal distribution recorded both for openness (ψ L and φ L ), and C min QQ-Plots represents the likely threshold for channel identification.For ψ L , we consider the break on higher values (right tail of distribution), while for φ L , we considered the break for lower values -left tail -(Eqs.11 and 12).For C min , we evaluate the break on the negative side (El.i Fig. 7a) that, following Evans' approach, corresponds to convergent topography (Evans, 1979;Wood, 1996).Resuming these formulations, channels are identified where C min < QQ plot thr (13) where the term "thr" (threshold) represents, for each map, the value corresponding to the break in the QQ-Plot (El.i, Fig. 7a).Maps normalization has been evaluated according to QQ-Plot th using the procedure where N TA stands for the normalized parameter (openness or curvature) considered for a given pixel, and TA (x,y) is the topographic attribute at the pixel of interest (Fig. 7c).
The obtained weight grid W (Figs. 8d and 9d) refers to: where N stands for normalization procedure according to Eq. ( 14).To assign higher values to convergent topography, the positive openness normalized map appears as 1/N φ L .

Network detection
Field surveyed channel heads for the study area show that observed contributing areas vary significantly and this suggests that a constant value for network extraction might not be a good assumption (Passalacqua et al., 2010b).Average contributing area can be used, but the resulting drainage densities are too high (Passalacqua et al., 2010b).Accurate and objective location of channel network remains therefore, a challenge.For the present work, we proposed a sound method to identify these features using an objective threshold dependent on the weighted upslope area distribution.Considering that our datasets derived from different distributions, we standardized both maps in order to allow comparison.To identify the threshold, the weighted upslope area has been normalized according to a standard score approach, indicating how many standard deviations each observation is above or below the mean.The standard score (z i ) is a dimensionless quantity and for the i-th observation of a random variable Xat a point i, and it is given by: where µ and σ are respectively the mean and the standard deviation of the distribution.The z-score for any observation can be interpreted as a measure of the relative location of the observation in a data set.Thus, observations in two different datasets with the same z-score can be said to have the same relative location in terms of being the same number of standard devations from the mean.Values that are larger than the mean have positive Hydrol.Earth Syst.Sci., 15, 1387Sci., 15, -1402Sci., 15, , 2011 www.hydrol-earth-syst-sci.net/15/1387/2011/ standard scores and values that are smaller than the mean have negative scores.When a value equals the mean, it has a standard score of 0. This standardization changes the central location and the average variability of the distribution, but it does not change its shape.We chose to define the threshold for channel head and network identification at z i equal to 0, a position in the exact middle of the distribution.Channel network is identified therefore, by those pixels that satisfy the relation where z Aw is the standard score (Eq.16) of the weighted upslope area at each pixel.

Noise filtering
Direct application of openness and curvature independently produce typically segmentation of the resulting raster, because of the numerous local convergences that exist in real surfaces due to inherent noise.The use of the weighted upslope area, allows a better connection of the network, but noises are still relevant.Filtering the input data regularizing the map before computing attributes refers to a different approach for topographic attribute evaluation for feature extraction (Lashermes et al., 2007;Passalacqua et al., 2010a,b) and whose uncertainties have been already underlined (Passalacqua et al., 2010b).In our study cases, for areas with a low degree of morphological complexity (Miozza), noises can be easily discarded on the produced map through simple filtering based on the majority of contiguous neighboring cells.When the procedure is applied to areas with complex morphology (Cordon, Fig. 2), noise detection becomes more challenging.It is generally difficult to obtain relevant indicators automatically without any interaction by the user.Therefore, we suggest an approach that is not fully automatic, but it can be used to assist on the task of identifying the network and of discarding objectively meaningless extractions.
To discard false positives (noises), it is useful to analyze regions that show high fragmentation.For these areas, it is possible to evaluate disruption magnitude to mark a morphological disorder.We suggest that noises can be related to higher randomness of values of the original elevation data, while concavities related to channels refer to patterns with a better structure.A good representation of the elevation organization can be identified through the analysis of water movement for each cell.Flow is defined by any cell within a neighborhood that has a higher value than the processing cell.The output map that results from the function represents the pattern of the flow into each cell, measured by integer number in a range from 0 to 255.In order to test the degree of organization of this map, we evaluate a statistical measure of randomness, referenced as Entropy (Gonzales et al., 2003), measuring the distribution of values among classes within a neighborhood.We considered 256 classes, so that the pixel values directly correspondent to a bin value on the range 1-255 (Gonzales et al., 2003).To maintain homogeneity with the full procedure, cell neighborhood has been defined according to an average window size (n = 13) between the chosen ones for curvature and openness.
We produce a raster map where each output pixel contains the entropy value of the nearest neighborhood, evaluated as where p i is the proportion of pixels that are assigned to each class.
Minimum entropy occurs when the cell values are all located in the same class, while maximum entropy occurs when each cell value is located in a different class interval.The most homogeneous areas have therefore a low spatial entropy; this is the case for channel patterns.The most irregular regions have a high entropy.
We suggest to observe the entropy degree on areas where extraction produces not clear results.Extractions obtained in areas with values of entropy higher than the average should be discarded (Fig. 10b).Some noises can be withdrawn according to this analysis (El.i, ii in Fig. 10).Element iii in Fig. 10 refers to a channel referenced on maps but actually not active on the area.Therefore, consistently with other works (Passalacqua et al., 2010b), it has not been considered for quality evaluation.
Once we applied the two filtering approaches (majority filter for the Miozza basin and entropy analysis for the Cordon area) the remaining network needs a connection procedure.Considering a pour point (maximum value of flow convergences) we identified the most suitable (shortest) flow path from each channel head to the pour point.This path has been identified as the least accumulative cost distance to the pour point over a cost surface set as the Euclidean distance of each pixel from the correct extraction.A fuller discussion of accumulated cost surface methods, and representational accuracy can be found in Douglas (1994) and Eastman (1989).Other approaches to network connection have been successfully tested in the work of Passalacqua et al. (2010a,b) where channel networks were detected using non linear diffusion and geodesic path.

Results and discussion
The final product is a map representing the channel network.For accuracy assessment, the extracted networks (Figs.11b  and 12b) have been compared with a DGPS surveyed network (Tarolli and Dalla Fontana, 2009;Pirotti and Tarolli, 2010) (Figs.11a and 12a).The overall quality has been evaluated considering Cohen's k index of agreement (Cohen, 1960).
The quality measure used for this accuracy assessment was defined as  where P a is the total agreement probability evaluated according to Eq. ( 20), and P e is the agreement probability due to chance, according to the formulation in Eq. ( 21) (Cohen, 1960).(Landis and Koch, 1977).
To evaluate the index, buffer zones were generated around the extracted network as well as the reference one.The chosen buffer width was set to 5 m according to a previous work carried out on the Cordon area using the same dataset, where analysis of results had been performed using the same quality measure (Pirotti and Tarolli, 2010).To maintain homogeneity, the same buffer width has been considered for both the Cordon and the Miozza case study.
The extraction procedure generates a network characterized by a substantial agreement between extracted features and reference ones for both applications: Cohen's k are 0.78 and 0.63 for the Cordon area and the Miozza basin respectively.

Final remarks
This work analyzed a mathematical and statistical approach to a combination of topographic attributes for an unsupervised channel network identification in a complex mountainous terrain.Our primary focus has been to develop and present a method that could describe accurately the drainage network considering objective thresholds without an a priori knowledge of the study area or of the input dataset.The methodology includes two main aspect: (a) normalization of openness and minimum curvature maps according to their QQ-Plot thr and their combination with upslope area to highlight potential surface convergences, and (b) a thresholding procedure based on standardized values of the weighted upslope area.The methodology has been applied to a rectangular area on a study site and to a fully-organized basin used as a test site.Both extracted features have been compared with the field surveyed networks.
The joined use of curvature and openness allows an independence of the method from artifacts in the input data.The skewness/kernel evaluation through the polynomial "fitting/enforcing" approach allows to select optimum kernel size without assessing the quality of the input maps and it is able to identify kernels that have already been proven as optimal for other works in one of the two study areas.The approach based on distribution analysis also, allows independency from assessing the morphology of the areas: skewness is statistically identified and does not implies morphological similarities of the basins, because it does not consider the spatial location and organization of concavities and convexities, but, more in general, their location among distributions.The use of statistical operators as objective indexes, results on a network correctly delineated and strongly consistent with surface morphology, without assessing the study areas.Automatic detection of network based on thresholding operations was demonstrated as efficient in terms of time consumption and valid to associate shapes and pattern with actual topographic signature of flow processes.Finally, shortest cost path procedure applied to the filtered maps, allows the definition of a meaningfully connected network.
The proposed approach, anyway, present so far one limit.In areas with highly complex morphology, other surface features not related to channel networks are detected.Network extraction carried out using openness and curvature independently shows some flaws detecting localized patterns and misleading noisy cells, which do not actually represent the features.The method, yet, can provide a quantitative and qualitative description of the network and can give an overall information about position and orientation of local convergences.The analysis of surface entropy has been proven in this case to be a useful and objective tool to assist the user on discarding doubtful extractions.The only flaw is showed by the fact that entropy has to be applied interactively to assist on the task of automatic network mapping, but noises can be objectively discarded (Entropy higher than the mean).In area with lower morphological complexity, the procedure is instead fully automatic in all its steps.The results obtained by the method on two areas with different complexity, considering morphology, network shape and also structural shape, shows promising effectiveness for practical applications.

Fig. 1 .
Fig. 1.Maps showing the location of the study area on the Cordon basin (A) and the test area on the Miozza basin (B).Drainage network and surveyed channel heads are shown.

Fig. 2 .
Fig. 2. Example of complex morphology on the upper part of the Cordon basin.The high degree of complexity (A) and the rapid slope change (B) define two of the main issues related to channel network extraction on this area according to topographic parameters and classic thresholding procedure respectively.

Fig. 3 .
Fig. 3. Flow chart of the proposed methodology.Local morphology is enlightened through Topographic Attributes (Minimum Curvature C min , Positive Openness φ L , Negative Openness ψ L ).The choice of the optimum kernel size (n * ) is done through the analysis of the relationship between skewness and kernel width (n).Topographic attributes computed considering the optimum kernel are analyzed through QQ-Plot to identify thresholds to normalize each map.Flow convergence is done through multiple flow upslope area (A MDF ) weighted according to a matrix depending on the normalized topographic attributes.Network is then identified as positive values of the weighted area standard score.
(Figs. 8a  and b and 9a and b).

Fig. 4 .
Fig. 4. Example of kernel size effect on minimum curvature (A) and negative openness (B) for the Cordon area according to an increasing window size (n) of 3, 15 and 33 cells.

Fig. 5 .
Fig. 5. Skewness for each parameters according to window size (n) for the Cordon study site (A) and the Miozza basin (B).

Fig. 6 .
Fig. 6.Derivative computed by the polynomial "fitting/enforcing" approach for curvature and openness evaluation, both for the Cordon study site (A) and the Miozza basin (B).Detailed vision of points where the derivatives equal zero are shown on the right side of the figure, in i for the Cordon evaluation and ii for the Miozza one.

Fig. 7 .
Fig. 7. Cordon study area: topographic attribute map normalization.Example of QQ-Plot (A) for minimum curvature and identification of threshold (i) to apply in order to normalize the map (QQ-Plot thr ).Minimum curvature for n = 11 (B) and derived normalized map (C) are shown.

Fig. 10 .
Fig. 10.Cordon study area: local entropy according to flow convergences (A) and identification of meaningful (blue) and doubtful (red) pixels (B).

Fig. 11 .
Fig. 11.Cordon study area: reference network (A) and network extracted through the proposed methodology (B).
i is the number of class values, P (x .i), P (x i. ) are the columns and rows marginal probabilities, respectively and P (x ii ) are the agreeing extracted values.Perfect agreement results in a Cohen's k value of 1.0, while a value of 0 indicates a level of agreement due to Hydrol.Earth Syst.Sci., 15, 1387Sci., 15,  -1402Sci., 15,  , 2011     www.hydrol-earth-syst-sci.net/15/1387/2011/