Incorporating landscape characteristics in a distance metric for interpolating between observations of stream water chemistry

: Spatial patterns of water chemistry along stream networks can be quantified using synoptic or “snapshot” sampling. The basic idea is to sample stream water at many points over a relatively short period of time. Even for intense sampling campaigns, the number of sample points is limited and interpolation methods, like kriging, are commonly used to produce continuous maps of water chemistry based on the point observations from the synoptic sampling. Interpolated concentrations are influenced heavily by how distance between points along the stream network is defined. In this study, we investigate different ways to define distance and test these based on data from a snapshot sampling campaign in a 37-km2 watershed in the Catskill Mountains region (New York State). Three distance definitions (or metrics) were compared: Euclidean or straight-line distance, in-stream distance, and in-stream distance adjusted according characteristics of the local contributing area, i.e., an adjusted in-stream distance. Using the adjusted distance metric resulted in a lower cross-validation error of the interpolated concentrations, i.e., a better agreement of kriging results with measurements, than the other distance definitions. The adjusted distance metric can also be used in an exploratory manner to test which landscape characteristics are most influential for the spatial patterns of stream water chemistry and, thus, to target future investigations to gain process-based understanding of in-stream chemistry dynamics. Abstract. Spatial patterns of water chemistry along stream networks can be quantiﬁed using synoptic or “snapshot” sampling. The basic idea is to sample stream water at many points over a relatively short period of time. Even for intense sampling campaigns, the number of sample points is limited and interpolation methods, like kriging, are commonly used to produce continuous maps of water chemistry based on the point observations from the synoptic sampling. Interpolated concentrations are inﬂuenced heavily by how distance between points along the stream network is deﬁned. In this study, we investigate different ways to deﬁne distance and test these based on data from a snapshot sampling campaign in a 37-km 2 watershed in the Catskill Mountains region (New York State). Three distance deﬁnitions (or metrics) were compared: Euclidean or straight-line distance, in-stream distance, and in-stream distance adjusted according characteristics of the local contributing area, i.e., an adjusted in-stream distance. Using the adjusted distance metric resulted in a lower cross-validation error of the interpolated concentrations, i.e., a better agreement of kriging results with


Introduction
Synoptic or "snapshot" stream water sampling allows for the quantification of baseflow water chemistry (and quality) throughout a catchment (e.g., Grayson et al., 1997;Bernhardt et al., 2003, Wayland et al., 2003. This type of sampling provides information on spatial patterns at the landscape scale that enable insights to biogeochemical behavior throughout a stream network at low flow conditions. The goal of such sampling campaigns is often to infer the relation between landscape characteristics along a stream continuum and stream water quality (Grayson et al., 1997;Salvia et al., 1999, Wayland et al., 2003. This goal is in tune with emergent paradigms in freshwater ecology of river ecosystems as riverscapes which are closely connected with their catchment landscape (Fausch et al., 2002;Tetzlaff et al., 2007). Since baseflow chemical concentrations usually are temporally persistence, the observations made during a synoptic campaign are indicative of the health of an ecosystem.
During a synoptic sampling campaign typically water samples at ∼100 locations along the stream network are taken. This number, however, is still spatially sparse compared to the heterogeneity found in natural stream systems. Sampling significantly more points is usually not possible due to practical constraints of sample collection associated with covering large distances and the cost of analysis associated with processing large numbers of samples. The question is how we can infer spatially continuous information about the stream network from point observations such as those obtained during a synoptic sampling campaign?
One option is statistical modeling in-stream water chemistry/quality from point observations. Foran et al. (2000) and Alexander et al. (2002) provide a good overview of the various modeling techniques available for monitoring and predicting stream water quality. To link in-stream observations with landscape characteristics, models need to include some spatially referenced component. For example, the popular SPARROW model (Smith et al., 1997) uses a hybrid approach combining conventional regression methods with spatial data based on landscape characteristics and stream properties to predict continuous water quality from point observations. It is a statistically calibrated regression model with mechanistic components (e.g., surface-water flow paths, first-order loss functions) that has been applied at large scales for modeling nutrient transport (Smith et al., 1997;Preston and Brakebill, 1999;Alexander et al., 2000Alexander et al., , 2002Alexander et al., , 2004. SPARROW improves on previous regression approaches by including a spatial referencing of watershed attributes, which increases their correlation with water quality measurements (Smith et al., 1997). This and other models, however, require explicit functions comprised of source-specific coefficients that need to be calibrated to describe land-water delivery and in stream delivery to be empirically implemented. This approach implies assumptions on how various sources (point and non-point) in the landscape influence stream water quality. Such assumptions are inherent to the nature of all stream export models with a mechanistic component.
Geostatistics offers an approach to interpolate between point observations using the spatial structure of the sampling campaign that does not traditionally contain a mechanistic component. Semivariogram models and kriging, the core techniques of most geostatistics, are usually based on traditional Euclidean, or straight-line, distance metrics to distribute weights between neighboring observation points when interpolating an unsampled location. This definition of distance is used by most geostatistics packages because they are designed for interpolating surfaces on a continuous, twodimensional plane (Christakos, 2000). Euclidean distance, however, may not be suitable for stream networks because it fails to represent the spatial configuration, connectivity, directionality and relative position of sites in a stream network (Smith et al., 1997;Yuan, 2004;Ganio et al., 2005;Peterson et al., 2007). This has lead to a recent increase in studies using hydrologic or in-stream distance measures to explore spatial patterns in stream networks (e.g., Dent and Grimm, 1999;Gardner et al., 2003;Legleiter et al., 2003;Torgersen et al., 2004;Ganio et al., 2005;Peterson et al., 2006Peterson et al., , 2007Skøien et al., 2006Skøien et al., , 2007. In-stream distance metrics restrict connections from one point to another to pathways within the stream network and can be defined in two variants: symmetrical and asymmetrical. Symmetric in-stream distance is taken as the shortest hydrologic distance between two points when movement is not limited by flow direction while asymmetrical in-stream distance requires that water flow from one location to another for two points in a stream network to be connected (Peterson et al., 2007). In this study we only use symmetric in-stream distances. In addition to restrictions in path, additive mea-sures that represent relative network position based on stream conditions, such as flow volume, stream order, or watershed area, have been used to weight hydrologic distance measures to make them more ecologically representative (Peterson et al., , 2007. For example, Cressie et al. (2006) use classification variables to group similar stream locations based on the idea that "locations that are subject to similar outside influences might be expected to have similar data values". Recent work by Skøien et al. (2006Skøien et al. ( , 2007 provides a method (Top-kriging) which takes both the area and the nested nature of catchments into account to estimate streamflow-related variables in ungauged catchments. This concept focuses on manipulation of the semivariogram estimate and builds upon the early work of Gottschalk (1993aGottschalk ( , 1993b with extension by Sauquet et al. (2000) developing a method for calculating covariance along a river network to interpolate along the network. Directional trees corresponding to drainage network structure (i.e., channel width) have been used to modify the geostatistical framework (Monestiez et al., 2005;Bailly et al., 2006). Chokmani and Ouarda (2004) used a physiographical space-based kriging method incorporating physiographical and meteorological characteristics of stream gauging stations with multivariate analysis techniques to modify in-stream distance. Still, applying geostatistical techniques to stream networks is a relatively new field of research and the limited findings to date do not clearly indicate which distance measure to use .
In this paper, we propose a new distance metric that incorporates information from the surrounding watershed that potentially influences stream water chemistry. This allows for direct coupling between the stream network and the landscape that contains it. The new metric adjusts the in-stream distance between any two points based on the degree of similarity of relevant properties in their up-slope contributing watersheds. For example, two positions in a stream that have contributing areas with very similar characteristics would be considered "virtually" closer together than two positions that have contribution areas with different characteristics. This new metric, named the adjusted distance metric, does not use explicit assumptions on how landscape controls influence water quality and can be used with existing geostatistical methods. In this way both the physical distance between points and the connection of the stream to the surrounding landscape are considered. This provides a way to explore possible first-order controls on stream chemistry by quantifying their relative influence on how we interpolate observations. Such information can then be used to guide future sampling schemes based on initial synoptic campaigns.

Synoptic data
The synoptic data were point measurements of stream water chemistry. A snapshot sampling campaign consisting of 117 manually collected grab samples was conducted over one day during a spring recession flow period (26 April 2001) for the Townbrook Research Watershed in the Catskill region of New York State (Fig. 1). This 37-km 2 watershed in the headwaters of the Cannonsville Reservoir basin in Delaware County ranges in elevation from 493 to 989 m above mean sea level with slopes ranging from 0 to 43 • . The main channel (Townbrook) flows primarily east-west through the southern half of the watershed with an outlet at the farthest west point in the watershed. On the sampling day, the mean stream flow at the outlet was 0.77 m 3 /s corresponding to a specific runoff of 1.8 mm/d. The grab samples were analyzed at the US Department of Agriculture -Agriculture Research Services (USDA-ARS) laboratory at University Park, PA for various nutrient and major cation and anion concentrations ( Table 1). Note that N and P refer to the nitrate-nitrogen and orthophosphate forms of the nutrients, respectively. The analytical procedures used were standard methods for each constituent similar to those outlined by McHale et al. (2004) and Burns et al. (2006). For measurements where constituent concentrations were below detection limits (accounting for less than 13% of all measurements), a value of half the detection limit was used as value for the further analyses.

Landscape characteristic data
The landscape characteristic data were spatial information about the watershed used for defining attributes for the various subwatersheds. Characteristics were selected that are commonly considered as first-order controls on stream water concentrations at the landscape scale ( Table 2). The landscape characteristics used in this study were derived from topographic, landuse, and soil type of spatial information from various published sources. The Soil Survey Geographic (SSURGO) distribution data base (USDA-NRCS, 2000) was used to define soil depth, organic matter content, and porosity for each unique designation unit of the soils map (commonly referred to as the map unit identifier (MUID)). This links the graphic features of the soils map to attribute data defined from soil surveys. Soil depth was defined as the depth from the soil surface to lower boundary (restrictive layer). The organic matter content for each MUID was calculated as the average between the upper and lower organic matter contents reported in the SSURGO data base. Porosity for each MUID was taken directly from the SSURGO database.
The topographic wetness index (ln(a/tanβ) from Beven and Kirkby, 1979), where tanβ is the local slope and a is the upslope area, A, per unit contour length, were computed from a 10×10 m USGS digital elevation model (DEM)  (USGS, 1992) using a multiple flow-direction algorithm for determination of the upslope area (Seibert and McGlynn, 2007). In addition to the topographic wetness index, its components were also considered individually, i.e., the local slope, tanβ, and the logarithm of the upslope area, ln(A), as attributes. Landuse characteristics were based on Thematic Mapper data (NYCDEP, personal correspondence, 1999). The watershed is primarily forested at higher elevations (away from the main stream channel) and used agriculturally (including pasture and cropping) at lower elevations (near the main stream channel). For this study, we considered only these two landcover classes (which in total accounted for more than 96% of the total watershed area).

Defining an adjusted distance metric
Consider two separate points (i and j ) in a stream network. Based on a Euclidean distance metric (Fig. 2a), the points are separated by a distance defined simply by a straight-line path (E ij ) based solely on the coordinates of the points. Using a symmetrical in-stream distance metric (Fig. 2b), these two  points are separated by a distance determined by the path of the stream (d ij ).
Both points i and j also have a local contributing area with certain landscape characteristics. These characteristics can be used to define attributes for each point (a i and a j , respectively) as the area-weighted average of any quantifiable landscape characteristic in the contributing area (e.g., amount of forest, soil porosity, number of septic systems, or land surface slope). How similar or different two positions in the stream are with respect to the composition of their contributing areas can be determined by the absolute difference in attribute (a ij ): For example, consider a map of forested versus non-forested landuse for a catchment. In Fig. 2c, this is represented as crosshatched or non-crosshatched areas for forested or non-forested landuse, respectively. In this simple case where the landscape characteristic has a binary spatial distribution, the attribute at any point in the stream would represent the percentage coverage of characteristic over the local contributing area (approximately 30% and 90% forested landuse for points a i and a j , respectively, in Fig. 2c). When the landscape characteristic is defined over a continuous range of values (e.g., soil depth), the attribute at any point in the stream would represent the average value of the characteristic over the domain of the contributing area (e.g., average soil depth). For a stream distance metric to incorporate information about both topology of the stream network and composition of the contributing areas, it would need to use some combination of d ij and a ij. The measures d ij and a ij have different units and must be scaled for direct comparison. This can be accomplished by dividing through by medians (d median and a median , respectively) of all pairs of sample points in the stream network. The median may give a better indication of central tendency than mean and is typically thought of as giving a measure that is more robust in the presence of outlier values than the mean. These scaled values can then be combined into the adjusted distance metric, h ij , which combines physical distance and contributing area similarity. We propose a simple linear weighing of d ij and a ij : where ω is a weighting factor varying from 0 to 1. The weighting factor allows us to adjust the relative importance of the physical distance between points and the similarity/dissimilarity of their local contributing area. For ω equal 0, the adjusted distance equals the in-stream distance between two points scaled by the median of all in-stream distances. With a small value for ω, the physical in-stream distance between two points dominates whereas with higher values the adjusted distance becomes more dominated by the differences of the characteristic of the local contributing areas. Of course, other formulations are possible for combining d ij and a ij and Eq. 2 can easily be generalized to consider more than one landscape characteristic. However, these variations are beyond the scope of this proof-of-concept study. The goal of this study is to investigate the merits of including the characteristics of the contributing area in defining distance metrics. We accomplished this by interpolating stream water chemistry along a stream network based on point observations using three different distance metrics: Euclidean, in-stream, and the above proposed adjusted (Eq. 2) metrics. Data from a synoptic sampling campaign in the Catskill Mountains, NY, as described above, were used as test case. We evaluated the different distance measures by computing the cross-validation error associated with each interpolation.

Calculating distance metrics
The distance between any two points in the stream was computed for three stream distance metrics (i.e., the Euclidean, in-stream, and adjusted in-stream distance metrics). The stream network was defined by thresholding the upslope area map at a value of 5 ha which gives a rasterization of the stream network in the same 10×10 m grid as the DEM. Euclidean distance between two points was defined as a straight line between points based on the coordinates of each point. Using network-modeling techniques, an Ar-cView (ESRI, Inc., 2006) script was written similar to that used by Gardner et al. (2003) to calculate the distance between points for the symmetric in-stream distance metric using a path restricted to the stream.
The specific values of the adjusted distance metric depend on the landscape characteristic selected to define attribute values. As a first step the landscape characteristics considered in this study where computed for each grid cell along the rasterized stream network. This was done by first delineating the local contributing area for each stream cell and then determining the average of each landscape characteristic listed in Table 2 within this contributing area. The contributing areas were computed based on the DEM. A multiple-flow-direction algorithm (Seibert and McGlynn, 2007) was used to compute the downslope accumulation of catchment area and the local input of area entering the stream network at a certain stream cell. Along the stream network all area was routed towards the direction of the steepest gradient. Once the contributing area for each point in the stream network was determined, the average of each landscape characteristic over that area was calculated. This defines several attribute values (one for each landscape characteristic) at each point along the stream network. Based on these values, the attribute differences a ij the between any two points i and j in the stream network could be calculated using Eq. 1 for each attribute. The distances d ij were computed using the symmetric in-stream distance metric. The d ij reflects the topology of the stream network and does not depend on the selected attribute. The adjusted distance metric between two points in the stream was defined from Eq. 2 for each attribute with ω allowed to vary from 0.1 to 1 using intervals of 0.1 to facilitate computations (for ω=0 the adjusted distance equals the symmetric in-stream distance scaled by the median and is redundant). This resulted in a different adjusted distance metric for each landscape characteristics listed in Table 2 at each increment of ω (i.e., 8 characteristics times 10 increments of ω or 80 possible adjusted distance metrics). Instead of a priori choosing which landscape characteristic and ω combination to use for a certain constituent, (Table 1) we considered all possible combinations and selected for each constituent the best performing based on error analysis associated with the resultant interpolation (see Sect. 2.5 below).

Geostatistcal analysis
Ordinary kriging was used to interpolate between the synoptic sampling points for each constituent based on exponential models fit to calculated semivariograms (Cressie, 1985). To calculate semivariograms, the distances between all sampling locations, x ij defined using either Euclidean (E ij ), instream (d ij ), or adjusted in-stream (h ij ) distance metrics from above, were divided into lag bins of a given distance,  x, defining the semivariance for each lag bin, γ s (x), as where, N (x) is the number of pairs, Y i and Y j are the constituent of interest at sampling point i and j , respectively, with summation over pairs (i, j ) for the lag bin. The average bin semivariance was plotted against the average bin distance to create the sample semivariogram. This describes variance between two sampling locations in space as a function of distance and is fitted by a function (also called a model) to create the semivariogram. The main parameters of the fitted semivariogram model are the nugget, the sill, and the range. The sample semivariograms were fitted with an exponential semivariogram model of the form where γ e (x) is the fitted semivariogram model, σ 2 0 is the nugget, σ 2 ∞ is the sill, and λ is the correlation length. The models were fit using an automated fitting procedure (Cressie, 1985(Cressie, , 1991. The fitted semivariogram model provides a manner to interpolate the constituent of interest between sampling locations using kriging to generate predictions at unobserved locations by weighting the influence of neighboring sampled locations based on their distance and configuration and, in the case of the adjusted distance metric, landscape characteristics.

Selecting the best performing adjusted distance metric
All combinations of landscape characteristics (Table 2) and ω were considered for defining distance for interpolating each constituent (Table 1). The best performing adjusted distance metric was then selected using the cross-validation of the kriging interpolation based on each combination of landscape characteristic and ω. Cross-validation, which describes how well a kriging interpolation fits observed data, was performed with a "leave-one-out" methodology. This methodology omits a sampling location from the analysis and then estimates its value using the remaining sampling locations. After repeating for all sampling locations, a cross-validation error (K RMSE ) was then calculated as the root mean squared error from the differences between estimates and actual observations of the constituent concentrations as where Y i is the observed concentration of the constituent at point i in the stream network, E i is the kriging estimated concentration of the constituent at point i in the stream network, and n is the number of points considered or the number of samples. We computed the leave-one-out cross validation error (K RMSE ) for every possible combination of landscape characteristic and ω and then identified the combination that minimized K RMSE . The combination of landscape characteristic and ω that resulted in the lowest leave-oneout cross-validation K RMSE for each constituent was selected as the best performing adjusted distance metric for interpolating that particular constituent. For comparison, ordinary kriging interpolations and cross-validation were performed using the Euclidean and in-stream distance metrics. For additional comparison, cumulative error distribution functions were created for each constituent interpolated using each distance metric. Here, error is taken as the difference between the observed constituent concentration and the predicted (interpolated) constituent concentration at each sampling location. Such curves reflect the effects of observed extreme values or bias for each interpolation.

Visual comparison
Due to the large scale of the synoptic campaign and the number of constituents considered in this study, we present visual comparison results only for the nutrient concentrations of N, K and P. Semivariograms based on Euclidean, in-stream, and adjusted distance metrics for the observed concentrations of N, K and P show the relationship between variations among observations and the distance separating measurements (Fig. 3). The points (i.e., the sample semivariogram) represent the average semivariance of observations binned according to observation separation distance. The curves are the fitted exponential models describing variance between observations as a function of separation distance. Different landscape characteristics and values of ω provided the best performing adjusted distance metric used to generate the semivariograms (Table 3). In addition, the exponential semivariogram models fitted to the sample semivariograms developed from observed data using Eq. 3 had different nugget, sill, and range parameters ( Table 4). Note that in order to allow comparison between the three new metrics, we scaled all distances by dividing by the maximum distance for each metric, and, thus, range values have no units. Examples of N, K and P interpolations made with ordinary kriging for the three distance metrics are shown with two first-order tributaries and their downstream confluence (Fig. 4). The observed values and sampling locations are highlighted in the first column of Fig. 4 for N, K and P, respectively. The southern end of this tributary, which flows north-south before it flows into the main stream channel of the watershed, have higher observed values for each of the three nutrients when compared to the northern ends. It should be noted that there is more agricultural land draining through the southern end of the tributaries than the northern end. The northern end of this region is more upland in position and primarily covered with forest. For each point in the stream that is not directly sampled, an interpolated value is estimated using ordinary kriging based on the semivariogram models from Fig. 3. These interpolations allow the visual comparison of how each distance metric represents small-scale variations in nutrient concentrations along the stream.

Quantitative evaluation
The kriging interpolations for all the constituents using the three different distance metrics were evaluated by computing K RMSE from cross-validation (Table 3). Cross-validation gives a quantification of how well the interpolation "predicts" locations where concentrations are known. There is a reduction in cross-validation error for almost all constituents (except N and P where there is a slight increase) when a symmetric in-stream distance metrics was used compared to a Euclidean distance metric (Table 3). The change in K RMSE found using the in-stream distance versus Euclidean distance ranged from a slight increase of 4.2% for P concentrations to a reduction of 37.9% for Mg concentrations with and average reduction of 21.2% for all constituents. Using the adjusted distance metric resulted in an even larger reduction in crossvalidation error for all constituents (Table 3). The change in K RMSE found using the adjusted distance versus Euclidean distance ranged from a reduction of 10.8% for P concentrations to a reduction of 43.1% for Mg concentrations with an average of 30.1% for all constituents. When comparing the adjusted in-stream distance metric to the in-stream distance metric, there was a reduction in cross-validation errors for all constituents. The values of K RMSE using the adjusted instream distance were on average 11.0% lower than when the in-stream distance was used directly; for the individual constituents this reduction of K RMSE ranged from 6.2% for Cl to 16.0% for N.
For each of the nine constituents considered in this study, a cumulative error distribution was created using each distance metric (Fig. 5). These cumulative error distributions sum the ranked at-site error (observed concentration minus predicted concentration) for every sampling location in the river network for each constituent and each distance metric. From this analysis, there is slight shift in most cumulative error curves left of the vertical zero line indicating slight over prediction by all distance metrics. Also, there tends to be more spread in general when interpolation is made using the Euclidean distance compared to interpolations based on both the symmetric in-stream and adjusted distance metrics. Overall, though, these cumulative error distributions indicate that there are no clear effects of extreme values or strong bias in the interpolation.

Discussion
Different landscape characteristics and weighting factors were found to give the best performing adjusted distance metrics for different constituents. For example, an adjusted distance metric using the topographic wetness index and ω=0.2 provided best results for interpolating Na concentrations while using average porosity and ω=0.5 gave best results for interpolating Ca concentrations in this stream network. This variation is expected since there are different processes controlling in-stream concentrations for different constituents. The outlined methodology had no a priori assumptions on primary mechanisms and on how or to what extent the different landscape characteristics influence instream concentrations. Such restrictions based on assumptions of primary mechanisms are inherent to existing stream export models, such as SPARROW (Smith et al., 1997), and require calibration based on constituent and location to give continuous representations of stream water chemistry. The opportunity to be used as an explanatory analysis tool is one advantage of using the proposed geostatistical technique. Representations of stream water chemistry made using geostatistical techniques are drawn directly from observations and, thus, reflect the tight coupling inherent between stream water chemistry and landscape characteristics. This is especially true during low-flow conditions (which are present during most synoptic campaigns) as the mean transit time of water in and the contact time of water with the landscape increases.
Of the three metrics considered in this study, Euclidean distance performed the worst based on K RMSE for most constituents (with N and P being exceptions). This result (with respect to the symmetric in-stream distance metric) is expected and is similar to the results seen by Little et al. (1997) and Gardner et al. (2003). The better performance of both the symmetric in-stream distance metric and the adjusted instream distance metric is attributed to a more appropriate representation of distance when the travel path between two locations is restricted to the stream. Looking at the interpolation K concentration (Fig. 4) for example, there is a heavy influence of the low concentration observed at the northern end of the smaller tributary on the middle section of the longer stream when using the Euclidean distance metric. This is exhibited as a light colored region in the interpolation for K concentrations the longer stream (Fig. 4). The influence of this low-concentration sample is lower using the symmetric in-stream distance metric because the sample is farther away from the middle section of the longer stream. Using the adjusted distance metric, the kriging interpolation of K concentration becomes, in effect, smoothed out and more closely resembles the change in landuse composition moving downstream. The contributing area draining into the northern end of the longer stream reach is covered by 35% forested land. This composition decreases to only 15% forested landuse at the southern end of the reach reflecting the incorpo-ration of more agricultural in the lowland area. A similar control of landuse on K concentration has been seen in other watershed studies (e.g., Williams et al., 2005;Tripler et al., 2006) and illustrates an advantage for interpolation methods using the adjusted distance metric in developing hypothesis about the mechanisms controlling landscape-stream connections for different constituents (which is the goal of many synoptic campaigns). As another example, percentage agricultural landuse was the most suitable landscape characteristic for interpolating P concentrations in the stream network (Table 3). This relationship between P concentrations and landuse agrees with the findings of previous studies in the Catskill Region of New York State based on multi-year data (Lyon et al., 2006).
Identifying links between landscape characteristics and stream water chemistry is difficult and typically requires sampling covering various flow conditions and seasons at numerous locations. This is specifically true for constituents where the mechanisms controlling stream water concentration are not well established or understood. The question is how one does identify the best location to collect samples in a stream network? The adjusted distance metric interpolation approach could be used in an investigatory mode such that first-order controls of stream water chemistry are identified from an initial synoptic sampling campaign. Then, transition zones (e.g., stream reaches where the best performing attribute undergoes much change) or hot spots (e.g., positions in the stream network where the best performing attribute is extremely high or low) could then be further targeted in future investigations to gain process-based understanding connecting in-stream chemistry and landscape characteristics. This approach could allow more effective sampling strategies and, thus, reduce costs.
The adjusted distance metric, while improving interpolations of water quality observations in terms of crossvalidation error, comes at a computational cost. By not making assumptions of mechanisms, we needed to test multiple landscape characteristics to determine which proves to be "best" performing for interpolation. This means producing numerous stream network maps describing the contributing area composition for each "point" in the stream. With the stream network rasterized based on a 10×10 m grid, our 37 km 2 study watershed contained a stream network consisting of over 12 000 grid cells. This is quite a large domain to model especially since we need to compute differences between each cell for Eq. 1 (resulting in 12 000×12 000=144 000 000 calculations per attribute!). An alternative approach for large systems would be to represent the stream network treating the stream order as the smallest unit. While computationally "faster", this method would not be able to show variations at the sub-reach scale which is often the scale of interest in synoptic campaigns and ecohydrological studies.
Another possible shortcoming of the adjusted distance metric (in the form presented in this study) is that it is based on a symmetrical representation of in-stream distance. It has been pointed out that stream water chemistry is strongly influenced by longitudinal transport mechanisms and movement occurs primarily in the downstream direction (Closs et al., 2004;Peterson et al., 2006). With this in mind, several recent studies have focused on developing weighted asymmetrical in-stream distance metrics (e.g., Peterson et al., 2006Peterson et al., , 2007Cressie et al, 2006;Ver Hoef et al., 2006) or better incorporating the organization of nested catchments (e.g., Skøien et al., 2006Skøien et al., , 2007. The influence of such directionality is likely limited during low-flow conditions. In addition, symmetric in-stream distance may better represent the integration of contributing areas (flow paths) moving down a single stream reach. It is easy, for example, to imagine a scenario where a point source (or highly concentrated region of non-point source) exists between two observation positions along a stream. Using an asymmetric metric, any interpolations made between these two observation points would not reflect this point source (since the influence of the point source is felt only by the down stream position). While this may not matter at larger spatial scales, it is extremely important in smaller scales which can directly affect the ecology and health of the river system.

Concluding remarks
Synoptic sampling campaigns can be used to represent stream water chemistry at the watershed scale. It is often desirable to determine a spatial continuous mapping of stream water chemistry from such campaigns using interpolation techniques (such as geostatistics and kriging). These techniques are heavily influenced by how we define distance between points. In this study, we developed and evaluated an adjusted distance metric that couples distance between instream chemistry concentrations with landscape characteristics. Ordinary kriging based on this adjusted distance metric better matched observations (i.e. resulted in smaller crossvalidation errors) than either Euclidean or in-stream distance metrics for our test watershed. This adjusted distance metric can also be used to help identify first-order landscape controls on stream chemistry dynamics and target future sampling campaigns.