Detecting dominant changes in irregularly sampled multivariate water quality data sets

Time series of groundwater and stream water quality often exhibit substantial temporal and spatial variability, whereas typical existing monitoring data sets, e.g. from environmental agencies, are usually characterized by relatively low sampling frequency and irregular sampling in space and/or time. This complicates the differentiation between anthropogenic influence and natural variability as well as the detection of changes in water quality which indicate changes in single drivers. We suggest the new term “dominant changes” for changes in multivariate water quality data which concern (1) multiple variables, (2) multiple sites and (3) long-term patterns and present an exploratory framework for the detection of such dominant changes in data sets with irregular sampling in space and time. Firstly, a non-linear dimension-reduction technique was used to summarize the dominant spatiotemporal dynamics in the multivariate water quality data set in a few components. Those were used to derive hypotheses on the dominant drivers influencing water quality. Secondly, different sampling sites were compared with respect to median component values. Thirdly, time series of the components at single sites were analysed for longterm patterns. We tested the approach with a joint stream water and groundwater data set quality consisting of 1572 samples, each comprising sixteen variables, sampled with a spatially and temporally irregular sampling scheme at 29 sites in northeast Germany from 1998 to 2009. The first four components were interpreted as (1) an agriculturally induced enhancement of the natural background level of solute concentration, (2) a redox sequence from reducing conditions in deep groundwater to post-oxic conditions in shallow groundwater and oxic conditions in stream water, (3) a mixing ratio of deep and shallow groundwater to the streamflow and (4) sporadic events of slurry application in the agricultural practice. Dominant changes were observed for the first two components. The changing intensity of the first component was interpreted as response to the temporal variability of the thickness of the unsaturated zone. A steady increase in the second component at most stream water sites pointed towards progressing depletion of the denitrification capacity of the deep aquifer.


Introduction
Numerous high-frequency sampling studies unravelled the high temporal variability of stream water quality (e.g.Kirchner et al., 2004;Cassidy and Jordan, 2011;Halliday et al., 2012;Neal et al., 2012;Wade et al., 2012;Aubert et al., 2013;Kirchner and Neal, 2013;Tunaley et al. 2016;Rode et al., 2016;Blaen et al., 2017).Therefore, monitoring water quantity and quality on the timescale of the hydrological response of the catchment is a key requirement for understanding water quality dynamics and its driving processes in detail (Kirchner et al., 2004;Neal et al., 2012;Halliday et Published by Copernicus Publications on behalf of the European Geosciences Union. al., 2012).While the development of sensor technology, data loggers and transmission technology hopefully will help to significantly increase the number of high-frequency monitoring programmes in the future, most of the existing monitoring programmes so far applied a rather low sampling frequency.Nonetheless, there is common agreement that for short periods with high-frequency data, longer periods of low-frequency monitoring provide invaluable context (Burt et al., 2011;Neal et al., 2012;Halliday et al., 2012;Bieroza et al., 2014).This is especially true for existing long-term records which are required as reference to distinguish between natural short-term and long-term variability of the observed variables and the assessment of the effects of anthropogenic influence on water quality such as changes in land use in the catchment (Burt et al., 2008;Howden et al., 2011).
The intriguing temporal and spatial variability in water quality monitoring data sets can in most cases hardly be related to single causal factors.Instead, a variety of biogeochemical processes (e.g.Stumm and Morgan, 1996;Neal, 2004;Beudert et al., 2015), climatic (e.g.Neal, 2004) and hydrological (e.g.Molenat et al., 2008) variability and anthropogenic influences, for example agricultural (e.g.Basu et al., 2010Basu et al., , 2011;;Aubert et al., 2013) or forestal (e.g.Neal, 2004) land use, land use change (e.g.Scanlon et al., 2007;Raymond et al., 2008) or urbanization (e.g.Kroeze et al., 2013), interact at different scales impeding identification of clear cause-effect relationships.Usually a single solute is affected by numerous different drivers at different scales (see e.g.Molenat et al., 2008;Lischeid et al., 2010;Schuetz et al., 2016 for NO − 3 ).Inversely, a single driver usually has an impact on various solutes (Massmann et al., 2004;Lischeid and Bittersohl, 2008).This suggests that trend analyses of single variables might easily be misleading with respect to the identification of driving factors.For this purpose techniques which are able to account for the interaction of multiple drivers and observed variables are preferable.
On the other hand, despite their complexity, catchments are highly constrained systems.Usually only a few dominant processes determine the main dynamics of streamflow, groundwater head or water quality (Grayson and Blöschl, 2000;Sivakumar, 2004;Lischeid et al., 2016).Using joint information from different solutes is an established way to derive hypotheses on processes or other causal factors that are dominant in the monitored data.For this purpose, dimensionreduction techniques, especially the linear principal component analysis (PCA), have been used in analyses of multivariate water quality data for a long time, mostly as an exploratory tool for descriptive process identification (e.g.Usunoff and Guzmán-Guzmán, 1989;Haag and Westrich, 2002;Cloutier et al., 2008) or for determining mixing ratios (e.g.Hooper et al., 1990;Capell et al., 2011).If the analysed data consist of time series of one or several variables observed at different sites, then the temporal features of the results of the dimension reduction can be analysed in a spatially explicit way, e.g. with respect to seasonal patterns or long-term developments at the monitored sites (Lischeid and Bittersohl, 2008;Lischeid et al., 2010).
However, many of the methods commonly used for analysing temporal developments in monitoring data sets require regularly sampled data.In practice the spatiotemporal design of sampling campaigns and monitoring networks often evolves during the sampling period in an irregular way.In order to obtain a regularly sampled data set, additional information with a different sampling design, e.g. from pilot studies or single sampling campaigns, might not be utilized in the analysis at all.Further irregularities in the spatiotemporal structure of environmental monitoring data sets arise typically during the monitoring itself from a variety of reasons such as failure of sensors or data loggers, measurement errors, loss of samples or periods of ice or drought.Thus, in environmental monitoring practice, data sets with gaps and periods with corrupted measurements are more the rule rather than the exception (see e.g.Zhang et al., 2018, for river quality data).Lischeid et al. (2010) suggested a combination of exploratory data analysis methods to detect and analyse dominant processes and their temporal development in multivariate water quality data sets that is capable of dealing with irregular time series.We built on that and extended it towards the detection of "dominant changes" in time series of multivariate water quality data that are monitored at different sites, i.e. at different parts of a catchment or in different catchments within a region.In analogy to the dominant process concept (Grayson and Blöschl, 2000;Sivakumar, 2004), we use the term "dominant changes" in a broad and descriptive sense referring to systemic changes that clearly exceed the usual range of heterogeneities in the temporal, spatial or inter-variable structure of the observed water quality data.The changes we considered as dominant were those that concerned (1) main components of the multivariate water quality data set rather than single water quality variables (multivariate components), (2) behaviour at various sites rather than at single sites (multiple sites), and (3) long-term behaviour rather than short-term fluctuations or single events (long-term patterns).
To identify the dominant changes, we combined exploratory data analysis methods for non-linear dimension reduction, spectral analysis, linear and non-linear trend estimation, and a monotonic trend test in one exploratory framework.The suggested approach was tested with a multivariate water quality data set that has been sampled with a spatially and temporally irregular sampling scheme in northeast Germany from 1998 to 2009.In the following, we present and discuss the results of our case study according to the three aspects of dominant changes: (1) multivariate components, (2) multiple sites and (3) long-term patterns.We continue with a discussion of (4) effects of the irregular sampling and (5) methodological aspects of the exploratory framework.

Study area
The study area is the upper part of the basin of the Ucker river located in the northeast of Germany, about 90 km north of Berlin, which drains to the Baltic Sea another 50 km further north.It is part of the Leibniz Centre for Agricultural Landscape Research (ZALF) long-term monitoring region AgroScapeLab Quillow, the LTER-D (Long Term Ecological Research Network, Germany) and the TERENO (Terrestrial Environmental Observatories, http://teodoor.icg.kfa-juelich.de, last access: 8 October 2017) Northeastern German Lowland Observatory.Water samples have been taken in the adjacent catchments of Dauergraben (78.9 km 2 ), Stierngraben (104.8 km 2 ), and Quillow (399.4 km 2 ) with its subcatchments Strom (235.8 km 2 ) and Peege (25.6 km 2 ) (Fig. 1).At the ZALF weather station Dedelow, which is situated approximately 500 m northeast of Q_97 (Fig. 1a), a mean annual precipitation of 550 mm and a mean annual temperature of 8.9 • C was observed for the hydrological years within the study period (November 1997to October 2009).The mean annual climatic water balance for this period, calculated from daily precipitation and potential evapotranspiration, was found to be −103 mm, exhibiting high interannual variability with −148 mm in the summer half year and +45 mm in the winter half year.The topography of the region developed basically during the Pomeranian stage and the Mecklenburgian stage of the Weichselian ice age, i.e. 15 200 to 14 100 years before present.Altitude varies from 20 m in the lowlands of the Ucker river to more than 100 m above sea level in the southwestern part of the study area.During the Pleistocene, repeated advances and recessions of the ice sheet deposited highly heterogeneous unconsolidated sediments of about 150 to 200 m thickness.The base consists of a thick Oligocene clay layer which separates the upper freshwater groundwater system from saline groundwater underneath.Based on borehole surveys, up to seven aquifers divided by layers of till have been identified within the unconsolidated Quaternary sediments.In some parts of the region patches of halophilous plants are found in the lowlands, indicating local upwelling of saline groundwater from the underlying Tertiary aquifer through windows of the Oligocene clay layer.
Loamy and sandy loamy soils that developed from the till substrate prevail.Most of the region is intensively used as cropland, although the fraction of arable land differs between the catchments (Table 1).Forests comprise only a minor fraction of the area (Table 1).Land cover did not change within the study period from 1998 to 2009.The riparian zone of the catchments is mostly used as grassland, underlain by peat and organic and sandy fluvial deposits.The hummocky landscape includes about 1300 closed drainage basins and small ponds with an area of the water surface < 1 ha (Kalettka and Rudat, 2006;Lischeid et al., 2016).Many of the larger depressions have been connected by ditches to facilitate drainage.Partly, these ditches have later been replaced by underground pipes for land reclamation.In addition, agricultural soils are extensively drained by subsurface tile drainage systems.From the 13th century until the end of the 19th century, the energy of the natural water courses was also occasionally used to power mills.Today, those mills are not active any longer and have been replaced in most cases by weirs for water management or ramps.For more details on the study site, please see Merz and Steidl (2015).

Sampling and analysis
The monitoring aimed to cover the spatial and temporal variability of water quality along the Quillow stream, its tributaries and the adjacent streams.The main focus of the monitoring was the Quillow catchment.Here, eight sampling sites were located along the main stream and another four at each of the two tributaries Peege and Strom (Fig. 1 and Supplement Table S1).At the streams Dauergraben and Stierngraben and at the Ucker river, stream water quality was monitored at one site respectively.Stream water sampling started in 1998 and was performed until 2009.Discharge data were only available at sites Q_93 and S_118 (Fig. 1a).Thus, we did not include it in the presented analysis.Groundwater quality was monitored in the Quillow catchment only, close to the middle reaches of the stream and close to the mouth of the Peege tributary, from 2000 to 2008 (Fig. 1b).At this site, an up to 15 m thick horizontal till layer separates a shallow and very heterogeneous unconfined aquifer from a mainly confined deep aquifer.The separating till layer crops out further downstream (Merz and Steidl, 2015).Both aquifers were monitored (Table S2).The deep aquifer is known to be confined except at well Gd_204.Groundwater level in the deep aquifer was measured daily with automatic data loggers at wells Gd_198, Gd_201, Gd_203 and Gd_204 (Merz and Steidl, 2014a).
Groundwater quality (Merz and Steidl, 2014b) and stream water quality (Kalettka and Steidl, 2014) monitoring in the Quillow catchment covers a wide range of water quality parameters.For the multivariate analysis in this study, we considered from the joint groundwater and stream water quality data set only the 16 variables with less than 5 % missing values, i.e.NH + 4 , NO − 3 , NO − 2 , PO 3− 4 , Na + , K + , Mg 2+ , Ca 2+ , Cl − , O 2 , pH, water temperature, redox potential (Eh), electric conductivity (EC), SO 2− 4 and dissolved organic carbon (DOC) (Table S3).Each sample contained measurements of all 16 variables.Those water samples for which more than 2 of the 16 monitored variables were missing were excluded from the analysis, resulting in a set of 1572 samples.In total, 0.69 % of the values in the data set were missing.In addition, we considered HCO − 3 and Fe 2+ concentration from the groundwater monitoring (Table S3).
The sites were sampled roughly similarly across seasons (Fig. 2a).The most important systematic deviations from this rule were the Peege sites and the most upstream sites of the Quillow (Figs. 2a and 1), which often fall dry in summer (Merz and Steidl, 2015).Further details on the data and measurement methods are provided by Merz and Steidl (2015).The selection of water quality data used in this article and the groundwater level data have been published under the CC-BY 4.0 license and can be found in Lehr et al. (2018) and Merz and Steidl (2014a) respectively.

Data preprocessing
Missing values were replaced by the median of the respective variable.This concerned at most DOC (3.44 % of the values) and NO − 2 (2.54 %), whereas the percentage of missing values was less than 2 % for each of the other 14 variables (Table S3).Values below detection limit were replaced by 0.5 times that limit.To achieve equally weighted variables the values were z-normalized to zero mean and unit standard deviation for each variable separately.

Exploratory framework
To identify the dominant changes, we firstly used the nonlinear dimension-reduction technique isometric feature mapping (Isomap) to derive the main multivariate water quality components.To account for the interaction of groundwater and stream water, both groundwater and stream water samples have been analysed together in one joint analysis.Secondly, we studied differences between the sites with respect to median component values.Thirdly, we analysed the time series of the components at sites with more than 50 samples.Seasonal patterns were analysed with the Lomb-Scargle approach (Lomb, 1976;Scargle, 1982Scargle, " 1989) ) and -if significant -were subtracted from the series prior to trend analyses.Please note that the term "seasonal" refers to the annual cycle throughout the article.Linear trends were estimated with the Theil-Sen estimator and tested for significance with the Mann-Kendall test.Non-linear trends were depicted with the locally weighted regression (LOESS) approach (Cleveland, 1979;Cleveland and Devlin, 1988).We then related resulting low-frequency patterns to the long-term groundwater head dynamics, likewise determined as the LOESS smooth of the de-seasonalized series.Time series analysis at different sites allowed us to check whether long-term patterns were consistent, pointing to more general effects in the study area.
As the methods do not require regularly sampled data in space or time, we considered every sample as additional information of the spatiotemporal variability of the observed water quality in the study area rather than noise.Consequently, irrespective of irregularities of sampling intervals at a site or differences in sampling intervals and numbers of samples between the different sites, we included as many samples in the analysis as possible to increase the informative value and support the representativeness of the study in space and time.This might lead to a bias in the determination of the components, as well as in the estimation of the trends of the components and their significance, if deviations from a regular sampling scheme follow a systematic pattern.
To check for that, we tested the distribution of sampling intervals at all sites with N > 50 (Table S1) for normality with the Shapiro-Wilk test and the temporal development of the lengths of the sampling intervals for the whole observation period for monotonic trends with the Mann-Kendall test.For all tested sites a Gaussian distribution of sampling intervals as well as a monotonic trend of the length of sampling intervals during the observation period was rejected.

Dimension reduction
Dimension-reduction methods aim to represent a data set with a given number of dimensions (here the number of measured hydrochemical variables) in a new data space with substantially fewer dimensions.This is achieved by projecting the data in a new ordination system which makes a more efficient use of the intrinsic structures of the data set than the original one.The axes of the new ordination system are usually called components or dimensions.In the following, we will use the term "components".For the values of a component we will use the term "scores".The reduction of the data set's dimensionality is achieved by considering only some of the new components for further analysis.The selection process is a trade-off between reduction of the dimensionality and minimizing the loss of potentially informative structures.Typically only the first few components are selected as they depict the main structures in the data set.
In the projection, different methods focus on different aspects of the data.For example PCA aims at maximizing variance on the first components, classical multidimensional scaling (CMDS) at preserving the inter-point distances of the input data in the projection and self-organizing maps (SOM) at preserving the neighbourhood relations (topology) of the input data in the projection (Lee and Verleysen, 2007).In the last years, a variety of non-linear dimensionreduction methods have been developed (Van der Maaten et al., 2009).Although being sensitive to noisy data, isometric feature mapping (Tenenbaum et al., 2000) was one of the best-performing approaches when applied to real-world data (Geng et al., 2005).It has been successfully applied in environmental research disciplines, e.g.biodiversity studies (Mahecha et al., 2007), soil sciences (Schilli et al., 2010), climatology (Gámez et al., 2004) and biogeochemistry (Weyer et al., 2014).

Principal component analysis
In our study, the well-established linear principal component analysis (PCA) served as benchmark for the non-linear isometric feature mapping.PCA is one of the most widespread dimension-reduction methods going back to research of Pearson (1901) and Hotelling (1933).For a brief introduction to PCA, please see for example Jolliffe and Cadima (2016) and for a comprehensive one see Jolliffe (2002).PCA aims  GdP_201 ( 25) GsP_202 ( 11) GdQ_198 ( 28) GsQ_199 ( 18) GsP_200 ( 6) GdQ_204 ( 25) GdQ_205 ( 2) P_109 ( 8) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 50 100 150 Days between subsequent sampling dates to successively maximize the variance of the data set on the new calculated components.The scores of the components are calculated as weighted linear combinations of the original variables.The weights (loadings) of the linear combination define the axes of the data space in which the data are projected.The loadings are the eigenvectors derived from an eigenvalue decomposition of the covariance matrix of the analysed variables.If the analysed variables are z-normalized, as was done here, their covariance matrix is equivalent to the (Pearson) correlation matrix.The components are ordered with decreasing size of their eigenvalues.The share of variance that is assigned to a component is proportional to the size of its eigenvalue in relation to the sum of all eigenvalues.Thus, the ratio of total variance that is captured by the considered components gives a measure of performance of the PCA.PCA was performed in R (R Core Team, 2017) with the function "princomp" of the default package "stats".

Isometric feature mapping
Isometric feature mapping (Isomap) is a non-linear extension of CMDS.It aims to approximate the global non-linearity in a data set by local linear fittings (Geng et al., 2005).This is done by mapping approximated geodesic inter-point distances to an Euclidean distance matrix via a neighbourhood graph G (Tenenbaum et al., 2000).The geodesic distance between two points is the distance along the surface of a (nonlinear) manifold, in contrast to the straight-line Euclidean distance (Geng et al., 2005).The neighbourhood graph G consists of segments that connect every data point to its k nearest neighbours directly via Euclidean distances.For all unconnected points the shortest path along the neighbourhood graph G is computed as the smallest sum of connected segments via the Dijkstra algorithm (Dijkstra, 1959).This approximation of the geodesic distances allows the adaptation of G to the global non-linear structures in a data set.The only free parameter k has to be optimized by checking the performance of several runs.The more linear the data, the higher the optimum k will be.If k equals the possible number of connections of one data point to all other data points, the approximations of the geodesic distances are equal to the Euclidean distances and the Isomap results are congruent with those of CMDS and linear PCA (Gámez et al., 2004).Finally the neighbourhood graph G is embedded in the Euclidean space.
In contrast to PCA, assessing performance based on the eigenvalues of the components is not applicable for Isomap.Performance of the dimension reduction of the Isomap approach was assessed and compared to performance of the PCA by the squared Pearson correlation coefficient (R 2 ) of the inter-point distances in the high-dimensional data space and in the low-dimensional projection spanned by selected components (Lischeid and Bittersohl, 2008;Lischeid et al., 2010).A perfect fit would yield a value of 1 and a value of 0 reflects no correlation between the distance matrices of the original data and of the projection.Please note that with this measure the contribution of single components to the overall performance does not necessarily decrease monotonically with increasing order of the components, as it is the case for the eigenvalue-based performance measure of PCA.For the local assessment of the representation of inter-point distances at the individual sites, only the data points from the respective sites were used.Because the selection of data points at a site is only a subset of the global data set for which the dimension reduction was performed, the performances regarding the representation of inter-point distances differ between the individual sites as well as compared to the overall performance for the global data set.At some sites it can even happen that adding more components does not improve the representation of inter-point distances in the low-dimensional projection for every component.Isomap and the determination of the distance matrices were performed with the R package "vegan" (Oksanen et al., 2017).

Interpretation of components
The analysis focused on those components that explained a major fraction of the total inter-point distances.The considered components were regarded to reflect dominant drivers influencing water quality.Here, the term "driver" was used for biogeochemical and hydrological processes as well as for anthropogenic influences affecting water quality.Correspondingly we formulated a hypothesis for each considered component.The interpretation of the components is based on analysing (i) the correlations between measured variables and component scores as well as (ii) spatial and temporal patterns of the scores.
Correlation between scores of a selected component cp x and values of single variables might be blurred due to the effects of other components on the same variable.We excluded those effects by analysing the relationships between scores of the selected component cp x and the residuals of the multiple linear regression (mlr) of the single variable v i at hand and the remaining other considered components CP\x (residuals): where CP\x is the set of m considered components, without the selected component cp x ; and β 0 and β j are the intercept and coefficients of the regression mlr (v i , CP\x) = β 0 + j ∈{CP\x} β j cp j + residuals. (2) To assess the relationships between components and residuals we used bivariate scatterplots.To summarize the relationships between components and residuals we used the Spearman rank correlation, which enables us to consider nonlinear relationships as well, as long as they are monotonic.In addition, it is less sensitive to extreme values than the Pearson correlation.

Time series analysis
At sites with more than 50 samples, time series of component scores were analysed for seasonal patterns, linear trends and non-linear trends.The sites were compared with respect to the identified long-term patterns to detect general patterns in the study area.The significance level for trend and frequencies in this study was set to p ≤ 0.05.At each site, the fractions of variance of a time series that were assigned to its seasonal pattern, linear trend or non-linear trend were determined as the R 2 of the respective pattern with the component series.In the case of significant seasonal patterns, the estimations of the trends were based on the de-seasonalized series.Accordingly, the fractions of variance assigned to the trends were determined as the R 2 of the trend pattern with the deseasonalized series.The decomposition of the time series in a seasonal component and a non-linear trend derived with LOESS was inspired by the seasonal-trend decomposition procedure based on LOESS (STL) of Cleveland et al. (1990).

Lomb-Scargle method
Standard Fourier analysis requires an equidistant time series, which was not given in our study.Therefore, the estimation of seasonal patterns in the time series was done with the Lomb-Scargle method, which is an extension of Fourier analysis to the unevenly spaced case genuinely invented in astrophysics (Lomb, 1976;Scargle, 1982).The application of the Lomb-Scargle method in this study follows to a large extent the workflow suggested by Glynn et al. (2006) as well as Hocke and Kämpfer (2009).Details are given in Appendix A.

Theil-Sen estimator and Mann-Kendall test
The linear trend was estimated with the non-parametric Theil-Sen estimator, which is the median of all inter-point slopes in a time series (Theil, 1950a, b, c;Sen, 1968).The Mann-Kendall test (Mann, 1945;Kendall, 1990) was used to test for significant monotonic trends.Identified trends are not necessarily linear.Being based on rank correlation, data do not have to obey any specific distribution.Please note that we did not account for the effect of overestimation of the significance of trends with the Mann-Kendall test due to shortterm autocorrelation (Yue et al., 2002).That would have required an assessment of the lag-1 autocorrelation which was hampered by the irregular sampling.Neither did we consider long-term memory and its effects on the statistical significance of the trends (Cohn and Lins, 2005;Zhang et al., 2018).Consequently, we did not consider the possible effects of the irregular sampling on the long-term memory (fractal scaling) of the water quality series either (Zhang et al., 2018).Due to the limited number of samples per year and non-equidistant sampling, the seasonal Mann-Kendall test was not applicable (Fig. 2).Instead, significant seasonal patterns according to the Lomb-Scargle approach were subtracted prior to trend analysis.The Mann-Kendall test was performed with the R package "Kendall" (McLeod, 2011).

Locally weighted regression (LOESS)
We assessed non-linear trends and low-frequency patterns with locally weighted regression (LOESS; Cleveland, 1979;Cleveland and Devlin, 1988), where the smoothing is done by local fitting of a second-order polynomial to each point x in the data set using weighted least squares.The weights for each value to be fitted are scaled to the range from 0 to 1 by the distance d(x) between x and its qth closest point.The ratio of q to the number n of all data points, i.e. the span of the local regression smoother, defines the degree of smoothing.We used the default smoothing span, which is a proportion of q/n = 0.75 of x's nearest neighbours.Data points further away than the qth data point do not contribute to the regression.Within the range of the span, the weights w i of the neighbouring points x i in the least squares fit decrease with increasing distance of x i to x symmetrically around x according to the tricubic weighting function ] 3 ) 3 .Again, significant seasonal patterns according to the Lomb-Scargle approach were subtracted prior to trend analysis.For details about choosing different LOESS parametrizations, please see Cleveland (1979) as well as Cleveland and Devlin (1988).
Local extrema of the LOESS smooth were identified with the R package "EMD" (Kim andOh, 2009, 2014.).

Multivariate components
We achieved the best performance of the Isomap dimension reduction with k = 1300 (Table 2).In the following, results are presented for the first four Isomap components representing 88 % of the inter-point distances of the total data set.For single sites (with more than 15 samples), between 29 % and 97 % of the respective inter-point distances were represented (Table S4).
The first component depicted 42 % of the inter-point distances of the total data set.Plotting residuals of the variables versus the first component showed strong positive correlations for NO − 3 , Na + , K + , Mg 2+ , Ca 2+ , Cl − , EC, SO 2− 4 , DOC and slightly weaker, but still positive, correlations for O 2 and Eh.Temperature was the only variable correlating negatively with the first component (Fig. 3).Visualization of the component scores versus residuals of solute concentration revealed predominantly linear relationships (Fig. S1 in the Supplement).
The second component reflected 18 % of the inter-point distances in the data.It exhibited clear positive correlation with O 2 concentration, pH and Eh, and weaker correlation with Na + , K + and DOC.It was inversely correlated with Ca 2+ , EC and SO 2− 4 (Figs. 3 and S2).In the groundwater samples, HCO − 3 and Fe 2+ had been determined as well.Both solutes were negatively correlated with this component (Fig. 4a).NO − 3 concentration in the deep groundwater samples was very low (with 27 % of the samples below detection limit) and did not show any clear correlation with the second component.Low component scores in the groundwater came along with high Ca 2+ and HCO − 3 concentration.The relationship of scores of component one and two in the groundwater is shown in Fig. 4b.Except for the two shallow wells close to the Peege stream (Gs_200, Gs_202; see Fig. 1b), scores of the first and second component are negatively related (Fig. 4b).
The third component represented 6 % of the inter-point distances in the data set.The residuals exhibited positive correlation for Na + , Mg 2+ , Cl − , pH and temperature.Negative q q q q q q q q q q q q q q q q q q q q q q q q q 0 1 2 3 4 5 6 −6 −4 −2 0 Fe 2+ q q q q q q q qq q q q q q q q q q q q q q q q q 6.8 7.2 7.6 −6 −4 −2 0 pH q q q q q q q q q q q q q q q q q q q q q q q q q −6 −4 −2 0 −6 −4 −2 0 Component 1 Component 2 q q q q q q qq q q q q q q q q q q q q q q q q q q Gs_199 Gs_200 Gs_202 Gd_198 Gd_201 Gd_203 Gd_204 Gd_205 correlations were found for NO − 3 , Ca 2+ , O 2 , Eh and DOC (Figs. 3 and S3).
Another 22 % of the inter-point distances in the data were assigned to the fourth component.Residuals of the component scores showed negative correlation for NH + 4 , PO 3− 4 , K + , temperature, and DOC and positive correlation for O 2 (Figs. 3 and S4).The range of component values was spanned mainly by single large values of NH + 4 , PO 3− 4 and K + that cannot be explained with the preceding three components (Fig. S4).This highlights the importance of particular events for the fourth component.

Multiple sites
Median values of scores of the first component clearly differed between streams (Fig. 5a).At the Strom sites, the median score values were considerably lower than those from the other stream water sampling sites.The median values of scores of the sites at the Quillow and Stierngraben showed intermediate values followed by the Ucker site, the Peege sites and finally the Dauergraben with the highest median score value.Groundwater samples in general exhibited consistently low scores for the first component, but without clear differences between deep and shallow groundwater samples.Mixing of water from different streams was visible at site Q_93 downstream of the confluence of the Quillow (Q_95) and of the Strom stream (S_118), as well as at site Q_100 downstream of the confluence of Q_104, Q_102 and P_107 (Figs. 1 and 5a).
Stream water samples exhibited the highest scores of the second component, whereas low scores were limited to deep groundwater samples, and shallow groundwater sam-ples were in an intermediate position (Fig. 5b).Median values of the stream water sites were approximately on the same level except for the sites Q_103, Q_106 and U_128, which exhibited noticeably higher median values than the other stream water sites, and the two Peege sites P_109 and P_108, which exhibited median values on the same level as the shallow groundwater sites Gs_199 and G_200.The scores in the deep groundwater clearly showed the largest absolute values, indicating the significance of deep groundwater for this component (Fig. 5b).
Scores of the third component in the deep groundwater were consistently higher than in shallow groundwater, while the stream water samples covered the whole range of values (Fig. 5c).The lowest scores of the third component were found at the Peege sites and in the shallow groundwater, and the highest scores were found at Ucker, Dauergraben and the deep groundwater.At the Quillow stream, scores tended to increase from the spring to the outlet.The effect of mixing of tributaries with different water qualities was visible along the course of the Peege and Quillow streams downstream of the respective confluences at the sites P_108, Q_95 and Q_93 (Figs. 1 and 5c).
The range of values of the fourth component was strongly biased towards negative values, caused by single events at some sites which exhibited very low values (Fig. 5d).

Long-term patterns
Time series of scores of the components were studied at sites with more than 50 temporal replicates.This applied for 13 stream water sites (Table S1).All dominant frequencies (for details, please see Appendix A) interpreted as seasonal pat- q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q −5 0 5 10 Component 1

Groundwater
Stream water (a) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q −4 0 4 Component 2 (b) q q q q q q q q q q q q q q q q q q q q q q q −4 0 4 Component 3

(c)
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q −20 Sites with N < 13 are marked with "∼", those with N < 3 with "X".Labels: P, Peege; Q, Quillow; S, Strom; St, Stierngraben; U, Ucker; D, Dauergraben; Gs, shallow groundwater; Gd, deep groundwater.
terns had a period length in the range between 350 and 380 days.For de-seasonalization these seasonal patterns were subtracted from the time series prior to analysis for linear and non-linear trends.Most of the time series of the scores of the first component exhibited clear seasonal patterns with maximum scores during the winter season (Figs. 6 and 7).Between 30 % and 67 % of the variance was assigned to the seasonal pattern.At all sites we found significant negative monotonic trends (Fig. 6).The strongest decline was found at site D_112 and the weakest trend at site Q_97 (not shown).The linear trend comprised between 9 % and 48 % of the variance of the deseasonalized time series (Fig. 6).In contrast, the LOESS smooth depicted 14 % to 57 % of the variance (Fig. 6).It showed a decrease until December 2004 approximately and an increase thereafter (Fig. 8).The de-seasonalized time se-ries of groundwater heads showed a similar behaviour, with the minimum water level in June 2006 (Fig. 8).Timing of the minimum values of the scores of the first component varied between sites, spanning a range from 17 February 2004 to 17 March 2009 (Fig. 8).As an example, Fig. 7 gives the time series of scores of the first component at site Q_93, the seasonal pattern extracted from the series and the de-seasonalized time series with the non-linear trend estimated with the LOESS smoother.
Unlike for the first component, only 5 of the 13 considered time series of the second component exhibited a clear significant seasonal pattern, accounting for 17 % to 48 % of the variance (Fig. 6).The maxima of the seasonal patterns of the sites at Quillow and Ucker were in spring and at Stierngraben and Dauergraben in summer.In contrast, significant monotonic trends were found at most of the stream water sampling sites.All significant trends of the second component were positive.The linear trend comprised between 5 % and 16 % of the variance of the time series, while the LOESS smooth comprised between 4 % and 25 %.
Values of the third component showed a clear seasonal pattern with maxima in summer (Fig. 6).Between 30 % and 60 % of the variance was assigned to the seasonal signal.The only exception was site D_112 were the seasonal pattern was distorted by strong maxima in the winters of 2003, 2004 and 2007.Only at four sites were significant linear trends found.All of them were negative, comprising between 6 % and 13 % of the variance.The LOESS smooth depicted between 0 % and 21 % of the variance.
For the fourth component, significant seasonal patterns with maxima in summer were observed at 7 of the 13 analysed series, comprising between 17 % and 61 % of the variance (Fig. 6).Five sites showed a significant monotonic trend, comprising between 5 % and 10 % of the variance.A negative trend was observed at site St_133 only.Four sites showed a positive trend.The LOESS smooth depicted between 1 % and 16 % of the variance.

Multivariate components
Non-linear Isomap performed in this study only slightly better with respect to the representation of inter-point distances than PCA (Table 2), suggesting that mainly linear relationships were of importance for the overall dynamics in the data set.As there were only minor differences, we will present in the following the results of Isomap only.
For PCA and Isomap, the first component represents by definition the correlation structure that predominantly can be extracted from the set of variables as a whole.If all the loadings of the first component of a PCA have the same sign, it is a weighted average of all the analysed variables (Jolliffe, 2002;Jolliffe and Cadima, 2016).The stronger the analysed variables are linearly correlated, the more the first component approximates the arithmetic mean of all variables (for examples with hydrometric data see Lischeid, et al., 2010;Lehr et al., 2015).Furthermore, the first component serves as reference for all the subsequent components.
In this study each sample of the multivariate water quality data set is uniquely defined by a sampling site and a sampling date.Thus, the first component depicted (a) for each sampling site the pattern that was most prominent in the time series of the variables correlating with the first component and q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Period [days]: 368 q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q qq qq q q q q q qq q q q q q q q q q q q q q q q qq q q q qq q qq q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q 1998 2000 De-seasonalized comp 1 LOESS q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q qq qq q q q q q qq q q q q q q q q q q q q q q q qq q q q qq q qq q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q (a) (b) Groundwater level in m a.s.l.
q Gd_198 q Gd_201 q Gd_203 q Gd_204 q qq q qq qqq qqqq q qqq q qqq qq q qq qq qq q q qq q q q qq qqqqqqqqqq q q qqq q q qq q qq qqqq qq qqqq q qqq qq q qq q qq q qq qq qq q qq q qq qq qq qq qq qq qq q qq q qqq q q q qq qqq q qq qq qq qq qq qq qq qq q qq q q q q qq q qqq qqq qq q q q qq q qqq qq qqq q qq qq qq qq qq q qq q qq q qq q qq q q qq qq qq qqq qqq qqq qqqq qqq qqq qqq qqq qqqq qq qq qq q qq qqqq qqqqq qqqq qq qqq qq qqq qq qqq qq q qq qqq qqq qqq qqq qqq q qqq qqq qqqq qq qqq qqq qqqqqq qqqqqqqq qqq qqqqq qq qq qq qqqqq qq qqqq qq qqqq qqqq qqq qqq qq qqq qqq qq q qq qq q qq q qq qq qq qq qq q qq qq qq q qqq qq qqq qqq qq qqq qqq qqq qqq qqqq qqq qqq qqq qqqqq qqqq qq qqqqq q qq qqqqq qqqqq qqqqqqq qq qqqq qqqqq qqqqq qq qqq qqq qqq qqqqq qq q qq qqqqq q qq qqqq qq q qqqq qqq qqq qqq qqq qqq qqqq qqq qqq qqqq qqq qqq qqqqq qqqq qq qqqqq qq qqqqqq qqqq qqqqqq q qq qqqq qqq qqq qqq qqq qqq qqq qq q qq qq q qq q qq qq qq q qq q qq qqq q q q qqq qqq qqq qq qqq qqq qqq qqq qqq qqq qq qqq qqq qqqq qqqqq qq qqqqq qq qqq qqq qqqq qqqqq qq qq qqqq qqqq qqqq qqqq qqq qqq qq qq qq q qq q qq qqq qq qqq q qq qqq q q qqq qq qqq qqq qq qqq qqq qqqq qqq qqqq qqq qqq qqq qqqqq qqqq qq qqqqq qq qqqqq qqqqq qqqq qqq qq qqq q qqq qqq qq qqq q qqq qqq qq qqq qq qq q qq qqq qqq qqq qqq qqq q qqqq qqqq qqqq qqqq qqqq qqqq q qqqqqq qqq qqq qqqqqqqqq qq qqqqq qq qqqqqqqqqq qqqqqqq qq qqqq qqqqq qqqq qqq qq qq q qq qq q qq qq qq qq qq q qq qq qq q qq qq qq qq qq qqq qqq qqq qqq qqq qqq qqqq qqq qqq q qqqqq qq qq qqqqq qq qqqqqqqqqq qqqqqqq qq qqqq qqqqqqqqqqqq qqq qqq qq qq qq q qq q qq qq qq qq qq q qq qq qq q qq qq qq qq qq qq qq qq qq qq qq qq qq qqq qqq q qqq qq qqqqq qq qqqqqqqqqq qqqqqqq qq qqqq qqqqqq qqqqq q qqq qqq q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq qq q qq qq qqq qq qq qq qq qq qq qq qq qq qq q qq q qq q qq qq qqq qq qq qqqq q qq qq qq qq qqq q qqq qqq q qq qqq q qqq qqq qqq qqq qqq qqq q . Left y axis: LOESS smooth of time series of the first component at sites with N > 50 in grey.If a significant seasonal pattern was present, this was removed before smoothing.Right y axis: LOESS smooth of the de-seasonalized groundwater level at four sites in black.The black dots mark the minima of the LOESSsmoothed series.
The whole study region is characterized by relatively intense agriculture (Table 1).Thus, in addition to the natural background, we assume a general effect of the agricultural practice on the solute concentration level and the dynamics of the water quality series in the area.Enhanced concentration of NO − 3 , Cl − , SO 2− 4 and Ca 2+ is typical for groundwater and stream water in regions with intense agriculture compared to forested areas (Broers and van der Grift, 2004;Fitzpatrick et al., 2007;Lischeid and Kalettka, 2012).Nitrogen and potassium are the main ingredients of mineral fertilizers.Cl − and SO 2− 4 are the dominating anions in potassium fertilizers.SO 2− 4 is a major ingredient of phosphorus fertilizers and in some nitrogen fertilizers.Calcite is present in some nitrogen fertilizers or is applied separately via liming.DOC might be leached from slurry application or via tile drains after mechanical destruction of topsoil aggregates during tillage (Graeber et al., 2012).In addition, cations from the soil matrix might be leached by an enhanced anion concentration (mainly NO − 3 ) (Jessen et al., 2017).Overall the application of fertilizers and other agricultural practices like tillage tend to enhance the solute concentration of seepage water (Pierson-Wickmann et al., 2009).Thus, we interpreted the first component as the enhancement of the natural background level of solute concentration due to agricultural practices.
Compared to the first component, the relationships of the second component with Eh, pH and O 2 concentration were more clearly expressed (Figs. 3 and S2).The range of the scores of the second component was spanned by the lowest values in the deep groundwater and the highest values in the stream water (Fig. 5b), whereas shallow groundwater exhibited intermediate scores.This sequence corresponds to redox conditions expected in those water categories.Thus, we interpreted the second component as a redox-controlled component covering a sequence from reducing conditions in deep groundwater to post-oxic conditions in shallow groundwater and oxic conditions in stream water.O 2 and NO − 3 concentration in deep groundwater samples usually was below the detection limit, which is a common feature in this region (Merz et al., 2009).NO − 3 in seepage and groundwater can be denitrified by microorganisms which use the oxidation of sulfides to sulfate as electron donor for denitrification (Massmann et al., 2003;Jørgensen et al., 2009).We ascribed the high SO 2− 4 and Fe 2+ concentration to the oxidation of pyrite (Figs.4a  and S2).Pyrite and other sulfides are abundant in the Pleistocenic sediments of northern Germany (e.g.Weymann et al., 2010).Consequently, the pH decreases, calcite gets dissolved and the HCO − 3 concentration increases.Part of the released Ca 2+ replaces Na + and K + being sorbed to clay minerals.
We interpreted the clear separation in the third component between relatively low scores for the shallow aquifer and relatively high scores for the deep aquifer as a reflection of two opposing gradients (Fig. 5c).High concentration of NO − 3 , O 2 and DOC and relatively high Eh values being negatively related to the third component (Fig. 3) are indicative of groundwater close to the surface, whereas enhanced concentration of the positively related solutes Na + , Mg 2+ and Cl − is characteristic for local upwelling of saline groundwater from the underlying Tertiary aquifers at greater depth (Hannemann and Schirrmeister, 1998;Tesmer et al., 2007).The scores of the stream water samples, in turn, reflect the mixing ratio of groundwater from the two aquifers to the streamflow.We expect the baseflow maintained from the deep aquifer to be relatively enriched with geogenic solutes compared to the water that stems from the shallow aquifer or faster-responding flow components like tile drain discharge and surface runoff.Water from the shallow aquifer is expected to be relatively enriched with solutes originating from the surface compared to water that stems from the deep aquifer.
The range of values of the fourth component was dominated by single extremely low scores, reflecting samples with high concentration of NH + 4 , PO 3− 4 and K + (Fig. S4).The catchments of the analysed streams are only sparsely populated and mainly characterized by intensive agriculture (Table 1).In agricultural landscapes slurry is a typical source in which those nutrients occur in high concentration (Hooda et al., 2000).We are not aware of any other high-concentration sources of this combination of nutrients in the region.The small number of scores with very low scores implied that there were merely single events occurring at some of the sites only.This fits the finding that the timing of slurry application is crucial for the amount of nutrient loss to the streams (Hooda et al., 2000;Cherobim et al., 2017).Thus, we interpreted the negative peaks of the fourth component as sporadic events of slurry application, being either unintentionally directly applied to the stream during the spreading of the slurry or being leached via surface runoff and tile drain discharge after application.

Multiple sites
The interpretation of the first component as an agriculturally induced enhancement of the natural background level of most of the water quality variables is consistent with the spatial pattern of median component scores at the different sites.The highest scores were found in the Dauergraben stream and in the Peege stream (Fig. 5a).Both catchments are characterized by intense agriculture, a relatively dense network of tile drains and hardly any buffer strips along the streams leading to a rapid transmission of solute-enriched waters from the fields to the streams.In contrast, the Strom stream exhibited the lowest scores among all streams.Compared to the other streams, the valley of the Strom stream is clearly deep cut.Therefore, the Strom stream is expected to receive along its whole length continuous and substantial groundwater inflow from the deep aquifer.In addition, the valley slopes are covered with forest and are not in agricultural use, acting as a Hydrol.Earth Syst.Sci., 22, 4401-4424, 2018 www.hydrol-earth-syst-sci.net/22/4401/2018/ buffer strip for the agricultural impact.Furthermore, the fraction of arable land in the Strom catchment is smallest and the fraction of woodland is largest compared to the other catchments (Table 1).Main parts of the Strom catchment are situated within a nature conservation area, furthermore limiting the agricultural impact in its riparian zone.Deep groundwater, shallow groundwater and the stream water were well separated by the second component (Fig. 5b).Exceptions were the sites at the Peege stream, which are mainly supplied with water from tile drainage and the shallow aquifer and consequently yield median values similar to the shallow groundwater.The largest positive median values of the second component, being higher than those of the other stream water sites, were observed at sites with less than 13 samples (Q_103 and Q_106) and at the site U_128, which received at least partly waters from a different region than the other stream water sites (Figs. 1 and 5b).Thus, for the purpose of this study, we restricted our analysis on the spatial variability of the redox component to the categories of deep groundwater, shallow groundwater and stream water.
However, we took a closer look at the non-linear structure that became apparent for the deep groundwater samples in some of the residual plots of the second component (Fig. S2).In addition, we related the groundwater values of the second component to the first component and the HCO − 3 and Fe 2+ concentration (Fig. 4).The negative relationship between the second component and the first component in the deep groundwater suggests that the agricultural load represented by the first component acts as a driver for the sulfide oxidation represented by the second component.Among all deep groundwater wells, the deepest groundwater well Gd_198 exhibited the lowest scores of the first component (Fig. 5a) and the highest scores of the second component (Fig. 4b and Fig. 5b).This suggests that due to the relatively low agricultural load the oxidation of sulfides was the least pronounced among all deeper wells.Similar relationships between the extent of sulfate oxidation in the aquifer and agricultural NO − 3 input have been found in other studies (e.g.Zhang et al., 2009;Jessen et al., 2017, and references therein).
We expected the ratio of groundwater from the deep aquifer contributing to the streamflow to increase in general with increasing catchment size.The Peege stream is mainly fed by the shallow aquifer and yielded consequently median values of the third component similar to the shallow groundwater sites (Fig. 5c).In comparison the streams of Quillow, Strom and Stierngraben exhibited higher median values, indicating a larger proportion of groundwater from the deep aquifer contributing to runoff in comparison to the Peege stream.The sites U_128 and D_112 showed the highest median values of the third component among the stream water sites, being equal or even higher than those of the deep groundwater sites (Fig. 5c).Both sites have subsurface catchments that do not include the deep groundwater sampling sites in this study.Site D_112 is on the eastern side of the river Ucker, while all groundwater sampling sites are on the western side of it (Fig. 1a).In addition, its higher median value of the third component was partly due to several peaks during the winter time.Those coincide with high values of Cl − .These might indicate road salt application, but we did not investigate this further, as the winter peaks considered only this single site.Site U_128 is situated at the outlet of the lake Unteruckersee upstream of the confluence of the Quillow stream (Fig. 1a).There, we did not expect a contribution of the groundwater sampled in the Quillow catchment either.
All the stream water sampling sites with negative peaks of the fourth component are located near arable fields which are known to get fertilized by slurry (Fig. 5d).For example the two most affected sites Q_102 and Q_103 receive slurry input from a large pig farm close by (Gernot Verch, personal communication, 2018).Overall, only a few slurry input events accounted for 22 % of the representation of the interpoint distances of all the water quality samples of the water quality data set in the Isomap projection (Fig. 5d).However, the performance of the representation of the inter-point distances after adding the fourth component differed substantially between the different sites (Table S4).In the case of site S_121 the representation of inter-point distances with four components (R 2 = 0.68) was even slightly worse than with three components (R 2 = 0.66) (Table S4).This indicated an anomaly at this specific site compared to all other sites with respect to the fourth component, i.e. the solutes which mainly determine the fourth component.We traced this phenomenon back to one single sample from 25 May 2004 which comprised relatively high DOC values and at the same time relatively low values of K + , which is opposite the relationships indicative of the fourth component (Fig. 3).The deterioration of the representation of the inter-point distances after adding the fourth component at this site vanished in an Isomap analysis which was performed without this sample.We were not able to find an explanation for this exceptional sample.However, it underlined that by applying a dimension-reduction method every single sample is put into the perspective of the global features of the data set as depicted by the components.Overall, the fourth component underlines the necessity of developing and using methods in environmental data analysis, which enable us to consider non-linear processes as well as singular and site-specific events.

Long-term patterns
Dominant changes were observed for the first two components (Fig. 6).We interpreted the non-linear long-term trend of the first component at most stream water sites (Fig. 8) as the response of stream water quality to the interannual variability of depth to groundwater.An increase in the thickness of the unsaturated zone leads in general to longer residence time of seepage water, increasing retardation and buffering of topsoil seepage water, which reduces the solute concen-tration originating from the surface in the seepage water and consequently reduces the values of the first component.
Trends similarly shaped to the non-linear trend of the first component of stream water quality were observed for the water level in the deep groundwater.In general, the turning points of the deep groundwater head time series lag behind those of the scores of the first component of the stream water sites by approximately 1.5 years (Fig. 8).The earlier date of the turning point at groundwater gauge Gd_204 in October 2005 is most probably an artefact, caused by the effect of the large time gaps in 2006 and 2007 on the de-seasonalizing at this site and has to be considered with care.
We suggest that the time lag between stream water chemistry and water level in the deep aquifer is due to different response times to changes in the moisture conditions of the unsaturated zone.Compared to the relatively fast response of the stream water quality, the groundwater level in the deep aquifer reacts slower.In general, the overall trend of groundwater recharge reflects a relatively slow response to changes in the regional water balance.The velocity of seepage in the sediments of the upstream region of the Quillow catchment is estimated to be roughly 1 m per year.
The seasonal patterns, i.e. the annual variability, in the time series of the scores of the first component in the streams were ascribed to transient hydraulic decoupling of the mostly affected topsoils from the streams in summer.Usually there is hardly any seepage during the dry summer months at all.This leads often to drought in the uppermost stream reaches (Fig. 2a and Merz and Steidl, 2015).Thus, shallow groundwater and tile drain discharge, both sources with relatively high agricultural load, did not contribute to stream discharge during these periods and larger areas of the catchment got hydraulically decoupled from the stream network (Merz and Steidl, 2015).Similar effects of the seasonal variability of the hydrological connectivity of streams, groundwater and tile drainage in agricultural catchments on the concentration level of solutes originating from agriculture in the stream water have been reported, e.g. for NO − 3 in the Schaugraben study catchment in the north of Germany (Wriedt et al., 2007) and for NO − 3 and Cl − in the Kervidy-Naizin catchment in western France (Molenat et al., 2008;Aubert et al., 2013).
The other dominant change in stream water chemistry observed in this study was the continuous increase in the second component at most stream water sites (Fig. 6).All of the sampling sites with very low values of the second component were in the deep aquifer (Fig. 5b).The positive trends of the second component at most stream water sites were ascribed to changes in the chemistry of the baseflow originating from groundwater.Considering the interpretation of the second component, this translates into enhanced oxidation of geogenic sulfides in the deeper aquifer due to the continuous input of NO − 3 and DOC originating from agriculture and subsequent calcite dissolution.Geogenic sulfides, such as pyrite, serve as electron donors for denitrification.The consumption of the geogenic sulfides is irreversible and might lead to the depletion of the denitrification capacity in the deep aquifer in the long run (Merz et al., 2009;Zhang et al., 2009;Merz and Steidl, 2015).Consequently, buffering of NO − 3 surplus from agricultural land use is expected to decrease and NO − 3 concentration in the groundwater and the stream water is expected to increase.The hypothesized long-term development should be of concern for the water resources and environmental protection agencies with respect to future water quality and related international commitments, such as the Water Framework Directive (EU, 2000), the Groundwater Directive (EU, 2006) and the Nitrates Directive (EU, 1991) of the European Union.Substantial time lags have to be considered for the response of groundwater quality to measures that reduce leaching of NO − 3 (e.g.Pierson-Wickmann et al., 2009;Meals et al., 2010).In the Quillow catchment, we expect travel times in the order of magnitude of decades for the seepage water to reach the deep aquifer.
We did not observe dominant changes for the other two water quality components during the course of the observation period.The main temporal feature of the third component was a very distinct and steady seasonal pattern, as could be expected for the mixing ratio of groundwater from the deep aquifer.All stream water sites with n > 50, except for D_112, showed a distinct seasonal pattern with maximum scores in the summer, which is consistent with the assumption that the fraction of deep groundwater in the streams is highest during this period (Fig. 6).The seasonal pattern at site D_112 was disturbed by the winter peaks we ascribed to road salt application (Sect.5.2).
Because of its strong dependence from single events (Fig. 5d), the results of the estimation of the seasonal patterns and the trends of the fourth component have to be considered with care.The maxima of the seasonal pattern in summer at some sites were interpreted as reduced nutrient inputs to the stream due to nutrient uptake of plants and maximum buffering capacity of the unsaturated zone in summer.There were no indications for any effects of those events that we ascribed to the direct effect of slurry application on samples taken on the subsequent sampling dates at the affected sites.This is presumably due to the width of the sampling interval (Fig. 2).
In the case of dependence of a component from single events, change might be also related to clustering of those events during certain parts of the series, either for series at single sites or sets of series.Most of the extreme events of the fourth component appeared during the first half of the observation period (not shown).However, because of the small number of clearly outstanding events, we did not investigate this further (Fig. 5d).
In this study, the presented analysis of changes in water quality was limited by the temporal resolution of the data.Aspects such as long-term memory effects, as indicated by fractal scaling of solute series (Kirchner et al., 2000) and the observed scale-crossing non-self-averaging behaviour of solute series (Kirchner and Neal, 2013), were not considered.

Effects of the irregular sampling
There was an obvious spatial bias with a focus on the Quillow catchment itself, conditioned by the focus of the monitoring (Sect.2.2, Fig. 1).Stream sampling sites were only partly independent from each other, as the same streams had been sampled along different reaches.This needs to be considered in the interpretation of the components.In our exploratory approach, differences between subsequent stream reaches helped to identify the effects of tributaries or groundwater that recharged between the respective sampling sites.
In that way, the stream was used as a measurement device for biogeochemical processes and water-borne solute transport in different parts of the catchment and the interlinkages of groundwater and stream water.
It is important to note that our approach does not require the same number of samples per site (Fig. 2).The derived components constitute a frame in which all samples are integrated independent of the number of samples per site.Thus, in our application we get the information of how those sites with very few samples group or behave in relation to the others.Even a few samples might indicate for example that the respective site behaves similar to other sites with respect to some components and very different with respect to other components.The influence of single samples for the integration of the different sites into the global pattern of the water quality relationships summarized by the fourth component is an illustrative example for that (Sect.5.2).Thus, even occasional sampling at some sites helps in assessing the strength of effects of the respective drivers at these sites and might support or contradict hypotheses on spatial variability and related long-term patterns of those influences.This information would be lost if those samples would be excluded beforehand.
In addition, the approach followed here does not require identical temporal sampling resolution at all sites or synchronous sampling dates.Thus, a strictly regular sampling design, which is hardly feasible, is no prerequisite.Correspondingly, data from different monitoring programs could be used for a joint analysis.Sampling intervals at the sampling sites with N > 50 were not normally distributed and biased towards deviations that are longer than the median (Fig. 2b).Several series exhibited large time gaps.However, as sampling intervals did not change systematically throughout the monitoring period we assume that the effects on the results of the significance test with Mann-Kendall were negligible (Sect.3.2).In comparison, the trend estimations with the Theil-Sen estimator and LOESS are more robust, as they incorporate the exact sampling dates explicitly in the calculations.Thus, we do not expect major effects on the sign of the Theil-Sen estimator or the general shape of the LOESS smooth at the given temporal resolution.
In general, the interpretation of the components should consider the temporal structure of the data set.For example in this study the drying out of the streams at the Peege sites and the most upstream sites of the Quillow in summer was the most important systematic deviation from an otherwise roughly similar sampling across seasons (Fig. 2a).This information was included in the interpretation of the first component (Sect.5.3).If the monitoring would in general not have been performed roughly similarly across seasons, e.g. if one or more seasons would in general be missing, the estimation of the seasonality would not be applicable.If the monitoring would be such that there would be different seasons sampled in different years, this would have to be considered in the estimation of the trend.

Exploratory framework
The application of a dimension-reduction approach was motivated by the assumption that drivers influencing water quality usually affect more than one solute and that single solutes are affected by more than one driver.Like in preceding studies (e.g.Lischeid and Bittersohl, 2008;Lischeid et al., 2010), the representation of water quality data in low-dimensional space required only a few components to capture the main features of the data set.
Whether the relationships in the data set are mainly linear ones, as in this study, or whether there are considerably nonlinear relationships as well is usually not known in advance.Thus, if the aim is to consider and check for possible nonlinear relationships in the analysis we recommend using PCA as a linear benchmark for Isomap (demonstrated by Lischeid and Bittersohl, 2008).In a straightforward way this allows for (1) assessing whether the dominant correlation structures in the data set are mainly linear or non-linear and (2) identifying those components, samples, sites and periods deviating from the linear behaviour as captured by the PCA.
Based on the correlation of component scores and residuals, we formulated for each considered component a hypothesis on a dominant driver influencing water quality.Again, whether the relationships are linear, as they were for most of the global relationships in this study (Figs.S1-S4), is usually not known beforehand.Summarizing the relationships between residuals and components with Spearman rank correlation enables us to consider non-linear relationships between residuals and components as well, as long as they are monotonic.However, the main benefit in this study was that the Spearman rank correlation is less sensitive to extreme values compared to the Pearson correlation.This concerned especially the assessment of the relationships of the residuals of SO 2− 4 and Cl − with the second component and the residuals of PO 3− 4 and NH + 4 with the fourth component (Figs.S2  and S4), which were more strongly expressed with Pearson correlation due to a few single extreme values.The derived correlations differ from default loadings of PCA, which are defined as the coefficients of the linear combination of the analysed variables which is used to calculate the principal component scores.Those coefficients, scaled by the square root of the eigenvalue of the respective component, are equivalent to the Pearson correlation of PCA component scores and analysed variables.It is important to note that the differences in the evaluation of the correlations of components and the measured variables might lead to different interpretations of the components.
The treatment of censored values can substantially affect the derived components and the subsequent interpretation of the results and has to be considered carefully (Helsel, 2012, and references therein).For the application of Isomap, it is required to provide numerical values for the values below the detection limit.For simplicity, we here used half the detection limit as a maker for values below the detection limit.We checked for the effect of this substitution by comparing the Isomap results of the presented analysis with another Isomap analysis in which we excluded the two most affected variables, NO − 2 and PO 3− 4 (Fig. S4).The correlation of the Isomap scores of the interpreted components 1 to 4 of version 1 (with NO − 2 and PO 3− 4 ) versus version 2 (without NO − 2 and PO 3− 4 ) yielded R 2 values of 0.99 for the first component, 0.98 for the second component, 0.97 for the third component and 0.64 for the fourth component.The comparison of the two versions with respect to the Spearman rank correlations of Isomap scores of the first four components and the residuals (please see Fig. 3 for the respective values of version 1) yielded R 2 values of 0.98 for the first component, 0.99 for the second component, 0.99 for the third component and 0.88 for the fourth component.Thus, the first three components are virtually identical.The fourth component is affected, because PO 3− 4 is one of the important variables for this component (Fig. 3).Nevertheless, the similarity of the correlations of Isomap scores and the fourth component of both versions suggests that the characteristics of the fourth component were not merely introduced by the substitution of the values below the detection limit for PO 3− 4 .Thus, overall, the substitution did not substantially affect the interpretation of the considered components; however, we acknowledge that the replacement of censored data with some fraction of the reporting limit is not generally appropriate for dealing with censored data (Gilliom and Helsel, 1986;Singh and Nocerino, 2002;Helsel, 2005Helsel, , 2006Helsel, , 2012)).For data sets which are more heavily affected by censored values, other dimension-reduction methods such as the rank-based approaches suggested by Helsel (2012) should be preferred.
For data sets in which the number of measured variables differs between the sites there is a trade-off between the number of considered variables versus the number of considered sites.Depending on the focus of the study different selections of the data set can be used.For example if the main focus of the study is to analyse the multivariate water quality dynam-ics in detail it might be worthwhile to disregard some sites to be able to include more variables.If the focus is to maintain the spatial coverage of the monitoring, like in this study, more sites might be of more value than additional variables.Depending on the available resources a third option would be to perform two analyses, one focusing on more variables, one on more sites and comparing the results.If it is possible to link the considered components, like we did in the preceding paragraph, this proceeding can be used for spatial extrapolation of the hypotheses derived from the version which included more variables.However, in our case the sketched trade-off was not dramatic.Thus, we excluded only the variables with more than 5 % missing values (Sect.2.2) to keep the possible effect of any method of replacement rather low.
To prevent adding variables with little information gain it is recommendable to perform a correlation analysis beforehand and rule out highly correlated variables.For this purpose we recommend not to rely only on a numerical measure of correlation, but to visually examine the scatterplots of the respective variables to check for systematic deviations from the global relationship.There might be for example some sites or seasons in which the otherwise tight relationship gets weaker, pointing to local or temporal phenomena.
Technically it is possible to combine other data than solutes (e.g.sediment data and biological indicators) together with the solutes in one joined data set for the derivation of the components.However, the multivariate components derived by the dimension-reduction approach are the basis of the subsequent interpretation of the results.It has to be considered as well that all included variables are equally weighted due to the z scaling prior to the dimension reduction.Thus, including other types of data might in some cases complicate the interpretation.In general, we recommend not mixing variables with different scales of measures (e.g.nominal variables and ratio scaled variables) in the database for the derivation of the components.
Instead, data which were not used in the derivation of the components can be used as additional information for their interpretation.For example in this study we used in addition to the spatiotemporal features of the components other variables like groundwater level series, Fe + 2 and HCO − 3 concentration from the groundwater samples, the spatial distribution of land use and expert knowledge on the study area for the derivation of the hypotheses.A thorough testing of the hypotheses, for example through hydrochemical modelling or numerical experiments with virtual catchments, was out of the scope of this study.
However, an interpretation of the components as distinct drivers is no prerequisite for the further analysis of the components.In any case, the components constitute, and can simply be used as, a condensed representation of similar behaviour among the analysed variables according to the constraints of the used dimension-reduction method.
For PCA and Isomap each component describes subsequently the correlation structure that is most prominent in Hydrol.Earth Syst.Sci., 22, 4401-4424, 2018 www.hydrol-earth-syst-sci.net/22/4401/2018/ the remainder of what has not already been assigned to the higher-ranked components.This implies that each component has to be interpreted with respect to the higher-ranked components.Also, the consideration of the respective other components in the interpretation of a component can be helpful to carve out its characteristics as it was done here with the residuals of the multiple linear regression of the respective three other components and the measured variables (e.g.Fig. S1).Beyond that, it can be helpful to elucidate the interaction of the components as it was done here for example for scores of the first and second component (Fig. 4b).
The sites differed substantially with respect to the median values of the four analysed multivariate components (Fig. 5).However, these components comprised the largest fraction of the inter-point distances at any single site with more than 18 samples (Table S4).We conclude that our results identified major regional phenomena rather than site-specific peculiarities.This is consistent with the prior assumption that there are a few dominant drivers which determine the main stream water and groundwater quality dynamics in the region.This provides some confidence to hypothesize that these drivers presumably play a major role even in adjacent catchments that have not been sampled so far.
To detect and characterize the dominant changes in the multivariate water quality data we explored whether there were shifts in time in specific components, whether they were linear or non-linear in nature and whether trends occurred at many or only at single sites.For example for the scores of the first component, the Mann-Kendall approach identified monotonic trends at various stream water sampling sites (Fig. 6).However, the linear trend estimation failed to detect the non-linear trend that was observed at many series (Fig. 8).This reflects the well-known sensitivity of global linear trend estimation to low-frequency patterns that are not entirely covered by the observation period (Koutsoyiannis, 2006;Milliman et al., 2008;Lins and Cohn, 2011).
The LOESS smooths of the de-seasonalized series, on the other hand, did clearly reveal the similarity between the long-term behaviour of groundwater level in the deep aquifer and the series of the first component.In our exploratory approach, the LOESS smooth of the de-seasonalized series served as a descriptive tool for illustrating rather than for proving non-linear long-term patterns.No significance test was applied.The outcome of the LOESS smoother highly depends on the parameterization of the approach (i.e. the degree of smoothness) that would have to be justified prior to the testing of significance.

Conclusions
We suggested and tested an exploratory approach for the detection of dominant changes in multivariate water quality data sets with irregular sampling in space and time.The combination of the selected methods aimed to provide a broadly applicable exploratory framework for typical existing monitoring data sets, e.g. from environmental agencies, which are often characterized by relatively low sampling frequency and irregularities of the sampling in space and/or time.In the approach, we applied a dimension-reduction method to derive multivariate water quality components and analysed their spatiotemporal features with respect to changes that concerned more than single sites, short-term fluctuations or single events.
The components can be used irrespective of an interpretation as drivers influencing water quality.By definition, the components are a compact description of the common dynamics among the water quality variables.Thus, similar behaviour in space and time among the water quality variables as well as systematic changes in the multivariate water quality data can be addressed in a purely descriptive manner.This can be used for example to test the often implicit assumption of constant boundary conditions of scientific process and modelling studies.Furthermore, the components and their spatiotemporal features per se can serve as reference for further studies, e.g.detailed process studies with higher temporal resolution, and the assessment of future developments of water quality in an area.In this study, the components were used to develop hypotheses on dominant drivers influencing water quality and to analyse the temporal and spatial variability of those influences.
It is emphasized that the presented approach is readily applicable with data from common monitoring programs without specific requirements concerning sampling frequency or regular distribution of sampling sites, sampling dates and sampling intervals, except that there should be no systematic bias in the respective distribution.Even variables which have to be excluded from the derivation of the components, for example because of the amount of missing values or because they have been monitored only at subsets of the sampling sites, can be related to the components as additional information for their interpretation.For example in this study we used the concentration of Fe 2+ and HCO − 3 in the groundwater as additional information for the interpretation of the second component.Thus, the approach allows an efficient use of existing monitoring data as well as the consideration of often neglected irregular pieces of data from for example pilot studies or single sampling campaigns.Irregularities in the structure of a data set are not seen as a fundamental hindrance, but as an additional source of information.We see this as a major advantage for the analysis of comprehensive water quality monitoring programs, both from a scientific perspective and from a more applied point of view of for example water resources and environmental agencies.Therefore, we recommend the approach especially for the exploratory assessment of existing long-term low-frequency multivariate water quality monitoring data sets.

Figure 1 .
Figure 1.Map of the study area.Coordinates of WGS84 / UTM zone 33N are given in m.(a) Stream water monitoring sites and the location of the study area (upper Ucker river catchment) within Germany.(b) Section with the included groundwater monitoring sites.For better readability only the number of the ID of the monitoring sites is shown.

Figure 2 .
Figure 2. (a) Sampling dates at the sites for the whole monitoring period.(b) Box plots of the variability of sampling intervals during the monitoring period.For better readability, the maximum of the x axis is limited to 180 days.The median (red) and mean (blue) of sampling intervals are shown separately for the groundwater and stream water sites.Grey vertical lines mark the 1-day, 1-week and 1-month interval.(a, b) The dashed horizontal line separates groundwater sites (bottom) from stream water sites (top).Labels: P, Peege; Q, Quillow; S, Strom; St, Stierngraben; U, Ucker; D, Dauergraben; Gs, shallow groundwater; Gd, deep groundwater.The number of samples at each site is given in brackets.Names of the sites with more than 50 samples are printed bold.

Figure 3 .
Figure 3. Spearman rank correlation of a component and the residuals of the multiple linear regression of the measured variable and the remaining three other components.

Figure 4
Figure 4. (a) Selection of variables versus scores of component 2 for the groundwater samples.Concentration in mg L −1 .(b) Scores of component 1 versus component 2 at the groundwater sites.

Figure 6 .
Figure6.Fraction of variance of the time series of the Isomap component scores of sites with N > 50 assigned to the seasonal pattern (dark grey) and the trend estimated by the linear Theil-Sen estimator (mid-grey) as well as the non-linear LOESS smooth (light grey).Fraction of variance is derived as the R 2 of the scores of the respective component with the seasonal pattern or the estimated trend.Only significant seasonal patterns and linear trends are shown.The sign of the linear Theil-Sen estimator is given in the respective line.The number of samples at each site is given in brackets.
(b) between the sampling sites the difference in concentration level of those variables.High values of the first component indicate synchronous appearance of relatively high Eh and EC together with relatively high concentration of NO − 3 , Cl − ,

Figure 7 .
Figure 7. (a) Time series of scores of the first component at site Q_93 (N = 126) in black and the seasonal pattern estimated with Lomb-Scargle in grey.(b) The de-seasonalized series in black and the non-linear trend estimated with LOESS in grey.
C.Lehr et al.:Detecting dominant changes in multivariate water quality

Table 2 .
Cumulated R 2 of the reproduction of the inter-point distances of the data in the projection by the first ten components of the best Isomap run and linear PCA.