Higher statistical moments and an outlier detection technique as two alternative methods that capture long-term changes in continuous environmental data

Introduction Conclusions References


Introduction
Variability is a fundamental feature that shapes ecological and evolutionary processes, yet most traditional statistics tend to be based on overly simplistic assumptions about variability.For example, many commonly used statistical approaches depend on the assumptions of normality and equality of variance, hardly the norm in nature.Application of such methods requires the use of statistical transformations (e.g., logarithmic or ranks) that force variability out of the data, arguably eliminating potentially interesting Figures variability.This issue is particularly troublesome for understanding temporally continuous phenomena, where measures of central tendency (e.g., the mean or median) convey little information about the overall pattern of variability.Indeed, previous studies have illustrated the importance of understanding continuous phenomena based on their temporal patterns (Colwell, 1974) and the behavior of extremes (Gaines and Denny, 1993), rather than traditional descriptors of central tendency.
Here, we explore two approaches that quantify and visualize changes in empirical distributions of continuous environmental variables over time focusing on natural phenomena that can be described as "regimes".Typically regimes are quantified in terms of the magnitude, frequency, duration, and timing of events, which seems simple enough, but such has not been the case in practice.Consider, for example, the case of stream flow regimes, where well over 150 partially redundant statistical descriptors have been derived (Olden and Poff, 2003).To complicate matters further, such regimes can be partitioned further into different temporal frames or grains (e.g., Steel and Lange, 2007;Arismendi et al., 2013a).If one wishes to efficiently provide an overall description of a regime or to envision changes in regimes over time without resorting to a plethora of descriptors, the solutions are not immediately obvious.
We use two approaches to address the general problem of describing regimes and their temporal changes, using thermal regimes of streams as an illustrative example.First, using frequency analysis we examine patterns of variability and long-term shifts in stream temperature using higher statistical moments (skewness and kurtosis) of empirical distributions by season across decades.Second, we combine non-metric multidimensional scale ordination technique (N-MDS) and highest density regions (HDR) plots to detect anomalous years.To illustrate the utility of these approaches, we apply them to contrast predictions and questions about long-term responses of thermal regimes of streams to changing terrestrial climates and other human-related water uses (Fig. 1).Figures

Back Close
Full

Thermal regime of streams as an illustrative example
Temperature is a fundamental driver of ecosystem processes in freshwaters (Shelford, 1931;Fry, 1947;Magnuson et al., 1979;Vannote and Sweeney, 1980).Short-term (daily/weekly/monthly) descriptors of mean and maximum temperatures during summertime are frequently used for characterizations of thermal habitat availability and quality (McCullough et al., 2009), definitions of regulatory thresholds (Groom et al., 2011), and predictions about possible influences of climate change on streams (Mohseni et al., 2003;Mantua et al., 2010;Arismendi et al., 2013a, b).These simple descriptors can serve as useful first approximations, but do not capture the full range of thermal conditions that the aquatic biota experience at daily, seasonal, or annual intervals (see Poole and Berman, 2001;Webb et al., 2008).Both human impacts and climate change have been shown to affect thermal regimes of streams at a variety of temporal scales, across seasons or years (e.g., Steel and Lange, 2007;Arismendi et al., 2012Arismendi et al., , 2013a, b), b).For example, the recent warming climate may lead to different responses of streams that may not be captured only using average or maximum temperature values (Arismendi et al., 2012).Daily minimum stream temperatures in winter are showing more warming than daily maximum values during summer (Arismendi et al., 2013a; for air temperatures see Donat and Alexander, 2012).In human modified streams, seasonal shifts in stream temperatures and earlier warmer temperatures have been recorded following removal of riparian vegetation (Johnson and Jones, 2000).
However, simple threshold descriptors cannot characterize these shifts.Using the first approach (higher statistical moments), we examine the question of whether a recent warming climate has led a shift in the shape of the stream temperature distribution or if stream temperature has simply moved entirely to the right without any change in shape.In addition, we compare these potential shifts in the distribution of stream temperature between water regulated and unregulated streams.Using the second approach (outlier detection technique), we address the question of whether anomalous years are repeatedly detected across streams types (regulated and unregulated) and Introduction

Conclusions References
Tables Figures

Back Close
Full examine if those anomalous years represent a regional influence of the climate or alternatively highlight the importance of local factors.Previous studies have shown that detecting long-term changes of thermal regimes of streams is complex and the use of only traditional statistical approaches may mislead a variety of responses of ecological relevance (Arismendi et al., 2013a, b).
2 Material and methods

Study sites and time series
We selected long-term gage stations (US Geological Survey and US Forest Service) that monitored year-round daily stream temperature in Oregon, California, and Idaho (n = 10; Table 1).The sites were selected based on (1) availability of continuous daily records for at least 31 years (1 January 1979 to 31 December 2009) and (2) complete information for time series of daily minimum (min), mean (mean), and maximum (max) stream temperature for at least 93 % of the period of record.Half of the sites (n = 5) were located in unregulated streams (sites 1-5) and the other half were in regulated streams (sites 6-10).Regulated streams were those with reservoirs constructed before 1978.Time series were carefully inspected and for the outlier analysis only (see below) we interpolated missing data following Arismendi et al. (2013a).The percentage of daily missing records of each time series was less than 7 %.To ensure enough observations to adequately represent the tails of the respective distributions at a seasonal scale for analyses of higher statistical moments (i.e., winter: December-February; spring: March-May; summer: June-August; fall: September-November), we grouped and compared daily stream temperature data at each site among the three decades 1980-1989, 1990-1999, and 2000-2009. Figures . Figures Back Close Full

Higher statistical moments
To compare stream temperatures across sites, we standardized time series of daily temperature values using a Z-transformation as follows: where ST i was the standardized temperature at day i , T i was the actual temperature value at day i ( • C), µ was the mean and σ was the standard deviation of the respective time series considering the entire time period.Higher statistical moments of skewness and kurtosis are often considered problematic in parametric statistics, where data is often assumed to be normal.In reality, however, these moments can be useful to describe changes in environmental variables over long-term periods (see Shen et al., 2011;Donat and Alexander, 2012).Skewness addresses the question of whether or not a certain variable is symmetrically distributed around its mean value.With respect to temperature, positive skewness of the distribution or skewed right indicates colder conditions are more common (Fig. 1c) whereas negative skewness or skewed left represents increasing prevalence of warmer conditions (Fig. 1a).Therefore, increases in the skewness over time could occur with increases in warm conditions, decreases in cold conditions, or both.
Kurtosis describes the structure of the distribution between the center and the tails representing the dispersion around its "shoulders".In other words, as the probability mass decreases around its shoulders it may increase in either the center, or the tails, or both resulting in a rise in the peakedness, the tailweight, or both and thus, the dispersion of the distribution around its shoulders increases.The reference standard is zero, a normal distribution with excess kurtosis equal to kurtosis minus three (mesokurtic).A sharp peak in a distribution that is more extreme than a normal distribution (excess kurtosis exceeding zero) represented less dispersion in the observations over the tails (leptokurtic).Distributions with higher kurtosis tend to have "tails" that are more accentuated.Therefore, observations are spread more evenly throughout the Introduction

Conclusions References
Tables Figures

Back Close
Full tails.A distribution with tails more flattened than the normal distribution (excess kurtosis below zero) described higher frequencies spread across the tails (platykurtic).With respect to temperature, a leptokurtic distribution may indicates that average conditions are much more frequent and there is a lower proportion of both extremes cold and warm values (Fig. 1c).A platykurtic distribution represents a more evenly distributed distribution across all values with a higher proportion of both extreme cold and warm values (Fig. 1a).Therefore, increases in the kurtosis over time would occur with decreases in extreme conditions, increases of average conditions, or both.Skewness and excess kurtosis are dimensionless and were estimated as follows: where n represented the number of records of the time series, T i was the temperature of the day i , µ and σ the mean and standard deviation of the time series.
To define the status of the skewness for the stream temperature distribution in a particular season and decade, we used two criteria.First, we classified the amount of skewness in three categories following Bulmer (1979): "highly skewed" (if skewness was < −1 or > 1), "moderately skewed" (if skewness was between −1 and −0.5 or between 0.5 and 1), and "symmetric" (if skewness was between −0.5 and 0.5).Second, we statistically tested whether the skewness coefficient was different from zero following Cramer (1998):

Conclusions References
Tables Figures

Back Close
Full where SES was the standard error of skewness, Z Skewness the test statistic, and n the number of records of the time series.The critical value of Z Skewness was approximately 2 (two-tailed test of skewness = 0 at significance level of 0.05).If Z Skewness was < −2, the temperature distribution was likely skewed negative ("negative skewed"), if Z Skewness was > 2, the temperature distribution was likely skewed positive ("positive skewed").
However, if Z Skewness was between −2 and 2 we could not reject the null hypothesis that the distribution was skewed ("non-significant").We also used similar procedures to test whether the excess kurtosis was different from zero following (Cramer, 1998): where SES was the standard error of skewness, SEK the standard deviation of excess kurtosis, Z Kurtosis the test statistic, and n represented the number of records of the time series.If Z Kurtosis was < −2, the temperature distribution likely had "negative excess kurtosis or platykurtic", if Z Kurtosis was > 2, the temperature distribution likely had "positive excess kurtosis or leptokurtic".Finally, if Z Kurtosis was between −2 and 2, we could not reject the null hypothesis ("non-significant").We computed higher statistical moments using R ver.2.15.1 (R Development Core Team, 2012).

Outlier detection procedure
We explored and examined features expressed by thermal regimes of streams, which may not be captured using only traditional approaches of summary statistics (for streamflow see Chebana et al., 2012).We considered an entire year as one finitedimensional observation (365 days of daily minimum stream temperature).Using temperatures for each day within a year across all years.The N-MDS analysis places each year in a multivariate space in the most parsimonious arrangement (relative to each other) with no a priori hypotheses.Based on an iterative optimization procedure (999 random starts) we minimize a measure of disagreement or stress between their distances in 2-D (for a detailed explanation see Kruskal, 1964).The resulting coordinates 1 and 2 from the 2-D plot provided a collective index of how unique a given year was (Fig. 1b and d).In N-MDS the order of the axes was arbitrary and the coordinates represented no meaningful absolute scales for the axis.Fundamental to this method is the relative distances apart of points with a higher proximity indicating a higher degree of similarity, whereas more dissimilar points are positioned further apart.We performed the N-MDS analyses using the software Primer ver.6.1.15(Clarke, 1993;Clarke and Gorley, 2006).
Using the two coordinates of each point (year) from the 2-D plot originated in the N-MDS ordination procedure, we created a bivariate high dimensional region (HDR) box-plot (Hyndman, 1996).The HDR plot has been typically produced using the two main principal component scores from a traditional principal component analysis (PCA) (Hyndman, 1996;Chebana et al., 2012).However, is this study, we modified this procedure taking the advantage of the higher flexibility and lack of assumptions of the N-MDS analysis (Everitt, 1978;Kenkel and Orloci, 1986) to provide the two coordinates needed to create the HDR plot.In the HDR box-plot, there are regions defined based on a probability coverage (e.g., 50; 90; or 95 %) where all points (years) within the probability coverage region have higher density estimates than any of the points outside the region (Fig. 1b and d).The outer-region of the probability coverage region is bounded by points representing anomalous years (in Fig. 1b and d see outlier years).We created the HDR plots using the package "hdrcde" (Hyndman et al., 2012) in R ver.

Results
Stream temperature empirical distributions were distinctive among seasons, but seasons were relatively similar across sites (Fig. 2).Winter had the narrowest range and highest frequency of observations at colder standardized temperature categories (−1.3, −0.7).The second highest proportion of observations in the year were also colder values occurring during spring in unregulated streams and during summer at four of the five regulated sites.This shift of frequency was likely due to release of warmer water from the reservoir management upstream.Temperature distributions during winter had high overlap with those during spring.Fall distributions showed broadest range, with a similar proportion for a number temperature values.
Changes in the shape of empirical distributions among seasons over decades were not immediately evident, but the state of skewness and kurtosis captured this changes in cases when lower statistical moments (average and standard deviation) did not show differences (e.g., site 1 during fall and summer in Fig. 3; Tables 2 and 3; Supplement).The utility of combining skewness and kurtosis to detect changes in distributional shapes over time is illustrated by unregulated sites 1 and 2 during fall (Tables 2  and 3; Supplement).At these sites, there were shifts between the last two decades from negatively skewed to more symmetrical distributions and from mesokurtic to platykurtic states suggesting a proportion of the probability mass moved from center into the tails due to recent less extreme cold and more warm conditions.Overall, in most unregulated sites, kurtosis changed its state with recent increases during winter, summer, and spring (Table 3; Supplement).Winter and summer mostly had negatively skewed distributions whereas spring generally had positively skewed distributions or those with little change across decades, except for site 3 (Table 2; Supplement).Decadal changes in both skewness and kurtosis during winter and summer observed in unregulated sites suggest the probability mass moved from its shoulders into warmer values at its center, but maintained the tail-weight of the extreme colder conditions (Fig. 3; Tables 2  and 3; Supplement).In spring, however, the probability mass diminished around its Introduction

Conclusions References
Tables Figures

Back Close
Full shoulders apparently due to decreases in the importance of extreme colder conditions.Collectively, these findings illustrate how higher statistical moments may describe the complexity of temporal changes in stream temperature among seasons and highlight how shifts may occur at different portions of the distribution (e.g., extreme cold, average, or warm conditions).
In regulated sites, we observed shifts toward colder temperatures (e.g., sites 6 and 9 during summer and fall in Fig. 3; Supplement) suggesting local influences of water regulation mask climate-related impacts.This illustrated mixed effects of skewness and kurtosis due to climate and water regulation, especially during spring, winter, and summer (Tables 2 and 3; Fig. 3; Supplement).In particular, in spring, patterns of skewness were similar to unregulated sites whereas patterns of kurtosis were in opposite directions (more platykurtic in regulated sites).This can be explained by the water discharged from reservoirs in spring that was a mix of the cool inflows to the reservoir, the cold water stored in the reservoir itself from the winter, and yet the surface of the reservoir warmed because of increasing solar radiation.Patterns of skewness and kurtosis seen in regulated sites also highlights the influences of site-dependent water management coupled with climatic influences.This is illustrated by the skewness of sites 7 and 8 compared to sites 9 and 10 in winter (Table 2) and the high variability of the state of skewness among sites in summer.
Outliers representing years (Fig. 4; Supplement) were identified as points outside 95 % confidence intervals (CI).During the period , year 1992 was identified as an outlier at five sites (or eight sites at 90 % CI) and years 1987 and 2008 were outliers in four sites.Most unregulated sites had between two and four outlier years (or between four and five at 90 % CI), whereas regulated sites had two or less (or between three and four at 90 % CI, except site 10 which had seven), a result consistent with the notion that the reservoirs buffer extremes and homogenize temperatures among years.
The confidence region for unregulated sites appeared to be more irregularly shaped than regulated sites which suggests that stream regulation may tightly cluster and homogenize years buffering the influence from extreme regional climatic conditions.Figures

Back Close
Full The outlier-detection method captured years with anomalies in either magnitude or timing of events (Fig. 4; Supplement).For example, year 1992 and 1987 were outliers likely due to magnitude of warming throughout year.In other sites, such as unregulated sites 3, 4 and 5, the outlier years were most likely due to increased temperatures in seasons other than summertime, and not related to higher summertime temperatures.

Discussion
Here we show the utility of using higher statistical moments and outlier detection as alternative approaches to capture long-term changes in empirical distributions of environmental regimes and whether if these changes are consistent across site types.Stream ecosystems are exposed to multiple climatic and non-climatic forces which may differentially affect their hydrological regimes (e.g., temperature and streamflow).In particular, we show that potential timing and magnitude of responses of stream temperature to both the recent warming climate and other human-related impacts vary among seasons, years, and across sites.Central tendency statistics may or may not capture these alterations on thermal regimes which could be relevant to infer their ecological and management implications.For example, by increasing both extreme cold and warm conditions, but maintaining average values.Increased understanding of the shape of empirical distributions by season or year will also help researchers and resource managers evaluate potential impacts of shifting environmental regimes on organisms and processes across a range of disturbance types.Empirical distributions are a simple, but comprehensive way to examine high frequency measurements that include the full range of values.Higher statistical moments provide useful information to characterize and compare regimes and can show which season could be most responsive to disturbances.This could help improve predictive models of climate change impacts by incorporating full regimes into scenarios.
The outlier detection technique used here take advantage of all available information and represents a more complete and realistic view of environmental regimes.When Figures

Back Close
Full single metrics are used to describe environmental regimes they have to be selected and thus, information must be compressed.Often selection means simplification resulting in the compression or loss of information (e.g., Arismendi et al., 2013a).By examining the whole empirical distribution of temperature we can provide a better characterization of shifts over time or following other disturbances than simple thresholds or descriptors.In particular, our findings suggest a differential resilience of unregulated streams to the recent warming climate that could be likely related to local conditions of watersheds (e.g., Arismendi et al., 2012).Regulated and unregulated sites located in the same watershed (sites 2, 7, and 8 in Table 1 and Supplement) may share similar outlier years (e.g., year 2008 as a cold-water outlier) suggesting strong climatic influences during those years.However, when sites are spatially located close to one another (unregulated sites 3 and 4 in Table 1) they may not necessarily share the same outlier years (they share only year 1987) likely because local drivers may be more important than regional climate forces.By using the outlier detection technique, we illustrate their utility to describe regional vs. local influences of climate on streams.For example, a differential vulnerability of streams to regional or local climate changes by characterizing years with extreme conditions or those when seasonal shifts occurred.
In conclusion, our two approaches complement traditional summary statistics by helping to explain long-term continuous environmental variable behaviors for seasons and years.We illustrate this using temperature of streams in unregulated and regulated sites as an example.In particular, we show water regulation may mask climate related influences.Using cold-water mountain streams from similar regions, we characterize responses and changes in thermal regimes that are useful in representing the influences of both local impacts and regional climatic forcing.Although we did not include a broad range of stream types, our analysis of stream temperatures within the set of streams considered herein was sufficient to illustrate the utility of the two approaches.air temperature see Shen et al., 2011).These analyses will be useful to characterize how regimes of continuous phenomena have changed in the past, may respond in the future, or to identify the type and timing of their resilience.Full

Supplementary material related to
Full Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | a non-metric multidimensional scaling (N-MDS) unconstrained ordination technique, we compared the similarity among years of the Euclidean distance of standardized Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | The two approaches are transferable to other types of continuous environmental variable measurements and regions to examining seasonal and annual responses, and climate or human-related influences (e.g., for streamflow see Chebana et al., 2012; for Introduction Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Fig. 1 .Fig. 2 .Fig. 3 .Fig. 4 .
Figure 1 480 Fig. 1.Conceptual diagram showing hypothesized long-term responses of water temperature at both seasonal (upper panel) and annual (lower panel) scales in unregulated (left panel) and regulated (right panel) streams.In the upper panel we showed examples of changes in skewness and kurtosis for temperature distributions affected by a warming climate and stream regulation in a given season.For instance, less cold temperatures and an overall shift toward warming values may occur in unregulated streams (a) whereas in (c) regulated streams the influence of the reservoir may reduce both extreme cold and warm temperatures confounding the effect from the climate.In the lower panel we illustrate the use of N-MDS and HDR plots for detecting outlier years in regulated and unregulated sites (the shaded area represent a given coverage probability).Outliers 1 and 2 represent anomalous years since they are located in the outer or the confident region.For instance, in (b) unregulated streams individual years are less clustered due to more heterogeneous responses to the warming climate whereas in (d) regulated streams they are more clustered due to the reservoir may homogenize temperatures across years.

Table 1 .
Location and characteristics of unregulated (n = 5) and regulated (n = 5) streams at the gaging sites.Percent of gaps in the stream temperature time series from January 1979 to December 2009 used in this study.

Table 3 .
State of kurtosis of probability distributions of daily minimum stream temperature by season and decade at unregulated and regulated sites.Platykurtic distributions are indicated by "↔" and leptokurtic distributions indicated by " ".Mesokurtic distributions and non-significant kurtosis of distributions are not shown (see Supplement for more details).