Decomposing a time series into independent trend , seasonal and random components

Abstract. Many time series observations in hydrology and climate show large seasonal variations and it has long been common practice to separate the original data into trend, seasonal and random components. We were interested in using that decomposition approach as a basis for understanding variability in hydro-climatic time series. For that purpose, it is desirable that the trend, seasonal and random components are independent so that the variance of the original time series equals the sum of the variances of the three components. We show that the resulting decomposition with the trend component traditionally estimated either as a linear trend or a moving average does not produce components that are independent. Instead we introduce the rarely adopted two-way ANOVA model into studies of hydro-climatic variability and define the trend as equal to the annual anomaly. This traditional approach produces a decomposition with three independent components. We then use global land precipitation data to demonstrate a simple application showing how this decomposition method can be used as a basis for comparing hydro-climatic variability. We anticipate that the three-part decomposition based on the two-way ANOVA approach will prove useful for future applications that seek to understand the space-time dimensions of hydro-climatic variability.



Introduction
Many climatic and hydrologic time series contain large seasonal oscillations and it has long been standard practice to consider such time series as being composed of three components that include a long-term trend, a seasonal cycle (or seasonal oscillation) and a random component (Kendall et al., 1983, p. 429;von Storch and Zwiers, 1999).In practice the trend component is usually removed first using an approach such as (linear) trend removal (e.g., Kedem and Fokianos, 2002) or sometimes a moving average might be used (e.g., Adhikari and Agrawal, 2013).Other trend removal techniques are possible (e.g. higher order polynomial, exponential, etc.) depending on the nature of the time series.Once the trend component has been removed, the mean seasonal cycle is calculated and the remaining part of the original time series is assigned to the random component.The details are well known.
Applications of the time series decomposition vary but are usually directed towards analysis and forecasting.
One possible application of the three-part decomposition described above, that is yet to be fully explored in the climatic and hydrologic sciences is to provide a basis for understanding the variability of a time series.To give an example, assume we have a monthly precipitation time series that has been decomposed into the above-noted three components.Once done we can ask how much of the overall variability is due to each of the three parts.
Given that the precipitation time series is the sum of three components, then it follows that the total variance of the time series is simply the sum of the variances of the three components plus three additional terms that account for the covariances.If the three covariances were all zero, then the partitioning of the total variation between the components is greatly simplified since the total variance is just the sum of the variances of the three separate components.A time series decomposition with that property would potentially provide an extremely useful basis for preparing a climatology of the variability as opposed to a climatology of the mean.For example, imagine a precipitation time series.By decomposing the original time series into three independent components we could use a ternary diagram to display, in a single diagram, how the variability is partitioned between those three components.
The aim of this study is to investigate whether it is possible to identify a time series decomposition approach that separates a time series into the long-term trend along with seasonal and random components, where the covariances between the three components are all zero.In other words, the decomposition is such that the three components are independent.We use monthly precipitation data for various case studies but the underlying results are equally applicable to other variables (e.g., temperature, runoff, evapotranspiration, etc.).The paper begins by adopting the standard three-part decomposition described above where we adopt two widely-used methods to estimate the long-term trend.The first subtracts a linear trend while the second represents the trend as a moving average.We find that neither of these much-used approaches produces a time series decomposition with independent components.We then introduce a decomposition method based on the traditional two-way ANOVA model (e.g., Miller and Kahn, 1962;Sun et al., 2010) where the covariances are all zero.While the traditional two-way ANOVA model has been widely used in the analysis of scientific experiments it has received little attention for the analysis of hydro-climatic variability.To demonstrate the application, this approach is then applied to global land precipitation data to produce maps of the variability with the aim of showing the potential of the approach.

Precipitation Data
We use monthly rainfall data from site observations collected by the Australian Bureau of Meteorology (http://www.bom.gov.au/).We selected three sites to show a variety of different precipitation time series (Fig. S1).The first is at Darwin Airport (12.42 °S, 130.89 °E, data period: 1941Airport (12.42 °S, 130.89 °E, data period: -2017) ) located in northern Australia.
The precipitation at Darwin Airport has a distinct wet-dry season combined with a long-term upward trend in precipitation.The results for Darwin Airport are reported in the main text.In the supporting material we show results at two further sites with very different rainfall characteristics.The second site, Donnybrook (33.57 °S, 115.82 °E, data period: 1906Donnybrook (33.57 °S, 115.82 °E, data period: -2017) is located in a winter-dominant precipitation regime in southwest Australia and shows a long-term decline in precipitation.The final site, Cobar (Lerida) (31.70 °S, 145.70 °E, data period: 1883-1997) is located in the arid centre of New South Wales with precipitation highly variable from year to year but with no distinct seasonality and no long-term trend.
In a later part of the paper, we use a gridded global precipitation dataset prepared by the Climatic Research Unit (CRU, TS4.01 database, monthly, 1901-2016, global 0.5° ´ 0.5°) (Harris et al., 2014), to give an example of how the two-way ANOVA model can be used to categorize and compare variability.

Statement of the Problem
We use monthly precipitation time series (P(t)) for q years, and separate the time series into components that describe a long-term trend (P a (t)), monthly means (P m (t)) and a random residual component (P r (t)), such that, We test traditional time series decomposition methods and seek a method where the three covariances in Eq. ( 2) are all zero.

Evaluating Two Widely-Used Time Series Decomposition Methods
In this section we use monthly time series for precipitation at Darwin to evaluate whether two widely-used methods produce decompositions where the individual components are independent (i.e., covariances are zero).
The original data for Darwin cover the period 1941-2017, but we report the decomposition for the shorter period 1942-2016 to account for the loss of data at either end due to the moving average procedure (section 4.2).

Time Series Decomposition Using Linear Trend Removal
On this approach the mean of the time series is first subtracted and a linear regression is fitted to the monthly anomalies.The resulting regression is then used to calculate the long-term trend component which is The resulting variance-covariance matrix is shown in Fig. 1e.The overall (temporal) variance of the original time series is 33716.12(mm mon -1 ) 2 .The results show that the variances of the three terms do sum to the total temporal variance since the least squares estimation is used in the linear regression making the covariances all sum to zero.However, the individual covariances are not all zero.Actually, when the slope of the linear regression is not zero (not a constant time series), the covariances between three decomposed components are also not zero.

Time Series Decomposition Using Moving Average Trend Removal
On this approach the calculation is as before except that a moving average is used to represent the long-term trend component.In general, one could use a moving average of any period, e.g.months-years-decades.We use a 24 month moving average but the same general conclusions will hold for other periods.The results for Darwin are shown in Fig. 2. (See Figs.S4, S5 for equivalent results at Donnybrook and Cobar.) The resulting variance-covariance matrix is shown in Fig. 2e.Here, the covariances are substantial.For example, the covariance of the trend and monthly mean components (cov(P a , P m ) = 864.00(mm mon -1 ) 2 ) is actually larger than the variance of trend component ( a 2 P s = 581.34 (mm mon -1 ) 2 ).The conclusion is that the moving average method is not suitable for the intended purpose.

Summary
The above evaluation of two widely used traditional methods shows that while the covariances between the three components were generally (but not always, e.g.covariance value between moving average and monthly mean components in Fig. 2) small, they were not zero.In the next section, we show a three-part decomposition method with the desired property that the covariances between the three component are zero.

Introducing a Time Series Decomposition Method based on a Two-way ANOVA Model
On further investigation we realised that a traditional two-way analysis of variance (ANOVA) model (e.g., Miller and Kahn, 1962) which has been widely adopted in designing agricultural experiments (e.g., Clewer and Scarisbrick, 2001), would meet the criteria we set, i.e., the three components were independent.Briefly, the temporal mean of the entire (monthly) time series is first subtracted and the anomaly for each year is calculated.The resulting variance-covariance matrix is shown in Fig. 3e.The covariances are all zero, which demonstrates that the overall temporal variance (Fig. 3a, 2 P s = 33716.12 (mm mon -1 ) 2 ) is the sum of the variances of the three independent components.(The same result holds at the Donnybrook and Cobar sites, see Figs.S6 and S7.)We further include a mathematical proof (see Appendix) that the covariances are zero in all cases using this approach.We conclude that a time series decomposition based on the traditional two-way ANOVA model has the desired properties.

Variability in Global Precipitation
We use a global land precipitation database to demonstrate an application of the traditional two-way ANOVA model decomposition described above.The data are from the CRU database (monthly, 1901-2016, 0.5° ´ 0.5°) where we have calculated the overall temporal variance at each grid-box (Fig. 4a) as well as the percentages of the total variance due to the annual anomaly (Fig. 4b), monthly (Fig. 4c) and random (Fig. 4d) components.
(The variances for each component are shown in Fig. S8.) Inspection of Fig. 4a shows that the largest temporal variance of precipitation is generally near the equator.In tropical Africa and South America, that variation is dominated by the monthly component (Fig. 4c) highlighting a key point that in these regions the random component of (monthly) precipitation is a relatively small fraction of the total precipitation.However, that result is not universal throughout the tropics.For example, several regions throughout South East Asia (e.g., Indonesia, Malaysia) show the opposite pattern with a low fraction of the total variance due to the monthly (seasonal) component (Fig. 4c) and a correspondingly large fraction due to the random component.Presumably those parts of South East Asia would also be more drought-prone compared to tropical Africa and South America.Another key feature is that the fraction of the total variation explained by the annual (trend) component is small everywhere (Fig. 4b).
To further demonstrate the utility of the approach, we use a ternary diagram to show the fractional partitioning of the total variance to the three components (Fig. 5).Note that this is only possible because the three components are independent.In future work we plan a much more comprehensive assessment of hydro-climatic variability using this approach.

Discussion and Conclusion
Decomposition of a time series into trend, seasonal and random components has long been used in many disciplines including studies in hydrology and climate.The emphasis in those studies is often on analysis and forecasting.However, we were interested in investigating variability and for that application the central attribute of the chosen decomposition method was whether the covariance between the three components would be zero.
If that were to hold then the total variance would be the sum of the variances of the three components, which would eliminate the potential complexity arising from the covariance components.
On investigation we found that the two most commonly-used methods for removing the trend (linear and moving average) will not generally produce components that are independent (Fig. 1, 2).Interestingly, in the example precipitation time series used here, the moving average approach often produced a covariance between the trend (24-month moving average) and monthly components that exceeded the variance of trend component (Figs 2, S4).That approach is clearly not suitable for our intended application.In contrast the linear trend often produced small covariances with the added feature that the covariance of the trend and monthly components (cov(P a , P m )) was the same magnitude but opposite sign from the covariance of annual and random components (cov(P a , P r )).This pattern occurs as a design feature of the linear regression method.In particular, the linear regression produces a trend component (P a ) and a remainder (P m + P r ) that are independent by design (i.e., cov(P a , P m + P r ) = 0).This leads directly to the above-noted cancellation (i.e., cov(P a , P m ) + cov(P a , P r ) = 0), but the individual covariances are generally not zero.In contrast the classic two-way ANOVA model separates a time series into trend, monthly and residual components and was designed to preserve independence among those three components.However, that classic method has not, to our knowledge, generally adopted to investigate the variability in the hydro-climatic time series.Our numerical results (Fig. 3, S6, S7) and mathematical proof (Appendix) that the three components are independent demonstrate the utility of this method in decomposing a time series for studies on variability.One important point is that the seasonal component (here defined as monthly) repeats over all years of the time series.
Hence caution is needed in applying this approach when it is known that the amplitude of the seasonal component is changing with time, such as for example, as has been observed for the seasonal cycle of atmospheric CO 2 (Zeng et al., 2014;Piao et al., 2017).
As an application, we applied the two-way ANOVA model to explore the variability in global precipitation.The temporal variance of precipitation is clearly separated into distinct regimes.In one regime, the total variance is dominated by the monthly means (seasonal component) while the other regime is dominated by the random (residual) component.This separation shows good agreement with previous studies based on different approaches that investigate the predictability of precipitation (Jiang et al., 2016 and2017).In particular, those regions with a high predictability of precipitation also have a high fraction of the total variance that is due to the seasonal component.We expect that a separation of the variance based on this approach will prove useful for many other applications, especially in studies seeking to understand hydro-climatic variability.
We evenly distribute the annual mean anomaly in l th year (see Eq. ( A12)) to all p months in the same year to define P a (t) as, P a (t) = [u a (1) − P(t),! ,u a (1) − P(t) We obtain the monthly mean component P m (t) by repeating u m (k) (see Eq. ( A4)) for all q years as follows, ] q th year (A14) With P(t), P a (t) and P m (t) now all defined, P r (t) is the residual component, P r (t) = P(t) − P a (t) − P m (t) and substituting from Eqs. (A2), ( A13) and (A14) we have, ] q th year (A16) Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-601Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 7 December 2018 c Author(s) 2018.CC BY 4.0 License.
subsequently removed.The monthly means are then calculated and the random component is set equal to the remainder.The results for Darwin are shown in Fig. 1. (See Figs.S2, S3 for equivalent results at Donnybrook and Cobar.) Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-601Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 7 December 2018 c Author(s) 2018.CC BY 4.0 License.
Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-601Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 7 December 2018 c Author(s) 2018.CC BY 4.0 License.The long-term trend component in each month is calculated by evenly distributing the annual anomaly in each year to every month in the same year.Once the trend component is extracted from the original time series, the monthly means are calculated and the random component is set equal to the remainder.It should be noted that in the traditional two-way ANOVA model, the original time series is actually decomposed into four components, i.e., long-term mean (constant), net (or centred) annual and monthly components (that have zero means) and the residual component.In this study, we combine the long-term mean and centred monthly component in the twoway ANOVA model to produce the monthly means component.The results for Darwin are shown in Fig. 3. (See Figs.S6, S7 for equivalent results at Donnybrook and Cobar.) Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-601Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 7 December 2018 c Author(s) 2018.CC BY 4.0 License.

Figure 1 .
Figure 1.Decomposition of monthly precipitation time series at Darwin (1942-2016) using linear trend removal.

Figure 3 .
Figure 3. Decomposition of monthly precipitation time series at Darwin (1942-2016) using the two-way ANOVA model.

Figure 4 .
Figure 4. Variability of global land precipitation based on the CRU database (1901-2016) using the two-way ANOVA model.

Figure 5 .
Figure 5. Ternary diagram showing decomposition of the temporal variance into the three independent components using the two-way ANOVA model.

Figure 1 .
Figure 1.Decomposition of monthly precipitation time series at Darwin (1942-2016) using linear trend removal.Panels show the (a) original observations (P), (b) linear trend (P a ), (c) monthly means (P m ), (d) residual random component (P r ) and the (e) variance-covariance matrix for the three components (P a , P m and P r ).

Figure 2 .
Figure 2. Decomposition of monthly precipitation time series at Darwin (1942-2016) using 24-month moving average trend removal.Panels show the (a) original observations (P), (b) 24-month moving average trend (P a ), (c) monthly means (P m ), (d) residual random component (P r ) and the (e) variance-covariance matrix for the three components (P a , P m and P r ).

Figure 3 .Figure 5 .
Figure 3. Decomposition of monthly precipitation time series at Darwin (1942-2016) using the two-way ANOVA model.Panels show the (a) original observations (P), (b) annual anomaly (P a ), (c) monthly means (P m ), (d) residual random component (P r ) and the (e) variance-covariance matrix for the three components (P a , P m and P r ).
A.1.2Mean of P a (t), P m (t) and P r (t)