Simulations of regional or global climate models are often used for climate change impact assessment. To eliminate systematic errors, which are inherent to all climate model simulations, a number of post-processing (statistical downscaling) methods have been proposed recently. In addition to basic statistical properties of simulated variables, some of these methods also consider a dependence structure between or within variables. In the present paper we assess the changes in cross- and auto-correlation structures of daily precipitation in six regional climate model simulations. In addition the effect of outliers is explored making a distinction between ordinary outliers (i.e. values exceptionally small or large) and dependence outliers (values deviating from dependence structures). It is demonstrated that correlation estimates can be strongly influenced by a few outliers even in large datasets. In turn, any statistical downscaling method relying on sample correlation can therefore provide misleading results. An exploratory procedure is proposed to detect the dependence outliers in multivariate data and to quantify their impact on correlation structures.
The investigation of climate change impact on the hydrological cycle is one of the crucial topics in the field of water resources management and planning (Mehrotra and Sharma, 2015). Simulations of regional and global climate models (RCMs and GCMs) represent a fundamental data source for climate change impact studies. It is well known that raw climate model outputs cannot be used directly in impact studies due to inherent biases which are found even for basic statistical properties (Chen et al., 2015). The bias is caused primarily by a simplified representation of important physical processes (Solomon et al., 2007), which often results from low spatial resolution of the RCMs.
Therefore, many methods have been developed to post-process the climate model outputs in order to move their statistical indicators closer to observations. An overview of these methods is presented, e.g. by Maraun et al. (2010). Precipitation is a key input into hydrological climate change impact studies and at the same time it belongs to meteorological variables that are most affected by bias. The comparison of correction methods commonly used for precipitation data is provided by Teutschbein and Seibert (2012). Nevertheless, these standard methods correct only the bias in statistical indicators (mean, variance, distribution function) of individual variables. The bias in persistence parameters of time series as well as the bias in cross-dependence structures between variables is often neglected. However, the dependence structures of the meteorological variables affect the hydrological response of a catchment (Bárdossy and Pegram, 2012), and thus their inadequate representation in the data can impair hydrological impact studies (Teng et al., 2015; Hanel et al., 2017).
In recent years several studies attempted to overcome this limitation. Hoffmann and Rath (2012) and Piani and Haerter (2012) focused on the relationship between precipitation and temperature data from a single location. Bárdossy and Pegram (2012) developed two procedures correcting a spatial correlation structure of RCM precipitation. Mao et al. (2015) proposed a stochastic multivariate procedure based on copulas. Johnson and Sharma (2012) developed a procedure correcting common statistics (mean, variance) together with lag-1 autocorrelation in multiple timescales. The procedure was later extended with a recursive approach by Mehrotra and Sharma (2015) and subsequently with a non-parametric quantile mapping by Mehrotra and Sharma (2016) to correct the bias in auto- and cross-dependence structures across multiple timescales. An approach based on the principal components was presented by Hnilica et al. (2017), correcting bias in cross-covariance and cross-correlation structures.
This study is focused on a temporal stability of dependence structures. We evaluate the temporal changes in cross-and auto-correlation structures in multivariate precipitation data simulated by an ensemble of climate models. We further investigate whether the magnitude of the changes considerably exceeds the natural variability. Attention is finally paid to the effect of outlying values, which can significantly affect the correlations and can thus lead to artefacts in bias-corrected time series.
The paper is organised as follows. In Sect. 2 the data used in this study are presented and Sect. 3 describes the methodology. In Sect. 4 the results are reported and in Sect. 5 their consequences for climate changes impact studies are discussed.
The daily precipitation data from six EURO-CORDEX (Giorgi et al., 2009)
regional climate models were considered. The ensemble of models was composed
of two RCMs (CCLM,RCA) driven by three GCMs (EC_EARTH, HadGEM2-ES and MPI-ESM-LR); see Table 1 for the overview. The simulations
with 0.11
Global and regional climate models used in the present study.
Location of the considered grid boxes in the Czech Republic.
The numbering of individual pairs of grid boxes. The figure depicts the correlation matrix, the orders of rows and columns correspond to the grid box labels from Fig. 1. The sub-diagonal part of the (symmetrical) matrix was used for the numbering of individual pairs of grid boxes – the numbers inside of the matrix represent the identifiers used in Fig. 4.
Overview of the changes in correlation structures for all models:
The wet and dry periods were treated separately in this study. The
cross-correlations were calculated in two stages. Firstly the binary
cross-correlations were calculated to assess the correspondence of wet and dry
periods, using the time series with the values replaced by 0 (dry day) or by
1 (wet day). In the second stage the cross-correlations of overlapping wet
periods were calculated. The auto correlations were analysed through the
lag-1 auto-correlation coefficient, where only the non-zero pairs of
neighbouring values
The individual grid boxes were labelled by numbers 1–12, as shown by labels
in Fig. 1. The cross-correlation between the grid boxes
The changes in correlation coefficients were calculated as
The sampling variability of individual cross- and auto-correlation was
investigated to assess the statistical significance of their changes. The
confidence intervals were derived using the block bootstrap approach
(Davison and Hinkley, 1997). Specifically, the confidence interval around
the correlation One-year blocks from the time series for basins Step 1 was repeated 1000 times. The 95 % confidence interval was derived as a range between the 0.025 and
0.975 quantiles of the resampled correlations.
The block approach was chosen to preserve seasonal variability in the
bootstrap samples. For the presentation of confidence intervals, the unique
identifier (ID) was assigned to each pair of grid boxes, and the numbering was
done according to rows of correlation matrix; the scheme is depicted in Fig. 2. The confidence intervals for auto-correlation were derived in the same
way using 1-year blocks of time series. Due to random selection of the
blocks, the beginning part of the blocks is independent on the end of the
previous block. To minimise bias introduced by block resampling, data that
are potentially influenced (joints of the adjacent blocks) were not
considered for the calculation of the serial correlation.
The confidence intervals around the correlations from control and future period were used to visually assess their overlap. In addition, the real bootstrap-based tests of significance of individual changes were performed. In each of the thousand steps the correlation of resampled control data was subtracted from the correlation of resampled future data. The change was found to be insignificant if the confidence interval of these differences contained zero.
The 95 % confidence intervals of the individual
cross-correlation coefficients for overlapping wet periods for all models.
The identifiers of grid-box pairs (ID) are explained in Fig. 2. The blue
lines separate identifiers located in successive rows in the correlation
matrix (see Fig. 2). The arrow marks the confidence intervals around the
In the case of 12-dimensional data, the change in the cross-correlation
structure consists of changes in
Figure 3a and b present the changes in the binary cross-correlations and
in the cross-correlations of wet periods, respectively. As seen from the
figures, the binary correlations are relatively stable; their changes range
approximately from
The 95 % confidence intervals of the individual lag-1 auto-correlation coefficients for all models.
Figure 3c presents the changes in lag-1 auto-correlations; the box plots for
individual models are compiled from 12 changes in time series from
individual grid boxes. The changes range from
The significance of the changes in wet-period correlations was assessed using a block bootstrap. Figure 4 presents the 95 % confidence intervals of individual cross-correlations for all models. The blue dividers identify the successive rows below the diagonal in the correlation matrix. In general, the majority of changes show little significance; the intervals from control and future periods overlap considerably (except models 1A and 2A, which in many cases show exceptionally wide intervals for the future period). Figure 5 shows the same for lag-1 auto-correlations of individual grid boxes. Also in this case the majority of changes do not exceed the sampling variability; the most significant changes are reached by the model 3B, but the overall trend is a drop in the future.
To verify these results, the significance of individual changes was tested using the bootstrap approach. The results of tests correspond well with the visual assessment presented in Figs. 4 and 5. In the case of cross-correlations, only four changes were found significant for the model 1A, no changes for the model 1B, two for the model 2A, eight for the model 2B, two for the model 3A and no changes for the model 3B. In case of auto-correlations, the significant changes were found only for the model 3B. Note, however, that the fraction of significant changes might be larger in the case of a fixed reference being used for calculating correlations and auto-correlations (see Sect. 3).
The previous section demonstrated that in some cases the changes in cross-correlation show little significance despite their high absolute values, which is particularly related to the models 1A and 2A. At the same time, it can be seen in Fig. 4 that some confidence intervals for these models are exceptionally wide. Further analyses showed that this instability of correlation estimates is introduced by outlying values, which cause seeming changes in the correlation structures.
In the simulation of the model 2A, the sample correlation
The effect of outliers on correlation structures of model 2A in
the future period (the outliers are circled):
The difference between the ordinary and dependence outliers. The dashed lines define the standard coordinate system; the solid lines define an alternative coordinate system. The outlying points in the standard coordinate system are ordinary outliers (point A); the outlying points in the alternative coordinate system are denoted as dependence outliers (point deviating from the dependence structure, point B). The construction of alternative coordinates is explained in the text.
Outlying values affect also the auto-correlation. The largest change in the
auto-correlation was achieved by the model 2A, where
The examples showed that outliers can distort cross- and auto-correlation
structures of a large dataset comprising many thousands of values.
Nevertheless, it should be realised that not each extreme value necessarily
affects the correlation (as seen in Fig. 6b). Therefore, a more specific
concept of outliers is presented in this study. Values deviating from the
correlation structure are denoted as
The demonstration of the exploratory procedure:
The problem is that the presence of outliers is not easily detected from
the changes in dependence structures. It can be indicated indirectly from
the analysis of sampling variability; nevertheless, the wide confidence
intervals do not necessarily imply the presence of outliers. Or
alternatively, it can be found when the individual pairs of datasets are
visually checked. We propose a procedure allowing for identification of
significant dependence outliers and assessment of their effect on
correlation structure. The procedure consists of three steps:
The most outlying (multivariate) value is found in the data (in alternative
coordinates). The value is removed from the data and a new correlation matrix is
calculated. A difference between the new and the previous correlation matrix is
calculated and recorded.
These three steps are repeated. The difference in step 3 is quantified
through
The procedure is demonstrated on two simple two-dimensional examples. Figure
8a depicts the sequence of
This procedure is very useful as it allows a large set of multivariate
data to be explored as a whole. The
The detection of dependence outliers for complete 12-dimensional data from all models in the future period. The strong outliers in data from the models 1A and 2A are clearly distinguishable.
The examples presented demonstrate that outliers can strongly affect the cross- and auto-correlation structures of the data comprising many thousands of values. In general, it must be stressed that the presence of outliers cannot be considered as a bias. The extreme precipitation values as well as the dependence outliers naturally occur. Nevertheless, although the dependence structures are markedly influenced by a small number of outliers, they characterise the data as a whole. Therefore a substantial bias can arise when data with noticeable outliers are used to assess the dependence structures, or when their dependence structures are used, e.g. for calibration of the bias correction functions. The cross- and auto-correlation structures are the key ingredients in several multivariate bias correction methods; for examples see Mehrotra and Sharma (2015) and Mehrotra and Sharma (2016). The results based on these methods can be devalued by outliers; see the Supplement to this paper.
From this point of view there is no need to distinguish between real
extremes and “genuine” outliers (for example measurement errors). The real
extremes as well as genuine outliers affect the correlation structures in
the same way, which subsequently affects the bias corrections (or stochastic
generators). Therefore the dependence outliers, regardless of their origin,
should be removed from the calibration data. The appropriate tool for
testing the presence of outliers is the analysis of the difference
The analysis of significance showed that in most cases the correlations are stable in time; their changes are insignificant and are caused by outlying values. Therefore the climate projection can be interpreted as a linear transformation of an initial state, because a nonlinear transformation would change the correlations substantially. From this point of view a reasonable scenario of future precipitation can be obtained by the corresponding linear transformation of observations, i.e. by the multiplicative delta method (Déqué, 2007). Such an approach avoids the problems of complex bias correction methods (e.g. their increasing complexity and unclear effect on climate change signal), which have recently been the subjects of serious criticism, for example by Ehret et al. (2012) or Maraun et al. (2017).
The RCM data, the source codes and the plot data
are available online at
The supplement related to this article is available online at:
JH an MH designed the study and wrote the paper. JH wrote the source codes. VP provided the theoretical background for the principal component analysis and for the bootstrap. All authors participated in the interpretation of the results.
The authors declare that they have no conflict of interest.
This study was supported by the Czech Science Foundation (Jan Hnilica from grant no. 16-05665S and Martin Hanel from grant no. 16-16549S). Moreover, the financial support from RVO: 67985874 is greatly acknowledged. We acknowledge the World Climate Research Programme's Working Group on Regional Climate, and the Working Group on Coupled Modelling (former coordinating body of CORDEX and responsible panel for CMIP5). We also thank the climate modelling groups of the CLM community and the Rossby Centre (Swedish Meteorological and Hydrological Institute) for producing and making available their model outputs.
This paper was edited by András Bárdossy and reviewed by Geoff Pegram and Ashish Sharma.