This study discusses the effect of empirical-statistical bias correction methods like quantile mapping (QM) on the temperature change signals of climate simulations. We show that QM regionally alters the mean temperature climate change signal (CCS) derived from the ENSEMBLES multi-model data set by up to 15 %. Such modification is currently strongly discussed and is often regarded as deficiency of bias correction methods. However, an analytical analysis reveals that this modification corresponds to the effect of intensity-dependent model errors on the CCS. Such errors cause, if uncorrected, biases in the CCS. QM removes these intensity-dependent errors and can therefore potentially lead to an improved CCS. A similar analysis as for the multi-model mean CCS has been conducted for the variance of CCSs in the multi-model ensemble. It shows that this indicator for model uncertainty is artificially inflated by intensity-dependent model errors. Therefore, QM also has the potential to serve as an empirical constraint on model uncertainty in climate projections. However, any improvement of simulated CCSs by empirical-statistical bias correction methods can only be realized if the model error characteristics are sufficiently time-invariant.

Society is increasingly demanding reliable projections of future climate change to analyze adaptation options and costs, to explore climate change mitigation benefits, and to support political decisions. Such climate projections are usually generated with general circulation models (GCMs) of rather coarse spatial resolution, which are refined by dynamical or statistical downscaling methods (e.g., Giorgi and Mearns, 1991; Fowler et al., 2007). Currently, an increasing number of climate change impact investigations rely on dynamical downscaling methods, i.e., the use of regional climate models (RCMs, e.g., Giorgi and Mearns, 1991, 1999; Wang et al., 2004; Rummukainen, 2010). However, even the newest generation of RCMs features considerable systematic errors (e.g., Kotlarski et al., 2014), which complicates the direct application of RCM results in climate change impact research. RCM output is therefore usually post-processed with empirical-statistical “bias correction” methods (e.g., Déqué, 2007; Themeßl et al., 2011) before it is used as input for impact models, such as hydrological models. Bias correction methods have been demonstrated to successfully reduce systematic model errors (i.e., the difference between historical model output and meteorological observations), but the knowledge about how they influence the climate change signal (CCS; i.e., the long-term average difference between a future and a past climate simulation) is very limited so far.

A relation between model errors and CCS has been discussed by Christensen et al. (2008), who found that monthly temperature errors of RCMs over Europe often depend on the observed monthly mean temperature and that in warmer months, errors are often larger than in colder months (or vice versa). Such “intensity-dependent” errors can be shown to alter the temperature CCS (Christensen et al., 2008; Themeßl et al., 2012; Boberg and Christensen, 2012).

Bias correction methods like quantile mapping (QM) modify the CCS. For example, Themeßl et al. (2012) and Dosio et al. (2012) showed that QM modifies the CCS of RCMs in operation over Europe in some regions and seasons, and found a lower summer temperature CCS in eastern Europe as well as a higher winter temperature CCS in Scandinavia after bias correction with QM. Currently, such modifications are often regarded as an undesired deficiency of bias correction methods (e.g., Hempel et al., 2013). However, Maurer and Pierce (2014) recently claimed that QM may have no negative effect on the quality of the CCS and demonstrated that QM does not deteriorate the multi-model mean precipitation CCS in a GCM ensemble.

In this paper we go a step further and argue that, under the assumption of time-invariant model error characteristics, the modification of the CCS by QM can be interpreted as improvement, rather than as deterioration, since it is capable of mitigating intensity-dependent model errors. To support this hypothesis, we develop a linearized analytical description of the effect of intensity-dependent model errors on the CCS. This framework allows the impact of such errors to be investigated, not only on the multi-model mean CCSs in an ensemble of climate simulations, but also on the inter-model variability, which is often used as a measure of uncertainty in climate projections (e.g., Hawkins and Sutton, 2009, 2011; Prein et al., 2011). Furthermore, we compare the analytical correction of the CCS to the correction by QM.

In Sect. 2, the QM method is described and its effect on the temperature CCS of the ENSEMBLES multi-model data set is demonstrated. In Sect. 3, the error characteristics of the ENSEMBLES models are analyzed, and in Sect. 4 we present an analytical formulation of intensity-dependent model errors and their effects on the CCS. In Sect. 5 these effects are compared to the effects of QM on CCSs, and in Sect. 6 a summary is given and conclusions are drawn.

The basic assumption of QM is that model errors depend on the value of the simulated variable. This concept of intensity-dependent errors is a rough simplification of actual model error characteristics, since model errors are not only influenced by the local value of the simulated variable. However, we will demonstrate that errors and local values correlate well in many cases (Sect. 3). The concept is simple yet powerful, since it separates, e.g., cold from hot regimes, or drizzling from heavy precipitation regimes and therefore accounts for potentially very different model errors under the associated regimes. It should be emphasized that intensity-dependent model errors are equivalent to a misrepresentation of variability, i.e., to differences between the observed and modeled width of the density distribution. Figure 1 demonstrates that intensity-dependent error characteristics with a positive slope correspond to overestimated variability, if the model error is defined as the difference between the inverse modeled and observed empirical cumulative density functions (ECDF). Similarly, a negative error slope corresponds to underestimation of variability.

QM is a distribution-based bias correction method (e.g., Panofsky and Brier, 1958; Wood et al., 2004) that maps a modeled historical ECDF to an observed ECDF, with the mapping function shown in Fig. 1c for an artificial example. It is a well-established method to prepare climate model output as input for hydrological models (e.g., Déqué, 2007; Maraun et al., 2010; Themeßl et al., 2011) and has been successfully applied to the sum of daily precipitation and air temperature of RCMs and GCMs by Dobler and Ahrens (2008), Piani et al. (2010a, b), Dosio and Parulo (2011), Dosio et al. (2012), Maurer and Pierce (2014) and others. Furthermore, Themeßl et al. (2011) showed for daily precipitation sums that QM outperforms six other prominent bias correction techniques.

In our study, a non-parametric version of QM is used (Themeßl et al., 2011, 2012; Wilcke et al., 2013), as suggested by Gudmundsson et al. (2012). The ECDFs are constructed from 930 values for each day of the year based on modeled and observed data of a 30-year reference period (1961–1990) and a 31-day moving window, centered on the day under consideration. Our implementation of QM is not restricted to the range of observed values in the reference period, since the correction is extrapolated beyond the calibration range by using the correction term of the highest and lowest quantile, respectively. Please note, that this implies constant (not intensity-dependent) error characteristics outside the calibration range. As discussed by Bellprat et al. (2013), such constant errors at high temperatures outside the calibration range may be more realistic in many cases than a linear extrapolation.

Intensity-dependent model errors of a model that overestimates daily
temperature variability (artificial data).

Some restrictions apply to the application of QM on climate scenarios: as pointed out by Eden et al. (2012), internal variability causes differences between a GCM simulation and observations, which cannot be separated from actual model errors, if QM is applied to GCM-driven RCMs, as in our case (see Sect. 2.2). By using rather long calibration periods (30 years) and by focusing on temperature, which is less affected by natural variability than, e.g., precipitation, we try to minimize this effect. In addition, our multi-model approach further reduces dependence on natural variability. However, in the interpretation of the results, some noise due to natural variability has to be taken into account. Similar to all empirical-statistical downscaling and bias correction methods, the application of QM on future climate simulations is based on the assumption of time-invariant model error characteristics. This stationarity assumption can obviously not be directly assessed for future periods and it can be expected to be violated to some degree. However, several studies demonstrate the skill of empirical-statistical bias correction methods, either for past periods independent of the calibration period under ongoing climate change (e.g., Piani et al., 2010a; Themeßl et al., 2012; Gudmundsson et al., 2012; Wilcke et al., 2013), or for future periods using a pseudo-reality approach (Maraun, 2012). Furthermore, Teutschbein and Seibert (2013) show that correction methods like QM perform better under non-stationary conditions than widely used linear transformations or the delta-change approaches. This gives confidence that empirical-statistical bias correction with QM is useful not only for historical simulations, but also, though with degraded performance, for future climate simulations. However, in a strict interpretation, the results and conclusions of this study are only valid under the assumption of time-invariant model errors and it is still subject to further investigation to determine the severity of this restriction. Although such investigation is outside the scope of our study, we want to mention that the new centennial re-analyses of ECMWF (ERA-20C) and NOAA-CIRES (V2c) offer a promising new test bed for the investigation of the long-term stability of model error characteristics.

We apply QM to a set of 15 GCM-driven regional climate simulations for Europe from the ENSEMBLES multi-model data set (van der Linden and Mitchell, 2009). The ENSEMBLES models are operated on a 25 km grid and reach until 2100. In the following, we show the results for daily mean temperature, but the analysis of daily minimum and maximum temperatures gives very similar results. The application of our analysis to other parameters like, e.g., precipitation is basically straight forward, but the linearization applied in Sect. 4 can be expected to be less appropriate for precipitation than for temperature. Further investigation is needed to fully reveal the effect of QM on the precipitation CCS. The major motivation for focusing on temperature here is its relatively simple error characteristic and its significant climate trend, which facilitates the demonstration of the effect of QM on the CCS.

As observational reference, the ENSEMBLES gridded observational data set (E-OBS, Haylock et al., 2008) is used. It is a European land-only daily high-resolution (25 km grid spacing) data set for five meteorological parameters, including daily mean temperature.

Subsequently, we show the effect of QM on the multi-model mean CCS and on the
standard deviation of CCSs for the periods 2021–2050 and 2070–2099, both
compared with the reference period 1971–2000. In Fig. 2 the spatial patterns
of the difference between the uncorrected and the corrected multi-model mean
temperature CCS is shown for different seasons in the middle (left) and end
(right) of the 21st century. In the end of the century, differences exceed

Differences between uncorrected and corrected (QM) multi-model mean temperature CCS. The reference period is 1971–2000. The left panels refer to CCSs in the mid-21st century (2021–2050), the right panels to the end of the 21st century (2070–2099). Blue colors indicate areas where the uncorrected model is colder than the corrected model; red colors vice versa.

Differences between uncorrected and corrected (QM) multi-model standard deviation. The reference period is 1971–2000. The left panels refer to CCSs in the mid-21st century (2021–2050), the right panels to the end of the 21st century (2070–2099). Blue colors indicate areas where the uncorrected ensemble features a smaller standard deviation; orange colors vice versa.

Figure 3 shows the spatial pattern of the difference between the uncorrected
and the corrected standard deviation of CCSs as a measure of model
uncertainty. In most regions, model uncertainty is larger in the uncorrected
model ensemble (orange colors), particularly in regions where the CCS is
overestimated (see Fig. 2). The overestimation locally peaks at
0.5

After having demonstrated and quantified the effects of QM on the CCS and the model uncertainty in the ENSEMBLES multi-model ensemble, the rest of this paper is devoted to the explanation of these effects.

Temperature error characteristics (model minus observation) of the HC (left panels) and SMHI (right panels) RCMs in eight subregions of Europe (sub-panels) and each month of the year.

Since intensity-dependent model errors are the main suspects which cause the demonstrated effect of QM on the CCS, we investigate whether such errors exist in the ENSEMBLES RCMs. Due to their contrasting error characteristics, two of the RCMs are discussed in more detail: the HadRM3Q3 (driven by the HadCM3Q3 global climate model) operated by the Hadley Centre (HC), and the RCA (driven by run 3 of the ECHAM5 global climate model) operated by the Swedish Meteorological Service (SMHI). Since model error characteristics are known to be regionally very variable, Europe is separated into eight subregions following Rockel and Woth (2007), which are marked in Figs. 2 and 3: the British Islands (BI), France (FR), central Europe (ME), Scandinavia (SC), Iberian Peninsula (IP), Mediterranean (MD), Alps (AL) and eastern Europe (EA).

The following characterization of model errors is based on daily mean temperature ECDFs, which are averaged over each month and subregion. For each model, only the range between the 10th and 90th percentiles is used in order to avoid the noisy tails of empirical distributions. The ECDFs of the grid points in each subregion are sampled over this range on a daily basis and the daily model error characteristics are derived for each grid point by subtracting the inverse observed from the inverse modeled ECDF (see Fig. 1). Further, the grid point error characteristics are averaged over each subregion and each month of the year. Figure 4 exemplarily shows the daily temperature error characteristics of the HC and SMHI models.

Both models are affected by strongly intensity-dependent errors, but the
error characteristics of the two models differ substantially. While the HC
model features positive error slopes (up to 0.5) in most seasons and regions,
the SMHI model has mainly negative slopes (up to

In order to analyze whether such single-model error slopes cancel out in the multi-model ensemble, the ensemble average error characteristics (bold lines) in SC and EA are shown in Fig. 5 together with those of all 15 individual models (light lines). In SC, a considerable negative multi-model average slope exists in most parts of the year (minimum in July). Contrarily, positive slopes can be found in EA in summer (maximum in July). Several other regions, like AL, feature only minor multi-model average slopes, but in turn larger slope variability (see Figs. S9 to S12).

Having shown and quantified the intensity-dependence of model errors in the ENSEMBLES multi-model data set, we subsequently give a simplified analytical description to highlight the mechanism of how such errors act on the CCS in a multi-model ensemble.

Temperature error characteristics (modeled minus observed) of the ENSEMBLES models in SC (left panels) and EA (right panels). The light lines show the error characteristics of the individual models, the bold line shows the ensemble average. The number in the lower right corner of each panel denotes multi-model average error slope.

Let

Averaging over 30 years, taking the difference between a future and a past
period, and neglecting the residual, yields the linearized effect of the
intensity-dependent model error on the CCS:

For a multi-model ensemble, the ensemble mean CCS and the multi-model
variance of the CCS is relevant. To derive the effect of intensity-dependent
errors on these quantities, the error slope can be written as the sum of the
ensemble mean error slope (

The effect of intensity-dependent errors on the second important quantity in
a multi-model ensemble, the variance of CCSs (which is often interpreted as a
measure of uncertainty), can be described with the linearized model as well.
Using Eqs. (5) and (6), the variance can be expressed as

The linearized error characterization leads to a simple way to correct the
CCS of single models following Eq. (3), the multi-model mean CCS following
Eq. (6), and the multi-model variance of CCS following Eq. (9). Error slopes,
climate change signals, their variability, and their covariance are
calculated based on the comparison of historical simulations with
observations and applied to results of future simulations. Such correction
assumes not only a linear error-slope, but also time-invariant error
characteristics. The linearly corrected multi-model mean temperature CCS is
listed in Table 1 (

Multi-model mean temperature error slopes (

Multi-model variance of the temperature CCSs (

In Table 1 the terms contributing to errors in the multi-model mean CCS (see
Eq. 6) are listed for all subregions and seasons. Multi-model mean error
slopes (

In Table 2, the terms contributing to errors in the estimated variance of a
multi-model ensemble (Eq. 9) are listed: two positive offset terms,

Estimated errors in the multi-model mean CCS due to intensity-dependent model errors. The reference period is 1971–2000. The orange colors refer to CCSs in the mid-21st century (2021–2050), the blue colors to the end of the 21st century (2070–2099). Light colors correspond to the estimation of the error by QM, dark colors to LC.

Table 2 also lists the uncorrected (

The knowledge about the influence of
empirical-statistical bias correction methods like QM on the CCS of climate
simulations is very limited so far. For the ENSEMBLES multi-model data set it
has been demonstrated that QM dampens projected summer warming in
southeastern Europe and France by about 0.5

Estimated errors in the multi-model standard deviation of the temperature CCS due to intensity-dependent model errors. The reference period is 1971–2000. The orange colors refer to CCSs until the mid-21st century (2021–2050), the blue colors until the end of the 21st century (2070–2099). Light colors correspond to the estimation of the error by QM, dark colors to LC.

To support this hypothesis, we analytically formulated the effect of intensity-dependent model errors on the CCS and showed that they erroneously modify the CCS. Positive error slopes lead to an exaggeration of the CCS and negative slopes dampen it. This is the case for a single model's CCS as well as for the multi-model mean CCS in a model ensemble, which is additionally exaggerated by high variability amongst the single model's CCSs. A comparison of this analytically determined error and the effect of QM on the mean CCS in the ENSEMBLES multi-model data set leads to largely similar results. This confirms that the effect of QM on the CCS is mainly caused by the correction of intensity-dependent errors and that such modification can be regarded as improvement, if roughly time-invariant model error characteristics can be assumed.

With regard to the variance of the CCSs in a multi-model ensemble, the analytical description reveals that intensity-dependent model errors lead to an overestimation of variance. Since variability of CCSs in a multi-model ensemble is often used as an indicator for model uncertainty, intensity-dependent model errors can be regarded to be responsible for parts of the model uncertainty in the CCS. This further implies that the correction of intensity-dependent errors by QM should lead to a smaller variance and therefore constitute an empirical constraint on climate model uncertainty. However, we could only partly demonstrate this very desirable effect by the application of QM on the ENSEMBLES data set. In most regions and seasons, the analytical correction as well as QM reduce the variance as expected, but particularly in the winter season of longer-term simulations, QM often increases it, which could not be fully explained so far and needs further investigation.

Generally, our results indicate that empirical-statistical bias correction methods that correct for intensity-dependence in model errors can lead to improved estimates of future climate change. The improvements primarily refer to the mean CCS, but also an empirical constraint on uncertainty in multi-model climate projections seems to be feasible. A restriction to these results is the fact that any potential improvement can only be realized if the assumption of time-invariant model error characteristics sufficiently holds. It is still subject to further investigation to determine the severity of this restriction.

A. Gobiet is responsible for the general concept and conduction of the study, for the analytical description presented in Sect. 4, for the interpretation of the results and for writing the text. M. Suklitsch contributed the analysis of the error characteristics of the ENSEMBLES models and G. Heinrich the analysis of the effect of QM on the climate change signal in the ENSEMBLES data set. Both were also involved in the discussion of the results and contributed to parts of the text.

This study was partly funded by the EU FP7 projects ACQWA (grant agreement
212250) and IMPACT2C (grant agreement 282746). We acknowledge the E-OBS data
set from the EU-FP6 project ENSEMBLES
(