Commonly used bias correction methods such as quantile mapping (QM) assume the function of error correction values between modeled and observed distributions are stationary or time invariant. This article finds that this function of the error correction values cannot be assumed to be stationary. As a result, QM lacks justification to inflate/deflate various moments of the climate change signal. Previous adaptations of QM, most notably quantile delta mapping (QDM), have been developed that do not rely on this assumption of stationarity. Here, we outline a methodology called scaled distribution mapping (SDM), which is conceptually similar to QDM, but more explicitly accounts for the frequency of rain days and the likelihood of individual events. The SDM method is found to outperform QM, QDM, and detrended QM in its ability to better preserve raw climate model projected changes to meteorological variables such as temperature and precipitation.

Bias correction of climate model projections is often performed in order to properly assess the impacts of climate change on human and environmental resources (Berg et al., 2003; Ines and Hansen, 2006; Muerth et al., 2013; Teng et al., 2015). Removing model bias is particularly useful for impact studies involving hydrological models, where runoff is a nonlinear function of precipitation (Christensen et al., 2008; Mauer and Hidalgo, 2008; Hagemann et al., 2011). Global climate models (GCMs) provide large-scale projections for many climate variables (IPCC, 2013). However, many climate processes and landscape features are not resolved at the coarse resolution of current GCMs. To bridge this gap, regional climate models (RCMs) are commonly used to downscale GCM data to a higher resolution. Even though RCMs can provide added value (Fowler et al., 2007; Feser et al., 2011; Di Luca et al., 2013; Kotlarski, 2014), systematic errors in the model output still exist (Mearns et al., 2012; Sillmann et al., 2013).

Numerous statistical bias correction methodologies have been developed to remove systematic model errors (Schmidli et al., 2006; Boé et al., 2007; Lenderink et al., 2007; Leander et al., 2008; Gellens and Roulin, 2012; Chen et al., 2013). The methods adjust the modeled mean, variance, and/or higher moments of the distribution of climate variables, to more closely match the observations. Quantile mapping (QM) has been a widely used method due to its ability to handle higher-order moments in addition to being computationally efficient (Wood et al., 2004; Piani et al., 2010; Themeßl et al., 2011; Gudmundsson et al., 2012; Teutschbein and Seibert, 2013). Standard QM assumes that the function of error correction values found in a calibration period can be applied to any time period of interest. This is referred to as the stationarity assumption or the time-invariant assumption (Christensen et al., 2008; Maraun, 2012; Themeßl et al., 2012; Brekke et al., 2013; Chen et al., 2013). The assumption of stationarity, in QM, is responsible for inflating (or altering) the raw model projections of climate change (Maurer and Pierce, 2014).

In this study, we focus specifically on the performance of bias correction and not issues related to downscaling/upscaling (Maraun, 2013). We have divided the study in two main sections: Sects. 3 and 4. In Sect. 3, first we begin by testing the stationarity assumption used in QM and the implications of this assumption. Second, we use a synthetic example to investigate the potential advantages of using a parametric instead of a non-parametric approach. Third, we illustrate the problems associated with validating bias correction methods using a split-sample or cross-validation test. In Sect. 4, a new bias correction method called scaled distribution mapping (SDM) is outlined. Finally, Sect. 5 compares and discusses the performances of SDM with other methods.

Climate model data in this study use projections of daily mean temperature
and precipitation values from the KNMI-RACMO22E regional climate model. The
KNMI-RACMO22E RCM was forced with the ICHEC-EC-EARTH GCM, and it is one of
the model projection runs in the EURO-CORDEX project. The data can be found
on any of the ESGF repositories containing EURO-CORDEX (e.g.,

Over the years, numerous bias correction methods have been developed using univariate and multivariate approaches (Gellens and Roulin, 1998; Wood et al., 2004; Schmidli et al., 2006; Boé et al., 2007; Leander and Buishand, 2007; Lenderink et al., 2007; Li et al., 2010; Maraun et al., 2010; Piani et al., 2010; Themeßl et al., 2011; Piani and Haerter, 2012). In this study, we focus our study on univariate bias correction methods.

Many popular existing bias correction methods have been reviewed and compared and QM was found to outperform other methods (Gudmundsson et al., 2012; Teutschbein and Seibert, 2012, 2013; Chen et al., 2013). At the same time, studies have pointed out serious problems that arise when using QM for bias correction (Hagemann et al., 2011; Themeßl et al., 2012). In particular, the method can alter the raw model projected changes (Themeßl et al., 2012; Maurer and Pierce, 2014). This inflation or deflation of the raw simulated climate change signal exists as an artifact of the stationarity assumption. The impact that the stationarity assumption has on QM bias corrected data is discussed in more detail in Sect. 3.1.

More recently, the standard non-parametric QM method has been adapted to more explicitly preserve the raw modeled climate change signals (Michelangeli et al., 2009; Olsson et al., 2009; Willems and Vrac, 2011; Sunyer et al., 2015; Wang and Chen, 2014; Cannon et al., 2015). Hempel et al. (2013) and Bürger et al. (2013) both used a form of detrended QM (DETQM) that better preserved monthly trends, but the daily values still are subject to the stationarity assumption, which can ultimately result in altering the raw modeled projected change. Extending previous work (e.g., Olsson et al., 2009; Bürger et al., 2013), Cannon et al. (2015) modified the QM method and outlined an approach called quantile delta mapping (QDM). QDM is a break from other typical QM methods insofar as that it is not constrained by the stationarity assumption. An example of QM versus QDM is shown in Fig. 1. In the traditional QM method, a raw modeled value is always corrected by the same value of bias or error that is determined by its respective quantile in the calibration period. On the other hand, QDM multiplies observed values by the ratio of the modeled values (period of interest divided by calibration period) at the same quantiles. Our proposed bias correction methodology, SDM, share some similarities with QDM; however, there are three important distinctions: (1) SDM uses a parametric model instead of a non-parametric one, (2) SDM and QDM handle days with zero rainfall very differently, and (3) SDM more accurately accounts for the differences in the modeled variances, for temperature, between the period of interest and the calibration period.

Schematic of the quantile mapping versus quantile delta mapping
methodologies. The red lines are the bias corrected values for the future
model. The arrows in each panel illustrate the bias correction of a future
modeled value at ECDF

Independent of downscaling, bias correction with QM is well known to alter
the raw modeled climate change signal (Hagemann et al., 2011; Themeßl
et al., 2012; Brekke et al., 2013; Maurer et al., 2013; Pierce et al., 2013;
Maurer and Pierce, 2014). This alteration of the raw modeled climate change
signal can be attributed to the stationarity assumption, which implies that
the error correction values established in a calibration period can be
applied to any time period within or outside the calibration time period.
These error correction values can differ, in both magnitude and sign, as a
function of quantile. In the context of a warming climate, raw model
projected temperature values will result in QM disproportionately sampling
error correction values from higher quantiles that were established in the
calibration period. Depending on whether the error correction values at the
higher quantiles are greater or lesser than those at lower quantiles, this
will inflate or deflate the raw model projected climate change. For example,
consider a model (in the calibration period) has an uncorrected (raw) value
of 10

Altering of the raw model's climate change signal could be justified with QM if one finds that the stationarity assumption is justified. We tested the stationarity assumption by investigating whether the error correction values are independent of the calibration period. To do this, the same daily data in the future time period (2071–2100) were bias corrected via standard QM, separately for each month, using two different calibration periods (1951–1980 and 1976–2005). Figure 2 shows how sensitive QM is to using two different calibration periods to bias correct these same future values. Figure 2a and b show the sensitivity for temperature and precipitation, respectively. There are instances where this sensitivity to the calibration period is nearly as large as the raw model projected mean changes. This illustrates how unstable the error correction values in QM can be. If stationarity was a valid assumption, all the map colors in Fig. 2 would be much closer to gray. The calibration period largely influences the error correction values and, as a result, the stationarity assumption is invalid. As an additional test, we performed the bias correction using a parametric implementation of QM and found the error correction values to be equally sensitive to the chosen calibration period.

The sensitivity of quantile mapping (QM) to the calibration
period. The same daily data in the future time period (2071–2100) was bias
corrected, separately for each month, using two different calibration
periods (1951–1980 and 1976–2005). Panels

The results shown in Fig. 2 have broad implications for QM. First, it
shows that one cannot confidently correct a specific modeled value with a
specific error correction value. As an example, a raw modeled value of
50 mm can be corrected by

Synthetic data of observations (Obs) and modeled (Mod) data. Each
data set consist of 200 values randomly sampled from the same gamma
distribution with shape parameter equal to 0.8 and scale parameter equal to
12.0. Panel

In a non-parametric method such as QM, there is an implicit assumption that
each respective quantile is equally probable. In other words, the largest
observed and modeled quantiles both correspond to the same ECDF value. Therefore, is it safe to
assume that events that share the same ECDF value are equally probable? To
test this assumption, synthetic precipitation data were used to evaluate if
events of equal quantiles can be treated as being equally probable. Figure 3
shows two distributions (referred to as Obs and Mod), each consisting of 200
values randomly sampled from the same gamma distribution with shape
parameter equal to 0.8 and scale parameter equal to 12.0. Figure 3a shows
the empirical distributions, whereas Fig. 3b shows the fitted distributions
of the same data. In this example, the largest quantiles from Obs and Mod
are 32.8 and 57.3. In a non-parametric method like standard QM, the
correction applied to the largest Mod quantile (to bias correct this
quantile) is simply the difference between the largest quantile from Mod and
Obs (depicted by the blue line with arrows). The absolute value of this
difference is 24.5. When gamma distributions are fit to the empirical Obs
and Mod data, it is found that the largest quantile of Obs (with the value
of 32.8) corresponds to a fitted cumulative distribution function (CDF) of 0.993, while the largest quantile of
Mod (with the value of 57.3) corresponds to a fitted CDF of 0.998. Using the
fitted distributions, one can find the corresponding Mod value at the same
expected Obs CDF of 0.993. In this example, that value is 44.8. Then our
expected difference between events of the same expected probability has been
reduced from 24.5 to 12.0 (where 24.5

Observed (Obs) and modeled (Mods) values for distribution sizes
of 100 and 10 000. As in Fig. 3, the values are randomly sampled from the
same gamma distribution with shape parameter equal to 0.8 and scale
parameter equal to 12.0. Panels

Figure 4 shows the results of a 1000 randomly generated Obs and Mods values for distribution sizes of 100 and 10 000. Again, the values are randomly sampled from the same gamma distribution with shape parameter equal to 0.8 and scale parameter equal to 12.0. Figure 4a and d show the possible scenarios of ECDFs for the different distribution sizes. Figure 4b and e show the counts of the absolute differences between the extreme values using the non-parametric method (blue line) and the parametric method (green line). Similarly, Fig. 4c and f show the counts of the absolute differences, averaged over the entire distribution, between the values using the non-parametric and the parametric method. One can clearly see the usefulness of a parametric method over a non-parametric method as the sample size increases. With a sample size of 10 000 values, there can still be large differences between the most extreme quantiles of each distribution. However, with larger sample sizes, we are converging on identical distributions for Obs and Mod. Therefore, the magnitude of the differences between extreme quantiles is simply due to sampling noise. On the other hand, if a distribution is fit to the data first, then one gains information regarding the expected probabilities of specific events taking place with respect to the underlying distribution. This allows for the parametric method to reduce the error associated with sampling noise over that of the non-parametric method.

Split-sample or cross-validation tests are commonly used to validate how well bias correction methods perform under changing conditions (Maurer and Pierce, 2014; Klemeš, 1986; Wang and Chen, 2014; Piani et al., 2010). Typically, a period is chosen for calibrating the bias correction parameters and then different bias correction methods' performances are compared in a validation period. For example, bias correction parameters might be fit in a calibration period such as 1951–1980. Then, the performances of different methods are evaluated by comparing the bias corrected data to observations for the period 1981–2010. Unfortunately, split-sample tests that directly compare observations to bias corrected model data, for a time period outside of calibration, cannot distinguish between bias correction methodological performance and the performance of the underlying raw model. In this study. we define model performance by how well the raw model simulates changes to the statistical distribution of a climate variable with respect to observed changes to the distribution.

Conflation of apparent bias correction skill (using QM) and model
performance. The data are daily temperature values for the month of June for
the period 1981–2010 after calibrating on the period 1951–1980. Panel

The following example is used to illustrate how split-sample tests are not
suitable to validate bias correction methods. Standard QM was used to bias
correct daily values of June temperature for the period 1981–2010
(validation period) after calibrating on the period 1951–1980. In this
example, evaluation of performance was measured by the mean absolute error
(MAE) between observed and bias corrected quantiles. Lower values of MAE
indicate better performance and reflect better distributional agreement
between the bias corrected and observed values, averaged across all
quantiles. Figure 5a shows a

Next, the differences between the raw modeled changes and the observed
changes (depicted by Fig. 5c) are removed from each grid cell. The copper
scatter in Fig. 5e shows the QM bias corrected data after removing the
temperature model performance error of the mean (1.4

Figure 5 illustrates how a split-sample or cross-validation test does not
distinguish between methodological performance and raw model performance.
Using pseudo-realities (Maraun et al., 2010) could tell us something more
about the robustness of methods in different scenarios, but the performance
of the bias correction method still cannot be separated from how well
individual models simulate raw projected changes relative to the other
models. In our example, we only looked at QM method performance and how this
relates to model performance. Obviously, there will be differences from one
bias correction method to another. However, in a split-sample test, method
and model performance are conflated. Consider a case, where a raw model
projects changes across Europe that are 1

Part of the validation procedure must ensure that the observed and modeled distributions in the calibration or historical period are statistically similar (Maraun et al., 2015). However, as shown here, validating bias corrected data in another period outside of the calibration period against observations will not isolate bias correction methodological performance. Model performance will obscure the performance of the bias correction method. Instead, validation (or evaluation) should measure how well the raw modeled projected changes to the entire distribution are captured or preserved by the bias correction method between any two periods.

Previous sections have shown that QM lacks justification for introducing inflation/deflation to the climate change signal (Sect. 3.1). This section introduces a bias correction method named SDM and its performance is compared to standard QM in addition to more recent and similar methods such as DETQM and QDM. Methodological performance is evaluated using raw model projected changes to the leading moments (mean, standard deviation, and skewness) instead of individual quantiles because of our findings concerning the sampling noise associated with extreme values in the distributions (Sect. 3.2).

A new bias correction methodology, called SDM, is proposed in this study. The conceptual framework of the method is quite similar to QDM (Fig. 1). However, as previously mentioned, our method has important differences that will be discussed in more detail in the “Performance” section (Sect. 4.2). The SDM method makes no assumption of stationarity. It scales the observed distribution by raw model projected changes in magnitude, rain-day frequency (for precipitation), and likelihood of events. The scaling changes as a function of the bias correction period. The next two subsections outline the SDM methodology for precipitation and temperature. Similar to other bias correction methods, a pre-screening of appropriate GCMs/RCMs is advised. Bias correction will not, and should not, be expected to fix serious model deficiencies (garbage in – garbage out; e.g., Noguer et al., 1998). There are a few important differences between the implementation of SDM for precipitation and temperature. First, SDM scales the distribution of precipitation by a multiplicative or relative amount and temperature is scaled by an absolute amount. Second, only values of positive precipitation exceeding a specified threshold (e.g., 0.1 mm) are used to build the distributions, while with temperature all values are used. Third, temperature data is first detrended, then bias corrected, and finally, the trends are added back in. As a result, the variance is not inflated by temporal trends.

Bárdossy and Pegram (2012) showed that bias corrected data can still have systematic biases at different spatial scales. They illustrated that when raw modeled precipitation is less correlated across grid cells (lower average of its correlation matrix) than observations, the modeled extremes will be systematically underestimated for larger spatial scales (averaging across tens to hundreds of grid cells) with respect to observations. Similarly, an RCM with a higher average correlation matrix, with respect to observations, will overestimate extremes. We agree that this can be problematic for impact studies that rely on spatially distributed data across multiple grid cells (e.g., modeling hydrological extremes). In an effort to properly reflect the statistical properties of the observations in the calibration period at a variety of scales, we advocate re-correlating the data (Bárdossy and Pegram, 2012) prior to implementing SDM.

The SDM methodology for bias correcting daily precipitation is illustrated by the example in Fig. 6. The observed and modeled data are from the same grid cell in the European domain for the month of April. The observed and historical model periods correspond to 1971–2000. The future model period is 2071–2100. The future period can more generally be thought of as any time period one desires to bias correct, and it can be the same period as the raw historical model period. To illustrate the SDM methodology, we have chosen to fit a gamma distribution to the observed and modeled data for all grid cells. However, the gamma distribution will likely not be suitable for all studies involving precipitation, especially with respect to extremes (Papalexiou et al., 2013). Depending on the type of analysis involved (e.g., changes to extreme precipitation, projected changes to the number of days with precipitation above 0.1 mm), the data should be fit with the distribution deemed most appropriate by the user. If a user does choose to fit a different distribution, that distribution can simply take the place of the gamma distribution outlined in the methodology. The SDM methodology for precipitation is now outlined in the following steps.

Illustration of the scaled distribution mapping (SDM) methodology
for precipitation. Panel

Step (1): set a raw modeled minimum precipitation threshold. In this study,
we have used 0.1 mm as a threshold (which is the minimum amount of observed
precipitation). When presenting the results, we additionally look at the
impact of using a larger threshold of 1.0 mm. Any values below the threshold
are set to 0.0 mm. Next, the days with precipitation are separated from days
with no rainfall. Figure 6a shows the sorted values of precipitation for the
observations, the raw historical model, and the raw future model. In this
example, there are 434 observed rain days, 525 raw historical model rain
days, and 593 raw future model rain days. Our expected number of bias
corrected future model rain days,

Step (2): fit gamma distribution parameters (or similarly the most appropriate
distribution determined by the user), using maximum likelihood, to the
positive observed precipitation values (Fig. 6b), and the raw historical
and future modeled values (Fig. 6c). The probability density function of
the gamma distribution is

Step (3): calculate the scaling between the fitted raw future model
distribution and the fitted raw historical distribution at all of the CDF
values corresponding to the precipitation events of the raw future model
time series. The scaling is calculated as

Step (4): calculate the recurrence intervals for the three sorted arrays of
positive precipitation. The recurrence interval array, RI, is calculated as

Step (5): find the scaled or adjusted RI for the raw future model. This is
calculated as

Step (6): the initial array of bias corrected values can now be calculated as

Step (7): as a final step, the bias corrected values for positive rain days are placed back into the modeled time series in the correct temporal locations. Consider in this example that the maximum raw modeled precipitation amount fell on 28 April 2078. Then, the largest bias corrected value will be reinserted back into that day. This is applied to the rest of the 489 positive values of precipitation. Originally, there were 593 raw modeled future rain days; therefore, the smallest 103 raw modeled values will have no precipitation after bias correction. Similar to this example, GCMs/RCMs more often overestimate the frequency of rain days (Leander et al., 2008). However, in the case of the model underestimating the rain-day frequency, the SDM method is currently not adjusting the original raw modeled rain-day frequency. It should be kept in mind, however, that even if the model underestimated the frequency, the impact on the distribution is significantly less. In this example, if the raw model had 10 % fewer rain days than observations, that would have only translated to a 0.4 % reduction in total precipitation. This is due to the fact that the method is preferentially filling the largest events first.

In the case that the historical model period and the bias correction period of interest completely overlap, the bias corrected data will be exactly the same as the observed distribution. There would be no difference between the distributions in magnitude, likelihood, nor rain-day frequency, and therefore the observed distribution undergoes no scaling.

The SDM methodology for temperature is outline in the following steps. Again, the future period is a general representation of any time period one desires to bias correct, and it can be the same period as the raw historical model period.

Step (1): detrend the raw modeled and observed time series in order to get a more accurate measure of the natural variability (we have used a linear trend, though any trend line could be used). These trends are added back in at the end, but until then, all subsequent steps use the detrended time series.

Step (2): fit a normal probability distribution function to the detrended observed, raw historical modeled, and the raw future modeled time series. For a normal distribution, the fitted parameters are simply the empirical mean and standard deviation. Next, using these fitted normal distributions, find the corresponding CDF values for the temperature events that occurred in the three time series. Similarly to precipitation, set an upper and lower threshold for the CDF values (e.g., 0.0001, 0.9999). Again, imprecise rounding can lead to the CDF function providing a value of 1.0 or 0.0, which corresponds to an infinite temperature values.

Step (3): calculate the scaling between the fitted raw future model
distribution and the fitted raw historical distribution at each probability
of the events taking place in the raw future model time series. The scaling
is then calculated as

Step (4): next, calculate the recurrence intervals for the three sorted arrays
of temperature with

Step (5): find

Step (6): the initial array of bias corrected values can now be calculated as

Step (7): reinsert the bias corrected values,

The SDM method presented in Sect. 4.1.1 and 4.1.2 attempts to best preserve the raw model projected changes to different moments of the distribution. However, the temporal evolution of the climate change signal might not be captured. Consider the case where SDM is used to bias correct January temperature values for the period 2011–2100 using the calibration period 1971–2000. Then, the changes to the bias corrected distribution of 2011–2100 versus 1971–2000 will be very similar to the raw model projected changes to the distribution. On the other hand, investigating a temporal subset of the bias correction period can yield undesirable results. For example, the mean change between 2011–2040 and 1971–2000 might not be as close for the bias corrected and the raw data. If one desires to have the climate change signal properly preserved across a variety of timescales, the SDM method must be discretized into smaller blocks. For this study, the authors used 30-year periods with a 10-year sliding window in order to bias correct the middle 10 years. We began by bias correcting the period 2011–2020 using the period 2001–2030 as our period of interest (future period) and 1971–2000 as our calibration period. Next, we bias corrected 2021–2030 using the period 2011–2040 (period of interest) along with the same calibration period. With no modeled data beyond 2100, the period 2091–2100 used 2081–2100 as the period of interest. A user can adjust the number of years to bias correct length of the period, number of years to save, and the length of the sliding window to more/less strongly follow the raw modeled temporal evolution of climate change.

The amount of inflation (or deflation) introduced to the raw model
projected change to the mean by the SDM and QM methods. The mean climate
change was evaluated between the periods 2071–2100 and 1971–2000. Panels

Figure 7 illustrates the amount of inflation (or deflation) that was
introduced to the raw model projected change to the mean by the SDM and
standard QM methods. The mean climate change was evaluated between the
periods 2071–2100 and 1971–2000. Figure 7a and b show the difference, by
month, between the raw mean temperature changes and the bias corrected mean
temperature changes when using SDM and QM, respectively. Similarly, Fig. 7c and d show
the same but for relative change to precipitation. For both
temperature and precipitation, SDM has minimal inflation to the raw model
projected mean change. In contrast, using QM leads to inflation greater than
1

Same as Fig. 7, but depicting the inflation of the standard deviation for SDM versus QM.

The temporal evolution of the SDM and QM performance, for all grid cells, is
illustrated for temperature in Fig. 9. As discussed in Sect. 4.1.3,
the authors implemented SDM using 30-year periods with a 10-year sliding
window in order to bias correct the middle 10 years. The amount of
inflation/deflation to the climate change signals is illustrated for the
mean, standard deviation, skewness, and the trend. Each colored grid cell
in the figure depicts the spatial MAE between the raw
model and bias corrected changes for all grid cells. For example, consider
the lower left grid cell situated in the SDM row and mean column. This grid
cell shows the spatial MAE between the raw modeled and bias corrected mean
changes between the periods 2071–2100 and 1971–2000 for the month of January
(it is the spatial MAE of the January panel of Fig. 7a). What is most
noticeable is how the QM's alteration (inflation/deflation) of the climate
change signal increases as a function of the projected time period. When the
projected period is furthest from the calibration period (2071–2100), the
alteration to the leading three moments are the greatest. In contrast, SDM
is seen to outperform QM in its ability to better preserve the raw projected
changes. Furthermore, the performance of SDM does not degrade as a function
of the projected time period. Figure 10 shows the same as Fig. 9, but for
precipitation. Again, SDM performs much better in preserving the raw
projected changes. This is especially true for projected changes to the mean
and the standard deviation. It should be noted that all methods perform
worse for precipitation in the summer months. This can be explained by some
regions (e.g., Spain) having very few days with precipitation, which cannot
be fit well by either a parametric or a non-parametric method. Using a

Performance of SDM and QM for bias correcting temperature with varying outlook periods (i.e., 2011–2040, 2021–2050, …). The color bars correspond to the mean absolute error (MAE) between the raw model and bias corrected changes for the leading three moments of the distribution in addition to trends. The lower left grid cell situated in the SDM row and “Mean” column is the spatial MAE between the raw modeled and bias corrected mean changes between the periods 2071–2100 and 1971–2000 for the month of January (it is the MAE over all of Europe calculated from Fig. 7a).

Same as Fig. 9, but for precipitation.

Same as Fig. 9, but compares the temperature performances of SDM to the more recent and similar methods of QDM and DETQM.

Figure 11 compares the temperature performances of SDM to the more recent
and similar methods of QDM and DETQM. SDM is seen to perform best with
respect to better preserving the raw projected changes for the leading three
moments of the distribution. Both SDM and QDM perform equally well for
preserving changes to the mean. However, SDM better minimizes MAE for
standard deviation and skewness. This can be attributed to one of the main
differences between the SDM and QDM methods. QDM scales the observed
distribution by absolute difference between modeled quantiles (future –
past), though this does not properly scale the higher moments of the
distribution. In contrast, SDM applies the rightmost term, (

Same as Fig. 10, but compares the precipitation performances of SDM to the more recent and similar methods of QDM and DETQM.

Figure 12 compares the performances of the same three methods, but for
precipitation. Again, SDM outperforms QDM and DETQM. Again, the average MAE
for Figs. 11 and 12 are statistically significantly smaller for SDM (

The impact of using different precipitation thresholds on
methodological performance. Each bar is the average MAE across all grid
cells, months and outlook periods (e.g., blue bar in panel

Additionally, we investigated the impact of using a different precipitation
threshold. A precipitation value of 0.1 mm is very small and will have no
noticeable impact for many applications. In those cases, it may be more
appropriate to have a larger precipitation threshold. Figure 13 shows
average performances for each method when using a precipitation threshold of
0.1 and 1.0 mm, respectively. The performance of SDM improves as a result
of using a larger threshold, while the performances of other methods remain
approximately the same. The improvement of SDM with the 1.0 mm threshold can
be explained by the fact that a larger threshold leads to a better fit to
the distribution. For observations, averaged across all grid cells, we found
that precipitation amounts of less than 1.0 mm (and

Bias correction methods are used extensively in impact assessment studies (Ines and Hansen, 2006; Muerth et al., 2013; Teng et al., 2015). The application of these methods, however, is not without controversy (Ehret et al., 2012). A number of important questions that require consideration are (1) does independently applying bias correction to different meteorological variables (separately to precipitation and temperature) adversely alter the thermodynamically consistent spatiotemporal fields provided by climate models? (2) Do bias correction methods avoid pushing the corrected values beyond physically realistic limits? (3) Can GCMs/RCMs with large biases be reliable in their projections of climate change? (4) How can substantial model deficiencies not simply be falsely treated as bias and corrected as such? These are difficult questions, and more reflection and investigation is required before we find answers that are indisputable. In any regard, for the foreseeable future, there will continue to be scientists that use bias correction methods for impact assessment studies.

Statistical bias correction methods vary considerably and can have a large influence on the expected regional impacts of climate change. Multiple studies have previously come to the conclusion that QM is one of the better existing bias correction methods (Gudmundsson et al., 2012; Teutschbein and Seibert, 2012, 2013; Chen et al., 2013). However, our analysis highlighted two issues that challenge this conclusion. First, we demonstrated that the stationarity assumption is invalid and, as a result, the climate change signal cannot be justifiably altered using QM. Second, split-sample or cross-validation evaluation tests do not isolate the performance of the bias correction methods themselves. These performances are conflated with how well the raw model(s) simulate the observed changes to the leading moments of the distribution. In light of these issues, the performance of a bias correction method should be validated on how well it preserves raw model projected changes across the entire distribution.

In this study, we presented the SDM bias correction methodology, which scales the observed distribution by raw model projected changes to magnitude, rain-day frequency (for precipitation), and the likelihood of events. The performance of SDM was evaluated and found to perform better than traditional QM along with recent methods that are more similar such as QDM and DETQM. We advocate using a bias correction method, such as SDM, which scales the observed distribution by simulated changes across the modeled distribution. As a result, one need not rely on the invalid stationarity assumption.

The data are available at ESGF (e.g.,

The authors declare that they have no conflict of interest.

The work was supported by the project ÖKS15,
funded by the Austrian Federal Ministry of Agriculture, Forestry,
Environment and Water Management. The authors had additional support from
the United States Bureau of Reclamation, and the federal state government of
Austria with the projects CHC-FloodS (id: B368584) and DEUCALION II (id:
B464795), both funded by the Austrian Klima- und Energiefonds through the
Austrian Climate Research Program (ACRP). We acknowledge the World Climate
Research Programme's Working Group on Regional Climate, and the Working
Group on Coupled modeling, former coordinating body of CORDEX and
responsible panel for CMIP5. We also thank the climate modeling groups
(Royal Netherlands Meteorological Institute (KNMI)) for producing and making
available their model output. We also acknowledge the Earth System Grid
Federation infrastructure an international effort led by the U.S. Department
of Energy's Program for Climate Model Diagnosis and Intercomparison, the
European Network for Earth System modeling, and other partners in the Global
Organisation for Earth System Science Portals (GO-ESSP). We acknowledge the
E-OBS data set from the EU-FP6 project ENSEMBLES
(