Bias correction can modify climate model simulated precipitation changes without adverse effect on the ensemble mean

When applied to remove climate model biases in precipitation, quantile mapping can in some settings modify the simulated difference in mean precipitation between two eras. This has important implications when the precipitation is used to drive an impacts model that is sensitive to changes in precipitation. The tendency of quantile mapping to alter model-predicted changes is demonstrated using synthetic precipitation distributions and elucidated with a simple theoretical analysis, which shows that the alteration of model-predicted changes can be controlled by the ratio of model to observed variance. To further evaluate the effects of quantile mapping in a more realistic setting, we use daily precipitation output from 11 atmospheric general circulation models (AGCMs), forced by observed sea surface temperatures, over the conterminous United States to compare precipitation differences before and after quantile mapping bias correction. The effectiveness of the bias correction is not assessed, only its effect on precipitation differences. The change in seasonal mean (winter, DJF, and summer, JJA) precipitation between two historical periods is compared to examine whether the bias correction tends to amplify or diminish an AGCM’s simulated precipitation change. In some cases the trend modification can be as large as the original simulated change, though the areas where this occurs varies among AGCMs so the ensemble median shows smaller trend modification. Results show that quantile mapping improves the correspondence with observed changes in some locations and degrades it in others. While not representative of a future where natural precipitation variability is much smaller than that due to external forcing, these results suggest that at least for the next several decades the influence of quantile mapping on seasonal precipitation trends does not systematically degrade projected differences.


Introduction
In translating simulated precipitation projections produced by general circulation models (GCMs) for local and regional climate impact studies, a process of downscaling is needed (e.g., Christensen et al., 2007;Fowler et al., 2007;Murphy, 1999).While "perfect-prognosis" downscaling estimates fine-scale projections by assuming the predictors are realistically simulated (Eden et al., 2012), any "model output statistics" (MOS, Glahn and Lowry, 1972) approach by design includes some form of bias correction to remove the time-invariant GCM biases, allowing the signal, or change, simulated by the GCM to be isolated to some degree from the systematic errors.This is critical in applications such as hydrology, where runoff is a nonlinear function of precipitation, and so is highly sensitive to model biases.
A common method for bias correction is quantile mapping (QM), which has been shown to be an effective method for removing some GCM biases at relatively little computational expense (Li et al., 2010;Maraun et al., 2010;Panofsky and Brier, 1968;Piani et al., 2010;Themeßl et al., 2011;Wood et al., 2004).This method has been employed in creating several widely used data sets of downscaled GCM output for the United States and global land areas (Girvetz et al., 2009;Maurer et al., 2014).The use of these data sets in hundreds of studies, and the extensive application of QM by many others, has led to recent efforts to study some of the assumptions and effects of QM bias correction (Maraun, 2012(Maraun, , 2013;;Maurer et al., 2013).
One important effect of QM is that it can change the GCM trend, so that the raw GCM simulated change is modified during the bias correction process, an effect that can be large relative to other sources of uncertainty such as variability among GCMs (Brekke et al., 2013;Hagemann et al., 2011;Maraun, 2013;Pierce et al., 2013;Themeßl et al., 2011).This has raised concerns regarding the effect of modifying the precipitation change simulated by GCMs, especially for waterconstrained regions where climate adaptation plans hinge on projected changes in water supply (Barsugli, 2010).
In this paper we examine the effect of QM on simulated precipitation changes between two historic periods, and focus on the question of whether the simulated changes are systematically altered by QM.
While historic GCM simulations include the climatic response to forcings such as changes in atmospheric greenhouse gas concentrations, solar variability, etc., they are unsynchronized with historic natural variability (Eden et al., 2012).This natural, or internal, variability of precipitation can be dominant even at timescales as long as 50 yr (Deser et al., 2012;Maraun et al., 2010), and may play a substantial role in GCM variability in future projections through the mid-21st century (Hawkins and Sutton, 2011).Thus, the differences in a regional precipitation change between two periods in a GCM's historic simulation compared to the observed change result from both GCM biases in sensitivity to external forcing and the fact that natural variability is not synchronized with the observed record.Only the former represents a bias in the GCM.To lessen this effect, this study uses model output contributed as part of the Atmospheric Model Intercomparison Project (AMIP) experiment.In these AMIP model runs the simulated natural variability is more closely tied to observations, since observed sea surface temperatures and sea ice are imposed on the atmospheric model, with the same greenhouse gas concentrations as the historical simulations, with simulations performed by an atmospheric general circulation model (AGCM).This provides a test where the effects of unsynchronized low frequency natural variability between the models are diminished relative to unconstrained historic runs.The improved representation of trends in AMIP simulated precipitation, as compared to unconstrained historical runs, has been demonstrated (Hoerling et al., 2010).
In this study we do not separate the different sources of variability, but apply a QM bias correction as it is typically done, where the QM recognizes the difference between a simulated and observed variable (calling the difference "bias"), but is blind to the source of the difference.As the sources of this aggregate "bias" change in the future, for example, when the precipitation trends forced by increased atmospheric greenhouse gas concentrations dominate regional precipitation variability, it is conceivable that the effect of QM on the simulated trends may change.It is also possible that the relative importance of different mechanisms driving regional precipitation (e.g., large-scale circulation, orographic enhancement, convective storms) will change in the future (Cloke et al., 2013;Maraun et al., 2010), altering the climate model biases and ultimately the effect of QM on trends.Thus, the findings from this experiment should be limited to the historic period and the next few decades, when natural precipitation variability constitutes a similar proportion of the variability as over the three most recent decades.
As noted by Eden et al. (2012), techniques such as QM cannot correct for certain types of biases, such as GCM errors in large-scale circulation producing storm tracks very different from observations.Thus, Eden et al. (2012) suggest that QM only be applied to the portion of the bias due to climate model parameterization or orography; to apply QM to the aggregate bias as done here (and in most applications of QM) can result in less robust bias removal.It should, however, be emphasized that this study does not examine the effectiveness of QM at reducing differences between observed and simulated precipitation, but only its effect on mean precipitation changes over multi-decadal timescales.This experiment examines whether there are coherent modifications induced by QM to the simulated precipitation changes, and if so, whether they might have a tendency to improve or degrade the projected changes.

Methods and data
As an observational baseline, we used the daily precipitation dataset of Livneh et al. (2013), which has a spatial extent of the conterminous United States, a spatial resolution of 1/16 • (approximately 6 km), and includes the period 1915-2011.The period from 1979 (the beginning of the AMIP model output) to 2005 was aggregated to a 1 • spatial resolution for this bias correction exercise, which is a typical spatial resolution used when bias correcting GCMs (e.g., Li et al., 2010;Wood et al., 2004).The 1 • spatial scale was selected here to correspond to a scale finer than the highest resolution climate model used in this study.We included only those 1 • cells where at least 25 % of the area was land area included in the Livneh et al. (2013) data set.
We obtained simulated daily precipitation from the historical AMIP runs for 11 AGCMs, listed in Table A1, from the CMIP5 multimodel ensemble archive (Taylor et al., 2012).For all of the AGCMs we used the run identified as r1i1p1, with the exception of GISS-E2-R for which we used r6i1p1 since that had the available variables and periods for this study.From the CMIP5 AMIP runs we extracted the 1979-2005 period and bilinearly interpolated the data onto the same 1 • grid as the observations.QM is then applied (independently) to each 1 • grid cell in the domain.QM is extensively discussed elsewhere (e.g., Gudmundsson et al., 2012; references cited above) and only a brief summary is presented here.QM bias correction is an empirical statistical technique that matches the quantile of an AGCM simulated value to the observed value at the same quantile.The quantiles are determined by sorting AGCM output and observations for the same historical base (or calibration) period, and constructing cumulative distribution functions (CDFs) for each.We used a version of QM bias correction essentially following Maurer et al. (2010), with one variation.Maurer et al. considered each month independently, so that for January a 15 yr period would have a CDF defined by 31 days × 15 yr = 465 points.One modification for this application is that, to avoid abrupt inconsistencies between months, we used a moving 31 day window centered on each day, producing a separate set of CDFs for each day of year (Dobler et al., 2012;Thrasher et al., 2012).This method employs a nonparametric quantile mapping; that is, there is no fitting of a theoretical probability distribution to the data in creating the CDFs.While both parametric and nonparametric approaches are widely used in QM, nonparametric methods have shown higher skill in reducing systematic errors in modeled precipitation (Gudmundsson et al., 2012).
The period 1979-1993 is used to train the QM, which is then applied to 1994-2005.The difference in precipitation between 1994-2005 and 1979-1993 is assessed both before and after bias correction.We compared the raw interpolated AGCM (raw) and the bias corrected (BC) shifts relative to observations (obs) in precipitation between the two periods for winter (DJF) and summer (JJA).We used a difference in daily precipitation, in millimeters, as a metric, for example: where the subscript x is either obs, raw or BC for observations, raw AGCM, or bias corrected AGCM precipitation, and the overbar indicates a mean for the period.To quantify the effect of the BC on the precipitation change between the two periods, we used a trend modification index, TM, defined as where vertical bars are the absolute value.This index has the property of having values greater than 0 where the bias correction degrades the correspondence between the climate model and observed precipitation changes.Equation ( 2) emphasizes that we examine changes in terms of differences rather than ratios (or fractions).

Results and discussion
Figure 1 presents an illustration of one way in which quantile mapping can change the trend or shift simulated by a GCM.
The plot uses a synthetic data set of daily precipitation generated using a gamma distribution, similar to Piani et al. (2010).The data for synthetic observations have a mean of 30, as do the data for synthetic GCM for the overlapping historic period, so the GCM shows no bias in mean daily precipitation for the overlapping historic period, but is given a −30 % observations.The original change at the 80th percentile is 15.6, and the post-BC change is 21.2; at the 20th percentile the original change is 7.4 and the post-BC change is 8.6.Figure 2 continues with the synthetic data from Fig. 1, but presents probability distribution functions to illustrate more clearly the effect of the imposed bias in variance on the projected change through the bias correction process.Figure 2a shows that the 40 % increase in the raw GCM data is amplified to a 56 % increase by the QM process.If the synthetic distribution were symmetrical, a comparable decrease in GCM simulated mean would be amplified in the opposite direction, and if projected changes were negative as often as positive, then this amplifying effect would be offset and the quantile mapping would have little net effect on trends or shifts.However, because the distributions in Fig. 2a are bounded and positively skewed, even when equivalent increases and decreases are projected, the net effect of an underestimated variance is for quantile mapping to amplify the trend.This is illustrated in Fig. 2b, where the same observed and raw GCM historic distributions are used, but a 40 % decrease in mean value is imposed on the raw future GCM projection.In this case, the shift is only slightly affected by quantile mapping, changing from a 40 % decrease to a 39 % decrease.Thus, an underestimate of variance for a bounded, positively skewed distribution, common for daily precipitation (Wilks, 1989), will have a tendency during quantile mapping bias correction to amplify projected trends or shifts (Maraun, 2013).Conversely, overestimation of variance will tend to dampen projected trends.
BCR < 1 (bias correction reduces the model change) when the model difference between the pth percentile and median value is larger than the observed difference between the pth percentile and the median value -i.e., when the model has too much variance.Similarly, where BCR > 1, bias correction will increase the model change (when the model has less variance than observed).Furthermore, Eq. ( 3) indicates that QM does not alter the sign of the model-predicted change (at least in this simple case) and that the alteration of the change is insensitive to any positive or negative bias between the model and observations, being affected only by the relative variance of the two.From this simple synthetic demonstration it can be inferred that, if there were a preponderance of GCMs with biases in variance in the same direction, the net effect of QM on the simulated difference between eras could be systematically in one direction, even with random biases in the mean.
In reality trends in non-normally distributed variables cannot be represented just by changes in the median, and GCMs exhibit much more complex biases than simply an overestimate or underestimate of variance, with differing biases at different times, in different seasons, and at different quantiles, for example (Boberg and Christensen, 2012;Maurer et al., 2013;Themeßl et al., 2011), all of which can affect the modification of GCM simulated changes by QM.Thus, simply characterizing a GCM as exhibiting a certain bias in standard deviation will not exactly predict the effect of bias correction on trends.In any case, for illustration, Fig. 3 shows the ensemble median of biases in standard deviation, expressed as a ratio of simulated to observed standard deviation, for the 11 AGCMs included in this study for two seasons: DJF and JJA.This shows areas where there appears to be consistent underprediction of standard deviation by a majority of AGCMs, such as in the southeastern portion of the domain.This means there may be a potential for the trends in the raw output from many of the AGCMs to be modified by the bias correction process.
Analyzing actual precipitation simulations, Figs. 4 and 5 show that bias correction does not generally change the pattern of regions that are simulated as becoming wetter or drier, as suggested by Eq. ( 3), since the left and center columns are broadly similar.However, the difference between the bias corrected and raw AGCM precipitation changes for some regions is of a magnitude that is comparable to the projected change itself.While the differences (right columns in Figs. 4  and 5) show that there are large areas where the BC process produces a wettening or drying effect for each AGCM, there is considerable variation among the AGCMs.
While not shown here, for JJA precipitation the changes due to the BC process for each AGCM appear slightly less prominent than for DJF relative to the raw AGCM precipitation changes between the two periods.Figure 6 shows the ensemble median change and the interquartile range (IQR) between the BC and raw precipitation differences for the two periods for both DJF and JJA.The left column represents the ensemble median effect of BC on the seasonal mean precipitation difference between 1994-2005and 1979-1993. The IQR in Fig. 6 is analogous to the standard deviation, representing the spread of the AGCMs about the median.In general, where the ensemble median has the greatest magnitude, the IQR is also large, indicating high variability among the models in the effect of BC on the precipitation change.The changes in precipitation differences induced by the BC process in Fig. 6 can be a cause for concern.While in large portions of the domain they are small in comparison to the observed difference in mean precipitation between the two periods (Fig. 7), at many individual points the effect can be substantial.For example, for the DJF median panel in Fig. 6, there is a swath of dark blue grid cells along the southern west coast, with a median effect of the BC on the precipitation trend of 0.4 mm d −1 or higher.This would be an important modification based on the observed differences in Fig. 7, with a median change between the periods of 0.5-1.0mm d −1 .Second, the DJF IQR for these cells is greater than 0.5 mm d −1 , indicating that 25 % of the AGCMs would show trend modifications by BC in excess of approximately 0.65 mm d −1 (the median plus half of the IQR), which is on the order of the observed trend in Fig. 7.This latter point makes clear the importance in using an ensemble of climate models rather than one or a few, since the regions of enhancement/reduction of trends are not coherent across different models and the effect diminishes when combined into an ensemble.
Perhaps more importantly, in Fig. 6, some areas where the BC process appears (in the median) to produce much wetter conditions than the raw AGCM are also areas where the observed difference between the 1994-2005 and 1979-1993 periods is considerably higher than the AGCMs simulate.One example is the Pacific northwest, where Figs. 4 and 5 show more than half the models simulating drying DJF conditions between 1994-2005 and 1979-1993, in distinct contrast to the wettening trend in the observations (Fig. 7).It should be emphasized that the BC only adjusts the quantiles of the AGCM to match those of observations within a 15 yr training period -there is no attempt to match trends, either within the 15 yr training period or over longer periods.Thus, any trends are inherited directly from the AGCM, though the QM can, as discussed above, modify these.
This raises the question of whether the change induced by BC in the precipitation change (or trend) between the two periods degrades or improves the correspondence between simulated and observed trends in any systematic way.In terms of the link between the trend modification and variance, this is equivalent to asking if models with variances that are too large tend to have trends that are too large, and vice versa.The TM index described above is used to illustrate this for each AGCM for DJF in Fig. 8. Values in blue (negative values) show where the effect of the BC results in an improved representation of the observed difference in precipitation between the two periods, and red (positive values) indicate a degraded precipitation trend due to BC.It is evident that over the entire domain, for each AGCM there are  This suggests that with an ensemble of 11 AGCMs as used in this effort the BC produces no consistent improvement or degradation in the simulated AGCM precipitation change.While the effect of BC on the trend can be significant, it tends as often as not to bring AGCM simulated trends closer to observed trends for the periods used in this study.However, there are isolated locations where the trend appears to be degraded for most model simulations, which could be of particular interest for impacts studies.One such case is the southwestern portion of the domain, where Fig. 9 (center panels) shows the grid cells for which JJA precipitation trends are degraded for 75 % of the AGCM simulations by the BC process.For these locations, it may be beneficial to retain the raw GCM simulated trend during impacts analysis studies.Conversely, in Fig. 9 (right panels) there are many grid cells in the northeast where DJF precipitation trends are improved by BC for most of the AGCM simulations.
One of the driving motivations for much downscaling is the investigation of regional and local hydrological impacts of climate change (Fowler et al., 2007).Since the runoff response to changing precipitation is highly nonlinear (Wigley and Jones, 1985), changes in precipitation are amplified in their convolution to runoff changes.This emphasizes the importance in ensuring that the projected precipitation trends not be degraded during the BC process, since the implications would be for even greater biases in projected runoff changes.

Summary and conclusions
Quantile mapping bias correction has been shown to modify the projected changes, or trends, produced by climate models.This is of critical concern regarding precipitation projections, where changes to the raw climate model output can have significant impacts on the implications for water supply and management in the face of climate change.The resulting discrepancy between the raw climate model output and bias corrected output leaves some ambiguity as to whether the bias correction should be modified to preserve the original climate model simulated changes.It is emphasized that this study is only concerned with the effect of quantile mapping on precipitation trends.It includes no assessment of the effectiveness of quantile mapping at reducing biases, which would be enhanced by considering the different sources of bias.
The historical changes in daily mean precipitation simulated by 11 atmospheric general circulation models, driven by observed sea surface temperatures and sea ice to preserve observed variability in boundary conditions, were examined across the conterminous United States.The differences were compared between precipitation for two periods, 1979-1993 and 1994-2005 for all AGCMs, both before and after a quantile mapping bias correction, and gridded observed precipitation.We consider winter and summer precipitation separately.
We found that the bias correction did produce different precipitation changes from the raw AGCM output, with a wettening effect in some locations and a drying effect in others.While there was some spatial consistency in regions showing a tendency for bias correction to make the projections wetter or drier, the skill, measured as a correspondence to observed changes, was more variable, with different AGCMs responding to bias correction differently.Taken as an ensemble, the bias correction had no coherent, overwhelming negative or positive effect on the correspondence of the simulated to observed precipitation changes between periods.Reliance on a single AGCM or a small sample of AGCMs however could, for some regions, result in a degraded simulated trend in precipitation due to bias correction.
Based on these results, it does not appear that there is a clear advantage to either preserving the raw AGCM simulated trend in precipitation during bias correction or allowing the trend to be modified by the process.In most locations, as long as a reasonable ensemble size is used, even though the trend in seasonal precipitation may be modified in the process, it may be as likely as not to be beneficial to do so.Similar to the suggestions by others (Cloke et al., 2013), it may be prudent for practitioners to examine the projected trends in raw AGCM output as well as in bias corrected output, to be completely transparent as to the effects of bias correction on trends.
These findings are limited to the extent of this study, namely seasonal mean precipitation for the observed periods used here.This focus was motivated by the observation of changes in trends in mean precipitation produced by quantile mapping.Since changes in the magnitude of extreme precipitation events are important for assessing many impacts to society, future efforts will examine the effect of quantile mapping bias correction on trends in extreme events.Quantile mapping can have different effects at the tails of distributions (Li et al., 2010), and changes in the projected trends in extreme events due to quantile mapping have not been explored.Furthermore, the bias correction was performed at a 1 • spatial scale, so that the observations are comparable to the scale of the climate models.At finer scales, the biases between interpolated AGCM output and observations would be expected to be much more heterogeneous, and the impact of quantile mapping bias correction at finer scales could be quite different from that found here, though employing quantile mapping to downscale to fine scales has been found to be problematic (Maraun, 2013).

Fig. 1 .
Fig. 1.Cumulative distribution functions for a synthetic demonstration set of observed, GCM simulated historic, and GCM projected future precipitation data.

Fig. 2 .Fig. 3 .
Fig. 2. Probability density functions for the same synthetic data in Fig. 1, but including the post-bias correction GCM future projection.

Fig. 4 .Fig. 5 .
Fig. 4. For GCMs 1-6, the change in mean DJF precipitation between 1979-1993 and 1994-2005 for the raw GCM output (left column) and bias corrected GCM output (center); the difference between the two is in the right column.

Fig. 6 .
Fig. 6.Ensemble median difference between the BC and raw differences in precipitation between 1994-2005 and 1979-1993 for DJF (top row) and JJA (bottom row).Right column is the interquartile range (IQR), defined as the 75th percentile minus the 25th percentile.

Fig. 8 .
Fig. 8.For DJF, the TM index (described in the text) values for each GCM.

Fig. 9 .
Fig.9.For DJF and JJA, the ensemble median TM index value (left panels), the locations of grid cells (dark rectangles) where the 25th percentile TM index value exceeds 0 (center panels), and the grid cells where the 75th percentile value is less than 0.

Figure 9
summarizes the results for the ensemble in Fig. 8 and the similar ensemble for JJA.The median TM values (left panels) tend to lie close to zero, and neither degraded (TM > 0) nor improved (TM < 0) values dominate the picture for either DJF or JJA.The center panels highlight regions where 75 % of the AGCMs show a degraded change in precipitation (relative to the observed change) due to the BC process.These cases constitute 4.3 % of the grid cells for DJF and 13.0 % of the grid cells for JJA.The right panels show the grid cells where 75 % of the AGCMs show improved correspondence with the observed change after BC.These cover 26.2 and 4.5 % of the domain for DJF and JJA, respectively.

Table A1 .
Climate models used in this study.Acknowledgements.This research was supported in part by the California Energy Commission Public Interest Energy Research (PIER) Program.We acknowledge the World Climate Research Programme's Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups (listed in Table A1 of this paper) for producing and making available their model output.For CMIP the US Department of Energy's Program for Climate Model Diagnosis and Intercomparison provided coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals.