Climate simulations often suffer from statistical biases with respect to
observations or reanalyses. It is therefore common to correct (or adjust)
those simulations before using them as inputs into impact models. However,
most bias correction (BC) methods are univariate and so do not account for
the statistical dependences linking the different locations and/or physical
variables of interest. In addition, they are often deterministic, and
stochasticity is frequently needed to investigate climate uncertainty and to
add constrained randomness to climate simulations that do not possess a
realistic variability. This study presents a multivariate method of rank
resampling for distributions and dependences (R

Climate change impact studies aim to investigate and understand the
consequences of the potential evolutions of the climate system. Impacts can
be hydrological with changes in seasonal flows and water resources driven by
precipitation changes

Over the last decade, most of the developed – and therefore applied – bias
correction (BC) methods focused on the adjustment of the mean

to propose a multivariate BC (MBC) method for both multi-site and multi-variable simulations;

to relax the temporal constraints of EC-BC on the corrected data ranks in order to let the climate model drive more temporal properties and their evolutions and therefore express its own temporal dynamics;

to introduce some stochasticity in the MBC outputs, or at least to enable the proposed MBC method to provide multiple corrected scenarios.

To apply, investigate and evaluate the proposed R

The ERA-Interim

Moreover, Sect.

In many of the multivariate BC development papers, the notion of “copula
functions” is used. Indeed, those functions characterize the rank dependence
structure of most multivariate joint distributions

The EC-BC approach

The R

As in EC-BC or any “marginal/dependence” approach,
each dimension (variable/location) is first corrected independently
from the others by a univariate BC method. In the present study, the
CDF-t method is used

Then, a dimension is selected (i.e., one physical variable at one given location) to serve as a “reference dimension” for the shuffling. For this specific dimension, the time sequence of the ranks of the 1d-bias-corrected data is kept untouched. Note that this sequence is therefore the same as that of the ranks of the simulations to be corrected, at least with a BC method preserving the ranks as it is the case for CDF-t.

Next, for each time step

Once this time step

Steps 2 to 4 are then repeated successively until each dimension has served as the reference dimension.

One example of 3-dimensional reference data and results from the
1d-bias correction of sample size 4 for illustration of the R

Results of the R

An example is now given to illustrate the functioning of R

Moreover, step 4 assumes that these copula (dependence) functions are stable
in time (i.e., stationary) and correspond to those from the reference data.
This assumption makes it possible to apply the proposed R

In the present study, the CDF-t univariate adjustment method

This section describes the comparisons that will be performed between
different BC methods in the following for evaluating the proposed R

It is first reminded that, for each tested BC method applied to ERA-I reanalyses with SAFRAN data as reference, the calibration period is 1980–1994, while the correction/evaluation period is 1995–2009. Moreover, each calibration/evaluation is performed for daily temperature and precipitation time series on 1506 grid cell in the southeast of France over a 6-month “winter” (15 October to 14 April) and a 6-month “summer” (15 April to 14 October).

First of all, the 1-dimensional CDF-t bias correction

a 2-dimensional R

a 1506-dimensional R

a 3012-dimensional version, where temperature and precipitation for all the 1506 grid cell are corrected jointly. Only one 3012d-BC is needed here.

In the following Sect.

Inter-variable Spearman correlation maps in winter over the
evaluation period from:

In this section, all analyses are realized for the winter season but the main conclusions hold for the summer results that are displayed in the Supplement.

First, the BC results are compared in terms of inter-variable correlations.
To do so, the spearman correlation between temperature and precipitation time
series have been computed for each of the 1506 grid cell and the resulting
maps are shown in Fig.

Maps of first temperature predominant empirical
orthogonal functions (EOFs) in winter over the evaluation
period for

The evaluation is now performed in terms of inter-site and spatial
correlation. A principal component analysis (PCA) is first carried out on
each physical variable (i.e., temperature and precipitation) separately but
for the whole region of interest (i.e., 1506 grid cell). However, before
applying the PCA, the daily areal mean has been removed from each daily data.
Indeed, the data present a high day-to-day variability within the region of
interest. This strongly impacts the PCA that shows a predominant empirical
orthogonal function (EOF) almost uniform over the region if the areal mean is
not removed (not shown). Moreover, as precipitation presents a skewed
distribution, all zero precipitation values are put to a non-zero but
positive small value (

Same as Fig.

Eigenvalues

Correlograms in winter over the evaluation period
for

Maps of lag-1 day temperature auto-correlations in winter over the
evaluation period
for

Same as Fig.

Box plots of the mean absolute error (MAE) values calculated on lag-1 to lag-7 day Pearson
correlations for:
ERA-I; 1d-BC; 2d-BC; 1506d-BC of T2 or PR (example for first reference variable);
3012d-BC with five different reference temperature locations.

Values of

In order to get more quantification of those results, various Spearman and
Pearson correlation matrices was computed for the different datasets (SAFRAN,
ERA-I and the BC results) in the evaluation period over the 1506 locations:

on temperature vs. temperature (resulting in a 1506

on precipitation vs. precipitation (1506

on temperature vs. precipitation (1506

on (temperature, precipitation) vs. (temperature, precipitation)
(3012

Other analyses of the spatial properties derived for the different BC methods
were also performed (e.g., quantile-quantile plots of the daily areal means)
but are not provided here since their conclusions were the same as in the
presented figures: 1d-BC approximately preserves ERA-I properties that are
biased with respect to SAFRAN's; 2d-BC changes the ERA-I spatial statistics
but does not necessarily improve them, while 1506d- and 3012d-BC via
R

The proposed R

For illustration purposes, in order to evaluate and compare the different BC
methods when applied to regional climate simulations over a historical period
and in a future climate change context, two RCMs driven by the same GCM are
used to provide simulations to be corrected. Those RCMs are (i) the “Weather
Research and Forecasting” (WRF) regional climate model

(left column) Inter-variable Pearson correlations between T2
and PR in winter for each grid cell and (right column) changes in inter-variable
Pearson correlations from the historical period to the 2071-2100
period;

Spatial correlograms of temperature

This subsection contains a short evaluation of the BC methods applied to the
RCM simulations over the 1980–2009 historical period, as well as an
illustration of how the tested BC methods behave and differ from each other
in a climate change context, both in terms of inter-variable and inter-site
dependencies. As an objective of this sub-section is to evaluate the changes
from the historical (1980–2009) to the future (2071–2100) time periods, in
order to save space, the evaluations of the BC methods applied to the RCM
simulations are performed directly over the whole historical period
(1980–2009), without cross-validation. Nevertheless, when applying the same
cross-validation exercise as was done with ERA-I in
Sect.

A new multivariate bias correction approach was proposed, allowing to correct
not only the marginal (univariate) distributions of the climate variables of
interest but also the statistical dependences between the variables, as well
as the dependences between the different locations over a given geographical
domain. This approach relies on the previously developed “Empirical Copula
–
Bias Correction”

R

The different BC versions were then also tested and compared on climate
simulations from the WRF and RCA4 regional climate models (RCMs) over the
1980–2009 historical period as well as the 2071–2100 future time period. The
2071–2100 bias corrections was not made to evaluate the methods (since no
reference data are available for the future) but rather to illustrate how the
different multivariate R

The possible future developments of this work are both methodological and
applied. First, as stated earlier, the variability/stochasticity introduced
in the actual R

Moreover, based on the results presented in this study, the assumption of
conservation of the dependence structure sounds reasonable for the inter-site
aspects (Fig.

Furthermore, the R

More generally, there is not yet a complete intercomparison of the
multivariate bias adjustment methods. As the need for such multivariate
methods becomes crucial for many impact studies, intercomparison exercises are
now essential to evaluate the various existing methodologies and to make
distinctions, not only between “marginals/dependence” and “successive
conditional” correction approaches for example but also between different
methods and assumptions within each approach. If such an intercomparison
study has to be performed first from the climate point of view (i.e., in
terms of quality of the corrected climate variables and their various
properties), it should also be conducted from the perspective of some
specific impacts and impact models, trying to understand how the quality of
the bias adjusted simulations transfer into the often non-linear impact model
outputs. To do so, applying a high-dimensional R

Finally, the selection of an “optimal” reference dimension, or at least some
preferential ones, is certainly a necessary future step. However, the notion of
optimality here may depend on the context of the correction and on
the subsequent use of the multivariate bias-corrected data. However, simple selection
methods can be imagined. For example, a logical choice can be to
select the dimension for which the temporal dynamics of the model to be
corrected is the most similar to that of the observations over the
calibration period. In such a case, that could correspond to the dimension
for which the Spearman rank correlation (or an auto-correlation value) is the
closest to that of the reference (observational) data. Of course, other
selections are possible but this question is left for future work. In the
same idea, we could also consider a “multivariate” reference vector. For
example, instead of relying on a univariate reference dimension, the latter
can be a couple (or more generally a

ERA-Interim temperature and precipitation
datasets can be accessed through the ECMWF website at

The multivariate BC method is applied to

The R

Apply separate univariate BC to each dimension.
We obtain

Compute the

Compute the

Choose one dimension

Find

For time

For all dimensions

Repeat steps 4 (a–c) for all dimensions until

The author declares that he has no conflict of interest.

This work has been partially supported by the ANR-project StaRMIP, the
VW-project CE:LLO, the ERA4CS EUPHEME and CoCliServ projects, and the LABEX
IPSL project. All computations were made in R. An R package containing
functions for the R